FILTERS

FILTERS define the filter conditions on the corpus of documents to be searched. FILTERS enable you to filter documents on various attributes of documents indexed in Enterprise Archive. For example, filtering certain participants of an email communication, documents with file attachment size and so on.

FILTERS enable users to narrow down target documents by applying filter criteria on other attributes of indexed documents. The following are different types of filters that can be applied to the policy:

The following is a sample representation on using filters. Note that, there is an implicit AND across the filters:

{
"FILTERS": {
"PARTICIPANTS": {},
"PARTICIPANTS_COUNT": {},
"NETWORKS": {}
}
}

PARTICIPANTS

PARTICIPANTS filter type is used to select documents on the basis of participant's attributes. The JSON structure for this filter is same as that of WORDSANDPHRASES but instead of searching for terms, documents are being filtered on the basis of participant user ids, their endpoint display names or their attributes.

These following PARTICIPANT filter syntax representations that are similar to syntactical structures of WORDSANDPHRASES.

Syntax Representation 1

Syntax Representation 2

Syntax Representation 3

Syntax Representation 4

{

"FILTERS": {

"PARTICIPANTS": {

"MUSTALL": {},

"MUSTANY": {}

}

}

}

{

"FILTERS": {

"PARTICIPANTS": {

"MUSTANY": {}

}

}

}

{

"FILTERS": {

"PARTICIPANTS": {

"MUSTANY": {

"AND": [

{},

{},

{}

]

}

}

}

}

{

"FILTERS": {

"PARTICIPANTS": {

"MUSTALL": {

"OR": [

{},

{}

]

},

"MUSTANY": {

"AND": [

{},

{},

{}

]

}

}

}

}

The open-close braces {} in the above representations are placeholders to a define search criteria.

The attributes that can be used in PARTICIPANT filter are:

  • ROLE - The valid ROLE attributes are from, to/cc/bcc, to, cc, and bcc. Absence of a role in the section implies all the roles.

  • The fields USERIDS, ENDPOINTIDS, GROUPS, DISPLAYNAMES, DOMAINS, DEPARTMENTS, DIVISIONS, BUILDINGS, CITIES, STATES, COUNTRIES, SYSTEM_ATTRS are used to filter participants based on their attributes. Each of these fields accepts an array of values.
    Each element of SYSTEM_ATTRS array value is a JSON object with "name" and "value" fields. Example 1 and Example 2 show two valid sections.

Note

Values for ENDPOINTIDS, GROUPS, DISPLAYNAMES, DOMAINS allow the use of wildcard characters and must have a minimum of three characters at the beginning of the term in which they are being used.

Example 1
{
"ROLE" : "from",
"GROUPS" : ["eng", "fin"],
"COUNTRIES" : ["US"]
}
Example 2
{
"USERIDS": [
"id1",
"id2"
],
"SYSTEM_ATTRS": [
{
"name": "attr1",
"value": "val1"
},
{
"name": "attr2",
"value": "val2"
}
]
}

Each participant attribute field mentioned above, except for ROLE and SYSTEM_ATTRS, has a counterpart field with the same name suffixed with _LIST. For example, USERIDS_LIST, GROUPS_LIST and so on.

All the attributes suffixed with _LIST contains a list of references for the corresponding attributes without a suffix. A library file with a corresponding name must be present in the Supervision List Library. For more information, see Configuring List Library. Example 3 and Example 4 represents the usage of _LIST.

Example 3
{
"USERIDS": [
"id1"
],
"USERIDS_LIST": [
"userid_list"
],
"GROUPS_LIST": [
"group_list1",
"group_list2"
],
"COUNTRIES_LIST": [
"EMEA"
]
Example 4
{
"USERIDS": [
"id1", "id2", "id3", "id4"
],
"GROUPS_LIST": [
"group_list1",
"group_list2"
],
"COUNTRIES_LIST": [
"EMEA"
]

Using MUSTX and Boolean Operations within PARTICIPANT Filter

The policy with PARTICIPANT filter can be enhanced by applying MUSTX occurrences to the section. The MUSTX occurrence criteria is applied not only across the values of a participant attribute but also across the different participant attributes. Use of Boolean operators AND/OR allow further flexibility.

Note

Except for ROLE, different attribute filters in a section are independently applied to the set of participants. Only ROLE applies to all the participants, no participant is required to satisfy more than one of the other attribute filters.

Example 1

Enterprise Archive filters documents with participants which satisfy all the following criteria:

  • No "from" participant must be from "amazon.com" or "google.com".

  • At least there should be one "to_cc_bcc" participant for each group "Trade" and "Finance" and country US. Please note that the participant from "Trade" or "Finance" need not be the same as the one from country "US".

  • The document should be sent from at least one participant in one of APAC or EMEA countries.

{
"PARTICIPANTS": {
"MUSTNOT": {
"ROLE": "from",
"DOMAINS": [
"amazon.com",
"google.com"
]
},
"MUSTALL": {
"ROLE": "to_cc_bcc",
"GROUPS": [
"Trade",
"Finance"
],
"COUNTRIES": [
"US"
]
},
"MUSTANY": {
"ROLE": "from",
"COUNTRIES_LIST": [
"APAC",
"EMEA"
]
}
}

Example 2

Demonstrates use of AND operation and selects documents which satisfy all the following criteria:

  • At least one participant is from "Trade" or "Finance" group.

  • At least one participant is from "APAC" or "EMEA" region.

{
"PARTICIPANTS": {
"MUSTANY": {
"AND": [
{
"GROUPS": [
"Trade",
"Finance"
]
},
{
"COUNTRIES_LIST": [
"APAC",
"EMEA"
]
}
]
}
}

Example 3

Demonstrates use of OR and selects documents which satisfy at least one of the following criteria.

  • At least one participant from "Trade" and one participant from "Finance".

  • At least one participant from "APAC" and one participant from "EMEA".

{
"PARTICIPANTS": {
"MUSTALL": {
"OR": [
{
"GROUPS": [
"Trade",
"Finance"
]
},
{
"COUNTRIES_LIST": [
"APAC",
"EMEA"
]
}
]
}
}

PARTICIPANTS_COUNT

This filter allows you to search documents based on the number of participants.

Valid Values

Valid Values

TYPE

  • external

  • internal

  • all (the default when TYPE field is absent).

ROLE

from, to/cc/bcc, to, cc, and bcc.

RANGE

  • "1 TO 5" // at least one space around "TO"

  • "> 10" , ">= 10"

  • "= 10",

  • "< 10" , "<= 10"

Example
"PARTICIPANTS_COUNT": {
"TYPE": "external",
"ROLE": "to",
"RANGE": "1 TO 5"
}

Result

All documents addressed to maximum of five external users with their email address specified in the to field are filtered.

COMMUNICATION_TYPES

This filter allows you to search documents with types of participant. Communication types in Enterprise Archive are as follows:

  • Internal Only

  • External Only

  • External Inbound

  • External Mixed Outbound

  • External Only Outbound

  • External Bi-directional

Example
"COMMUNICATION_TYPES": [
"internal only",
"external only",
"external inbound",
"external mixed outbound",
"external only outbound",
"external bi-directional"
]

Result

All documents with all possible values for Communication type are filtered.

NETWORKS

This filter allows you to search documents with Network type such as Lync, AIM, SharePoint, Yammer, and so on.

"FILTERS": {
"NETWORKS": [
"lync",
"aim",
"sharepoint"
]
}

Result

All the documents from Lync, Aim, and Sharepoint are filtered.

CHANNELS

This filter allows you to search documents with Channels types such as Instant Messengers, Chat, e-mail, and so on.

"FILTERS": {
"CHANNELS": [
"im",
"chat",
"email"
]
}

Result

All the documents from Lync, Aim, and Sharepoint are filtered.

ACTION_EVENTS

This filter allows you to search documents using the following Action Events:

  • block

  • moderate

  • add x-headers

  • modify subject

  • add disclaimer

  • copy message

  • move message

  • notify message

  • notify originator

  • update recipients

  • archive

  • review

"FILTERS": {
"ACTION_EVENTS": [
"block",
"moderate",
"archive",
"review"
]
}

Result

All documents that are marked as block, moderate, archive or review are filtered.

POLICY_EVENTS

This filter allows you to search documents using the following Policy Events:

  • permissions

  • blocked content

  • ethical wall

  • flagged content

  • challenged content

  • infected content

  • moderated content

  • altered content

  • size limit exceeded

  • content type

  • miscellaneous

"FILTERS": {
 "POLICY_EVENTS": [
"permission",
"blocked content",
"moderated content",
 ]
}

Result

All documents that are categorized under permissions, blocked content, and moderated content are filtered.

FILE_SIZE

This filter allows you to filter documents according to file size. Valid values of UNIT are B (default), KB, MB, and GB. The syntax for RANGE is same as in PARTICIPANTS_COUNT.

"FILTERS": {
"FILE_SIZE": {
"RANGE": "<10",
"UNIT": "MB"
}
}

Result

All documents with attachment less than 10 MB are filtered.

FILE_COUNT

This filter allows you to filter documents based on file count. The syntax for RANGE is same as in PARTICIPANTS_COUNT.

"FILTERS": {
"FILE_COUNT": {
"RANGE": "<10",
}
}

Result

All documents with less than 10 attachment are filtered.

FILE_NAMES

This filter allows you to filter documents based on the file names. The syntax for file names are similar to TERM syntax, that is Regex and Wildcards can be used.

Note

File names can contain Wildcard characters and must have a minimum of three characters at the beginning of the term in which they are being used.

"FILTERS": {
"FILE_NAMES": ["listofdocs.txt"]
}

Result

Documents with attachment named listofdocs.txt are filtered.

FILE_EXTENSIONS

This filter allows you to filter documents based on the file extensions such as TXT, PDF, DOC, XLS, and so on.

Note

File extensions can contain a Wildcard characters and must have a minimum of three characters at the beginning of the term in which they are being used.

"FILTERS": {
"FILE_EXTENSIONS": ["txt"]
}

Result

Documents with .txt attachments are filtered.

FILE_CONTENT_TYPES

This filter allows you to filter documents which have at least one attachment whose content type matches one of the mentioned names. The following are high-level file types the attachments are categorized under:

  • application

  • audio

  • font

  • image

  • message

  • model

  • multipart

  • text

  • video

Note

File content types can contain a Wildcard characters and must have a minimum of three characters at the beginning of the term in which they are being used.

"FILTERS": {
"FILE_CONTENT_TYPES": ["application", "audio"]
}

Result

Documents with attachments that are categorized under application or audio types are filtered.

DATE

This filter allows you to filter documents based on the specified date range. Each date criteria has the following fields:

DATE_TYPE

DATE_TYPE can be one of the following:

  • Archived

  • Sent

  • Processed

Multiple date criteria with same date type results in union of documents from each date criteria. This implies that there is an implicit "OR" between date criteria with same date type.

Date criteria with different date types select documents common across those date criteria. This implies that there is an implicit "AND" between date criteria with different date type.

RAGE_TYPE

RANGE_TYPE can be one of the following:

  • Between

  • Before

  • After

FROM

Represents start date of the date criteria. Either Between and After RANGE_TYPES must be used. See Example 2.

TO

Represents end date of date criteria. Either Between and Before RANGE_TYPES must be used. See Example 3.

Date Formats Supported

Date mentioned in the FROM and TO fields can be in any of the following formats:

  • Epoch time in milliseconds.

  • "yyyy-MM-dd'T'HH:mm:ss" // 2019-04-02T15:03:00 ; time zone passed by system

  • "yyyy-MM-dd'T'HH:mm:ssZ" // 2019-04-02T15:03:00+0800, 2019-04-02T15:03:00-0800

  • "yyyy-MM-dd'T'HH:mm:ssZZ" // 2019-04-02T15:03:00+05:30, 2019-04-02T15:03:00-08:00

  • "yyyy-MM-dd'T'HH:mm:ss.000" // 2019-04-02T15:03:00.999, 2019-04-02T15:03:00.999 (with milliseconds)

  • "yyyy-MM-dd'T'HH:mm:ss.000Z" // 2019-04-02T15:03:00.999+0530, 2019-04-02T15:03:00.999-0800

  • "yyyy-MM-dd'T'HH:mm:ss.000ZZ" // 2019-04-02T15:03:00.999+05:30, 2019-04-02T15:03:00.999-08:00

Example 1

Filters documents archived from "2009-10-12T12:10:30" to "2009-10-14T11:20:30", both inclusive.

{
"DATE": [
{
"DATE_TYPE": "Archived",
"RANGE_TYPE": "Between",
"FROM": "2009-10-12T12:10:30",
"TO": "2009-10-14T11:20:30"
}
]

Example 2

Filters documents archived either from "2009-10-12T12:10:30" to "2009-10-14T11:20:30" or after "2015-10-12T00:00:00".

{
"DATE": [
{
"DATE_TYPE": "Archived",
"RANGE_TYPE": "Between",
"FROM": "2009-10-12T12:10:30",
"TO": "2009-10-14T11:20:30"
},
{
"DATE_TYPE": "Archived",
"RANGE_TYPE": "After",
"FROM": "2015-10-12T00:00:00"
}
]
}

Example 3

Filters documents sent before "2009-10-11T00:00:00" but archived from "2009-10-12T12:10:30" to "2009-10-14T11:20:30".

"FILTERS": {
"DATE": [
{
"DATE_TYPE": "Archived",
"RANGE_TYPE": "Between",
"FROM": "2009-10-12T00:10:00",
"TO": "2009-10-14T00:00:00"
},
{
"DATE_TYPE": "Sent",
"RANGE_TYPE": "Before",
"TO": "2009-10-11T00:00:00"
}
]
}

NATIVE_SIZE

This filter allows you to filter documents on the basis of the size of the original interaction transcript ingested in Enterprise Archive. Valid values of UNIT are B (default), KB, MB, and GB. The syntax for RANGE is same as in PARTICIPANTS_COUNT.

"FILTERS": {
"NATIVE_SIZE": {
"RANGE": "<10",
"UNIT": "MB"
}
}

Result

Documents with native file size less then 10 MB are filtered.

IMPORTANT

The following FILTERS are for uses who are well aware with fields of documents indexed in Enterprise Archive Index Store. Contact Smarsh Support to use these filters.

CUSTOM_MAIN

This filter allows the user to use Lucene syntax for the MAIN area of the document.

CUSTOM_FILES

This filter allows user to use Lucene syntax for the FILE area of the document.

Using Boolean Operations in Filters

More than one grouping of filter types (in one JSON Object) can be logically combined with Boolean operations AND, OR, NOT. These Boolean operators can be nested to any depth to express complex scenarios.

Example 1
{
"FILTERS": {
"OR": [
{
"COMMUNICATION_TYPES": [
"external mixed outbound"
],
"NETWORKS": [
"lync"
]
},
{
"COMMUNICATION_TYPE": [
"external inbound"
],
"NETWORKS": [
"aim"
]
}
]
}
}

Example 2 represents an OR operation between two groups of filters. A document must satisfy any of the following conditions to get selected:

  • Communication type is external mixed outbound AND Network is lync.

  • Communication type is external inbound AND Network is aim.

This requires that the selected documents contain pdf or doc extension file attachments AND also satisfy the conditions of Example 1.

Example 2
{
"FILTERS": {
"AND": [
{
"FILE_EXTENSIONS": [
"pdf",
"doc"
]
},
{
"OR": [
{
"COMMUNICATION_TYPES": [
"external mixed outbound"
],
"NETWORKS": [
"lync"
]
},
{
"COMMUNICATION_TYPE": [
"internal only",
"external inbound"
],
"NETWORKS": [
"aim"
]
}
]
}
]
}