Stop Words

Stop Words refer to a list of some of the most commonly words used in the English language. When a communication is indexed, Stop Words are ignored and not included in the index; consequently, no search can be conducted for those specific words.

If a Stop Word is used as part of a phrase in a search or policy lexicon entry, the Enterprise Archive substitutes an ANY token as a placeholder for that word, and any word that appears in that placeholder will still allow a match to occur if the other Non-Stop Words are present.

The following table lists 31 words considered as StopWords by Enterprise Archive and thus are removed from Indexing.

a

an

and

are

as

at

be

but

by

for

if

in

into

is

it

no

not

of

on

or

such

that

the

their

then

there

these

they

this

to

was

will

with


Note

Even though Elastic recommends treating 33 words (including For and Not) as Stop Words, Enterprise Archive supports For and Not for indexing due to their general importance in phrase searches.

However, for applications like Enterprise Conduct, there can be even smaller list of Customized Stop Words configured based on your Organization’s agreement with Smarsh. For any clarification, please contact Smarsh Support.

If search phrases "are you sure" is used, a match will occur as long as any character, number, or word appears before “you sure”. Valid matches for "are you sure" include the following:

  • "aren’t you sure"

  • "boy you sure"

  • "but you sure"

  • "I hear you sure" and so on.

Additional Examples

Search Phrase

"Cannot accept this"

Considering "this" as a Stop Word, the search returns documents containing "Cannot accept ANY." All the following would be flagged:

  • “Cannot accept this”

  • “Cannot accept money”

  • “Cannot accept anymore,” and so on.

The addition of ‘this’ is superfluous and “Cannot accept” would therefore be flagged every time it appears in the document regardless of what word appears after “accept.”

Search Phrase

"Totally in disbelief"

Considering "in" as a Stop Word, the search returns documents containing "Totally ANY disbelief.” All the following would be flagged:

  • “Totally in disbelief”

  • “Totally without disbelief”

  • “Totally Absolute Disbelief,” and so on.

“Totally disbelief" will not be flagged, since the search is also looking for something to replace the “ANY” token placeholder in between “Totally” and “disbelief”. The inclusion of the Stop Word in this case enforces the phrasing structure, but not the individual word.


Also see, Stop Words Hit Highlighting behavior in Enterprise Archive.