Understanding Enterprise Archive Data

Introduction

This section will explain the various states during the archiving lifecycle of a document in Enterprise Archive. This will help application users better understand and interpret the Archive Metrics and reporting data that is generated.

Archive Management Dashboards uses a Smarsh reporting framework, to analyze and represent data using various visualizations such as graphs, pie charts, and so on. The visualization under each dashboard lists a compilation of the following point-in-time statistical data based on the Archive Management activities:

  • Data received at gateways and the archive

  • Data purged based on retention policies

What is a Document?


A document in this scenario of metrics refers to a single ingested copy of any type of communication, which could be an email, a chat message transcript, a collaborative message, and so on. This count cannot be used to reconcile with UI search results as documents could be iterative in nature due to updates such as replies to an email, updates to an IM conversation, and so on.

What is a Snapshot?


A Snapshot refers to a discoverable document that can be searched for using Enterprise Archive UI search. When a document is created, modified, or deleted, a Snapshot is created for the event.

Archive Metrics

Archive metrics data represents documents that have been ingested into Enterprise Archive and can take any of the following states as categorized below:

Received

This state represents documents that have been successfully ingested into Enterprise Archive. Documents that are rejected will not be considered as part of this count.

Rejected

This state implies the document was rejected due to authentication failure, schema validation failure, breach of size limits or system errors.

Important

To know more about rejections and response codes, refer the topic. Any code other than 200 indicates a rejection.

Duplicate

This state implies documents with duplicate transcripts.

Processing

This state represents the following types of documents:

  • Queued and waiting to be processed.

  • Could not be processed and are awaiting re-processing in Enterprise Archive or Email Gateway.

  • Could not be processed for various reasons such as temprary failures, large interactions affecting throughput and so on.

Archived

This state for a document implies the document has passed any processing and is successfully archived in the system.

The archived documents count in the Archived visualization will not match with the Archive search result's count.

This is because, the visualization displays the count of documents that are getting ingested and archived, whereas the search results represents count of snapshots and not individual documents. In case of collaboration network data, the number of documents count in the dashboard will never match the archive search result's count.

Discoverable

This state refers to the Snapshot itself and not the document count. This count can be used to match with the archive search listing count in the UI.

Purged

This state implies the document has been successfully purged or disposed from the system.

How to view the number of Archived documents?


You can view the number of Archived documents in the system by date range by selecting Daily Stats from the Archive Management Dashboards menu as shown in the image below:

images/download/attachments/90256750/Daily_stats_screenshot_failed.jpg

How to identify Archived Raw and Gateway Archived Raw documents?


Archived Raw/Gateway Archived Raw douments are documents that could not be processed in Enterprise Archive/Email Gateway for more than 48 hours due to failures and hence will stored in their native XML/EML formats. Archived Raw/Gateway Archived Raw documents can be identified by using the Advanced Query search field in Enterprise Archive.

images/download/attachments/90256750/archived_Raw_concepts.jpg

Operator

Description

Syntax

dstate

Displays search results related to snapshot’s data-state which can be any of the following allowed values:

  • archived_raw

  • Gateway_Archived_Raw

  • Gateway_Archived_Raw_No_Metadata

To search for documents in Archived Raw state, enter the following query in the Advanced Search field:

dstate: "archived_raw"

On successful reprocessing of Gateway_Archived_Raw and Archived_Raw documents, these documents will be moved to Archived state and be identified by performing a normal search. However, the Gateway_Archived_Raw and the Archived_Raw copies of the document will also be available in the system and users can search for them using the above query. Enterprise Archive will store them as two separate snapshots. Enterprise Archive does not extract attachments from Gateway_Archived_Raw and Archived_Raw messages.

How to identify Fatal documents?


This state for a document implies a permanent failure or a source error and will not be available for reprocessing. These documents have to be resent by the point products for archiving. Fatal documents cannot be searched or identified from within Enterprise Archive. The Daily Health Check Alert that is generated contains the count of Fatal documents. There are no notifications that are sent in case of Fatal documents.

Note

Daily Health Check is not a standard offering from Enterprise Archive. This is set up only on request, contact Smarsh Support.