Enterprise Archive Exports

The export feature in Enterprise Archive allows you to export entire communications or just communication metadata as displayed in the column view. The desired search results can be exported by clicking the Export (images/download/attachments/83825435/export_button_new.png ) button from the results listing space. Multiple exports can be initiated in parallel by selecting and exporting different sets of items. This topic explains how to use the Export feature in Enterprise Archive.

What are the content types that can be exported?


  • Entire Communication Content: Export the communication content type in various format as desired. Multiple communications can be combined to export into a container. To know more, refer the Exporting Communication Content topic given below.

  • Communication Metadata Content: Communication metadata refers to message attributes such as Snapshot ID, From, To, Network, Channel, and so on. To know more, refer the Exporting Communication Metadata topic below.

  • Communication as Email Attachment: To download/view an offline copy of the communication from the message pane, use the application's one-click download feature. To know more, refer the One-Click Download section.

Are there any limitations on the number of documents being exported?


The limit is set 1 million documents per export. That is, clicking Export All or Export All as CSV button in the search results page with more than 1 million documents results in displaying the following error message:

No. of documents selected for exports is greater than 1 million docs, please refine the export criteria and try again.“ is displayed.

This behavior is common across exports performed in Archive Management, Case Management, and Supervision applications.

This limitation is set to prevent outages on export.

Are there any file size limitations while exporting documents?


If the size of an exported package exceeds 1 GB , the package is divided into multiple chunks. This applies for both ZIP and PST files, and each chunk will not exceed 1 GB in size.

Are there any limitations while exporting messages with inline images?


Certain image types such as .webp are not supported by Outlook or other mail clients, hence Enterprise Archive will not render those inline images when exported as an EML.

Are there any limitations in PDF exports?


Export to PDF feature is unavailable for Export All option. As a workaround, select all docs and click Export Selected .

Attachments within the document are exported in separate folders. Information on the list of attachments is appended as a slipsheet in the exported PDF. The slipsheet available at the header section of the PDF lists the number of attachments in the exported document along with the file name, extension, and the size of the attachment.

images/download/attachments/83825435/slipsheet.png

Other limitations that are observed in an exported PDF document:

  • Foreign languages are currently not supported. Documents only in English can be exported.

  • Documents more than 5 MB of body content cannot be exported. The size limit applies also while exporting multiple documents.

  • Images with broken hyperlinks does not appear.

  • Inline images having HTTPS (with SSL) link does not appear.

  • Images beyond the page size of 1500 x 1942 pixels (96 DPI) appear truncated.

  • Font colors are not preserved for interactions converted from RTF to HTML.

  • CSS tags appear for documents that contain inline images (having tables styles).

  • Multiple unwanted lines are displayed.

  • Header links for meta.html files appears.

  • Data in Journal documents are printed twice.

  • Alignment issues are observed in case of certain documents and documents containing inline calendars.

What are the resulting file type when documents are exported as Native?


When you export , the resulting file type is as follows:

When you export documents from the following media as Native, the resulting file type is as follows:

Original Media

Content Source

Resulting File Type

Facebook IM

Socialite

HTML

Facebook Posts

Socialite

HTML

LinkedIn

Socialite

HTML

Bloomberg IM

Vantage

HTML

Bloomberg IM

API

HTML

Bloomberg EML

Vantage

  • EML - If classification is set as Email during ingestion

  • HTML - If classification is any other type other than Email.

Bloomberg EML

API

  • EML - If classification is set as Email during ingestion

  • HTML - If classification is any other type other than Email.

Outlook

EGW

EML

MSG Files

ITM XML

If attribute under ITM XML is as follows:

MS-OXProps-Version =<"1" "2" or "3"> MS-OXProps-Message ="<any value>"

Native Output: MSG

Else, Native Output: EML

EML Files

ITM XML

EML

What is Deduplication?


Deduplication is a process of eliminating excessive copies of data thereby significantly decreasing the storage capacity requirements. This feature also enables reviewers or supervisors to avoid same copies of an email thread while reviewing documents. To enable this feature, check the Remove Duplicate Emails check box.

What happens when "Remove Duplicate Emails" is enabled?


Currently, Deduplication is supported only for email data.

Whenever the user enables Remove Duplicate Emails in an export request, all nearly identical email messages are suppressed in the export. As a result, only one copy of an email having the same email elements (Subject, Body, Attachments, and so on) are included in the export.

The email is deduplicated by calculating a hash on the email elements. If the hash value matches for two email messages, they are considered as duplicates. The email elements that are used by the hash calculation are configured by the Enterprise Archive Operations team for each tenant.

The following email elements can be configured for the deduplication hash calculation are Date, From, Sender, To, Cc, Bcc, Subject, Body, Attachments. All other email headers are ignored for deduplication. If one of the above email elements is removed from the configuration, then that element will be ignored when calculating the deduplication hash value.

The export output contains a dedup.csv file which lists the email messages that were suppressed in the export because they were duplicates. The export also contains a summary.txt file which is a high-level report showing the deduplication count and whether the deduplication feature was enabled for the export.

How to view and download completed exports?


Completed exports can be downloaded from the Exports menu in each application. For more information refer the Viewing Exported Conversations topic.

How fast are the exports in various export formats?


On processing 1 GB of data, here is a report on average export speed in the following formats:

Export Formats

Average Export Speed

ZIP-EML

3600 docs/minute

ZIP-XML

2700 docs/minute

ZIP-MSG

1500 docs/minute

ZIP-Native

1500 docs/minute

ZIP-HTML

1080 docs/minute

ZIP-PDF

120 docs/minute

PST-MSG

480 docs/minute

The export takes a longer time to complete if the following export options are selected:

  • Include Metadata option under Additional options.

  • EDRM v2.0 XML or EDRM v2.0 XML With CoC options under Load Format.

What is an export guardrail? How is it implemented in Enterprise Archive?


An export guardrail is a set of guidelines and rules that regulate the export process. In Enterprise Archive, the implementation of export guardrails involves setting limits to control the export of data. By establishing and enforcing these guardrails, Enterprise Archive maintains control over the amount of data exported, ensuring adherence to compliance regulations and preventing potential issues that may arise from excessive exports. The export guardrail is implemented for both regular exports and CSV exports formats.

For Tier 1 and Tier 2 exports in Enterprise Archive, the guardrail is set at 1 million snapshots . If this limit is exceeded, the export process will not be triggered, and error messages will be displayed on the user interface (UI).

images/download/attachments/83825435/image2024-1-5_12-12-59.png

The following error messages serve as alerts to inform users that the export limit has been crossed:

Condition

Message Displayed

Count of Tier 1 exports > 1 million

Count of documents selected for Tier 1 exports exceeds 1 million, please refine the export criteria and try again

Count of Tier 2 exports > 1 million

Count of documents selected for Tier 2 exports exceeds 1 million, please refine the export criteria and try again

Count of Tier 1 exports > 1 million and count of Tier 2 exports > 1 million

Count of documents selected for Tier 1 exports exceeds 1 million and count of documents selected for Tier 2 exports exceeds 1 million, please refine the export criteria and try again

Exporting Communication Content

To export communication content from the search results:

  1. Select all communications or only specific communications that you wish to export and click the images/download/attachments/83825435/export_button.png button.

  2. Select the export type as shown in the following image.

    images/download/attachments/83825435/exporting_search_Results.png
  3. Selecting Export Selected or Export All will display the following export window appears. Specify a name for the file to be exported in the Name text box.

    You cannot export more than 1 million documents at once. The following message is displayed if more than 1 million documents are exported:

    No. of documents selected for exports is greater than 1 million docs, please refine the export criteria and try again.

    images/download/attachments/83825435/export_format_optionsEA.png

  4. Choose a Container type - ZIP or PST. The exported documents will be downloaded on the chosen type.

  5. Choose a desired Text or Email Format:

    • XML

    • HTML

    • MSG

    • EML

    • Native

    • PDF (BETA ). This is controlled by a feature flag and is available only to certain customers for evaluation purposes. Contact Smarsh Support to enable this feature. Also see, Limitations in PDF Export.

      Note

      When you export a document as XML, additional attributes such as Tags added to the document are also exported.

  6. To export all instant messages into a single consolidated file, select the Zip option under Container, and the HTML format from Text or Email Format options. After you export, the downloaded ZIP file contains a full-snapshot-N.html file. This HTML file provides a consolidated view of all the exported messages and can be viewed in any Enterprise Archive supported web browser.

  7. Select one of the following Load Format options:

    • None - To get the exported document in a simple XML or HTML format.

    • EDRM v2.0 XML - To get the exported document in an Electronic Discovery Reference Model (EDRM) format.

    • EDRM v2.0 XML With CoC - To get the exported document in an Electronic Discovery Reference Model (EDRM) format with CoC (Chain of Custody) enabled. Chain of Custody guarantees Enterprise Archive users data authenticity and ensures the data has not been tampered with from the time it has been ingested until the final download.

  8. Select one of the following Additional Options:

    • Send Notifications -Enables email notifications to be sent when the export jobs have completed.

    • Remove Duplicate Emails - Eliminates near-duplicate email messages from the final export package. Near-duplicate emails are email messages that are identical except for minor differences in the email headers. For more information, see What happens when "Remove Duplicate Emails" is enabled?

    • Include Context - All messages from the same conversation thread (with the same GCID) are exported as a single file. Contact your Smarsh representative to enable this feature and configure the network to include context.

    • Compress Zip - Compresses the final packed zip file.

    • Include Expanded Participants -Includes the complete list of participants in a communication.

      Note

      When the Include Expanded Participants checkbox is enabled, the X-ACTIANCE-RECIPIENTS and X-ACTIANCE-SNAPSHOT-ID are populated only for journal and email communications. Only Non-MIME emails are supported currently, not MIME.

      For Non-Journal and Non-Email communications, the header X-headers will be not populated in the above mentioned export formats.

      This checkbox is available only for the following export types:

      • ZIP-MSG

      • ZIP-EML

      • ZIP-Native

      • PST-MSG

      images/download/attachments/83825435/ExpandDL.png
      You can choose between the following options:

      • As x-header - Includes all DL and participants information within the interaction, which can be viewed as internal headers. That is additional x-headers such as X-ACTIANCE-RECIPIENTS and X-ACTIANCE-SNAPSHOT-ID containing all the participants in the communication including the ones expanded from Distribution Lists and BCC. This option is selected by default.
        images/download/attachments/83825435/x-headers.png

      • In respective email Sender/Receipients headers - Includes all participant information inline. That is, the DL and participants information are available in the respective To, Cc, Bcc, or From fields in the exported interaction.

        Note

        • When BCC recipients are not visible in the original interaction, the BCC list of recipients is obtained from the X-MS-Exchange-Organization-BCC header if available to construct BCC header.

        • The X-MS-Exchange-Organization-BCC header is not modified when it is stored-in or exported-from Enterprise Archive.

        • Recipients that are part of DLs and direct participants will not be deduped in any of the To, CC , BCC fields .

        • The order of recipients in the BCC field is maintained as it is in the original interaction. If both the BCC and X-MS-Exchange-Organization-BCC headers are present, the recipients listed in the BCC header will be shown first, followed by those listed in the X-MS-Exchange-Organization-BCC header.

        • The BCC header is introduced to EML/MSG output when recipients are available under X-MS-Exchange-Organization-BCC and not available under BCC.

        • When From field is expanded, only the first participant info is displayed in EML/MSG output. This is a Microsoft Outlook limitation though Enterprise Archive expands all participant information.

        DLs without Participants Expansion
        images/download/attachments/83825435/DL_withour_Participants.png
        DLs with Participants Expansion
        images/download/attachments/83825435/DL_with_Participants.png

    • Include Metadata - Enabling this check-box will include communication metadata. This is not available for Native and PDF export formats.

      Note

      In case of HTML export, if the Include Metadata option is selected, a meta.html file is included in the exported package. In case of XML, metadata will be included in the exported XML file.

      The following metadata parameters are excluded: in this option is not selected:

      • Transcript Info

      • Interaction information such as Global Thread ID, Interaction attributes, Subject metadata, Modality metadata and so on.

      • Participants

      • Action Events

      • Policy events

    • Date Gap - Enabling this check-box generates a report containing the total number of items found per day in a date range specified per communication.

    • High Priority Export - Enabling this check-box ensures the selected documents will be prioritized for export above any other documents in queue.

  9. Click Export. A confirmation dialog appears once the export is completed.

    Note

    Exported documents available for download for up to 31 days. That is, exports older than 31 days will not be downloadable from the UI. However, users can rerun the exports and download them.

Exporting Communication Metadata

To export only communication metadata from the search results:

  1. Select all communications or only specific communications from the search results, and then select Export Selected As CSV or Export All As CSV.

  2. Specify a name for the file to be exported in the Name text box.

    images/download/attachments/83825435/export_as_CSV.png
  3. Choose the following options under Additional Options:

    • Send Notifications - Enables email notifications to be sent when the export jobs have completed.

    • High Priority Export - Ensures the selected documents will be prioritized for export above any other documents in queue.

  4. Click Export. A confirmation dialog appears once the export is completed.

Note

In a CSV export, Bcc participants are only limited to 50.

When a document containing multiple participants (for example, more than 500 in the To/Cc/Bcc fields) is exported as CSV, the participants listing in the file gets truncated to multiple rows.

This is Microsoft Excel issue. As a workaround, Smarsh recommends to use Notepad++ to view such CSV files.