Enabling Historical Data Import

The Historical Data Import feature allows you to seamlessly integrate past employee information into your system. This functionality ensures a comprehensive record of your workforce, providing valuable insights into past and present employee data. This topic outlines the functionalities and steps involved in utilizing Historical Data Import using CSV files with the "Start Time" field. The "Start Time" allows you to specify the validity period for each employee record, providing a more holistic view of your employee data over time.

This feature is behind a feature flag. Contact Smarsh support to enable this feature.

Why Include Start Time in Import CSV?

Start Time provides a way to associate specific dates with participant data (e.g., groups, attributes) in your CSV files. This is particularly useful for:

  • Handling Email Address Reuse: Companies often reuse email addresses over time. The Start Time attributes ensure accurate participant identification based on specific points in time, eliminating confusion during historical data imports.

  • Tracking Employee Profile Transitions: Employee profiles evolve due to promotions, name changes, and department shifts. Start Time allows you to capture these historical transitions, enriching your data analysis and enabling more granular access control for your content.

The table below outlines several relevant use cases for including Start Time within the Enterprise Archive import process. It details the associated context, the challenges encountered with traditional data ingestion, and how the inclusion of a "Start Time" attribute in the import CSV addresses these issues.

Use Case

Context

Challenge

Solution

Email Address Reuse

Companies often assign email addresses based on a naming convention (e.g., <firstname>@<company.com> , <lastname>.<initials>@<company.com> ). This can lead to duplicate email addresses when multiple employees share similar names. While companies may implement strategies to differentiate these addresses (e.g., john.smith@company.com and john.smith.01@company.com ), these solutions may not account for past employees with the same name.

Existing data ingestion methods in Enterprise Archive might struggle to pinpoint the correct employee associated with an email address in historical data (Tier 2 data) due to email address reuse over time. Customers have no way to specify which employee used a specific email address during a particular period.

Customers can import historical employee data along with a "Start Time" attribute. This attribute specifies the date and time from which a specific employee record becomes valid. This enables Enterprise Archive to differentiate between employees who shared the same email address at different points in history.

Employee Profile Transitions

Some organizations meticulously maintain historical employee data, including details like promotions, department changes, and name modifications. This information is crucial for applying content restrictions based on user groups or custom attributes.

Existing historical data processing methods in Enterprise Archive might not capture these employee profile transitions. This makes it difficult to associate content with the appropriate employee based on their role or status at a specific point in time.

Customers can import historical employee data with various attributes alongside a "Start Time". These attributes could include group memberships, custom classifications, or other details relevant to content access control. By associating these attributes with specific timeframes, Enterprise Archive can accurately determine the appropriate access permissions for each piece of historical content based on the employee's profile at the time it was created or interacted with.

Importing Historical Data

  1. Import via API: Historical data import is currently available exclusively through a dedicated API endpoint. Detailed API documentation outlining Historical CSV import functionalities is available at Historical Import of Participants using a CSV file.

  2. Specifying Start Times: Include a "Start Time" column in your CSV file. This column should be formatted as YYYY-MM-DD HH:MM:SS.SSS in UTC. For example, 01-05-2020 09:15:10:005

Refer the Sample import participant CSV file for more information.

Note

  • Uploading a CSV without a Start Time column will utilize the existing import process.

  • Uploading a CSV file containing a Start Time column via the Enterprise Archive user interface (UI) will result in rejection of the file. Historical data import is currently supported only with an API.

Error Handling

    • Rows with invalid Start Time formats or no Start Time will be rejected during upload.

    • Uploads containing both valid and invalid entries will be partially processed. The error message will identify the specific rows and columns containing invalid data, helping to troubleshoot.

Note

  • No UI Support: Currently, Historical CSV import is limited to API access.

  • File Size Limit: The maximum supported file size per CSV is 1 GB.

  • Single Upload: Only one CSV file can be uploaded at a time.

  • Historic Updates: Profile information for existing dates cannot be updated through Historic CSV import if data already exists in the system.

  • Update Limit: Smarsh allows a maximum of 150 profile updates per employee.

  • Chronological Order is Crucial: Ensure all updates for a given user are uploaded in chronological order. Uploads with records out of order will be rejected by the system.

  • Complete CSV: Data for an employee must be complete within a single CSV file.

  • Duplicate Entries: Uploading entries with identical timestamps for the same participant will be rejected to prevent duplicates.

  • Data Modification: Once uploaded, participant data in Enterprise Archive cannot be directly modified. To update historical data, create a new entry with a revised Start Time.