Introductory Guide to eDiscovery

A lawyer in the eDiscovery process, working on her laptop with a justice scale statue on her desk.
March 14, 2024

101 Guide: eDiscovery

Organizations produce a deluge of electronic information. Email inboxes are overflowing, hard drives and cloud storage repositories are packed, and social media notifications come at breakneck speed. 

For legal professionals who must locate key documents and information for use as possible evidence in litigation and investigations, the explosion of such data has rendered the process of collecting, culling, and reviewing digital information—known as eDiscovery—incredibly costly and time-consuming. 

Fortunately, advances in eDiscovery technology are reducing the cost and burden for legal teams to narrow the universe of potentially relevant data. This eDiscovery 101 guide provides an introduction to the fundamentals of eDiscovery and explores challenges and best practices, including considerations for multilingual and cross-border eDiscovery.

What Is eDiscovery?

eDiscovery is the process of identifying, collecting, filtering, and reviewing electronically stored information (ESI) that constitutes potential evidence in legal proceedings and investigations. During eDiscovery, legal professionals, IT personnel, and forensics teams work together to narrow very large datasets into more manageable volumes. 

Prior to the digital age, “discovery” in business disputes meant combing through boxes and drawers full of paper documents to locate key information and evidence. Today, since most documents needed for business litigation, arbitration, and investigations are stored in electronic format, traditional discovery is referred to as eDiscovery and is focused on ESI. In the early days of eDiscovery, it was possible for legal professionals to manually review ESI to determine its relevancy to a matter. This was feasible for 25 or 2,500 files, but not so much for 25,000, let alone 250,000 or 2.5 million! 

Today, eDiscovery tools powered by analytics and artificial intelligence (AI) significantly streamline the eDiscovery process. But even with modern technology, given the ever-increasing volumes and diversity of ESI, eDiscovery remains an intensive endeavor that requires human expertise, effective communication, and agile decision-making at every step of the way.

What Is ESI?

ESI stands for electronically stored information. This includes a wide variety of digital assets, including emails; e-documents (Word, PPT, and Excel files); image, audio, and video files; mobile device data (e.g., chat programs and text messages); cloud-based applications; website content; and social media postings.

What Is the EDRM Framework for eDiscovery?

Introduced in 2005, the Electronic Discovery Reference Model (EDRM)  is a visual representation of the complete eDiscovery lifecycle, which is widely referred to by legal teams globally. The EDRM breaks down the eDiscovery process into nine steps, although not every step is relevant to every matter. The EDRM isn’t always followed sequentially, and steps may be repeated depending on the project scope and cadence. 

What Are the Steps in the EDRM?

01 Information Governance 

Information governance (IG) is often cited as the first step in eDiscovery, but it may be more helpful to think of it as proactive planning that a company can do to prepare for and execute eDiscovery in a streamlined manner. IG helps to provide an understanding of the scope of data as well as data sources by asking questions such as the following:

a.    Where is my data located? 
b.    Who might have possession, custody, and control of my data?
c.    Am I retaining more data than I need to for legal and operational purposes? 
d.    What are the applicable laws and regulations surrounding the retention and transfer of my data?
e.    Do I have a clear understanding of what steps to take by which stakeholders in the event that the eDiscovery process is triggered?

With solid IG protocols in place, the eDiscovery process will be more efficient and streamlined. 

docs icon

02 Identification 

This is the process of identifying the location, systems, sources, and people (“custodians”) who may hold data that will be relevant to the case at hand. You’re trying to understand the scope of data and how to access it for eDiscovery purposes. 

step 2 process icon

03 Preservation 

Once relevant ESI has been identified, it needs to be preserved for litigation. Common ways to accomplish this are to place a formal legal hold on the data so that it cannot be modified, deleted, or destroyed, or create a forensic copy of the ESI for external preservation and storage.

step 3 doc icon

04 Collection

Data collection is a process of forensically creating copies of all data, including preserving metadata. This is typically conducted by forensic professionals using specialized forensic software, although some modern IT systems enable “self-collection” by IT professionals. It is paramount that the contents of the data, including its underlying metadata, are not altered or lost during this process. 

step 4 computer icon

05 Processing

This is where ESI (including metadata) from a variety of sources is put into a common, usable format that can be prepared and displayed for attorney analysis and review. During the processing phase, data can be deduplicated and further filtered/culled using search terms, date restrictions, filetype restrictions, and, to varying degrees, data content.

Person reviewing docs icon

06 Review

After data processing and filtering, the “surviving” data is evaluated by attorneys for relevance, privilege, confidentiality, and privacy. Organizations often outsource this step to outside law firms, as in-house legal teams often don’t have the time or resources. In some cases, in-house counsel will do a “first pass” review of data before handing the process to outside parties. Third-party managed review companies also provide contract review attorneys to execute the review process at a fraction of the price of law firm personnel. Typically, the review of ESI is performed on a specialized document review platform, such as Relativity. 

Person with checklist icon

07 Analysis

Data is evaluated by the attorneys for context and content. The attorneys are categorizing the documents based on relevance, issues, and importance.

Person at their desk icon

08 Production

Data is packaged and disclosed to opposing counsel and/or government investigators. It may be placed in a static format, such as PDF, making it possible to redact privileged or nonrelevant data. 

PDF icon

09 Presentation

Data is digitally presented as evidence during a legal proceeding. 

Person presenting icon

What Is Early Case Assessment?

Early case assessment (ECA) in a workflow is typically employed during the “processing” stage above to identify key documents and information that the legal team uses to advise its client on case strategy and merits. ECA can help predict the cost of a case as well as likely exposure, which helps teams create realistic case strategies and budgets for the full eDiscovery process. 
ECA also uses advanced methods to filter out non-relevant data early in the eDiscovery process, thereby narrowing large datasets. Beyond the standard culling methods of filtering by date and search terms, and deduplicating and de-NISTing files (i.e., removing system files and other non-user-generated files), ECA leverages the following tools to further cull data during processing:

  • Filetype filters (e.g., removing calendar entries or video files when not relevant to the matter)
  • Domain/email handle filters (e.g., removing SPAM and industry newsletters that hit on search terms but are not relevant to the underlying dispute)
  • Concept clustering (i.e., organizing the files by content to identify and remove groups of search term false positives)

Generally speaking, standard ESI filtering results in an 80% reduction in the volume of ESI. Using ECA advanced filtering, that number can top 90%, meaning only 10% of your ESI moves to the document review phase, rather than 20%. 

Digital Reef is an ECA and eDiscovery platform from TransPerfect Legal. Digital Reef ingests, culls, analyzes, and exports datasets. It can be installed securely onsite or hosted in a TransPerfect Legal data center. Capable of processing up to 17 terabytes of data in 24 hours and managing multi-language data, Digital Reef can enable you to process, investigate, and preview your data 40% faster while reducing datasets by more than 90%. Attorneys and legal processing service providers use Digital Reef’s built-in document viewer and coding interface to conduct investigations.  

For larger matters that require linear or a technology-assisted review (TAR)-based approach, data is culled in Digital Reef and then exported in required output formats to easily load into your preferred review solution. Digital Reef integrates with numerous technologies, including the market-leading Relativity document hosting and review platform. The team at TransPerfect Legal builds proprietary add-ons for Relativity.  

ECA in Action

In a construction matter, a case-relevant search term entered into an ECA tool produced 300,000 emails. However, many of these documents were for construction projects that were not relevant to the matter under consideration. Emails related to these extraneous projects were then excluded from the dataset, which halved the number of search hits. Further analysis removed irrelevant email domains, for an overall reduction of 200,000 documents—leaving only 100,000 documents to review. Spending a few hours to run through this ECA exercise saved more than 4,000 hours of document review, which saved hundreds of thousands of dollars. While the technology allowed for rapid data exclusion, it took the experienced, skilled eyes of the human user to unlock the efficiencies of the ECA tool.

What Happens During Document Review and Analysis?

Document review and analysis is the most time-consuming and costly part of the eDiscovery process. It involves large teams of lawyers examining the documents that have survived the culling process to determine their relevance to the case. During document review, data is evaluated for relevance, privilege, confidentiality, and privacy. Technology-assisted review (TAR) typically happens during document review. 

Before undergoing document review, it’s important to:


documents icon

Identify names, dates, and keywords you will be looking for

documents icon

Identify agreement on document tags and labeling (e.g., relevant, privileged, smoking gun)

Team icon

Identify whether managed review or contract review  attorneys will be utilized (for the first pass of the review and/or the entire review)

Quality document icon

Identify quality control measures

Process flow chart icon

Identify parameters for efficiency

Computer icon

Identify whether TAR will be utilized to streamline the review

As with early case assessment, technology can streamline the process with document hosting, review, and analysis. Examples are Relativity and TransPerfect Legal’s newly released Reef Review. Reef Review offers a robust suite of analytics and AI features including continuous active learning, redaction, near-duplicate analysis, and daily review reporting.

How Is AI Streamlining eDiscovery?

AI has become an increasingly integral part of eDiscovery over the last decade. AI uses data-mining techniques that can narrow the set of documents sent for review, saving both time and money. AI-powered eDiscovery software leverages technologies such as machine learning (ML), natural language processing (NLP), and, more recently, generative AI. Here are some of the ways AI can automate and streamline aspects of eDiscovery:

Technology-Assisted Review (TAR)

TAR (sometimes called predictive coding) helps prioritize and identify potentially relevant documents for review by learning from human reviewers' feedback. Over the past 10 years, TAR has become the most popular and powerful AI tool in the eDiscovery toolbox.

Concept Clustering

After files have been processed, concept clustering groups the remaining data based on concepts, topics, or ideas. This allows reviewers to examine documents based on similarity, remove groups of search-term “mishits,” and generally focus the review on more relevant content first. 

Conceptual Search

Beyond just keyword searches, conceptual searching enables searching based on the context and meaning of the content, thereby improving the accuracy of search results.

Generative AI

Looking ahead, generative AI (e.g., ChatGPT) is poised to take on a more prominent role in eDiscovery. Take, for example, the “document dump” that sometimes happens the night before a deposition. Generative AI may be used to quickly summarize documents or create a chronology, making a previously untenable eleventh-hour review possible. Likewise, generative AI can provide useful features, such as “ask my documents a question,” as well as potentially replace TAR as the key driver of document review itself.

Language Identification and Machine Translation

Some AI-powered eDiscovery platforms have language identification capabilities that can automatically detect and tag the language of documents. Likewise, advanced machine translation engines are often built directly into eDiscovery platforms for multilingual datasets.

Named Entity Recognition (NER)

AI-based NER can identify and extract entities such as names, locations, and dates from documents. 

Sentiment Analysis

Sentiment analysis tools can help identify the tone and emotional context of communications.

AI in Action

Let’s say you want to find all documents that include references to “X” and any documents that include negative sentiments around “X.” Suppose you have 75,000 possible documents to review. Using an AI tool, you create your instructions, submit them, and then wait for documents to be reviewed. Each document gets classified in one of four ways: relevant to the issue, not relevant to the issue, needs further review, or has a technical issue. The tool will tell you which issue an item is relevant for (reference to X and/or negative sentiment about X). Now you have a much smaller pool of documents to review.

What Are Some Considerations for Multilanguage and Cross-Border eDiscovery?

The complexities of language and translation can make it difficult to understand and analyze electronic data in foreign languages. Cultural differences can also impact the interpretation and analysis of data, which can affect the accuracy and effectiveness of an investigation. 

Multilanguage eDiscovery tools play an important role when electronic data is in more than one language. They use natural language processing (NLP) and machine learning algorithms to help investigators track, understand, translate, and analyze electronic data in multiple languages. 

Cross-border investigations can be tricky, as laws and regulations regarding the collection, analysis, and transfer of electronic data vary by country. Investigators must have an awareness of data protection laws in each jurisdiction to ensure proper legal compliance. Multilingual eDiscovery tools offer legal and regulatory guidance, helping to ensure that no laws are broken within the jurisdictions in which the investigation is being conducted.

Data Privacy Compliance During eDiscovery

Data privacy and protection are paramount to maintaining the integrity of case materials. Companies need to keep data privacy laws and regulations front and center during eDiscovery. This means adhering to rules under such acts as the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and the European Union Privacy Directives (EUPD) for every jurisdiction in which they do business.

Companies need to be mindful of the way they identify and handle personal health information (PHI) and personally identifiable information (PII). Identifying where PHI and PII appear in datasets will keep you from triggering data privacy concerns. Advanced data mining technology can look through data and find people, places, and organizations and then appropriately redact that data. Note that the GDPR has more stringent rules about what constitutes personal information, including sexual orientation and political beliefs. 

What Are Some Challenges Associated with eDiscovery?

The Data Itself

The first and ever-increasing challenge has to do with data itself: the sheer volume of it, the varying types, and the disparate locations where it lives. With more people working remotely and on a growing number of platforms, enormous amounts of data are being produced every day. The more widely dispersed the data is, the more complex eDiscovery can be. 

Two hands write on a notebook with icons overlaid on the image
Colleagues at a law firm look at a laptop together.


The size of a company and the volume of legal issues it faces can influence the ease at which eDiscovery happens. Larger companies are more likely to have internal IT resources to help with eDiscovery. Companies that do a lot of eDiscovery are more likely to have established and well-enforced data management policies.  


Costs incurred during eDiscovery include processing, hosting, and reviewing data as well as costs associated with specialized eDiscovery technology. The document review phase usually carries the bulk of the eDiscovery cost, which will be higher the more data you have. Some companies will use contract lawyers for part of the process to save expenses.

A lawyer reviewing docs
A lawyer pointing with a security icon and a web of profile icons overlaid on the photo.

Data Privacy

Ensuring the security of sensitive information can be tricky. What happens if privileged documents are given to opposing counsel? This can happen under the best of circumstances, so it’s prudent to have a clawback agreement in place that allows for privileged materials to be returned without a waiver of privilege.

Data in Multiple Languages

Documents in multiple languages add complexity to the eDiscovery process, requiring translation as well as cultural sensitivity. 




A group of lawyers from India reviewing a document on a laptop.
A hand reaching out to a graphic of the globe, all overlaid on a photo of a blurred cityscape.

Cross-Border eDiscovery 

This necessitates navigating different legal systems and data protection regulations while being cognizant of and complying with international laws.




Are your ready to speak to an expert about our eDiscovery services?


Kickstart your global strategy

We enable our clients to reach new markets globally by connecting with their audiences and providing the best possible customer experience—in any language. 



Fill out the form to request information

Please visit to register in our system and submit questions
Please visit LinkedIn to check our live openings.

By submitting this form, you agree to the privacy policy and terms of this website.