Judge Peck’s Latest Word: Revisiting Technology Assisted Review (TAR)
April 8, 2015
Best Practices for In-House Management of Data Privacy and Security Matters
June 3, 2015

The Data Dump: what to do when you’ve received too much data?-Part 1

Have you ever been overwhelmed by a document dump? If your adversary “dumps” a large amount of data on you, with little regard for relevance or organization, how do you deal with it? Years ago, you would have sent associates to a warehouse where they would pour over pages of moldy and musty documents and return with a few hot documents in a few months. Now, the volumes produced make that nearly impossible and prohibitively costly.

Producing parties may dump data for many reasons. They may do it as a cost-saving measure, skipping a potentially expensive relevance review. They may be trying to bury relevant evidence so it is difficult to find. There also just may be a tremendous volume of documents responsive to your requests if you did not sufficiently tailor them.

This blog post will examine a few techniques and tools you can use to triage incoming productions effectively and inexpensively.[1]

When your adversary dumps millions of pages on you, there are three general scenarios that will guide your strategy. First, your case may be well developed, and you know what types of documents you need to find. Or, second, you may have some sense of the case but you need to fill in some gaps. Finally, you may not know much about your case at all, and you want to understand the documents to develop and refine your theory. Let’s look at each one in turn.


If you are looking for specific documents in a data dump, you have the advantage of already knowing information that can lead you to what you need. Now you just need to find a needle in a haystack.

In this scenario, traditional Boolean searches are likely to be helpful, just like you would use on Google or Lexis.

If a Boolean search does not return what you expect, you can also try using metadata analytics to find what you’re looking for. Metadata analytics group documents by known categories and allow you to drill down based on those classifications. If you know the To, From, or CC of a specific email for example, or the domain from which an email was expected, the timing of documents, or anything about the metadata at all, you can use that to hone in on the specific evidence you want. Using metadata analytics to create a review set with messages from John Smith to Edgar Wright in March of 2000 about Acme’s widgets can provide a manageable set of documents to review. This is particularly helpful in matters where codenames, euphemisms, and nontraditional language are used. In a matter where you know what you’re looking for but not the specific terms used within the document, metadata analytics are your best first step.

If metadata analytics don’t reveal what you want, using an example document with textual analytics may be successful. When provided with an example, the system can use textual analytics to reveal documents that are conceptually similar to the one you provide. If you believe you know what you are looking for but don’t have a sample, you can also create one using text from multiple documents simply inventing your own imaginary example for the system to analyze for similar documents.

In my next post, I will address how you may best assess data in a case where you don’t already know exactly what you need.

[1] These strategies are not usually optimal for producing documents.

Other Articles in this Series:

The Data Dump: what to do when you’ve received too much data? Part 2
The Data Dump: what to do when you’ve received too much data? Part 3
Jonathan Swerdloff
Jonathan Swerdloff
Jonathan Swerdloff is a Consultant at Driven, Inc. Prior to joining Driven, Jonathan was a litigation associate at Hughes, Hubbard & Reed LLP, accumulating more than 10 years experience in eDiscovery that included managing large discovery projects, analysis of enterprise systems, and investigations into nontraditional data sources. Through his experience as a litigator and programmer, Jonathan focused primarily on creative problem solving with regard to all data types. He analyzed and produced complex enterprise systems and developed internal workflows for large litigations. He deployed Information Governance strategies, has extensive experience with structured data collection, analysis, and production, and has served as an expert witness. His experience also includes developing cost-saving legal processes, managing legal budgets, and supervising legal personnel. Jonathan is admitted to the bars of New York and Connecticut. He holds a J.D. from the Cardozo School of Law and an MPS from NYU’s Tisch School of the Arts Interactive Telecommunications Program, where he studied rapid prototyping and software development. Jonathan is also an adjunct professor at the Parsons School of Design, teaching a Masters-level course in regulatory and ethics contexts for product designers. Jonathan previously served as the Director of Legal Strategy at the Corporate Knowledge Strategies Forum