Judge Peck’s Latest Word: Revisiting Technology Assisted Review (TAR)
April 8, 2015
Best Practices for In-House Management of Data Privacy and Security Matters
June 3, 2015

The Data Dump: what to do when you’ve received too much data?-Part 1

Have you ever been overwhelmed by a document dump? If your adversary “dumps” a large amount of data on you, with little regard for relevance or organization, how do you deal with it? Years ago, you would have sent associates to a warehouse where they would pour over pages of moldy and musty documents and return with a few hot documents in a few months. Now, the volumes produced make that nearly impossible and prohibitively costly.

Producing parties may dump data for many reasons. They may do it as a cost-saving measure, skipping a potentially expensive relevance review. They may be trying to bury relevant evidence so it is difficult to find. There also just may be a tremendous volume of documents responsive to your requests if you did not sufficiently tailor them.

This blog post will examine a few techniques and tools you can use to triage incoming productions effectively and inexpensively.[1]

When your adversary dumps millions of pages on you, there are three general scenarios that will guide your strategy. First, your case may be well developed, and you know what types of documents you need to find. Or, second, you may have some sense of the case but you need to fill in some gaps. Finally, you may not know much about your case at all, and you want to understand the documents to develop and refine your theory. Let’s look at each one in turn.


If you are looking for specific documents in a data dump, you have the advantage of already knowing information that can lead you to what you need. Now you just need to find a needle in a haystack.

In this scenario, traditional Boolean searches are likely to be helpful, just like you would use on Google or Lexis.

If a Boolean search does not return what you expect, you can also try using metadata analytics to find what you’re looking for. Metadata analytics group documents by known categories and allow you to drill down based on those classifications. If you know the To, From, or CC of a specific email for example, or the domain from which an email was expected, the timing of documents, or anything about the metadata at all, you can use that to hone in on the specific evidence you want. Using metadata analytics to create a review set with messages from John Smith to Edgar Wright in March of 2000 about Acme’s widgets can provide a manageable set of documents to review. This is particularly helpful in matters where codenames, euphemisms, and nontraditional language are used. In a matter where you know what you’re looking for but not the specific terms used within the document, metadata analytics are your best first step.

If metadata analytics don’t reveal what you want, using an example document with textual analytics may be successful. When provided with an example, the system can use textual analytics to reveal documents that are conceptually similar to the one you provide. If you believe you know what you are looking for but don’t have a sample, you can also create one using text from multiple documents simply inventing your own imaginary example for the system to analyze for similar documents.

In my next post, I will address how you may best assess data in a case where you don’t already know exactly what you need.

[1] These strategies are not usually optimal for producing documents.

Other Articles in this Series:

The Data Dump: what to do when you’ve received too much data? Part 2
The Data Dump: what to do when you’ve received too much data? Part 3
Jonathan Swerdloff
Jonathan Swerdloff
Jonathan Swerdloff is Director of Global Client Services and eDiscovery at Scott+Scott Attorneys at Law LLP. Prior to this role, he was an expert Consultant at Driven, Inc. Learn more about Driven's Consulting Services