The Data Dump: what to do when you’ve received too much data? Part 2

<< Read Part 1: The Data Dump: what to do when you’ve received too much data?

In my last post, I addressed finding a needle in a haystack when an opposing party produces its documents to you as a large, unorganized document dump. The next two posts in this series address what happens when you know that there’s a haystack but you’re not so sure about what the needles might look like. In this post, I will address an instance where you know the basic facts of your case but you have some evidentiary gaps.

If you don’t know much about the specific evidence that you need to prove your points, there are several techniques and tools you can use to triage the received production. Because you have a well developed theory of the case and you are trying to fill evidentiary gaps in a cost effective way, we’ll start with some very efficient first steps. Clearly, you will need to look at the production, but almost definitely need not look almost to get going.

The more specifics you know, the better off you are. For example, you may know date ranges, important players, and whether there are specific custodians who have had conversations. Using this information, you can use metadata analysis to narrow the set of documents. However, when taking this approach, remember– you can’t find what isn’t there. If you decide to only search and review part of a production, as you will see in a little while, you may miss important documents. In the ONE platform, this early data analysis will also give you the ability to select which documents to load. Only loading part of a production can potentially save you review time as well as money in hosting. The data you have not loaded will remain fully searchable and the text can still be reviewed before loading, so you also have less risk of missing something.

At this stage you will know something about the case and you may have negotiated the search terms that were the cornerstone of the production. If this is the case, you should look at a keyword hit count report. This will tell you how many documents responsive to each of your negotiated terms were produced. Running this report will give you the ability to look at the specific documents within the production that are the most likely to be relevant and give you context for them. You can turn on keyword highlighting within the documents, which will increase speed and efficiency for review. The keyword hit count can also be useful in determining which of the keywords used by the producing party are creating false hits or returning too many documents within your production set. This gives you the opportunity to fine-tune your approach. You may want to run Boolean searches on multiple keywords to find specific concepts.

Throughout this process there will inevitably be some manual review. You need to look at your evidence before you can use it. When documents are produced, they are usually grouped in families. 1 Reviewing families can add a huge number of documents to your review. Different circumstances call for different strategies. When you are looking for something specific, for instance, looking at a family may give color to the document you’ve found. On the other hand, if you’re trying to flesh out your case, adding family members can add a significant number of documents to your review. This is a double edged sword - looking at families gives you a greater sense of the documents but takes more time and thus costs more. Working with keyword highlights can make this process much less onerous by identifying both which documents in an archive are relevant and where in those documents the relevant content is. If you already have a good sense of your case you should start filling in evidentiary gaps by leveraging the keyword hit report, keyword highlighting, and looking at documents in families. Whether it’s fleshing out your timeline or expanding on known concepts, these techniques help you shine a spotlight on exactly what you are looking for using what you already know as a guide.

In the next post, I will turn to what to do when you have a general sense of your case but you are still developing your understanding of issues.

1 A family of documents is a set that was initially grouped together, such as a single email and its attachments or a zip file that contains multiple documents. One caution about zip files and other archives – they can contain a fair number of irrelevant documents depending on the way the initial zip file was created. If a custodian zipped up an entire file folder, for example, and there was only one relevant document in the folder, you may end up with many documents that are needlessly reviewed.

