Social Media Realities For Corporate Counsel
January 18, 2014
Five Questions to Ask Before…You Bring e-Discovery "In-House"
February 23, 2014

Thirteen Important Factors to Consider in Creating an Effective Keyword Search

Keyword searching is an objective search method commonly used to limit data collections to documents containing terms believed to be a strong indicator of potential relevance.  To accomplish this goal, eDiscovery practitioners create lists of words or search strings using proximity connectors which are then compared against an index of the terms extracted from the documents in the database.  Below is a decision tree that can aid in this effort.

1) What is the purpose of the review?

a) Is there to be a production?  Am I trying to cull down the dataset for review itself or only to look at the most relevant documents first?

    • For Internal investigations and review of 3rd party productions it is better to do keyword or conceptual searches  to focus the review and highlight all relevant keywords than to cull down to a dataset  pre-processing only  using keywords, thereby missing documents that may be relevant but do not hit on keywords

2)   Have the parties agreed on final keywords without testing?

a)      If yes, it may be okay to go ahead and run pre-processing.

b)      If not, you need to run post-processing and test the results of keywords and refine.  Keyword searching is most defensible when run post-processing, as running the search prior to processing presents issues such as:

    • the resultant inability to validate results, difficulty changing terms, potential to need to re-process data

3)   Is the dataset conducive to a keyword search?

a)      If not, is Optical Character Recognition needed?

    • Does my dataset contain a lot of images that will need to be reviewed?
    • Does my dataset contain a lot of handwriting?
    • Are there any other bars to an effective creation of an index?

4)   Does the review tool index all terms?

a)      If yes, what words were not indexed by the tool?

    • How can I tailor my keywords to avoid these non-indexed terms? 

5)       Does the data set contain foreign languages?

a)      If yes, do you need to capture foreign language documents?

    1. If not, can cull all foreign language documents
    2. If yes, make sure keywords contain foreign language terms
      • Make sure index contains foreign language terms (may need
      • to set up a separate index – Asian languages, for example)
      • Make sure a native speaker looks over the keywords for analysis purposes (actual use, terms of art, misspellings, slang etc.)

6) Narrowly tailor the terms to attempt to fully capture all relevant material and limit the capture of non-relevant material.

7) Check for terms that have multiple uses and connotations.

8) Check for terms that might have common use in signature blocks ( i.e. “confidential”)

9) Check for a term that might be part of the client’s domain name

10) Check for common misspellings of words

11) Broaden your terms with common synonyms

12) Think broadly about the actual types of records you intend to return

13) Attempt to link with syntax terms as much as possible in order to further narrow your search  - i.e. “draft* /10 will” as opposed to “will”