Many organizations struggle with how to get started with a data map, and how to use one that they have. Some basic steps are outlined below, though your project may not follow these exactly. The scope of your project will depend on how complex your organization’s IT is, your resources, and timing; creating and using a data map pre-litigation is going to be different than when you are responding to discovery. It is important to remember that data maps may be developed incrementally, such as by focusing on certain departments or systems before attempting to data map everything. In all cases, the goal of a data map is to identify how your organization actually does its business and what traces of digital information are stored in tangible media. If you are successfully documenting this information, you are making progress.
A. How to Create a Data Map
The process should begin by interviewing data stakeholders. In most cases, you will need at least three rounds of interviews. First, conduct interviews with your IT department, then interviews with the key stakeholders in each department. Finally, in litigation or an investigation, you will also need interviews with the individual custodians who may be at issue. This allows you to build a full picture and, in litigation, to hone in on a perfect data bullseye.
The data map should have tailored questions about your data that address the same questions that you were taught in fourth grade - Who has the data? Why is there data? What is the data? How is the data stored? When can you delete the data?
The IT interviews should establish the technical landscape of your data systems. Specific, detailed, and granular questions about the computers used, the method of sharing drives, document collaboration tools available - anything that the IT department supports should be documented. In addition, it is important to obtain documentation from them - network drive listings with directory sizes, lists of databases and business applications along with their business owners, and the IT infrastructure.
IT interviews will lay the foundations of your next interview rounds and help you tailor the relevant questions. The discussion should include where data resides within the organization, how it stays inside the organization, and how it flows into and out of the organization. There are several types of information flow for you to consider - flow between individuals, between departments, and between your organization and third parties. There is also the flow of data between machines, between databases, and between business processes. This is also a time to discuss what is home grown within your organization and what is purchased from outside.
Further, the IT interviews should focus on not just the current snapshot of the organization, but also what has come before. Legacy systems can pose unexpected issues in discovery, particularly if those that owned or developed them are no longer with the company. Questions of technical accessibility should also be addressed at this stage as not everything is equally easy to load, use, or understand. As your data map will require regular updating and refreshing, the more you collect today, the easier your job will be when you decommission systems. Legacy data which you are required to hold for regulatory purposes or which falls inside the scope of a legal hold must be handled with particular care, and the first step to getting there is to learn as much about it as you can.
The IT interviews are not your only stop when it comes to determining the informational infrastructure of the organization. Departmental interviews reveal how business happens in your organization and how data is used. Moreover, departmental interviews can reveal “Shadow IT” and “Dark Data” that was unknown to IT. IT generally doesn’t handle the content of organizational data, they just address the form and they usually work with just those things which have been approved. Therefore, no matter what the IT department believes, there is always a chance that there are unknown unknowns for IT.
Shadow IT is any system, storage, or application used in the organization that doesn’t have formal organizational approval and frequently lacks organization support. Examples include a homegrown Access database, a team that has decided to do its project management in a third party hosted solution such as Basecamp, or employees who take work home on unsecured USB thumb drives. Shadow IT is particularly common among employees at organizations that have not adequately provided them with the tools they need, causing them to implement or build their own. Employees often create these solutions this to be helpful; they are attempting to innovate a solution for a company need. “Shadow IT” should surface during the departmental interviews. At times, employee responses about how they do their jobs may reveal answers that surprise even department heads. One practice tip: during the interview process, employees who don’t have much exposure to Legal may find themselves afraid that they are in the crosshairs of an investigation. It is best to encourage candor by making clear that the purpose of the map is to identify all company data, rather than to point fingers.
Departmental interviews may also provide information about your organization’s Dark Data. Gartner defines Dark Data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).” Dark data can pose unexpected privacy issues and can also lead to having more data than you would otherwise expect. This data is generally low value to your organization, as evidenced by its lack of use, yet carries high risks if it is relevant for litigation or is leaked.
Custodial interviews can mean the difference between over and under inclusive collections. You can frequently trust your interviews, but you should also verify what your custodians say by looking at your data map and the documentation you’ve already collected in this process. The map is not the territory, it just serves as a guide as to where to look for things. More on custodial interviews can be found here.
B. How to use the data map
Once you understand your IT infrastructure, what is stored within your organization, and who owns the data, you can use the data map in several ways.
After this process is well underway, you can begin to remediate your data: deleting old, redundant, or outdated data and consolidating duplicate data. Do multiple systems track the same information across departments? Are there efficiencies that can be realized by reorganizing the data behind some processes?
It also allows you to start a privacy and security assessment. Are you collecting personally Identifiable Information (“PII?”) If you are, your interview process can help you identify whether you’ve got adequate controls around it and where it lives in the organization.
If you’ve uncovered Dark Data or Shadow IT infrastructure, are either of those things necessary? Can you eliminate the collection of some Dark Data or bring some of your Shadow IT under the umbrella of IT operations?
A data map is also helpful because it allows you to identify what to preserve. Preserving all of the data within your organization may not be necessary for either a legal or business purpose, but if you don’t know what you have, you don’t know what to keep.
When litigation or an investigation comes along, your map should prove invaluable. It directs you and your outside counsel to who and what is likely relevant, saving you a number of collection related headaches. Data maps allow you to identify the correct custodians and non-custodial data that are related to a matter, what data you should expect them to have, steps you need to take to preserve and collect the data, and any special confidentiality issues that apply to the data.
In Part 3 of this series, I will give you a sample checklist and template for your own customization and use within your organization
Other Articles in this Series:The Data Dump: what to do when you’ve received too much data? Part 1