particularly in showing trends and relationship
Answers
Answer:
I recently finished reading Steve Coll’s book Directorate S, which is a chronicle of the U.S. war in Afghanistan post 9-11. It’s a good book, and one line stuck out for me as I thought it had relevance for data analysis. In one chapter, Coll writes about Lieutenant Colonel John Loftis, who helped run a training program for U.S. military officials who were preparing to go serve in Afghanistan. In reference to Afghan society, he says, “Everything over there is about relationships.” At the time, Afghanistan had few independent institutions and accomplishing certain tasks depended on knowing certain people and having a good relationship with them.
I find data analysis to be immature as an independent field. It uses many tools–mathematics, statistics, computer science–that are mature and well-studied. But the act of analyzing data is not particularly well-studied. And like any immature organization (or nation), much of data analysis still has to do with human relationships. I think this is an often ignored aspect of data analysis because people hold out hope that we can build the tools and technology to the point where we do not need to rely on relationships. Eventually, we will find the approaches that are universally correct and so there will be little need for discussion.
Human relationships are unstable, unpredictable, and inconsistent. Algorithms and statistical tools are predictable and in some cases, optimal. But for whatever reason, we have not yet been able to completely characterize all of the elements that make a successful data analysis in a “machine readable” format. We haven’t developed the “institutions” of data analysis that can operate without needing the involvement of specific individuals. Therefore, because we have not yet figured out a perfect model for human behavior, data analysis will have to be done by humans for just a bit longer.
In my experience, there are a few key relationships that need to be managed in any data analysis and I discuss them below.
Data Analyst and Patron
At the end of the day, someone has to pay for data analysis, and this person is the patron. This person might have gotten a grant, or signed a customer, or simply identified a need and the resources for doing the analysis. The key thing here is that the patron provides the resources and determines the tools available for analysis. Typically, the resources we are concerned with are time available to the analyst. The Patron, through the allocation of resources, controls the scope of the analysis. If the patron needs the analysis tomorrow, the analysis is going to be different than if they need it in a month.
A bad relationship here can lead to mismatched expectations between the patron and the analyst. Often the patron thinks the analysis should take less time than it really does. Conversely, the analyst may be led to believe that the patron is deliberately allocating fewer resources to the data analysis because of other priorities. None of this is good, and the relationship between the two must be robust enough in order to straighten out any disagreements or confusion.
Data Analyst and Subject Matter Expert
This relationship is critical because the data analyst must learn the context around the data they are analyzing. The subject matter expert can provide that context and can ask the questions that are relevant to the area that the data inform. The expert is also needed to help interpret the results and to potentially place them in a broader context, allowing the analyst to assess the practical significance (as opposed to statistical significance) of the results. Finally, the expert will have a sense of the potential impact of any results from the analysis.
A bad relationship between the analyst and the expert can often lead to
Irrelevant analysis. Lack of communication between the expert and the analyst may lead the analyst to go down a road that is not of interest to the audience, no matter how correct the analysis is. In my experience, this outcome is most common when the analyst does not have any relationship with a subject matter expert.
Mistakes. An analyst’s misunderstanding of some of the data or the data’s context may lead to analysis that is relevant but incorrect. Analysts must be comfortable clarifying details of the data with the expert in order to avoid mistakes.
Biased interpretation. The point here is not that a bad relationship leads to bias, but rather a bad relationship can lead the expert to not trust the analyst and their analysis, leading the expert to rely more strongly on their preconceived notions. A strong relationship between the expert and the analyst could lead to the expert being more open to evidence that contradicts their hypotheses, which can be critical to reducing hidden biases.
Data Analyst and Audience
The data analyst needs to find some way to assess the needs and capabilities of the audience, because there is always an audience.
Explanation: