Read and Understand Data: Methods and Tools

Data Analysis – Correlation Vs. Causation

When analyzing data, there are three aspects one must pay attention to in order to ensure the data is accurate and that any story based on a given data is coherent.

1) Correlation Vs. Causation 

The most common misconception which often occurs when reading and analyzing data, is the difference between correlation and causation

Correlation

Describes patterns in mutual relationships between variables. So if one variable changes, the other one tends to, and vice versa – it doesn’t matter which one we start with.

Causation

Describes relationships between variables that have na action and a direction.

Confounding Variable: a variable that clouds other relationships and makes variables seem directly related, when they are not. Confounding variables usually are:

  • Income, 
  • Age,
  • Sex,
  • Education Level,
  • Race.

When analyzing data, it is necessary to try and identify these confounding variables, since they can flip a correlation between positive and negative, completely mask a relationship or even create the illusion of a relationship. These variables can also be something difficult to measure. 

Identifying if two variables are linked by correlation or causation is not always straightforward and the best way to determine that is to develop a study that controls the scenario to avoid confounding variables.

Watch this video before you continue: Misconceptions in Data Analysis. Study Hall Data Literacy

Data Analysis – Law of Large Numbers and Key Questions

2) Law of Large Numbers

When taking decisions about data, it is a good practice to follow the law of large numbers. One should always try to understand and have access to the sample size in order to critically analyze the sample mean, median or mode.

Law of large numbers – the larger a random sample is, the closer its average should be to the true population average.

3) Key Questions

Besides having an increased attention regarding confounding variables and the law of large numbers, it is also key to ask yourself some questions when analyzing data: 

  • Where did the data came from?
  • Who analyzed the data?
  • What is missing from the data analysis? (how was the sample collected? How big it was?)

A picture containing person, outdoor, sunglasses, wearing

Description automatically generated