Data Analysis – Correlation Vs. Causation
When analyzing data, there are three aspects one must pay attention to in order to ensure the data is accurate and that any story based on a given data is coherent.
1) Correlation Vs. Causation
The most common misconception which often occurs when reading and analyzing data, is the difference between correlation and causation.
Correlation
Describes patterns in mutual relationships between variables. So if one variable changes, the other one tends to, and vice versa – it doesn’t matter which one we start with.
Causation
Describes relationships between variables that have na action and a direction.
Confounding Variable: a variable that clouds other relationships and makes variables seem directly related, when they are not. Confounding variables usually are:
When analyzing data, it is necessary to try and identify these confounding variables, since they can flip a correlation between positive and negative, completely mask a relationship or even create the illusion of a relationship. These variables can also be something difficult to measure.
Identifying if two variables are linked by correlation or causation is not always straightforward and the best way to determine that is to develop a study that controls the scenario to avoid confounding variables.
Watch this video before you continue: “Misconceptions in Data Analysis. Study Hall Data Literacy”
Data Analysis – Law of Large Numbers and Key Questions
2) Law of Large Numbers
When taking decisions about data, it is a good practice to follow the law of large numbers. One should always try to understand and have access to the sample size in order to critically analyze the sample mean, median or mode.
Law of large numbers – the larger a random sample is, the closer its average should be to the true population average.
3) Key Questions
Besides having an increased attention regarding confounding variables and the law of large numbers, it is also key to ask yourself some questions when analyzing data: