View on GitHub

Empirical Research and Statistical Analysis course

Thomas Quettier PhD

📝 Exercise: Exploring Anscombe’s Quartet

“Anscombe’s Quartet”

is a classic illustration of the importance of visually inspecting data. It consists of four datasets, each with two variables, X and Y. Despite having nearly identical statistical properties, these datasets reveal very different relationships when plotted. Follow the steps outlined below and, as the saying goes, let’s put the elephant in the fridge! 🐘

Dataset Location

Navigate to: Data Library > lsj-data > Anscombe

Instructions

1️⃣ Check the Data

Summarize the Data:
- Calculate descriptive statistics (e.g., mean, standard deviation) for each dataset (X1 and Y1, X2 and Y2, etc.).

2️⃣ Write the Hypotheses

Null Hypothesis (H₀): X and Y are not correlated.
Alternative Hypothesis (H₁): X and Y are correlated.

3️⃣ Provide the Correlation Coefficients

Since there are multiple variables to test:
- Generate a correlation matrix for all datasets.
- Include three correlation coefficients:
  - Pearson
  - Spearman

4️⃣ Report the Results

Include:
1. Correlation Coefficient (r) for each test.
2. Sample size (N) for the analysis.
3. P-value to determine statistical significance.
Provide a clear interpretation:
- Does the data support the hypothesis that X and Y are correlated?

5️⃣ Graphical Representation of the Data

For each subset of the dataset (X1 and Y1, X2 and Y2, etc.):
- Create a scatterplot.
- Reflect on the relationship shown in the scatterplot:
  - Is it linear?
  - Are there any outliers?

🔙 Return to Day 4