📝 Exercise: Exploring Anscombe’s Quartet
“Anscombe’s Quartet”
is a classic illustration of the importance of visually inspecting data. It consists of four datasets, each with two variables, X and Y. Despite having nearly identical statistical properties, these datasets reveal very different relationships when plotted. Follow the steps outlined below and, as the saying goes, let’s put the elephant in the fridge! 🐘
Dataset Location
Navigate to: Data Library > lsj-data > Anscombe
Instructions
1️⃣ Check the Data
- Summarize the Data:
- Calculate descriptive statistics (e.g., mean, standard deviation) for each dataset (X1 and Y1, X2 and Y2, etc.).
2️⃣ Write the Hypotheses
- Null Hypothesis (H₀): X and Y are not correlated.
- Alternative Hypothesis (H₁): X and Y are correlated.
3️⃣ Provide the Correlation Coefficients
- Since there are multiple variables to test:
- Generate a correlation matrix for all datasets.
- Include three correlation coefficients:
- Pearson
- Spearman
4️⃣ Report the Results
- Include:
- Correlation Coefficient (r) for each test.
- Sample size (N) for the analysis.
- P-value to determine statistical significance.
- Provide a clear interpretation:
- Does the data support the hypothesis that X and Y are correlated?
5️⃣ Graphical Representation of the Data
- For each subset of the dataset (X1 and Y1, X2 and Y2, etc.):
- Create a scatterplot.
- Reflect on the relationship shown in the scatterplot:
- Is it linear?
- Are there any outliers?