This file contains a famous example of how summary statistics of a dataset can be misleading, Anscombe, F. J. (1973) Graphs in statistical analysis. American Statistician 27 (1): 17-21.
Four pairs of vectors with identical means:
```{r}
attach(anscombe)
mean(x1)
mean(x2)
mean(x3)
mean(x4)
mean(y1)
mean(y2)
mean(y3)
mean(y4)
```
...that seem to be related in the same way judging only from their correlations:
```{r}
cor(x1, y1)
cor(x2, y2)
cor(x3, y3)
cor(x4, y4)
lm(y1 ~ x1)
lm(y2 ~ x2)
lm(y3 ~ x3)
lm(y4 ~ x4)
```
Plotting can be helpful:
```{r, fig.height=8, fig.width=8}
library(tidyverse)
anscombe %>%
pivot_longer(everything(),
names_to = c(".value","source"),
names_pattern = "(.)(.)") %>%
ggplot(aes(x = x, y = y)) +
facet_wrap(~ source) +
geom_point() +
theme_bw()
```