World Happiness Analysis

Introduction

This is an analysis of the World Happiness Report from 2015-2017, looking at worldwide and region-wise trends in happiness score as well as patterns in the importance of the six factors of happiness in determining overall happiness in each country. The data comes from the Gallup World Poll.

The three datasets used in this analysis are available on Kaggle.

Note: There are some gratuitous remarks made about the data in this report by my friend, Adrian. I’ve indicated these remarks using square brackets [].

Data Import and First Look

Library imports:

library(ggplot2)
library(dplyr)
library(RColorBrewer)
library(tidyr)
library(knitr)
library(readr)

Data import: There are three separate datasets - 2015, 2016, and 2017.

df15 <- read_csv("2015.csv")
df16 <- read_csv("2016.csv")
df17 <- read_csv("2017.csv")

2015 includes 158 countries. 2016 includes 157 countries. 2017 includes 155 countries.

We’re interested in the Country and Region columns of each dataset in order to group overall happiness scores by country and world region. (Note: 2017 does not have a region column, so we can create one by joining the region data from another year’s dataset by country ID and filling in any missing cells manually). Beyond overall happiness score, country, and region, we’re also interested in the columns for Economy (GDP per capita), Family, Health (Life Expectancy), Freedom, Trust (Absence of Government Corruption), and Generosity. These are the six factors of happiness included in the survey. The values in these columns indicate how much each of the factors contributed to the overall happiness score for each country, i.e. how important they were.

Here’s a peek at the relevant columns in the 2015 dataset:

df15 <- df15 %>% select(Country, Region, `Happiness Rank`, `Happiness Score`, `Economy (GDP per Capita)`, Family, `Health (Life Expectancy)`, Freedom, `Trust (Government Corruption)`, Generosity)
kable(head(df15))

Country	Region	Happiness Rank	Happiness Score	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity
Switzerland	Western Europe	1	7.587	1.39651	1.34951	0.94143	0.66557	0.41978	0.29678
Iceland	Western Europe	2	7.561	1.30232	1.40223	0.94784	0.62877	0.14145	0.43630
Denmark	Western Europe	3	7.527	1.32548	1.36058	0.87464	0.64938	0.48357	0.34139
Norway	Western Europe	4	7.522	1.45900	1.33095	0.88521	0.66973	0.36503	0.34699
Canada	North America	5	7.427	1.32629	1.32261	0.90563	0.63297	0.32957	0.45811
Finland	Western Europe	6	7.406	1.29025	1.31826	0.88911	0.64169	0.41372	0.23351

And 2016:

df16 <- df16 %>% select(Country, Region, `Happiness Rank`, `Happiness Score`, `Economy (GDP per Capita)`, Family, `Health (Life Expectancy)`, Freedom, `Trust (Government Corruption)`, Generosity)
kable(head(df16))

Country	Region	Happiness Rank	Happiness Score	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity
Denmark	Western Europe	1	7.526	1.44178	1.16374	0.79504	0.57941	0.44453	0.36171
Switzerland	Western Europe	2	7.509	1.52733	1.14524	0.86303	0.58557	0.41203	0.28083
Iceland	Western Europe	3	7.501	1.42666	1.18326	0.86733	0.56624	0.14975	0.47678
Norway	Western Europe	4	7.498	1.57744	1.12690	0.79579	0.59609	0.35776	0.37895
Finland	Western Europe	5	7.413	1.40598	1.13464	0.81091	0.57104	0.41004	0.25492
Canada	North America	6	7.404	1.44015	1.09610	0.82760	0.57370	0.31329	0.44834

And 2017:

df17 <- left_join(df17, select(df16, "Country", "Region"), by = "Country")
df17$Region[71] <- "Eastern Asia"
df17$Region[33] <- "Eastern Asia"
df17$Region[155] <- "Sub-Saharan Africa"
df17$Region[113] <- "Sub-Saharan Africa"
df17$Region[139] <- "Sub-Saharan Africa"
df17 <- df17 %>% select(Country, Region, Happiness.Rank, Happiness.Score, Economy..GDP.per.Capita., Family, Health..Life.Expectancy., Freedom, Trust..Government.Corruption., Generosity)
kable(head(df17))

Country	Region	Happiness.Rank	Happiness.Score	Economy..GDP.per.Capita.	Family	Health..Life.Expectancy.	Freedom	Trust..Government.Corruption.	Generosity
Norway	Western Europe	1	7.537	1.616463	1.533524	0.7966665	0.6354226	0.3159638	0.3620122
Denmark	Western Europe	2	7.522	1.482383	1.551122	0.7925655	0.6260067	0.4007701	0.3552805
Iceland	Western Europe	3	7.504	1.480633	1.610574	0.8335521	0.6271626	0.1535266	0.4755402
Switzerland	Western Europe	4	7.494	1.564980	1.516912	0.8581313	0.6200706	0.3670073	0.2905493
Finland	Western Europe	5	7.469	1.443572	1.540247	0.8091577	0.6179509	0.3826115	0.2454828
Netherlands	Western Europe	6	7.377	1.503945	1.428939	0.8106961	0.5853845	0.2826618	0.4704898

It looks like the rankings don’t shift around very much for the world’s happiest countries. (Yay for Northern Europe! [“Those homogeneous bastards!” - Adrian])

Data Analysis

Now for the fun stuff.

Here’s a world map color-coded by each country’s overall happiness score in 2015.

worldmap <- map_data("world")
names(worldmap)[names(worldmap)=="region"] <- "Country"
worldmap$Country[worldmap$Country == "USA"] <- "United States"
happy_world <- df15 %>%
  full_join(worldmap, by = "Country")

map_theme <- theme(
    axis.title.x = element_blank(),
    axis.text.x  = element_blank(),
    axis.ticks.x = element_blank(),
    axis.title.y = element_blank(),
    axis.text.y  = element_blank(),
    axis.ticks.y = element_blank(),
    panel.background = element_rect(fill = "white"))

ggplot(data = happy_world, mapping = aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill = `Happiness Score`))  +
  scale_fill_continuous(low="thistle2", high="darkred", na.value="snow2") +
  coord_quickmap() +
  labs(title = "Happiness Around the World - 2015") +
  map_theme

The darker the red, the higher the happiness score. Regions in gray do not have happiness data. The happiest regions of the world appear to be in Europe, North and South America, Australia and New Zealand. Africa appears to contain the lowest overall happiness scores.

[“I’ll bet Americans aren’t actually that happy. They lie.” - Adrian’s unfounded hypothesis]

Let’s have a look at the average happiness of each world region for 2015.

dfavg <- df15 %>%
  select(Region, `Happiness Score`) %>%
  group_by(Region) %>%
  summarize(Average = mean(`Happiness Score`)) %>%
  arrange(desc(Average))
kable(dfavg)

Region	Average
Australia and New Zealand	7.285000
North America	7.273000
Western Europe	6.689619
Latin America and Caribbean	6.144682
Eastern Asia	5.626167
Middle East and Northern Africa	5.406900
Central and Eastern Europe	5.332931
Southeastern Asia	5.317444
Southern Asia	4.580857
Sub-Saharan Africa	4.202800

We can visualize these averages with a boxplot:

ggplot(data = df15, aes(x = df15$Region, y = df15$`Happiness Score`)) +
  geom_boxplot(aes(color = Region, fill = Region), alpha = 0.5) +
  geom_point(aes(color = Region), position = position_jitter(width = .1)) +
  labs(title = "Happiness by World Region - 2015", 
       x = "Region", 
       y = "Happiness Score") +
  theme_minimal() +
  theme(plot.title = element_text(size = rel(2.5)),
        axis.title = element_text(size = rel(1.5)),
        axis.text.x = element_blank())

The table of averages and the boxplot both confirm the intuitions we had about the world map for 2015. Australia & New Zealand, North America, and Western Europe have the highest average happiness scores. Sub-Saharan Africa has the lowest average, next to Southern Asia.

We can make similar boxplots for 2016 and 2017.

ggplot(data = df16, aes(x = df16$Region, y = df16$`Happiness Score`)) +
  geom_boxplot(aes(color = Region, fill = Region), alpha = 0.5) +
  geom_point(aes(color = Region), position = position_jitter(width = .1)) +
  labs(title = "Happiness by World Region - 2016", 
       x = "Region", 
       y = "Happiness Score") +
  theme_minimal() +
  theme(plot.title = element_text(size = rel(2.5)),
        axis.title = element_text(size = rel(1.5)),
        axis.text.x = element_blank())

ggplot(data = df17, aes(x = df17$Region, y = df17$Happiness.Score)) +
  geom_boxplot(aes(color = Region, fill = Region), alpha = 0.5) +
  geom_point(aes(color = Region), position = position_jitter(width = .1)) +
  labs(title = "Happiness by World Region - 2017", 
       x = "Region", 
       y = "Happiness Score") +
  theme_minimal() +
  theme(plot.title = element_text(size = rel(2.5)),
        axis.title = element_text(size = rel(1.5)),
        axis.text.x = element_blank())

Just from these boxplots, we can tell that the average happiness scores across world regions don’t change very much from 2015-2017.

Let’s look at how happiness scores changed for each country over time.

df15$year <- "2015"
df16$year <- "2016"
df17$year <- "2017"

names(df15)[names(df15)=="Happiness Score"] <- "score"
names(df16)[names(df16)=="Happiness Score"] <- "score"
names(df17)[names(df17)=="Happiness.Score"] <- "score"

dfall <- rbind(select(df15,"Country", "Region", "score", "year"),
               select(df16, "Country", "Region", "score", "year"),
               select(df17, "Country", "Region", "score", "year"))

ggplot(data = dfall) +
  geom_line(mapping = aes(x = year, y = score, group = Country, 
                          color = Region),
            alpha = 0.5, show.legend = FALSE) +
  geom_point(aes(x = year, y = score, color = Region), 
             position = position_jitter(width = .1),
             alpha = 0.5,
             show.legend = FALSE) +
  labs(title = "Worldwide Happiness Scores 2015-17", 
       x = "Year", 
       y = "Happiness Score") +
  theme_minimal() +
  theme(plot.title = element_text(size = rel(2.5)),
        axis.title = element_text(size = rel(1.5)),
        strip.text.x = element_text(size = rel(1.5))) +
  facet_wrap(~ Region)

For the most part, the scores for each country do not change significantly from 2015-2017. There are very few countries whose scores decreased significantly, and fewer still whose scores increased significantly. The countries that underwent significant change, if any, were primarily in Sub-Saharan Africa or Latin America & the Caribbean. This makes sense, since countries in these regions are more subject to sudden changes in economy and political stability.

To explore the factors that could be contributing to the score differences between world regions, let’s have a look at the six factors of happiness for each of these regions. We’ll use the 2015 data.

# 2015
names(df15)[names(df15)=="Economy (GDP per Capita)"] <- "Economy"
names(df15)[names(df15)=="Health (Life Expectancy)"] <- "Health"
names(df15)[names(df15)=="Trust (Government Corruption)"] <- "Trust"
pairs(~ Economy+Family+Health+Freedom+Trust+Generosity, data = df15, 
      main="Importances of the Six Factors of Happiness")

This pairs plot compares the importance of each of the six factors of happiness to each of the others. If there is a strong positive linear correlation between two factors, we can say that if one factor is important in evaluating a country’s overall happiness, it is likely that the other factor is important as well. Based on the plots, it seems that the importances of Economy & Health are strongly correlated, as well as Economy & Family.

I was hoping to be able to draw regression lines over each of the pair plots, but couldn’t figure out how. I’ve settled with eyeballing the correlations.

Let’s take a closer look at the top 10 happiest countries in 2015 and how much each of the six factors contributed toward their overall happiness scores. For this, a stacked bar plot would be a useful visualization.

dfwide <- df15 %>%
  head(10)

dflong <- gather(dfwide, Factor, `Importance of Factor`, Economy:Generosity, factor_key=TRUE)

ggplot(data = dflong) +
  geom_bar(stat = "identity", 
           aes(x = Country, y = `Importance of Factor`, fill = Factor)) +
  coord_flip() +
  theme_minimal() +
  theme(legend.position = "top") +
  labs(title = "The Six Factors of Happiness in the Ten Happiest Countries") +
  theme(plot.title = element_text(size = rel(1.5)),
        axis.title = element_text(size = rel(1.5)))

In general, Economy and Family seem to the the two most important factors of happiness in these countries. Trust (absence of corruption) and Generosity are the least important.

These six factors don’t add up to the overall happiness score for each country in the bar plot because they should just be thought of as weights and the ‘Dystopian Residual’ isn’t taken into account. More information on what the ‘Dystopian Residual’ is can be found in the data overview on Kaggle

Conclusion

This analysis illustrated that the world’s happiest countries are primarily in Western Europe (especially Northern Europe), North America, and Australia & New Zealand. These averages did not change very much from 2015-2017. It also revealed that Economy (GDP per capita) is the most important factor in evaluating a country’s happiness. Unsurprisingly, the happiest countries and world regions generally tended to be ones with strong and stable economies. The importance of Economy is also strongly positively correlated with those of Family and Health. This is expected, since more economic stability and higher GDP per capita generally encourages stable and comfortable family life as well as increases the availability of proper medical resources and healthcare. These factors then weigh more when determining overall happiness.

I would also hypothesize that these three factors–Economy, Family, and Health–tend to be particularly important because they directly affect individuals living in these countries. Everyone is affected by the state of the economy, especially since it holds direct sway over the availability and security of jobs and the flow of money. Families are the nucleus of home life for most individuals, and Health also affects people on the level of individuals. Consequently, these are very concrete factors and therefore have more influence on the happiness score gauged by individuals.

Sub-Saharan Africa and Southern Asia could definitely use a lift, but overall, the world doesn’t seem to be doing too badly. Here’s to the future!

Acknowledgements

Many thanks to:

Kenneth Tay - Instructor Adrian Liu - Moral Support and Witticisms FloMo Dining Hall - Carbohydrates and Proteins