Foreword
googleVisThe most famous TED talks on data and statistics.
The data sings, trends come to life, and the big picture snaps into sharp focus.
This is reproducible in R. We need to transform development statistics into moving bubbles and flowing curves that make global trends clear, intuitive and even playful.
First, load data on a country’s evolution of life expectancy, GDP and population over the past years into R.
We can get this data by using Qlik DataMarket.
# Load rdatamarket and initialize client
library(rdatamarket)
dminit(NULL)# Pull in life expectancy and population data
life_expectancy <- dmlist("15r2!hrp")
population <- dmlist("1cfl!r3d")
# Inspect life_expectancy and population with head() or tail()
head(life_expectancy)## Country Year Value
## 1 Bahrain 1960 52.09244
## 2 Bahrain 1961 53.46056
## 3 Bahrain 1962 54.81929
## 4 Bahrain 1963 56.14710
## 5 Bahrain 1964 57.42588
## 6 Bahrain 1965 58.64007
tail(life_expectancy)## Country Year Value
## 13493 Latin America & the Caribbean (IDA & IBRD countries) 2010 74.04205
## 13494 Latin America & the Caribbean (IDA & IBRD countries) 2011 74.28092
## 13495 Latin America & the Caribbean (IDA & IBRD countries) 2012 74.51328
## 13496 Latin America & the Caribbean (IDA & IBRD countries) 2013 74.73790
## 13497 Latin America & the Caribbean (IDA & IBRD countries) 2014 74.95507
## 13498 Latin America & the Caribbean (IDA & IBRD countries) 2015 75.16584
head(population)## Country Year Value
## 1 Bahrain 1800 64474
## 2 Bahrain 1820 64474
## 3 Bahrain 1870 64474
## 4 Bahrain 1913 81882
## 5 Bahrain 1950 114840
## 6 Bahrain 1951 117580
tail(population)## Country Year Value
## 20449 British Virgin Islands 2004 22187
## 20450 British Virgin Islands 2005 22643
## 20451 British Virgin Islands 2006 23098
## 20452 British Virgin Islands 2007 23552
## 20453 British Virgin Islands 2008 24004
## 20454 British Virgin Islands 2030 32023
# Load in the yearly GDP data frame for each country as gdp
gdp <- dmlist("15c9!hd1")
# Inspect gdp with tail()
tail(gdp)## Country Year Value
## 11499 Guam 2011 30862.11
## 11500 Guam 2012 32499.23
## 11501 Guam 2013 33278.25
## 11502 Guam 2014 34361.08
## 11503 Guam 2015 35210.79
## 11504 Guam 2016 35562.57
Preparing the data
Not all column names are named properly: the string Value is used to name the GDP value, the life expectancy value, and the population value. It would be better if you could make these more descriptive and unique for each. (Tip: use names() to see the column names of a dataset.)
Our data is only complete until 2008.
These issues should be fixed before you start creating your graph.
In addition, if we want to map all three development statistics into one interactive graph (and you should because it is extremely cool), we have to merge the three data frames.
# Load in the plyr package
library('plyr')
# Rename the Value for each dataset
names(gdp)[3] <- 'GDP'
names(population)[3] <- 'Population'
names(life_expectancy)[3] <- 'LifeExpectancy'
# Use plyr to join your three data frames into one: development
gdp_life_exp <- join(gdp, life_expectancy)
development <- join(gdp_life_exp, population)
head(development, 3)## Country Year GDP LifeExpectancy Population
## 1 Bahrain 1980 8537.929 69.77720 347568
## 2 Bahrain 1981 9269.270 70.17266 363427
## 3 Bahrain 1982 9446.158 70.53151 377967
Last data preps
Now that we have merged the data, it would make sense to trim the data set. Two ways:
# Subset development with Year on or before 2008
development_complete <- subset(development, development$Year <= 2008)
# Print out tail of development_complete
tail(development_complete)## Country Year GDP LifeExpectancy Population
## 11491 Guam 2003 22572.50 76.12512 163593
## 11492 Guam 2004 24396.11 76.43898 166090
## 11493 Guam 2005 26495.88 76.74034 168564
## 11494 Guam 2006 26555.65 77.03246 171019
## 11495 Guam 2007 27540.84 77.32105 173456
## 11496 Guam 2008 29056.50 77.61024 175877
nrow(development_complete)## [1] 9537
# Subset development_complete: keep only countries in selection
development_motion <- subset(development_complete, development_complete$Country %in% selection)
nrow(development_motion)## [1] 1793
The googleVis package. This package provides an interface between R and the Google Chart Tools.
library(googleVis)
# Create the interactive motion chart
motion_graph <- gvisMotionChart(development_motion,
idvar = 'Country',
timevar = 'Year')
# Plot motion_graph
plot(motion_graph)For a full view of the plot, check this page.
When working with a simple dataset to visualize, a single color and size for each observation is sufficient. But what if you like to know more?
To make the motion chart even more understandable we can play with the size and color of each bubble.
# Update the interactive motion chart
motion_graph <- gvisMotionChart(development_motion,
idvar = 'Country',
timevar = 'Year',
xvar = 'GDP',
yvar = 'LifeExpectancy',
sizevar = 'Population')
# Plot motion_graph
plot(motion_graph)However, it looks like the relationship is nonlinear. Make a transformation to the data to see to make the plot easier to read.
# Create a new column that corresponds to the log of the GDP column
development_motion$logGDP <- log(development_motion$GDP)
# Create the interactive motion chart with R and `gvisMotionChart())`
motion_graph <- gvisMotionChart(development_motion,
idvar = 'Country',
timevar = 'Year',
xvar = 'logGDP',
yvar = 'LifeExpectancy',
sizevar = 'Population')
# Plot your new motion graph with the help of `plot()`
plot(motion_graph)Let us see an alternative view:
The goal of these charts is (not only) to impress the audience, but also to visualize trends and to provide a more clear view on the data and the corresponding insights.