Foreword

  • Output options: the ‘tango’ syntax and the ‘readable’ theme.
  • Snippets and results.
  • Source: ‘Having Fun with googleVis’ from DataCamp.

Discovering googleVis

Hand Rosling and Gapminder

The most famous TED talks on data and statistics.

The data sings, trends come to life, and the big picture snaps into sharp focus.

This is reproducible in R. We need to transform development statistics into moving bubbles and flowing curves that make global trends clear, intuitive and even playful.

Loading in your data

First, load data on a country’s evolution of life expectancy, GDP and population over the past years into R.

We can get this data by using Qlik DataMarket.

# Load rdatamarket and initialize client
library(rdatamarket)
dminit(NULL)
# Pull in life expectancy and population data
life_expectancy <- dmlist("15r2!hrp")
population <- dmlist("1cfl!r3d")

# Inspect life_expectancy and population with head() or tail()
head(life_expectancy)
##   Country Year    Value
## 1 Bahrain 1960 52.09244
## 2 Bahrain 1961 53.46056
## 3 Bahrain 1962 54.81929
## 4 Bahrain 1963 56.14710
## 5 Bahrain 1964 57.42588
## 6 Bahrain 1965 58.64007
tail(life_expectancy)
##                                                    Country Year    Value
## 13493 Latin America & the Caribbean (IDA & IBRD countries) 2010 74.04205
## 13494 Latin America & the Caribbean (IDA & IBRD countries) 2011 74.28092
## 13495 Latin America & the Caribbean (IDA & IBRD countries) 2012 74.51328
## 13496 Latin America & the Caribbean (IDA & IBRD countries) 2013 74.73790
## 13497 Latin America & the Caribbean (IDA & IBRD countries) 2014 74.95507
## 13498 Latin America & the Caribbean (IDA & IBRD countries) 2015 75.16584
head(population)
##   Country Year  Value
## 1 Bahrain 1800  64474
## 2 Bahrain 1820  64474
## 3 Bahrain 1870  64474
## 4 Bahrain 1913  81882
## 5 Bahrain 1950 114840
## 6 Bahrain 1951 117580
tail(population)
##                      Country Year Value
## 20449 British Virgin Islands 2004 22187
## 20450 British Virgin Islands 2005 22643
## 20451 British Virgin Islands 2006 23098
## 20452 British Virgin Islands 2007 23552
## 20453 British Virgin Islands 2008 24004
## 20454 British Virgin Islands 2030 32023
# Load in the yearly GDP data frame for each country as gdp
gdp <- dmlist("15c9!hd1")

# Inspect gdp with tail()
tail(gdp)
##       Country Year    Value
## 11499    Guam 2011 30862.11
## 11500    Guam 2012 32499.23
## 11501    Guam 2013 33278.25
## 11502    Guam 2014 34361.08
## 11503    Guam 2015 35210.79
## 11504    Guam 2016 35562.57

Preparing the data

Not all column names are named properly: the string Value is used to name the GDP value, the life expectancy value, and the population value. It would be better if you could make these more descriptive and unique for each. (Tip: use names() to see the column names of a dataset.)

Our data is only complete until 2008.

These issues should be fixed before you start creating your graph.

In addition, if we want to map all three development statistics into one interactive graph (and you should because it is extremely cool), we have to merge the three data frames.

# Load in the plyr package
library('plyr')

# Rename the Value for each dataset
names(gdp)[3] <- 'GDP'
names(population)[3] <- 'Population'
names(life_expectancy)[3] <- 'LifeExpectancy'

# Use plyr to join your three data frames into one: development 
gdp_life_exp <- join(gdp, life_expectancy)
development <- join(gdp_life_exp, population)

head(development, 3)
##   Country Year      GDP LifeExpectancy Population
## 1 Bahrain 1980 8537.929       69.77720     347568
## 2 Bahrain 1981 9269.270       70.17266     363427
## 3 Bahrain 1982 9446.158       70.53151     377967

Last data preps

Now that we have merged the data, it would make sense to trim the data set. Two ways:

  • Take out data for years that have incomplete observations. In this case, the data is only complete up until 2008.
  • Trim down the data set to include fewer countries. The data frame development is currently loaded and contains observations about 226 countries per year. That could be a bit messy to plot on one graph.
# Subset development with Year on or before 2008
development_complete <- subset(development, development$Year <= 2008)

# Print out tail of development_complete
tail(development_complete)
##       Country Year      GDP LifeExpectancy Population
## 11491    Guam 2003 22572.50       76.12512     163593
## 11492    Guam 2004 24396.11       76.43898     166090
## 11493    Guam 2005 26495.88       76.74034     168564
## 11494    Guam 2006 26555.65       77.03246     171019
## 11495    Guam 2007 27540.84       77.32105     173456
## 11496    Guam 2008 29056.50       77.61024     175877
nrow(development_complete)
## [1] 9537
# Subset development_complete: keep only countries in selection
development_motion <- subset(development_complete, development_complete$Country %in% selection)

nrow(development_motion)
## [1] 1793

the prelude

The googleVis package. This package provides an interface between R and the Google Chart Tools.

library(googleVis)

# Create the interactive motion chart
motion_graph <- gvisMotionChart(development_motion,
                idvar = 'Country',
                timevar = 'Year')
                                  
# Plot motion_graph
plot(motion_graph)

For a full view of the plot, check this page.

The interlude

When working with a simple dataset to visualize, a single color and size for each observation is sufficient. But what if you like to know more?

To make the motion chart even more understandable we can play with the size and color of each bubble.

# Update the interactive motion chart
motion_graph <- gvisMotionChart(development_motion,
                idvar = 'Country',
                timevar = 'Year',
                xvar = 'GDP',
                yvar = 'LifeExpectancy',
                sizevar = 'Population')

# Plot motion_graph
plot(motion_graph)


The final output

However, it looks like the relationship is nonlinear. Make a transformation to the data to see to make the plot easier to read.

# Create a new column that corresponds to the log of the GDP column
development_motion$logGDP <- log(development_motion$GDP)

# Create the interactive motion chart with R and `gvisMotionChart())`
motion_graph <- gvisMotionChart(development_motion,
                idvar = 'Country',
                timevar = 'Year',
                xvar = 'logGDP',
                yvar = 'LifeExpectancy',
                sizevar = 'Population')

# Plot your new motion graph with the help of `plot()`
plot(motion_graph)

Let us see an alternative view:


The recessional

The goal of these charts is (not only) to impress the audience, but also to visualize trends and to provide a more clear view on the data and the corresponding insights.