Spatial bubble chart with R

Posted on Posted in Analytics

Recently I wanted to know how our members are distributed in the country.  Getting the data in a table or even in a bar graph does not give any idea on how far apart the chapters are or its geographical location.

Chapter-wise Member distribution
Chapter-wise Member distribution

A spatial bubble chart would show location as well as size of chapter. Unfortunately Excel or open office do not have such feature. I could do it easily with R.

For the uninitiated in data science, R is a free and open source (Mukt) software by by collaboration of thousands of developers around the world for data science. You may read more about it here.  R has a library ‘ggmap’ that can render maps. When I searched for the package, it was not installed.  Getting a package in R is easy.

I give a command to start R as super user and interactive mode:

$ sudo -i R

Once in R interactive mode I give:

> install.package('ggmap')

To install ‘ggmap’ package. R prompted me to select the repository, once I selected one it downloaded and installed the library. In couple of minutes I got a confirmation:

** building package indices
** testing if installed package can be loaded
* DONE (ggmap)

So I exited R with a  > q() command.

Next, I make a file with name of chapters, number of members and location of the chapters. There is not much data so I keep it in a text file with fields separated by a comma. R can read such format easily. Here is structure of the data file:

location, members, lat, lon
HQ, 256, 22.5726, 88.3639
Agra, 37, 27.176157, 77.90980
Ahmedabad, 46, 23.0201815,72.4393114
Ajmer, 33, 26.453226,74.5655012
Aligarh, 25, 27.9060815,78.0184661

I do some massaging of the data to find percentage of members in each chapter. For this I open an interactive R session in R commander. I select the package ‘ggmap’ under Tools menu:

Tools > Load package > ggmap

Then I  give:

# Read data from file
mdata <- read.csv("ch_data.csv")

# Total values under members col
netMemb <-sum(mdata$members)

# Create a temporary array with number of members
memb<-mdata[,"members"]

# Find percentage
percent<-memb/netMemb*100

# Add new column 'percent' in my data set
mdata$percent<-percent

As you see, processing array with R is as simple as performing arithmetic.  Next I get map of India. ggmap can get data from various sources, like google-map, openstreet-map etc.  I used google-map with street view as my base map. One can get other views like satellite, hybrid or terrain views as well.  I needed to do a bit of experimentation to get the correct zoom level.

#get map data
mapIndia <- get_map(location = "India", color = "color",
       source = "google", maptype = "roadmap", zoom = 5)
 
#render map
ggmap(mapIndia, extent = "device", 
      ylab = "Latitude",  xlab = "Longitude")

 

Once I got the sizes correct I keep the data in a variable and add my bubble plot layer to it. Here I use square root of percentage to make area proportional to size and use a scale factor of 3 to get good visuals:

# Save map in a variable
mapPlot <-ggmap(mapIndia, extent = "device", 
       ylab = "Latitude", xlab = "Longitude")

# Add bubble plot layer
mapPlot + geom_point(aes(x = lon, y = lat), colour = 'blue', 
          size = sqrt(percent)*3, data = mdata)

Here we see the chart:

Bubble plot on map
Bubble plot on map

Now that I got the picture, I save it with

ggsave("bblplot.png", device="png")

For using it in my report.

 

Spatial chart makes the problem clear. It shows absolutely no presence in NE states, which we know. I also shows absence in central India and rather low presence in western parts of the country. This is something that was not so evident from our previous bar chart.

Leave a Reply

Your email address will not be published. Required fields are marked *