Recently I wanted to know how our members are distributed in the country. Getting the data in a table or even in a bar graph does not give any idea on how far apart the chapters are or its geographical location.
A spatial bubble chart would show location as well as size of chapter. Unfortunately Excel or open office do not have such feature. I could do it easily with R.
For the uninitiated in data science, R is a free and open source (Mukt) software by by collaboration of thousands of developers around the world for data science. You may read more about it here. R has a library ‘ggmap’ that can render maps. When I searched for the package, it was not installed. Getting a package in R is easy.
I give a command to start R as super user and interactive mode:
$ sudo -i R
Once in R interactive mode I give:
To install ‘ggmap’ package. R prompted me to select the repository, once I selected one it downloaded and installed the library. In couple of minutes I got a confirmation:
** building package indices ** testing if installed package can be loaded * DONE (ggmap)
So I exited R with a > q() command.
Next, I make a file with name of chapters, number of members and location of the chapters. There is not much data so I keep it in a text file with fields separated by a comma. R can read such format easily. Here is structure of the data file:
location, members, lat, lon HQ, 256, 22.5726, 88.3639 Agra, 37, 27.176157, 77.90980 Ahmedabad, 46, 23.0201815,72.4393114 Ajmer, 33, 26.453226,74.5655012 Aligarh, 25, 27.9060815,78.0184661
I do some massaging of the data to find percentage of members in each chapter. For this I open an interactive R session in R commander. I select the package ‘ggmap’ under Tools menu:
Tools > Load package > ggmap
Then I give:
# Read data from file mdata <- read.csv("ch_data.csv") # Total values under members col netMemb <-sum(mdata$members) # Create a temporary array with number of members memb<-mdata[,"members"] # Find percentage percent<-memb/netMemb*100 # Add new column 'percent' in my data set mdata$percent<-percent
As you see, processing array with R is as simple as performing arithmetic. Next I get map of India. ggmap can get data from various sources, like google-map, openstreet-map etc. I used google-map with street view as my base map. One can get other views like satellite, hybrid or terrain views as well. I needed to do a bit of experimentation to get the correct zoom level.
#get map data mapIndia <- get_map(location = "India", color = "color", source = "google", maptype = "roadmap", zoom = 5) #render map ggmap(mapIndia, extent = "device", ylab = "Latitude", xlab = "Longitude")
Once I got the sizes correct I keep the data in a variable and add my bubble plot layer to it. Here I use square root of percentage to make area proportional to size and use a scale factor of 3 to get good visuals:
# Save map in a variable mapPlot <-ggmap(mapIndia, extent = "device", ylab = "Latitude", xlab = "Longitude") # Add bubble plot layer mapPlot + geom_point(aes(x = lon, y = lat), colour = 'blue', size = sqrt(percent)*3, data = mdata)
Here we see the chart:
Now that I got the picture, I save it with
For using it in my report.
Spatial chart makes the problem clear. It shows absolutely no presence in NE states, which we know. I also shows absence in central India and rather low presence in western parts of the country. This is something that was not so evident from our previous bar chart.