Different ways of coding a bar chart
I have this bar chart in a web application which shows the number of new projects which are fulfilled each month. The data is extrapolated from scrutinizing a Projects table and the created_at column. The SQL is pretty simple:
select
convert(date, cast(year(created_at) as char(4)) + '-' + cast(month(created_at) as varchar(2)) + '-01') as date,
count(*) as count
from
projects
group by
year(created_at),
month(created_at)
order by
1, 2;
It does a count for each year/month and formats the date as the first of the month. For example:
date | count |
---|---|
2011-06-01 | 6 |
2011-07-01 | 11 |
2011-08-01 | 8 |
JavaScript + D3
In a Ruby on Rails web application I've written, I generate the bar chart using the D3.js library with 156 lines of JavaScript code I had to write. D3 is a lower level library which generates visualizations using SVG. Writing data visualizations this way is more difficult and requires more knowledge of the technologies involved. However, it is also the most flexible and interactive. Using html, css, and javascript you can make truly interactive visualizations. In this case, the bar chart does not require interactivity.
R + plot
Here's an example using a few lines of R code with plot:
library(RODBC);
channel <- odbcConnect("dsn", uid="uid", pwd="pwd");
query <- "
select
convert(date, cast(year(created_at) as char(4)) + '-' + cast(month(created_at) as varchar(2)) + '-01') as date,
count(*) as count
from
projects
group by
year(created_at),
month(created_at)
order by
1, 2;";
results <- sqlQuery(channel, query);
close(channel);
barplot(results$count,
main="New Projects per Month",
names.arg=as.Date(results$date),
xlab="Month",
ylab="# Projects",
col="#3d88ba");
Which produces this simple bar chart:
R + ggplot
Using gglot you can produce a better looking chart:
library(RODBC);
channel <- odbcConnect("dsn", uid="uid", pwd="pwd");
query <- "
select
convert(date, cast(year(created_at) as char(4)) + '-' + cast(month(created_at) as varchar(2)) + '-01') as date,
count(*) as count
from
projects
group by
year(created_at),
month(created_at)
order by
1, 2;";
results <- sqlQuery(channel, query);
close(channel);
library(ggplot2);
image <- ggplot(results, aes(x=date, y=count, fill=count)) +
geom_bar(stat="identity", position="dodge") +
geom_text(aes(label=count), vjust=1.5, colour="white",
position=position_dodge(.9), size=3) +
ggtitle("New Projects per Month") +
xlab("Date") +
ylab("# Projects") +
guides(fill=FALSE) +
theme(axis.text.x = element_text(angle=30, hjust=1, vjust=1));
Output:
Python + Pandas (via an IPython Notebook)
As with R, there are many ways to make the same bar chart in Python. Here’s one example using the Pandas library via an IPython Notebook:
import pyodbc
import pandas.io.sql as psql
dsn = 'dsn'
user = 'user'
password = 'password'
database = 'database'
con_string = 'DSN=%s;UID=%s;PWD=%s;DATABASE=%s;' % (dsn, user, password, database)
cnxn = pyodbc.connect(con_string)
cursor = cnxn.cursor()
sql = """
select
convert(date, cast(year(created_at) as char(4)) + '-' + cast(month(created_at) as varchar(2)) + '-01') as date,
count(*) as count
from
projects
group by
year(created_at),
month(created_at)
order by
1, 2;
"""
cursor.execute(query)
df = psql.frame_query(sql, cnxn)
cnxn.close()
df.plot(kind='bar', x='date', y='count')
What's great about IPython notebooks is that you can create a reproducible, step-by-step document which shows your inputs and outputs. It also supports various markup languages such as markdown and LaTeX. Many data scientists use Ipython notebooks to demonstrate a concept or to compliment a paper or thesis.