Make gnuplot cool again!

Ville Klar

18.08.2019

Data visualization

Maybe the title of this post is a bit presumptuous. For many gnuplot never went out of vogue. Still, I figured it would be nice to play with gnuplot and reflect a little bit on its current place in the data visualization landscape. Even though gnuplot is arguably not the shiniest wrench in the toolbox, I think it is definitely worth checking out.

A bit of background

There’s no shortage of great data visualization tools out there. Libraries such as matplotlib and ggplot2 (or the tidyverse in general) have become pillars of data science workflows. For most purposes, I would recommend sticking to these established libraries. Matplotlib for the basics, Bokeh for more interactive stuff, pygal for fancy SVGs and so on. However, I think there is still room for gnuplot.

Despite the name implying association with GNU or the Free Software Foundation, gnuplot is not a member of team Stallman. I have to confess I have made the mistake of assuming gnuplot is GNU-software. The original authors (Thomas Williams, Colin Kelley, Russell Lang, Dave Kotz, John Campbell, Gershon Elber, Alexander Woo and many others) wanted to name it “newplot” but since that had already been take they settled on “gnuplot”. So if you see capitalized spelling of the “gnu” in the context of gnuplot, feel free to become triggered.

Based on a skim read of forum posts and comments, it seems like gnuplot was a staple among data visualization tools until matplotlib and ggplot matured to the point of offering the pretty much the same functionality. I remember hearing about it around 2012, but back then I was still using Matlab as my daily driver so I didn’t pay attention. It is nice to see that it is still actively maintained and based on my quick tests it seems to work great.

A practical example

Google Trends plot

The plot above depicts the Google Trend data on the search terms “gnuplot”, “matlplotlib” and “ggplot2”. It’s not conclusive evidence, but maybe suggests some degree of decline in the popularity of gnuplot. But that’s not the point here. Instead, I want to highlight how easy it was to make it with gnuplot.

The workflow with gnuplot is similar to python or R in the sense that you can either write commands into a REPL-like terminal or write scripts and then run them in their entirety. The save and load commands help with saving and retrieving scripts. The replot command is also very helpful if you change some variable and want to draw the output again.

From my point of view, the major distinction between gnuplot and python/R is that it is designed for plotting and not much else. Heavy lifting regarding the data analysis is best left to python, R or Octave (which uses gnuplot in the background). The subsequent plotting can then be done either by piping or writing results into .dat or .csv files.

So back to the Google Trends plot. Below are the 9 lines of gnuplot-commands that it took to generate it from the csv export from Google Trends.

set datafile separator ","
set xdata time
set timefmt "%Y-%m"
set format x "%Y"
set xlabel "Year" offset 0,0.5
set ylabel "Interest" offset 2,0
set terminal svg size 900,300 font "Fira,18"
set output "output.svg"
plot "multiTimeline.csv" using 1:2 with line title "gnuplot", "multiTimeline.csv" using 1:3 with line title "matplotlib", "multiTimeline.csv" using 1:4 with line title "ggplot2" linetype 7

Pretty efficient if you ask me. There is also a whole bunch of other output formats, in addition to SVG, that work very well in e.g. latex documents.

Conclusion

I think gnuplot (with some practice) can be a great tool for the explorative part of data science. Most would recommend something like Jupyter notebooks for that, but I think gnuplot (after some getting used to) is a marvelous plotting utility for high-speed discovery. It can save time with basic plotting and is really efficient for many different styles of plots.