Dear Salonnières,

thank you so much for attending compu-salon episode 5, beautiful plots. It was fun! Here's a summary of our discussion.

For convenience, I'm now archiving these e-mails at www.vallis.org/salon. One day I will make them into a proper website with links, figures, etc. One day...

There will be no salon this week (unless you want to come to Death Valley with me), but on Friday, Mar 2, we'll talk about version control: what's really happening.

I'll send out a reminder next week. Until then!

Michele

Compu-salon ep. 5 (2012/02/17) in summary

Also at www.vallis.org/salon.

Tufte the man

In our fifth meeting, we talked about making clear, informative, beautiful plots, following the teachings of the quantitative-illustration guru, Edward Tufte. In fact, a lot of what he says is common sense (to sensitive souls), but Tufte gets credit for... walking his talk in a series of illuminating books:

I highly recommend his first book, "The visual display of quantitative information", although "Beautiful evidence" is the coffeetable book to have if you want to impress your scientist friends.

Tufte is a vehement critic of Powerpoint---not so much the tool itself, but the culture of "pitching" ideas, distilling thought into bullet fragments, and cramming precious quantitative information in the limited resolution of LCD projectors. Indeed, Tufte argues convincingly that "Powerpoint culture" is partly to blame for the Challenger disaster.

Tufte's principles

So what are Tufte's principles of excellence? There are several summaries on the web, differing slightly in their organization. Here's my mash-up of a few of them.

We begin with the statement that graphical excellence is the well-designed presentation of interesting data. To achieve excellence, the data must be substantive; its statistical treatment must be accurate; and its graphical presentation must be designed for clarity, precision, and efficiency.

Tufte's principles of design then include:

  1. Making intelligent and appropriate comparisons
  2. Using data to explain causality, mechanisms, structure
  3. Remembering that data is often intrinsically multivariate, and finding creative ways to "escape Flatland"
  4. Integrating different modes of evidence by bringing words, tables, and figures together (no legends!). Galileo was doing it already
  5. Documenting the provenance of data used for plots

Tufte makes specific suggestions on achieving these principles:

To make comparisons (1):

  • Maximize data ink (the amount of ink used for data as opposed to graphical structure) and data density
  • When possible use paper, which still has the best resolution
  • Put related data side by side: use small multiples and sparklines

To use data to explain (2):

  • Avoid chartjunk (think Powerpoint chart "effects") and visual clutter (e.g., noisy gridlines, messy icons, clashing colors)
  • Use bright colors sparingly for important things, prefer natural palettes

To integrate evidence (4):

  • Treat illustrations as maps, including a scale reference, and annotating important points with their values
  • In flowcharts, words are nouns, lines are verbs
  • Use sparklines as datawords

Above all, Tufte stresses graphical integrity:

  • Represent numbers proportionally in graphics (i.e., avoid distorting relative sizes with perspective effects, represent quantities by length or area as appropriate)
  • Label plots clearly and thoroughly
  • Show data variation, not design variation (i.e., most "information" in a plot should relate to the data, not to decoration or inessential graphical structure)

Examples from Tapir work

We then looked at several examples of plots provided by fellow salonnières, and identified several places where we could apply the principles outlined above. I've put the plots here for your browsing convenience.

  • In Roland's plot (an example of small multiples!), the tick labels could be clearer (not italic, and minuses should not be hyphens); the subplots should be labels with informative words rather than (a), (b), (c); the hue-cycling colormaps create perceived structural artifacts that have no real correlate in the physics (see Rogowitz and Treinish).

  • Dave showed a before-and-after plot of mode excitations in the core and crust of neutron stars. The re-do exemplifies the ideas of scaling the aspect ratio of plots for maximum information transfer (banking), of presenting related information side by side, and of integrating words and images.

  • Dave's other example triggered a discussion of the role of icons in a plot, and of the balance between... prettiness and quantitative meaning. The labels could also be bigger and placed better, and the vertical bands should be labeled clearly.

  • Chad's LISA histograms again showed side-by-side presentation and figure annotation. Even more could be done to link the subplots together, i.e., mark the location of the histograms on the density plot.

  • Chad's LIGO template plot displayed the problem of displaying density with markers (they quickly saturate available ink/pixels), as well as interpolation artifacts in plotting density with color maps. The vertical axis of histograms was also discussed---should it be log-scale, and normalized as absolute (binsize-dependent) numbers or probability distribution? It depends on context, of course.

  • Anil's spacetime diagram is nice and essential; some of us thought that it could use a couple more labels.

  • Anil's Mathematica plots of the evolution of a mode amplitude (right?) showed the common problem of log-plotting a zero-crossing quantity: the downward extending "legs" that end with random, jagged height (basically the point at which the plotting routine happened to draw them). We did not come up with a solution; Sterl suggested multiplying by a power law to eliminate the trend, Michele suggested truncating the lines along a diagonal, and marking out the truncation explicitly.

    Better line dashing was suggested, as well as the explicit visualization of the "zooming out" of a plot inset from the main plot, using Mathematica's graphics primitives.

    Sterl also pointed out that Mathematica goes wrong in placing ticks on a logarithmic scale---it puts ten log tick spacings spanning two decades, so they end up marking 4, 9, 16, ... The only workaround known to us is to give Mathematica the explicit location of all ticks (and also frame ticks).

  • Last, Ajith's small-multiple example generated a discussion of the meaning of discrete color in a contour plot, a request for clearer labeling, and the suggestion of aligning the axes of all plots by re-centering coordinates. The grids were thought to be unnecessary, and Tom suggested emphasizing the vertical grouping of plots (sets of three which display sections along two coordinates out of three) by showing how they wrap into a data cube.

Tuftian plotting in Python and matplotlib

My usual modus operandi for plots is to start them in Mathematica or MATLAB, and then edit them thoroughly in Adobe Illustrator (which is worth its price IMHO, but has a free alternative in Inkscape). However, several salonnières expressed a feeling that an end-to-end scriptable solution is desirable, to avoid going through repeated manual edits after plot revisions, to keep the recipes of the plots under version control, and (I must assume) because scripting is fun.

In the context of this salon, scripting means Python, so plotting means matplotlib, the de facto standard. I have had harsh things to say about matplotlib and its continuously evolving API, but I've got to admit it's getting better. (Actually, ditto with quotes and music.)

So in the below I get through some Tuftian basic with matplotlib. The fine tuning of the little details is quite painful. As always, some Googling helps, starting from matplotlib's homepage and cookbook.

Basic pretty-plotting in matplotlib

import math
import matplotlib.pyplot as P, matplotlib.patches as MP
import numpy as N

# some settings
P.rc('figure',figsize=[5,3])    # we wish to work at the final graph size
                                # in this case 5in x 3in

P.rc('font',family='Helvetica',size=10) # work in standard sans-serif
P.rc('mathtext',fontset='stixsans')     # with math from www.stixfonts.org

# P.rc('font',family='Times New Roman',size=10)   # OR: work in standard serif
# P.rc('mathtext',fontset='stix')

P.rc('pdf',fonttype=3)          # for proper subsetting of fonts
                                # but use fonttype=42 for Illustrator editing

P.rc('axes',linewidth=0.5)      # thin axes; the default for lines is 1pt

# a pedestrian plot
fig = P.figure()

axes = P.axes([0.1,0.15,            # location of frame within figure:
               1 - 0.1  - 0.02,     # x, y, dx, dy
               1 - 0.15 - 0.02])

x = N.linspace(0,4*math.pi,100)
y = N.sin(x)

# plot takes all the usual matlab options
P.plot(x,y)

# restrict the axis to where we want it
P.axis([0,4*math.pi,-1,1])

# labels --- in all text, feel free to mix in LaTeX expressions
P.xlabel('$\phi$ (radians)')
P.ylabel('amplitude')

# do our own ticks
ticklocations = [math.pi * i for i in range(1,5)]
ticklabels = [('$%s\,\pi$' % i) for i in range(1,5)]
P.xticks(ticklocations,ticklabels)

# annotate points
dataxy = (x[40],y[40])
textxy = (0.5*math.pi,-0.5)
P.annotate('note',dataxy,textxy,
           verticalalignment='center',horizontalalignment='center',
           arrowprops={'arrowstyle': '-|>','fc': 'k'})

# add geometric shapes...
art = MP.Circle((math.pi,0),0.5,facecolor='gray',edgecolor='none')
axes.add_patch(art)