Category Archives: Data Visualization

DataViz: The Meaning of the Three-Letter Airport Abbreviations

Airport Codes

It’s one of those things that you’ve perhaps never explicitly thought about, but it may have tickled the back of your mind a few times as you make airplane reservations or stare at a departure screen. Where do the three letter codes used to delineate between airports actually come from? What’s the deal with the “X” in LAX? Why EWR for Newark?

Thanks to a new site from simple and effective new site from designers Lynn Fisher and Nick Crohn we can finally know where these little acronyms are derived from.

Called International Air Transport Association (IATA) codes, the clean, visually appealing site lets you click on a number of different codes (laid over an image of the airport’s location), which delivers provides a simple, concise explanation of its origin.

The mysterious X is finally understood: it’s simply the letter that’s plugged in if the necessary letter is already taken by another airport. The site also points out that up until the 1930s, airports only used two letter codes. That’s how you get LAX for Los Angeles International Airport, which was previously just LA. The strange Newark code EWR is also revealed. After switching the three letter codes, the Navy reserved all codes beginning with N. Thus, Newark was forced to begin with their second letter.

Many more fun tidbits like these can be found in this very informational little site. Have fun.


Source: Co.DESIGN, What The Hell Do Those Three-Letter Airport Abbreviations Mean?, Fast Company, March 19, 2015,

DataViz: U.S. Bachelor’s Degrees by Gender and Ethnic Diversity


[Click on image for interactive version and author information]U.S. Bachelors Degrees

Chart: Baking Units Demystified

A handy chart by Andrew M.H. Alexander.



Shapes, Pictures and Colors: Environmental Print as a Teaching Tool

Environmental Print

My wife and I were driving in the car last weekend and were discussing environmental print. My wife is a retired Special Education teacher who taught in the K-3 grades for 39 years. Currently, she is Adjunct Faculty at Arizona State University. I found her discussion of it very interesting and saw some parallels with what we try to do with data visualization and infographics. My blog today discusses what environmental print is and how it is used to help teach literacy in our early stages of education. It is from a paper by  Rebecca McMahon Giles and Karyn Wellhousen Tunks (source noted at the end of the blog post).

Best Regards,


What is Environmental Print?

“Hey, Ms. McMillan, you have three McDonald’s in your name.” This observation, made by 4-year-old Jadin as his pre-kindergarten teacher wrote her name, reflects young children’s familiarity with popular logos and commercial print that they see every day. [1]

Early encounters with environmental print, words, and other graphic symbols found in children’s surroundings are among their first concrete exposures to written language.

These experiences

  • provide an introduction to making meaning of abstract symbols and
  • offer children their first opportunity to make sense of the world through print.

As a result, children typically read print from their environment before reading print in books.

Why Environmental Print Is Important in Early Literacy

More than four decades of research on the role of environmental print has substantiated its important influence in young children’s literacy development. The preponderance of studies on environmental print, however, took place in earlier decades and focused on its impact on early reading behaviors. Interest in the impact of environmental print on children’s early writing is a more recent development.

Research clearly shows the benefits of exposure to environmental print for emergent readers and writers. In one study of preschoolers, 60% of the 3-year-olds and 80% of 5-year-olds could read environmental print in its context of cereal boxes, toothpaste cartons, traffic signs, and soft drink logos.

Children typically read environmental print first.

Children are initially dependent on the label or logo associated with the word. As their understanding of print and phonetic skills necessary for reading increases, they gradually begin to read words presented separately from the logo.

Children’s responses to environmental print are the direct outcomes of their prior experience with it. Academically at-risk preschoolers recognized significantly fewer environmental print logos than did their academically advantaged peers. However, studies consistently show that regardless of socioeconomic status or home language all children benefit from exposure to print in their environment.

Barbie - Environmental Print

Choose Suitable Environmental Print

Using environmental print in preschool, kindergarten, and primary classrooms is an important part of developing a language/literacy-rich learning environment. Many products marketed in the United States are labeled in English, French, and Spanish, so they can be tools to broaden children’s language experiences even further. Even so, reading environmental print is likely to be individual and dependent upon geographic location. For this reason, children should collect much of the environmental print that they will learn from at school.

  • Experiences in which children take ownership, such as cutting out a recognizable name or label from a container or magazine found at home, are particularly beneficial.
  • Contributing their own examples of environmental print to create class books or displays also strengthens the home-school connection.

Activities like these reinforce the fact that readable and writable print can be found everywhere, while ensuring that the print is actually familiar to the children.

Env Print2The purpose of using familiar environmental print for instruction is to form a bridge between the known and new, so it is important that teachers use
examples that are meaningful for the children in each group. Horner (2005) recommends emphasizing the use of child-familiar logos—such as those from toys, movies, and television shows—rather than community signs or household items. These were found to be most recognizable by both males and females of various ages. For instance, the journal entries in Photo 1 (above) [1] by two kindergarten girls, reflect their recognition of and interest in the text found on a classmate’s lunchbox.

Horner (2005) also points out that an educator’s use of logos could imply approval of the products they represent. She recommends that teachers use acceptable toy names whenever possible. Children usually enter learning settings already familiar with a wide variety of commercial environmental print, such as road signs and household product logos. Their classrooms often are filled with homemade environmental print, such as daily schedules, labels on shelves, and a list of birthdays. Initial experiences with both types of environmental print enable children to associate print with meaning. This enables them to build confidence in their ability to read, which is necessary for becoming successful readers. In addition to supporting young readers, recent research demonstrates how print from the environment gives young children confidence to experiment and use print resources to improve their writing. These researchers found that children experimenting with writing engage in  “environmental printing”— copying conventional forms of print—directly from sources in their immediate surroundings.

This study of kindergarteners’ journal-writing behavior revealed three distinct ways children used environmental print.

  • Some children used environmental print simply as a source to copy without regard to its meaning.
  • Environmental print also served as a resource for the correct spelling of particular words or phrases, such as the day of the week, needed in the child’s message.
  • Environmental print inspired children’s choices of writing topics.

Environmental Print in Daily Explorations

Env PrintEarly writing attempts can easily be promoted by deliberately stocking children’s play and learning areas with a combination of authentic environmental print and writing supplies along with other props. For example, a block center that contains street signs, “under construction” labels, and corporate logos such as those from
restaurants and manufacturers encourages the use of environmental print when building. Coupling such signs with blank index cards, sticky notes, and markers promotes environmental printing as children label or write about their structures.
Placing cookbooks, large colorful paper, and blank recipe cards in the pretend play area may prompt children to record the dishes being served.

They might design restaurant menus or transfer information from a cookbook to a personalized recipe box using the original text as a model and spelling reference. By adding labeled measuring utensils in pretend and water/sand play, children begin to see the relationship between quantities, numerals, and words. Setting up a classroom movie rental facility, pet rescue service, or grocery store with children for their dramatic play is another way to provide familiar environmental print as a motivation for writing. Telephone books, magazines, travel brochures, play money, and similar items all can expand children’s early literacy resources.

With a wide array of manipulatives that spark the use of environmental print, children will soon be able to write words to their favorite songs, learn color name words (in three languages) from crayons or markers, and match the names and shapes of seashells. Immersing children in a learning setting intentionally filled with environmental print to be used as a writing resource increases their ability and motivation to write.


Children who are surrounded by print flourish in literacy development and are often more successful in school. As children observe, read, discuss, and copy the signs and symbols in their world, they become aware that literacy is part of everyone’s daily activities. They come to realize that reading and writing fulfill various purposes and functions in their lives. Environmental print

  • provides models for children’s writing,
  • helps them internalize correct spellings of commonly used words, and
  • inspires their own writing through environmental printing. With support and guidance, young children eventually learn to write conventionally, composing messages for a variety of purposes and audiences.

Consciously capitalizing on their familiarity with environmental print as an aid for early writing is one way to promote their progress on the road to becoming independent authors and readers.



[1] Rebecca McMahon Giles and Karyn Wellhousen Tunks, Children Write Their World: Environmental Print as a Teaching Tool, Dimensions of Early Childhood, Fall 2010, Volume 39, Number 3,

Graph: How Winning Relates to Salary for the Top NFL Quarterbacks

QB Pay Graph

The old adage “You get what you pay for” applies in most situations. Apparently not in the NFL.

With the 2014 season in the books, compiled a list of the 25 highest-paid quarterbacks in terms of yearly average salary (data via and charted how they fared, as measured by wins.

QB Wins vs. Salary

Some of the results were expected. Aaron Rodgers, currently the highest-paid signal-caller in the NFL, is tied for the NFL lead among quarterbacks with 12 victories. Joining him at the top of the standings: Peyton Manning (No. 5 on the salary ranking), Tony Romo (No. 8) and Tom Brady (a bargain at No. 16).

Others surprised. NFC South quarterbacks looked egregiously overpaid this season, with Matt Ryan (the NFL’s second-highest paid quarterback), Drew Brees (No. 4) and Cam Newton (No. 20 as he’s still on his rookie-scale contract) all tallying seven wins or fewer. Jay Cutler’s deal (his $120M extension made him the fifth-highest paid quarterback) looks so bad that Phil Emery, the Bears’ GM who gave it to him, lost his job. Two quarterbacks (Sam Bradford and Matt Schaub) were paid nearly $20M combined last season … and won zero games.

Bargains do exist. Andrew Luck, who won 11 games, ranked 19th among all quarterbacks in pay. Recently retired Bills QB Kyle Orton won seven games for Buffalo, but didn’t even crack the top 20 in salary.

Find a full list below.




PhantomFlow: UI Testing With Decision Trees (Huddle Team)


PhantomFlow is an experimental approach to UI Testing, based on Decision Trees, created by James Cryer and the Huddle development team.

Available on GitHub, as a NodeJS wrapper for PhantomJS, CasperJS and PhantomCSS, PhantomFlow enables a fluent way of describing user flows in code whilst generating structured tree data for visualization.

PhantomFlow Report: Test suite overview with radial Dendrogram and pie visualisation

The above visualisation is a real-world example, showing the complexity of visual testing at Huddle.


  • Enable a more expressive way of describing user interaction paths within tests
  • Fluently communicate UI complexity to stakeholders and team members through generated visualisations
  • Support TDD and BDD for web applications and responsive web sites
  • Provide a fast feedback loop for UI testing
  • Raise profile of visual regression testing
  • Support misual regression workflows, quick inspection & rebasing via UI.


See also

PhantomFlow also comes as grunt plugin! grunt-phantomflow

Try it!

  • node test/test.js – First run will create visual test baslines with PhantomCSS
  • node test/test.js – Second run will compare baseline visuals with the latest screenshots. This’ll pass because there have been no changes.
  • node test/test.js report – An optional step to load the Decision tree visualisation into your Web browser

Mac OSX users should be aware that PhantomJS doesn’t load the FontAwesome glyths used in the test suite, I don’t understand why. I fixed this locally by downloading FontAwesome and double clicking on the .otf file to install the font.

There are two example test suites, these suites will be executed in parallel, the command line output is a bit muddled as a result.

The D3.js visualisation opens with a combined view which merges the test decision trees. Click on a branch label or use the dropdown to show a specific test. Hover over the nodes to see the PhantomCSS screenshots. If there is a visual test failure the node will glow red, hover and click the node to show the original, latest and generated diff screenshots.

Test Example

The demo describes a fictional Coffee machine application.

flow("Get a coffee", function(){
    step("Go to the kitchen", goToKitchen);
    step("Go to the coffee machine", goToMachine);
        "Wants Latte": function(){
                "There is no milk": function(){
                    step("Request Latte", requestLatte_fail);
                        "Give up": function(){
                            step("Walk away from the coffee machine", walkAway);
                        "Wants Espresso instead": wantsEspresso
                "There is milk": function(){
                    step("Request Latte", requestLatte_success);
        "Wants Cappuccino": function(){
                "There is no milk": function(){
                    step("Request Cappuccino", requestCappuccino_fail);
                        "Request Espresso instead": wantsEspresso
                "There is milk": function(){
                    step("Request Cappuccino", requestCappuccino_success);
        "Wants Espresso": wantsEspresso

And below is the visualisation generated by this simple feature test.

PhantomFlow Report: Feature test visualisation as tree Dendrogram

The visualisations

Deciding how to visualise this data is the hard part. It has to be readable and insightful. These visualisations are still evolving, it would be great to see contributions for better visualisations. Visit for inspiration.

PhantomFlow methods

  • flow (string, callback) : initialise a test suite with a name, and a function that contains Steps, Chances and Decisions
  • step (string, callback) : a discrete step, with a name and a callback that can contain a PhantomCSS screenshot as well as CasperJS events and asserts.
  • decision (object) : Defines a user decision. It takes an object with key value pairs, where the key is the label for a particular decision, and the value is the function to be executed. The function can contains further decisions, chances and steps
  • chance (object) : The same as a decision but offers the semantic representation of a chance event, as opposed to a deliberate possible action by the user

NodeJS setup example

    var flow = require('../phantomflow').init({
        // debug: 2
        // createReport: true,
        // test: 'coffee'

    //; // Show report{
        process.exit(code); // callback is executed when PhantomFlow is complete

NodeJs Methods

  • run (callback) : Runs all the tests. Takes a callback which is executed when complete
  • report () : Spins up a local connect server and loads a browser window with the visualisation generated on the last test run.


  • createReport (boolean) : Should a report/visualisation be built?
  • debug (number) : A value of 1 will output more logging, 2 will generate full page screenshots per test which can be found in the test-results folder. Forces tests onto one thread for readability.
  • earlyexit (boolean) : False by default, if set to true all tests will abort on the first failure
  • includes (string) : Defaults to ‘include’, it is the root directory of custom global includes (within the PhantomJS domain)
  • port (number) : Defaults to 9001, this is the port that will be used to show the report/visualisation
  • results (string) : Defaults to ‘test-results’, it is the root directory of the test results
  • remoteDebug (boolean) : Enable PhantomJS remote debugging
  • remoteDebugAutoStart (boolean) : Enable autostart for PhantomJS remote debugging
  • remoteDebugPort (number) : Defaults to 9000, the port on which Web Inspector will run
  • skipVisualTests (boolean) : If set to true the visual comparison step will not be run
  • test (string) : Test run filtering with a substring match
  • tests (string) : Defaults to ‘test’, it is the root directory of your tests
  • threads (number) : How many processes do you think you can parallelise your tests on. Defaults to 4.


Test execution is parallelised for increased speed and a reduced test to dev feedback loop. By default your tests will be divided and run on up to 4 spawned processes. You can change the default number of threads to any number you think your machine can handle.


Debugging is often a painful part of writing tests with PhantomJS. If you’re experiencing trouble try the following.

  • Enable debug mode 1, to show more logging. This will also prevent parallelisation – better for readability, but slower.
    var flow = require('../phantomflow').init({
        debug: 1
  • Enable debug mode 2, same as mode 1 but will also generate full-page screenshots per step, to allow to see what’s actualy going on.
    var flow = require('../phantomflow').init({
        debug: 2
  • PhantomJS provides remote debugging functionality. This setting allows you to use the debugger; statement and add breakpoints with the Web Inspector interface. Remote debugging can be use in conjunction with the debug modes described above.
    var flow = require('../phantomflow').init({
        remoteDebug: true
        // remoteDebugAutoStart: false
        // remoteDebugPort: 9000

Rebasing visual tests

Rebasing is the process of deleting an original visual test, and creating a new baseline, either by renaming the diff image, or running the test suite again to create a new image. The PhantomFlow UI provides a quick way to find and inspect differences, with a ‘rebase’ button to accept the latest image as the baseline.

PhantomFlow UI: Rebase button

What next?

James and the Huddle Team have been using this testing style for many months on Huddle’s biggest UI application. It’s still an evolving idea but for the team actively worked on it, it’s making a huge difference to the way they think about UI, and how they communicate about UI. It supports TDD well, they use it for ‘unit’ testing UI but it has great potential for end-to-end as well. James would also like to do more work on the visualisations, they look great and are very communicable, but he feels they could be a better. Of course, this is an Open Source project and it would be great to see contributions.

Source: Created by James Cryer and the Huddle development team.

Jacob Gube: 6 Ways to Increase the Visual Weight of Something


Jason GubeWhile purusing through Zite, I came across this blog post on the Design Instruct web site by Jacob Gube. Jacob is the co-founder and a managing editor of Design Instruct. He’s a web developer, and also the owner of Six Revisions. Follow Jacob on Twitter: @sixrevisions.

Best Regards,


6 Ways to Increase the Visual Weight of Something

In a design composition, the visual weight of an object refers to how well it draws attention to itself compared to other components of the composition. The “heavier” the object is, the more eye-grabbing it is.

When creating a design, it’s a good idea to prioritize key elements in the visual space by giving them heavier visual weights. For example, things you might consider giving heavier visual weights to — so that they’re more easily seen by the viewer — are call-to-action buttons in a web design, or the subject of a photograph.

I’ll talk about a few tricks for increasing the visual weight of an object.

1. Give It a Different Color

When the color-contrast between an object and its surroundings (including its background) is high, the more able it is to garner our attention.

In the example above, notice how, even though the size, shape and margins of the stars are identical, the red star is able to get your attention simply because of how distinctive its color is compared to other elements in the composition.

2. Move It Away from Other Objects

One easy trick for increasing the visual weight of an object is distancing it from other objects. Adding plenty of negative space around the object separates it from other objects, which in turn makes the object stand out.

In the example above, look at how our eyes interpret the composition as two groups of rabbits: A big group of 12 rabbits and a small group consisting of only one rabbit. By being farther away from the others, the estranged rabbit is able to command our attention more than any other rabbit in the composition.

3. Make It Look Different

When things look alike, it’s naturally hard for us to differentiate them. So, quite simply, we can make the visual weight of an object heavier by making it look different from other objects.

Even a slight change in the style properties of an object can heavily influence its visual weight if objects in the composition look similar. In the above example, notice how the circle at the center of the first row is able to get our eyes’ attention compared to the other circles.

4. Point to It

A simple trick for increasing the visual weight of something is to direct the viewer’s eyes to it using visual queues such as arrows.

In the above example, check out how the visual weight of the house is increased because it’s surrounded by arrows that point to its location. No matter where our attention goes, we’re redirected to look at the house because of the arrows.

5. Make It Look Visually Complex

An ornate object attracts our eyes more when it’s set among simple and unadorned objects. We can make the appearance of an object complex by giving it textures, drop shadows, changing its shape, adding more color to it, and so forth.

In the example above, the multi-colored circle has the heaviest visual weight because the surrounding objects are styled plainly.

6. Make It Bigger

Making an object larger than the other objects around it will increase its visual weight. It’s a reasonable proposition: The more visual space an object takes up, the more visible it is.

In the example above, notice how our eyes are quickly drawn to the biggest heart . The only thing different with it is its size.

Visual weight is a simple but incredibly powerful design tool for strategically arranging elements so that more important elements are readily seen by our viewers.

What tricks do you use to increase the visual weight of an object? Share your advice in the comments.

DataViz as Music: Pianogram

Pianogram - Flight of the Bumble Bee

When I played mallet percussion back in high school, I played Flight of the Bumble Bee on our Xylophone for a regional competition and won a second place finish. It was a tricky piece of music to play and took a lot of hand-key coordination. The data visualization above is the Flight of the Bumble Bee when you cross a histogram and piano keys to show note distribution of songs. It’s the Pianogram by JoeyCloud.Net. View examples such as Alla Turca or the classic Chopsticks, or punch in your own MIDI-formatted song for a taste of the distribution ivories.

Here’s the distribution for everyone’s favorite, Chopsticks.

Pianogram - Chopsticks

Data Visualization – A Scientific Treatment (Peter James Thomas)


peter-thomas-h130Peter James Thomas sent me a link to his blog about the scientific treatment of data visualization. Mr. Thomas (photo, right) has extensive management experience in the insurance, reinsurance, software development, manufacturing and retail sectors, with particular focus on forming a deep appreciation of business / customer needs; developing pragmatic strategies to address these; having a passion for high-quality execution; and understanding the key role of education in enacting cultural and organisational change. While Mr. Thomas has predominantly been a General IT or IT Development Manager in most of his roles, his specialties include Business Intelligence / Data Warehousing / Analytics (the main subjects covered in his blog), Financial Systems / ERP, IT Strategy Formation, IT / Business Alignment and Customer Relationship Management systems.

Peter is currently Head of Group Business Intelligence for Validus Holdings, a leading insurance and reinsurance organisation with a global presence. For a considerable portion of his time in this role, he was also Head of IT Development at Validus’s Talbot Underwriting subsidiary.

I am including Mr. Thomas’ blog post in its entirety below. I am also including a link to his blog here.

Best regards,


Data Visualisation – A Scientific Treatment

by Peter James Thomas, November 6, 2014

IntroductionDiagram of the Causes of Mortality in the Army of the East (click to view a larger version in a new tab)The above diagram was compiled by Florence Nightingale, who was – according to The Font – “a celebrated English social reformer and statistician, and the founder of modern nursing”. It is gratifying to see her less high-profile role as a number-cruncher acknowledged up-front and central; particularly as she died in 1910, eight years before women in the UK were first allowed to vote and eighteen before universal suffrage. This diagram is one of two which are generally cited in any article on Data Visualisation. The other is Charles Minard’s exhibit detailing the advance on, and retreat from, Moscow of Napoleon Bonaparte’s Grande Armée in 1812 (Data Visualisation had a military genesis in common with – amongst many other things – the internet). I’ll leave the reader to look at this second famous diagram if they want to; it’s just a click away.While there are more elements of numeric information in Minard’s work (what we would now call measures), there is a differentiating point to be made about Nightingale’s diagram. This is that it was specifically produced to aid members of the British parliament in their understanding of conditions during the Crimean War (1853-56); particularly given that such non-specialists had struggled to understand traditional (and technical) statistical reports. Again, rather remarkably, we have here a scenario where the great and the good were listening to the opinions of someone who was barred from voting on the basis of lacking a Y chromosome. Perhaps more pertinently to this blog, this scenario relates to one of the objectives of modern-day Data Visualisation in business; namely explaining complex issues, which don’t leap off of a page of figures, to busy decision makers, some of whom may not be experts in the specific subject area (another is of course allowing the expert to discern less than obvious patterns in large or complex sets of data). Fortunately most business decision makers don’t have to grapple with the progression in number of “deaths from Preventible or Mitigable Zymotic diseases” versus ”deaths from wounds” over time, but the point remains.Data Visualisation in one branch of Science

von Laue, Bragg Senior & Junior, Crowfoot Hodgkin, Kendrew, Perutz, Crick, Franklin, Watson & Wilkins

Coming much more up to date, I wanted to consider a modern example of Data Visualisation. As with Nightingale’s work, this is not business-focused, but contains some elements which should be pertinent to the professional considering the creation of diagrams in a business context. The specific area I will now consider is Structural Biology. For the incognoscenti (no advert for IBM intended!), this area of science is focussed on determining the three-dimensional shape of biologically relevant macro-molecules, most frequently proteins or protein complexes. The history of Structural Biology is intertwined with the development of X-ray crystallography by Max von Laue and father and son team William Henry and William Lawrence Bragg; its subsequent application to organic molecules by a host of pioneers including Dorothy Crowfoot Hodgkin, John Kendrew and Max Perutz; and – of greatest resonance to the general population – Francis Crick, Rosalind Franklin, James Watson and Maurice Wilkins’s joint determination of the structure of DNA in 1953.


X-ray diffraction image of the double helix structure of the DNA molecule, taken 1952 by Raymond Gosling, commonly referred to as “Photo 51″, during work by Rosalind Franklin on the structure of DNA

While the masses of data gathered in modern X-ray crystallography needs computer software to extrapolate them to physical structures, things were more accessible in 1953. Indeed, it could be argued that Gosling and Franklin’s famous image, its characteristic “X” suggestive of two helices and thus driving Crick and Watson’s model building, is another notable example of Data Visualisation; at least in the sense of a picture (rather than numbers) suggesting some underlying truth. In this case, the production of Photo 51 led directly to the creation of the even more iconic image below (which was drawn by Francis Crick’s wife Odile and appeared in his and Watson’s seminal Nature paper[1]):

Odile and Francis Crick - structure of DNA

© Nature (1953)
Posted on this site under the non-commercial clause of the right-holder’s licence

It is probably fair to say that the visualisation of data which is displayed above has had something of an impact on humankind in the fifty years since it was first drawn.

Modern Structural Biology

The X-ray Free Electron Laser at Stanford

Today, X-ray crystallography is one of many tools available to the structural biologist with other approaches including Nuclear Magnetic Resonance Spectroscopy, Electron Microscopy and a range of biophysical techniques which I will not detain the reader by listing. The cutting edge is probably represented by the X-ray Free Electron Laser, a device originally created by repurposing the linear accelerators of the previous generation’s particle physicists. In general Structural Biology has historically sat at an intersection of Physics and Biology.

However, before trips to synchrotrons can be planned, the Structural Biologist often faces the prospect of stabilising their protein of interest, ensuring that they can generate sufficient quantities of it, successfully isolating the protein and finally generating crystals of appropriate quality. This process often consumes years, in some cases decades. As with most forms of human endeavour, there are few short-cuts and the outcome is at least loosely correlated to the amount of time and effort applied (though sadly with no guarantee that hard work will always be rewarded).

From the general to the specific

The Journal of Molecular Biology (October 2014)

At this point I should declare a personal interest, the example of Data Visualisation which I am going to consider is taken from a paper recently accepted by the Journal of Molecular Biology (JMB) and of which my wife is the first author[2]. Before looking at this exhibit, it’s worth a brief detour to provide some context.

In recent decades, the exponential growth in the breadth and depth of scientific knowledge (plus of course the velocity with which this can be disseminated), coupled with the increase in the range and complexity of techniques and equipment employed, has led to the emergence of specialists. In turn this means that, in a manner analogous to the early production lines, science has become a very collaborative activity; expert in stage one hands over the fruits of their labour to expert in stage two and so on. For this reason the typical scientific paper (and certainly those in Structural Biology) will have several authors, often spread across multiple laboratory groups and frequently in different countries. By way of example the previous paper my wife worked on had 16 authors (including a Nobel Laureate[3]). In this context, the fact the paper I will now reference was authored by just my wife and her group leader is noteworthy.

The reader may at this point be relieved to learn that I am not going to endeavour to explain the subject matter of my wife’s paper, nor the general area of biology to which it pertains (the interested are recommended to Google “membrane proteins” or “G Protein Coupled Receptors” as a starting point). Instead let’s take a look at one of the exhibits.

Click to view a larger version in a new tab

© The Journal of Molecular Biology (2014)
Posted on this site under a Creative Commons licence

The above diagram (in common with Nightingale’s much earlier one) attempts to show a connection between sets of data, rather than just the data itself. I’ll elide the scientific specifics here and focus on more general issues.

First the grey upper section with the darker blots on it – which is labelled (a) – is an image of a biological assay called a Western Blot (for the interested details can be viewed here); each vertical column (labelled at the top of the diagram) represents a sub-experiment on protein drawn from a specific sample of cells. The vertical position of a blot indicates the size of the molecules found within it (in kilodaltons); the intensity of a given blot indicates how much of the substance is present. Aside from the headings and labels, the upper part of the figure is a photographic image and so essentially analogue data[4]. So, in summary, this upper section represents the findings from one set of experiments.

At the bottom – and labelled (b) – appears an artefact familiar to anyone in business, a bar-graph. This presents results from a parallel experiment on samples of protein from the same cells (for the interested, this set of data relates to degree to which proteins in the samples bind to a specific radiolabelled ligand). The second set of data is taken from what I might refer to as a “counting machine” and is thus essentially digital. To be 100% clear, the bar chart is not a representation of the data in the upper part of the diagram, it pertains to results from a second experiment on the same samples. As indicated by the labelling, for a given sample, the column in the bar chart (b) is aligned with the column in the Western Blot above (a), connecting the two different sets of results.

Taken together the upper and lower sections[5] establish a relationship between the two sets of data. Again I’ll skip on the specifics, but the general point is that while the Western Blot (a) and the binding assay (b) tell us the same story, the Western Blot is a much more straightforward and speedy procedure. The relationship that the paper establishes means that just the Western Blot can be used to perform a simple new assay which will save significant time and effort for people engaged in the determination of the structures of membrane proteins; a valuable new insight. Clearly the relationships that have been inferred could equally have been presented in a tabular form instead and be just as relevant. It is however testament to the more atavistic side of humans that – in common with many relationships between data – a picture says it more surely and (to mix a metaphor) more viscerally. This is the essence of Data Visualisation.

What learnings can Scientific Data Visualisation provide to Business?

Scientific presentation (c/o Nature, but looks a lot like PhD Comics IMO)

Using the JMB exhibit above, I wanted to now make some more general observations and consider a few questions which arise out of comparing scientific and business approaches to Data Visualisation. I think that many of these points are pertinent to analysis in general.


Broadly, normalisation[6] consists of defining results in relation to some established yardstick (or set of yardsticks); displaying relative, as opposed to absolute, numbers. In the JMB exhibit above, the amount of protein solubilised in various detergents is shown with reference to the un-solubilised amount found in native membranes; these reference figures appear as 100% columns to the right and left extremes of the diagram.

The most common usage of normalisation in business is growth percentages. Here the fact that London business has grown by 5% can be compared to Copenhagen having grown by 10% despite total London business being 20-times the volume of Copenhagen’s. A related business example, depending on implementation details, could be comparing foreign currency amounts at a fixed exchange rate to remove the impact of currency fluctuation.

Normalised figures are very typical in science, but, aside from the growth example mentioned above, considerably less prevalent in business. In both avenues of human endeavour, the approach should be used with caution; something that increases 200% from a very small starting point may not be relevant, be that the result of an experiment or weekly sales figures. Bearing this in mind, normalisation is often essential when looking to present data of different orders on the same graph[7]; the alternative often being that smaller data is swamped by larger, not always what is desirable.


I’ll use and anecdote to illustrate this area from a business perspective. Imagine an organisation which (as you would expect) tracks the volume of sales of a product or service it provides via a number of outlets. Imagine further that it launches some sort of promotion, perhaps valid only for a week, and notices an uptick in these sales. It is extremely tempting to state that the promotion has resulted in increased sales[8].

However this cannot always be stated with certainty. Sales may have increased for some totally unrelated reason such as (depending on what is being sold) good or bad weather, a competitor increasing prices or closing one or more of their comparable outlets and so on. Equally perniciously, the promotion maybe have simply moved sales in time – people may have been going to buy the organisation’s product or service in the weeks following a promotion, but have brought the expenditure forward to take advantage of it. If this is indeed the case, an uptick in sales may well be due to the impact of a promotion, but will be offset by a subsequent decrease.

In science, it is this type of problem that the concept of control tests is designed to combat. As well as testing a result in the presence of substance or condition X, a well-designed scientific experiment will also be carried out in the absence of substance or condition X, the latter being the control. In the JMB exhibit above, the controls appear in the columns with white labels.

There are ways to make the business “experiment” I refer to above more scientific of course. In retail business, the current focus on loyalty cards can help, assuming that these can be associated with the relevant transactions. If the business is on-line then historical records of purchasing behaviour can be similarly referenced. In the above example, the organisation could decide to offer the promotion at only a subset of the its outlets, allowing a comparison to those where no promotion applied. This approach may improve rigour somewhat, but of course it does not cater for purchases transferred from a non-promotion outlet to a promotion one (unless a whole raft of assumptions are made). There are entire industries devoted to helping businesses deal with these rather messy scenarios, but it is probably fair to say that it is normally easier to devise and carry out control tests in science.

The general take away here is that a graph which shows some change in a business output (say sales or profit) correlated to some change in a business input (e.g. a promotion, a new product launch, or a price cut) would carry a lot more weight if it also provided some measure of what would have happened without the change in input (not that this is always easy to measure).

Rigour and Scrutiny

I mention in the footnotes that the JMB paper in question includes versions of the exhibit presented above for four other membrane proteins, this being in order to firmly establish a connection. Looking at just the figure I have included here, each element of the data presented in the lower bar-graph area is based on duplicated or triplicated tests, with average results (and error bars – see the next section) being shown. When you consider that upwards of three months’ preparatory work could have gone into any of these elements and that a mistake at any stage during this time would have rendered the work useless, some impression of the level of rigour involved emerges. The result of this assiduous work is that the authors can be confident that the exhibits they have developed are accurate and will stand up to external scrutiny. Of course such external scrutiny is a key part of the scientific process and the manuscript of the paper was reviewed extensively by independent experts before being accepted for publication.

In the business world, such external scrutiny tends to apply most frequently to publicly published figures (such as audited Financial Accounts); of course external financial analysts also will look to dig into figures. There may be some internal scrutiny around both the additional numbers used to run the business and the graphical representations of these (and indeed some companies take this area very seriously), but not every internal KPI is vetted the way that the report and accounts are. Particularly in the area of Data Visualisation, there is a tension here. Graphical exhibits can have a lot of impact if they relate to the current situation or present trends; contrawise if they are substantially out-of-date, people may question their relevance. There is sometimes the expectation that a dashboard is just like its aeronautical counterpart, showing real-time information about what is going on now[9]. However a lot of the value of Data Visualisation is not about the here and now so much as trends and explanations of the factors behind the here and now. A well-thought out graph can tell a very powerful story, more powerful for most people than a table of figures. However a striking graph based on poor quality data, data which has been combined in the wrong way, or even – as sometimes happens – the wrong datasets entirely, can tell a very misleading story and lead to the wrong decisions being taken.

I am not for a moment suggesting here that every exhibit produced using Data Visualisation tools must be subject to months of scrutiny. As referenced above, in the hands of an expert such tools have the value of sometimes quickly uncovering hidden themes or factors. However, I would argue that – as in science – if the analyst involved finds something truly striking, an association which he or she feels will really resonate with senior business people, then double- or even triple-checking the data would be advisable. Asking a colleague to run their eye over the findings and to then probe for any obvious mistakes or weaknesses sounds like an appropriate next step. Internal Data Visualisations are never going to be subject to peer-review, however their value in taking sound business decisions will be increased substantially if their production reflects at least some of the rigour and scrutiny which are staples of the scientific method.

Dealing with Uncertainty

In the previous section I referred to the error bars appearing on the JMB figure above. Error bars are acknowledgements that what is being represented is variable and they indicate the extent of such variability. When dealing with a physical system (be that mechanical or – as in the case above – biological), behaviour is subject to many factors, not all of which can be eliminated or adjusted for and not all of which are predictable. This means that repeating an experiment under ostensibly identical conditions can lead to different results[10]. If the experiment is well-designed and if the experimenter is diligent, then such variability is minimised, but never eliminated. Error bars are a recognition of this fundamental aspect of the universe as we understand it.

While de rigueur in science, error bars seldom make an appearance in business, even – in my experience – in estimates of business measures which emerge from statistical analyses[11]. Even outside the realm of statistically generated figures, more business measures are subject to uncertainty than might initially be thought. An example here might be a comparison (perhaps as part of the externally scrutinised report and accounts) of the current quarter’s sales to the previous one (or the same one last year). In companies where sales may be tied to – for example – the number of outlets, care is paid to making these figures like-for-like. This might include only showing numbers for outlets which were in operation in the prior period and remain in operation now (i.e. excluding sales from both closed outlets or newly opened ones). However, outside the area of high-volume low-value sales where the Law of Large Numbers[12] rules, other factors could substantially skew a given quarter’s results for many organisations. Something as simple as a key customer delaying a purchase (so that it fell in Q3 this year instead of Q2 last) could have a large impact on quarterly comparisons. Again companies will sometimes look to include adjustments to cater for such timing or related issues, but this cannot be a precise process.

The main point I am making here is that many aspects of the information produced in companies is uncertain. The cash transactions in a quarter are of course the cash transactions in a quarter, but the above scenario suggests that they may not always 100% reflect actual business conditions (and you cannot adjust for everything). Equally where you get in to figures that would be part of most companies’ financial results, outstanding receivables and allowance for bad debts, the spectre of uncertainty arises again without a statistical model in sight. In many industries, regulators are pushing for companies to include more forward-looking estimates of future assets and liabilities in their Financials. While this may be a sensible reaction to recent economic crises, the approach inevitably leads to more figures being produced from models. Even when these models are subject to external review, as is the case with most regulatory-focussed ones, they are still models and there will be uncertainty around the numbers that they generate. While companies will often provide a range of estimates for things like guidance on future earnings per share, providing a range of estimates for historical financial exhibits is not really a mainstream activity.

Which perhaps gets me back to the subject of error bars on graphs. In general I think that their presence in Data Visualisations can only add value, not subtract it. In my article entitled Limitations of Business Intelligence I include the following passage which contains an exhibit showing how the Bank of England approaches communicating the uncertainty inevitably associated with its inflation estimates:

Business Intelligence is not a crystal ball, Predictive Analytics is not a crystal ball either. They are extremely useful tools […] but they are not universal panaceas.

The Old Lady of Threadneedle Street is clearly not a witch[…] Statistical models will never give you precise answers to what will happen in the future – a range of outcomes, together with probabilities associated with each is the best you can hope for (see above). Predictive Analytics will not make you prescient, instead it can provide you with useful guidance, so long as you remember it is a prediction, not fact.

While I can’t see them figuring in formal financial statements any time soon, perhaps there is a case for more business Data Visualisations to include error bars.

In Summary

So, as is often the case, I have embarked on a journey. I started with an early example of Data Visualisation, diverted in to a particular branch of science with which I have some familiarity and hopefully returned, again as is often the case, to make some points which I think are pertinent to both the Business Intelligence practitioner and the consumers (and indeed commissioners) of Data Visualisations. Back in “All that glisters is not gold” – some thoughts on dashboards I made some more general comments about the best Data Visualisations having strong informational foundations underpinning them. While this observation remains true, I do see a lot of value in numerically able and intellectually curious people using Data Visualisation tools to quickly make connections which had not been made before and to tease out patterns from large data sets. In addition there can be great value in using Data Visualisation to present more quotidian information in a more easily digestible manner. However I also think that some of the learnings from science which I have presented in this article suggest that – as with all powerful tools – appropriate discretion on the part of the people generating Data Visualisation exhibits and on the part of the people consuming such content would be prudent. In particular the business equivalents of establishing controls, applying suitable rigour to data generation / combination and including information about uncertainty on exhibits where appropriate are all things which can help make Data Visualisation more honest and thus – at least in my opinion – more valuable.


Watson, J.D., Crick, F.H.C. (1953). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature.

Thomas, J.A., Tate, C.G. (2014). Quality Control in Eukaryotic Membrane Protein Overproduction. J. Mol. Biol. [Epub ahead of print].

The list of scientists involved in the development of X-ray Crystallography and Structural Biology which was presented earlier in the text encompasses a further nine such laureates (four of whom worked at my wife’s current research institute), though sadly this number does not include Rosalind Franklin. Over 20 Nobel Prizes have been awarded to people working in the field of Structural Biology, you can view an interactive time line of these here.

The intensity, size and position of blots are often digitised by specialist software, but this is an aside for our purposes.

Plus four other analogous exhibits which appear in the paper and relate to different proteins.

Normalisation has a precise mathematical meaning, actually (somewhat ironically for that most precise of activities) more than one. Here I am using the term more loosely.

That’s assuming you don’t want to get into log scales, something I have only come across once in over 25 years in business.

The uptick could be as compared to the week before, or to some other week (e.g. the same one last year or last month maybe) or versus an annual weekly average. The change is what is important here, not what the change is with respect to.

Of course some element of real-time information is indeed both feasible and desirable; for more analytic work (which encompasses many aspects of Data Visualisation) what is normally more important is sufficient historical data of good enough quality.

Anyone interested in some of the reasons for this is directed to my earlier article Patterns patterns everywhere.

See my series of three articles on Using historical data to justify BI investments for just one example of these.

But then 1=2 for very large values of 1


Information Is Beautiful Awards 2014 Announced



Last Wednesday, November 12th, 2014, the third annual Information is Beautiful Awards celebrated data visualization at its best. Hundreds of entries were trimmed to an elite set of outstandingly illuminating infographics, over which the judges deliberated long and hard. Now, with thanks to their generous sponsors Kantar, here are the winners.

Data Visualization


Gold – Rappers, Sorted by Size of Vocabulary by Matthew Daniels

Silver – Weather Radials Poster by Timm Kekeritz

Bronze – The Depth of the Problem by The Washington Post

Special mention – The Analytical Tourism Map of Piedmont by Marco Bernardi, Federica Fragapane and Francesco Majno




Gold – Creative Routines by RJ Andrews

Silver – Game of Thrones Decoded by Heather Jones

Bronze – The Graphic Continuum by Jonathan Schwabish and Severino Ribecca



The Refugee Project by Hyperakt and Ekene Ijeoma

Gold – The Refugee Project by Hyperakt and Ekene Ijeoma

Silver – How Americans Die by Bloomberg Visual Data

Joint Bronze – Commonwealth War Dead: First World War Visualised by James Offer

Joint Bronze – World Food Clock by Luke Twyman


Motion infographic


Gold – NYC Taxis: A Day in the Life by Chris Whong

Silver – Beyond Bytes by Maral Pourkazemi

Bronze – Everything You Need to Know about Planet Earth by Kurzgesagt

Special mention – Energy by Adam Nieman




Gold – Selfiecity by Moritz Stefaner

Silver – OECD Regional Well-Being by Moritz Stefaner

Bronze – After Babylon by Sofia Girelli, Eleonora Grotto, Pietro Lodi, Daniele Lupatini and Emilio Patuzzo




Gold – RAW by Density Design Research Lab

Silver – Kennedy by Brendan Dawes

Bronze – Figure it Out by Friedrich Riha



Sam Slover, Wrap Genius

Sam SloverWrap Genius



Brendan Dawes, Kennedy

Brendan DawesKennedy




FFunctionWomen in Science and HP What Matters




Schwandt InfographicsBiobased Economy



The Rite of Spring by Stephen Malinowski and Jay Bacal

The Rite of Spring by Stephen Malinowski and Jay Bacal


Most Beautiful


RAW by Density Design Research Lab



Get every new post delivered to your Inbox.

Join 413 other followers