From OpenHatch wiki
Revision as of 00:09, 27 July 2012 by imported>Jesstess (→‎Bonus exercises)


Learn how to plot data with the matplotlib plotting library. Ditch Excel forever!


  • practice reading data from a file
  • practice using the matplotlib Python plotting library to analyze data and generate graphs

Project setup

Mac OS X users only

If you do not already have a C compiler installed, you'll need one to install matplotlib. You have several options depending on your situation:

  1. Download and install Xcode (1.5 GB) from
  2. Download and install Command Line Tools for Xcode (175 MB) from This requires an Apple Developer account (free, but you have to sign up).
  3. Download and install kennethreitz's gcc installer (requires 10.6 or 10.7) from

Please wave over a staff member and we'll help you pick which option is best for you computer.

Install the project dependencies

Please follow the official matplotlib installation instructions at

The dependencies vary across operating systems. summarizes what you'll need for your operating system.

A universal dependency is the NumPy scientific computing library. NumPy has download and installation instructions at

Installing matplotlib and its dependencies is somewhat involved; please ask for help if you get stuck or don't know where to start!

Download and un-archive the Jeopardy database project skeleton code

Un-archiving will produce a JeopardyDatabase folder containing 3 Python files and one SQL database dump.

Create a SQLite database from the database dump

Inside JeopardyDatabase is a file called jeopardy.dump which contains a SQL database dump. We need to turn that database dump into a SQLite database.

Once you have SQLite installed, you can create a database from jeopardy.dump with:

sqlite3 jeopardy.db < jeopardy.dump

This creates a sqlite3 database called jeopardy.db

Test your setup

At a command prompt, start sqlite3 using the jeopardy.db database by running:

sqlite3 jeopardy.db

That should start a sqlite prompt that looks like this:

SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"

At that sqlite prompt, type .tables and hit enter. That should display a list of the tables in this database:

sqlite> .tables
category  clue    

From a command prompt, navigate to the JeopardyDatabase directory and run


You should see a list of 10 jeopardy categories printed to the screen. If you don't, let a staff member know so you can debug this together.

Project steps

1. Create a basic plot

  1. Run python This will pop up a window with a dot plot of some data.
  2. Open Read through the code in this file. The meat of the file is in one line:
    pyplot.plot([0, 2, 4, 8, 16, 32], "o")

    In this example, the first argument to pyplot.plot is the list of y values, and the second argument describes how to plot the data. If two lists had been supplied, pyplot.plot would consider the first list to be the x values and the second list to be the y values.

  3. Change the plot to display lines between the data points by changing
    pyplot.plot([0, 2, 4, 8, 16, 32], "o")


    pyplot.plot([0, 2, 4, 8, 16, 32], "o-")
  4. Add x-values to the data by changing pyplot.plot([0, 2, 4, 8, 16, 32], "o-") to
    x_values = [0, 4, 7, 20, 22, 25]
    y_values = [0, 2, 4, 8, 16, 32]
    pyplot.plot(x_values, y_values, "o-")

    Note how matplotlib automatically resizes the graph to fit all of the points in the figure for you.

  5. Read about how to generate random integers on Then, instead of hard-coding x values and y values in, generate a list of random y values. An example plot using random y values might look like this:
    Basic plot.png

Read these short documents:

Check your understanding:

  • What does matplotlib pick as the x values if you don't supply them yourself?
  • What options would you pass to pyplot.plot to generate a plot with red triangles and dotted lines?

2. Plotting the world population over time

  1. Run python This will pop up a window with a dot plot of the world population over the last 10,000 years.
  2. Open Read through the code in this file. In this example, we read our data from a file. Open the data file world_population.txt and examine the format of the file.
  3. Find the documentation on for customizing the linewidth of plots. Then change the world population plot to use a magenta, down-triangle marker and a linewidth of 2.

World population resources:

Check your understanding:

  • In, what does file("world_population.txt", "r").readlines() return?
  • In, what does point.split() return?

3. Plotting life expectancy over time

In a new file, write code to plot the data in life_expectancies_usa.txt. The format in this file is <year>,<male life expectancy>,<female life expectancy>.

You can call pyplot.plot multiple times to draw multiple lines on the same figure. For example:

pyplot.plot(my_data_1, "mo-", label="my data 1")
pyplot.plot(my_data_2, "bo-", "label="my data 2")

will plot my_data_1 in magenta and my_data_2 in blue on the same figure.

Supply labels for your plots, like above. Then use pyplot.legend to give your graph a legend.

Your graph should look something like this:

Life expectancies.png

To save your graph to a file instead of or in addition to displaying it, call pyplot.savefig.

Life expectancy resources:

Bonus exercises

1. Letter frequency analysis of the US Constitution

  1. Run python It will generate a bar chart showing the frequency of each letter in the alphabet in the US Constitution.
  2. Open and read through The code for gathering and displaying the frequencies is a bit more complicated than the previous scripts in this projects, but try to trace the general strategy for plotting the data. Be sure to read the comments!
  3. Try to answer the following questions:
    1. On line 11, what is string.ascii_lowercase?
    2. On line 18, what is the purpose of char = char.lower()?
    3. What are the contents of labels after the for loop on line 30 completes?
    4. On line 41, what are the two arguments passed to pyplot.xticks
    5. On line 44, we use instead of our usual pyplot.plot. What are the 3 arguments passed to
  4. We've included a mystery text file mystery.txt: an excerpt from an actual novel. Alter to process the data in mystery.txt instead of constitution.txt, and re-run the script. What do you notice that is odd about this file? You can read more about this odd novel here.

2. Tour the matplotlib gallery

You can truly make any kind of graph with matplotlib. You can even create animated graphs. Check out some of the amazing possibilities, including their source code, at the matplotlib gallery:


Fireworks.png Balloons.png