Matplotlib

Project
Learn how to plot data with the matplotlib plotting library. Ditch Excel forever!

Goals

 * practice reading data from a file
 * practice using the matplotlib Python plotting library to analyze data and generate graphs

Mac OS X users only
If you do not already have a C compiler installed, you'll need one to install matplotlib. You have several options depending on your situation:
 * 1) Download and install Xcode (1.5 GB) from https://developer.apple.com/xcode/
 * 2) Download and install Command Line Tools for Xcode (175 MB) from https://developer.apple.com/downloads/index.action. This requires an Apple Developer account (free, but you have to sign up).
 * 3) Download and install kennethreitz's gcc installer (requires 10.6 or 10.7) from https://github.com/kennethreitz/osx-gcc-installer/

Please wave over a staff member and we'll help you pick which option is best for you computer.

1. Install the project dependencies
Please follow the official matplotlib installation instructions at http://matplotlib.sourceforge.net/users/installing.html

The dependencies vary across operating systems. http://matplotlib.sourceforge.net/users/installing.html#build-requirements summarizes what you'll need for your operating system.

A universal dependency is the NumPy scientific computing library. NumPy has download and installation instructions at http://numpy.scipy.org/

Installing matplotlib and its dependencies is somewhat involved; please ask for help if you get stuck or don't know where to start!

2. Download and un-archive the Matplotlib project skeleton code

 * http://web.mit.edu/jesstess/www/IntermediatePythonWorkshop/Matplotlib.zip

Un-archiving will produce a  folder containing several Python and text files.

3. Test your setup
Run the  script in your   directory. A window with a graph should pop up.

1. Create a basic plot
  Run. This will pop up a window with a dot plot of some data.   Open. Read through the code in this file. The meat of the file is in one line:

pyplot.plot([0, 2, 4, 8, 16, 32], "o")

In this example, the first argument to  is the list of y values, and the second argument describes how to plot the data. If two lists had been supplied,  would consider the first list to be the x values and the second list to be the y values.  Change the plot to display lines between the data points by changing

pyplot.plot([0, 2, 4, 8, 16, 32], "o")

to

pyplot.plot([0, 2, 4, 8, 16, 32], "o-")   Add x-values to the data by changing

pyplot.plot([0, 2, 4, 8, 16, 32], "o-")

to

x_values = [0, 4, 7, 20, 22, 25] y_values = [0, 2, 4, 8, 16, 32] pyplot.plot(x_values, y_values, "o-")

Note how matplotlib automatically resizes the graph to fit all of the points in the figure for you.   Read about how to generate random integers on http://docs.python.org/library/random.html#random.randint.

Then, instead of hard-coding y values in, generate a list of random y values and plot them.

An example plot using random y values might look like this:

 

Read these short documents:
 * Pyplot tutorial (just this one section; stop before the next section "Controlling line properties"): http://matplotlib.sourceforge.net/users/pyplot_tutorial.html#pyplot-tutorial
 * List of line options, including line style and marker shapes and colors: http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

Check your understanding:
 * What does matplotlib pick as the x values if you don't supply them yourself?
 * What options would you pass to  to generate a plot with red triangles and dotted lines?

2. Plotting the world population over time
  Run. This will pop up a window with a dot plot of the world population over the last 10,000 years.   Open. Read through the code in this file.

In this example, we read our data from a file. Open the data file  and examine the format of the file. </li>  Find the documentation on http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot for customizing the linewidth of plots. Then change the world population plot to use a magenta, down-triangle marker and a linewidth of 2. </li> </ol>

World population resources: <ul>  File input and output: http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files. </li>  Splitting sprints into parts based on a delimiter: http://www.hacksparrow.com/python-split-string-method-and-examples.html </li> </ul>

Check your understanding:
 * In, what does   return?
 * In, what does   return?

3. Plotting life expectancy over time
In a new file, write code to plot the data in. The format in this file is, ,.

You can call  multiple times to draw multiple lines on the same figure. For example:

pyplot.plot(my_data_1, "mo-", label="my data 1") pyplot.plot(my_data_2, "bo-", "label="my data 2")

will plot  in magenta and   in blue on the same figure.

Supply labels for your plots, like above. Then use  to give your graph a legend.

Your graph should look something like this:



To save your graph to a file instead of or in addition to displaying it, call.

Life expectancy resources: <ul>  File input and output: http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files. </li>  Splitting sprints into parts based on a delimiter: http://www.hacksparrow.com/python-split-string-method-and-examples.html </li>  Examples of legends: </li>  Ways to configure your legend: http://matplotlib.sourceforge.net/api/legend_api.html </li>  Saving your graph to a file: http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.savefig </li> </ul>

1. Letter frequency analysis of the US Constitution

 * 1) Run  . It will generate a bar chart showing the frequency of each letter in the alphabet in the US Constitution.
 * 2) Open and read through  . The code for gathering and displaying the frequencies is a bit more complicated than the previous scripts in this projects, but try to trace the general strategy for plotting the data. Be sure to read the comments!
 * 3) Try to answer the following questions:
 * 4) On line 11, what is  ?
 * 5) On line 18, what is the purpose of  ?
 * 6) What are the contents of   after the   loop on line 30 completes?
 * 7) On line 41, what are the two arguments passed to
 * 8) On line 44, we use   instead of our usual  . What are the 3 arguments passed to  ?
 * 9) We've included a mystery text file  : an excerpt from an actual novel. Alter   to process the data in   instead of , and re-run the script. What do you notice that is odd about this file? You can read more about this odd novel here.

2. Tour the matplotlib gallery
You can truly make any kind of graph with matplotlib. You can even create animated graphs. Check out some of the amazing possibilities, including their source code, at the matplotlib gallery: http://matplotlib.sourceforge.net/gallery.html.



Congratulations!
You've read, modified, and created scripts that plot and analyze data using matplotlib. Keep practicing!