Matplotlib: Difference between revisions

From OpenHatch wiki
Content added Content deleted
imported>Jesstess
No edit summary
 
(37 intermediate revisions by 10 users not shown)
Line 1: Line 1:
[[File:grid.png|right|300px]]
[[File:grid.png|right|300px]]


Very nice site!
== Project ==


Very nice site!
Learn how to plot data with the matplotlib plotting library. Ditch Excel forever!


Very nice site!
== Goals ==

* practice reading data from a file
* practice using the matplotlib Python plotting library to analyze data and generate graphs

== Project setup ==

==== Mac OS X users only ====

If you do not already have a C compiler installed, you'll need one to install matplotlib. You have several options depending on your situation:
# Download and install Xcode (1.5 GB) from https://developer.apple.com/xcode/
# Download and install Command Line Tools for Xcode (175 MB) from https://developer.apple.com/downloads/index.action. This requires an Apple Developer account (free, but you have to sign up).
# Download and install kennethreitz's gcc installer (requires 10.6 or 10.7) from https://github.com/kennethreitz/osx-gcc-installer/

Please wave over a staff member and we'll help you pick which option is best for you computer.

=== Install the project dependencies ===

Please follow the official matplotlib installation instructions at http://matplotlib.sourceforge.net/users/installing.html

The dependencies vary across operating systems. http://matplotlib.sourceforge.net/users/installing.html#build-requirements summarizes what you'll need for your operating system.

A universal dependency is the NumPy scientific computing library. NumPy has download and installation instructions at http://numpy.scipy.org/

Installing matplotlib and its dependencies is somewhat involved; please ask for help if you get stuck or don't know where to start!


=== Download and un-archive the Jeopardy database project skeleton code ===

* http://web.mit.edu/jesstess/www/IntermediatePythonWorkshop/JeopardyDatabase.zip

Un-archiving will produce a <code>JeopardyDatabase</code> folder containing 3 Python files and one SQL database dump.

=== Create a SQLite database from the database dump ===

Inside <code>JeopardyDatabase</code> is a file called <code>jeopardy.dump</code> which contains a SQL database dump. We need to turn that database dump into a SQLite database.

Once you have SQLite installed, you can create a database from jeopardy.dump with:

<pre>sqlite3 jeopardy.db < jeopardy.dump</pre>

This creates a sqlite3 database called <code>jeopardy.db</code>

=== Test your setup ===

At a command prompt, start <code>sqlite3</code> using the <code>jeopardy.db</code> database by running:

<pre>sqlite3 jeopardy.db</pre>

That should start a sqlite prompt that looks like this:

<pre>
SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite></pre>

At that sqlite prompt, type <code>.tables</code> and hit enter. That should display a list of the tables in this database:

<pre>sqlite> .tables
category clue
sqlite></pre>

From a command prompt, navigate to the <code>JeopardyDatabase</code> directory and run

<pre>python jeopardy_categories.py</pre>

You should see a list of 10 jeopardy categories printed to the screen. If you don't, let a staff member know so you can debug this together.


== Project steps ==
== Project steps ==


=== Create a basic plot ===
=== 1. Create a basic plot ===


<ol>
<ol>
Line 96: Line 29:


<pre>pyplot.plot([0, 2, 4, 8, 16, 32], "o-")</pre>
<pre>pyplot.plot([0, 2, 4, 8, 16, 32], "o-")</pre>

and re-run the script. What changed?
</li>
</li>
<li>
<li>
Add x-values to the data by changing
Add x-values to the data by changing


<code>pyplot.plot([0, 2, 4, 8, 16, 32], "o-")</code>
<pre>pyplot.plot([0, 2, 4, 8, 16, 32], "o-")</pre>


to
to


<pre>x_values = [3, 4, 7, 20, 22, 25]
<pre>x_values = [0, 4, 7, 20, 22, 25]
y_values = [0, 2, 4, 8, 16, 32]
y_values = [0, 2, 4, 8, 16, 32]
pyplot.plot(x_values, y_values, "o-")</pre>
pyplot.plot(x_values, y_values, "o-")</pre>

and re-run the script. What changed?


Note how matplotlib automatically resizes the graph to fit all of the points in the figure for you.
Note how matplotlib automatically resizes the graph to fit all of the points in the figure for you.
</li>
<li>
Read about how to generate random integers on http://docs.python.org/library/random.html#random.randint.

Then, instead of hard-coding y values in <code>basic_plot.py</code>, generate a list of random y values and plot them.

An example plot using random y values might look like this:
<br />
[[File:Basic_plot.png|300px]]
</li>
</li>
</ol>
</ol>


Read these short documents:
<b>Read these short documents</b>:
* Pyplot tutorial (just this one section; stop before the next section "Controlling line properties"): http://matplotlib.sourceforge.net/users/pyplot_tutorial.html#pyplot-tutorial
* Pyplot tutorial (just this one section; stop before the next section "Controlling line properties"): http://matplotlib.sourceforge.net/users/pyplot_tutorial.html#pyplot-tutorial
* List of line options, including line style, shapes and colors: http://www.thetechrepo.com/main-articles/469-how-to-change-line-properties-in-matplotlib-python
* List of line options, including line style and marker shapes and colors: http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot


<b>Check your understanding</b>:
<b>Check your understanding</b>:
Line 120: Line 66:
* What options would you pass to <code>pyplot.plot</code> to generate a plot with red triangles and dotted lines?
* What options would you pass to <code>pyplot.plot</code> to generate a plot with red triangles and dotted lines?


=== 2. Plotting the world population over time ===


<ol>
=== 2. Query the database with SELECT ===
<li>
Run <code>python world_population.py</code>. This will pop up a window with a dot plot of the world population over the last 10,000 years.
</li>
<li>
Open <code>world_population.py</code>. Read through the code in this file.


In this example, we read our data from a file. Open the data file <code>world_population.txt</code> and examine the format of the file.
Try running the following queries from the sqlite prompt:
</li>
<li>
Find the documentation on http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot for customizing the linewidth of plots. Then change the world population plot to use a magenta, down-triangle marker and a linewidth of 2.
</li>
</ol>


<b>World population resources</b>:
* <tt>SELECT * FROM category;</tt>
* <tt>SELECT NAME FROM category;</tt>
* <tt>SELECT * FROM clue;</tt>
* <tt>SELECT text, answer, value FROM clue;</tt>
* <tt>SELECT text, answer, value FROM clue LIMIT 10;</tt>

Explore the <code>category</code> and <code>clue</code> tables with your own SELECT queries.

<b>Check your understanding</b>:
* What does <code>*</code> mean in the above queries?
* What does the <code>LIMIT</code> SQL keyword do?
* Does case matter when making SQL queries?

<b>Step 2 resources</b>:
<ul>
<ul>
<li>
<li>
File input and output: http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files.
Using SELECT: http://www.w3schools.com/sql/sql_select.asp
</li>
<li>
Splitting sprints into parts based on a delimiter: http://www.hacksparrow.com/python-split-string-method-and-examples.html
</li>
</li>
</ul>
</ul>


<b>Check your understanding</b>:
* In <code>world_population.py</code>, what does <code>file("world_population.txt", "r").readlines()</code> return?
* In <code>world_population.py</code>, what does <code>point.split()</code> return?


=== 3. Make database queries from Python ===


=== 3. Plotting life expectancy over time ===
Examine the code in <code>jeopardy_categories.py</code>. To make a database query from Python, you need to:
# Import a Python library for making database connections
# Establish a connection to the desired database
# Get a cursor from the database for making queries
# Execute the database query using the standard SQL syntax
# Retrieve the list of results from the database cursor
# Do something useful with the results, like print them
# Close the database connection


In a new file, write code to plot the data in <code>life_expectancies_usa.txt</code>. The format in this file is <year>,<male life expectancy>,<female life expectancy>.
Match up each of these steps with lines of code in the file.


You can call <code>pyplot.plot</code> multiple times to draw multiple lines on the same figure. For example:


<pre>pyplot.plot(my_data_1, "mo-", label="my data 1")
=== 4. Tweak the existing Jeopardy scripts ===
pyplot.plot(my_data_2, "bo-", label="my data 2")</pre>


will plot <code>my_data_1</code> in magenta and <code>my_data_2</code> in blue on the same figure.
==== 1. Modify <code>jeopardy_categories.py</code> to print both the category and game number ====


Supply labels for your plots, like above. Then use <code>pyplot.legend</code> to give your graph a legend. Just plain <code>pyplot.legend()</code> will work, but providing more options may give a better effect.
<b>tip</b>: Remind yourself of the <code>categories</code> schema by running <code>.schema category</code> at a sqlite prompt.
<br />


Your graph should look something like this:
<br>Example output</b>:
<pre>Example categories:


[[File:Life_expectancies.png|300px]]
DETECTIVE FICTION (game #1)
THE OLD TESTAMENT (game #2)
ASIAN HISTORY (game #4)
RIVER SOURCES (game #5)
WORLD RELIGION (game #3)
SEAN SONG (game #2)
ANIMATED MOVIES (game #1)
NEW YORK CITY (game #6)
AFRICAN WILDLIFE (game #7)
LITTLE RED RIDING HOOD (game #8)</pre>

==== 2. Modify <code>jeopardy_clues.py</code> to only print clues with an $800 value. ====

A good way to achieve this is by adding a <code>WHERE</code> clause to the SQL query in <code>jeopardy_clues.py</code>.

Read about <code>WHERE</code> clauses in this short document:
* The WHERE Clause: http://www.w3schools.com/sql/sql_where.asp

<br />
<b>Example output</b>:

<pre>Example clues:

[$800]
A: She also created the detectives Tuppence &amp; Tommy Beresford
Q: What is 'Agatha Christie'

[$800]
A: According to this Old Testament book, this &quot;swords into plowshares&quot; prophet walked naked for 3 years
Q: What is 'Isaiah'
...</pre>


=== 5. Daily Doubles ===

Write a script that prints 10 daily doubles and their responses.

<br />
<b>tip</b>: The <code>clue</code> table has an <code>isDD</code> field.

<br />
<b>Example output</b>:

<pre>Category: NEW YORK CITY
Question: The heart of Little Italy is this street also found in a Dr. Seuss book title
Answer: Mulberry Street
===
Category: RIVER SOURCES
Question: This Mideastern boundary river rises on the slopes of Mount Hermon
Answer: the Jordan
===
Category: ROOM
Question: The Titanic has 3 rooms for this--only men were allowed there, as women weren't supposed to do it in public
Answer: smoking
...</pre>

==Bonus exercises==

=== 1. Random category clues ===

Write a script that randomly chooses a category and prints clues from that category.

<br />
<b>tip</b>: SQL supports an "<code>ORDER BY RANDOM()</code>" clause that will return rows in a random order. For example, to randomly pick 1 category id you could use:

<pre>SELECT id FROM category ORDER BY RANDOM() LIMIT 1</pre>

You can also use <code>ORDER BY</code> to sort the clues by value.

<br />
<b>Example output</b>:


To save your graph to a file instead of or in addition to displaying it, call <code>pyplot.savefig</code>.
<pre>5 GUYS NAMED MOE
[$200] Last name of Moe of the Three Stooges
[$400] Moe Strauss founded this auto parts chain along with Manny Rosenfield &amp; Jack Jackson
[$600] Major league catcher Moe Berg was also a WWII spy for this agency, precursor of the CIA
[$800] Term for the type of country music Moe Bandy plays, the clubs where he began, or the &quot;Queen&quot; he sang of in 1981
[$1000] This &quot;Kool&quot; rapper's album &quot;How Ya Like Me Now&quot; began a rivalry with LL Cool J</pre>


<b>Exercises resources</b>:
<b>Life expectancy resources</b>:
<ul>
<ul>
<li>
<li>
File input and output: http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files.
Using ORDER BY: http://www.w3schools.com/sql/sql_orderby.asp
</li>
<li>
Splitting sprints into parts based on a delimiter: http://www.hacksparrow.com/python-split-string-method-and-examples.html
</li>
<li>
Examples of legends: http://matplotlib.sourceforge.net/examples/pylab_examples/legend_auto.html
</li>
<li>
Ways to configure your legend: http://matplotlib.sourceforge.net/api/legend_api.html
</li>
<li>
Saving your graph to a file: http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.savefig
</li>
</li>
</ul>
</ul>


==Bonus exercises==


=== 1. Letter frequency analysis of the US Constitution ===
=== 2. Random game categories ===

Write a script to randomly choose a game number and print the categories from that game.
<br />

<b>tip</b>: the <code>category</code> table has <code>game</code> and <code>round</code> fields. round 0 is the Jeopardy round, round 1 is the Double Jeopardy round, and round 2 is Final Jeopardy.

<br />

<b>Example output</b>:
<pre>Categories for game #136:
0 WELCOME TO MY COUNTRY
0 METALS
0 GO GO GAUGUIN
0 FILE UNDER &quot;M&quot;
0 ANIMATED CATS
0 MAO MAO MAO MAO
1 SHAKESPEARE'S OPENING LINES
1 HEY, MARIO!
1 BRIDGE ON THE RIVER....
1 RUNNING MATES
1 13-LETTER WORDS
1 TONY BENNETT'S SONGBOOK
2 IN THE NEWS 2000</pre>


# Run <code>python constitution.py</code>. It will generate a bar chart showing the frequency of each letter in the alphabet in the US Constitution.
# Open and read through <code>constitution.py</code>. The code for gathering and displaying the frequencies is a bit more complicated than the previous scripts in this projects, but try to trace the general strategy for plotting the data. Be sure to read the comments!
# Try to answer the following questions:
## On line 11, what is <code>string.ascii_lowercase</code>?
## On line 18, what is the purpose of <code>char = char.lower()</code>?
## What are the contents of <code>labels</code> after the <code>for</code> loop on line 30 completes?
## On line 41, what are the two arguments passed to <code>pyplot.xticks</code>
## On line 44, we use <code>pyplot.bar</code> instead of our usual <code>pyplot.plot</code>. What are the 3 arguments passed to <code>pyplot.bar</code>?
# We've included a mystery text file <code>mystery.txt</code>: an excerpt from an actual novel. Alter <code>constitution.py</code> to process the data in <code>mystery.txt</code> instead of <code>constitution.txt</code>, and re-run the script. What do you notice that is odd about this file? You can read more about this odd novel [http://en.wikipedia.org/wiki/Gadsby_(novel) here].


=== 3. Top 20 Jeopardy categories ===


=== 2. Tour the matplotlib gallery ===
Read about the <code>GROUP BY</code> clause and write a script using it to print the 20 most common Jeopardy categories.


You can truly make any kind of graph with matplotlib. You can even create animated graphs. Check out some of the amazing possibilities, including their source code, at the matplotlib gallery: http://matplotlib.sourceforge.net/gallery.html.
An example of using <code>GROUP BY</code> and <code>ORDER BY</code> to produce an ordered list of counts on a hypothetical <code>foo</code> field is:


[[File:matplotlib_gallery.png|750px]]
<pre>SELECT foo, COUNT(foo) AS count FROM my_table GROUP BY foo ORDER BY count</pre>


<b>Example output</b>:
<pre>81 LITERATURE
79 BEFORE &amp; AFTER
73 WORD ORIGINS
71 SCIENCE
64 BUSINESS &amp; INDUSTRY
63 AMERICAN HISTORY
...</pre>


===Congratulations!===
===Congratulations!===


You've learned about SQL and making database queries from within Python. Keep practicing!
You've read, modified, and created scripts that plot and analyze data using matplotlib. Keep practicing!


[[File:Fireworks.png|150px]]
[[File:Fireworks.png|150px]]

Latest revision as of 04:47, 23 June 2016

Very nice site!

Very nice site!

Very nice site!

Project steps

1. Create a basic plot

  1. Run python basic_plot.py. This will pop up a window with a dot plot of some data.
  2. Open basic_plot.py. Read through the code in this file. The meat of the file is in one line:
    pyplot.plot([0, 2, 4, 8, 16, 32], "o")

    In this example, the first argument to pyplot.plot is the list of y values, and the second argument describes how to plot the data. If two lists had been supplied, pyplot.plot would consider the first list to be the x values and the second list to be the y values.

  3. Change the plot to display lines between the data points by changing
    pyplot.plot([0, 2, 4, 8, 16, 32], "o")

    to

    pyplot.plot([0, 2, 4, 8, 16, 32], "o-")

    and re-run the script. What changed?

  4. Add x-values to the data by changing
    pyplot.plot([0, 2, 4, 8, 16, 32], "o-")

    to

    x_values = [0, 4, 7, 20, 22, 25]
    y_values = [0, 2, 4, 8, 16, 32]
    pyplot.plot(x_values, y_values, "o-")

    and re-run the script. What changed?

    Note how matplotlib automatically resizes the graph to fit all of the points in the figure for you.

  5. Read about how to generate random integers on http://docs.python.org/library/random.html#random.randint. Then, instead of hard-coding y values in basic_plot.py, generate a list of random y values and plot them. An example plot using random y values might look like this:

Read these short documents:

Check your understanding:

  • What does matplotlib pick as the x values if you don't supply them yourself?
  • What options would you pass to pyplot.plot to generate a plot with red triangles and dotted lines?

2. Plotting the world population over time

  1. Run python world_population.py. This will pop up a window with a dot plot of the world population over the last 10,000 years.
  2. Open world_population.py. Read through the code in this file. In this example, we read our data from a file. Open the data file world_population.txt and examine the format of the file.
  3. Find the documentation on http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot for customizing the linewidth of plots. Then change the world population plot to use a magenta, down-triangle marker and a linewidth of 2.

World population resources:

Check your understanding:

  • In world_population.py, what does file("world_population.txt", "r").readlines() return?
  • In world_population.py, what does point.split() return?


3. Plotting life expectancy over time

In a new file, write code to plot the data in life_expectancies_usa.txt. The format in this file is <year>,<male life expectancy>,<female life expectancy>.

You can call pyplot.plot multiple times to draw multiple lines on the same figure. For example:

pyplot.plot(my_data_1, "mo-", label="my data 1")
pyplot.plot(my_data_2, "bo-", label="my data 2")

will plot my_data_1 in magenta and my_data_2 in blue on the same figure.

Supply labels for your plots, like above. Then use pyplot.legend to give your graph a legend. Just plain pyplot.legend() will work, but providing more options may give a better effect.

Your graph should look something like this:

To save your graph to a file instead of or in addition to displaying it, call pyplot.savefig.

Life expectancy resources:

Bonus exercises

1. Letter frequency analysis of the US Constitution

  1. Run python constitution.py. It will generate a bar chart showing the frequency of each letter in the alphabet in the US Constitution.
  2. Open and read through constitution.py. The code for gathering and displaying the frequencies is a bit more complicated than the previous scripts in this projects, but try to trace the general strategy for plotting the data. Be sure to read the comments!
  3. Try to answer the following questions:
    1. On line 11, what is string.ascii_lowercase?
    2. On line 18, what is the purpose of char = char.lower()?
    3. What are the contents of labels after the for loop on line 30 completes?
    4. On line 41, what are the two arguments passed to pyplot.xticks
    5. On line 44, we use pyplot.bar instead of our usual pyplot.plot. What are the 3 arguments passed to pyplot.bar?
  4. We've included a mystery text file mystery.txt: an excerpt from an actual novel. Alter constitution.py to process the data in mystery.txt instead of constitution.txt, and re-run the script. What do you notice that is odd about this file? You can read more about this odd novel here.


2. Tour the matplotlib gallery

You can truly make any kind of graph with matplotlib. You can even create animated graphs. Check out some of the amazing possibilities, including their source code, at the matplotlib gallery: http://matplotlib.sourceforge.net/gallery.html.


Congratulations!

You've read, modified, and created scripts that plot and analyze data using matplotlib. Keep practicing!