Matplotlib: Difference between revisions
imported>Jesstess |
imported>Jesstess No edit summary |
||
Line 198: | Line 198: | ||
</li> |
</li> |
||
</ul> |
</ul> |
||
=== 4. Tweak the existing Jeopardy scripts === |
|||
==== 1. Modify <code>jeopardy_categories.py</code> to print both the category and game number ==== |
|||
<b>tip</b>: Remind yourself of the <code>categories</code> schema by running <code>.schema category</code> at a sqlite prompt. |
|||
<br /> |
|||
<br>Example output</b>: |
|||
<pre>Example categories: |
|||
DETECTIVE FICTION (game #1) |
|||
THE OLD TESTAMENT (game #2) |
|||
ASIAN HISTORY (game #4) |
|||
RIVER SOURCES (game #5) |
|||
WORLD RELIGION (game #3) |
|||
SEAN SONG (game #2) |
|||
ANIMATED MOVIES (game #1) |
|||
NEW YORK CITY (game #6) |
|||
AFRICAN WILDLIFE (game #7) |
|||
LITTLE RED RIDING HOOD (game #8)</pre> |
|||
==== 2. Modify <code>jeopardy_clues.py</code> to only print clues with an $800 value. ==== |
|||
A good way to achieve this is by adding a <code>WHERE</code> clause to the SQL query in <code>jeopardy_clues.py</code>. |
|||
Read about <code>WHERE</code> clauses in this short document: |
|||
* The WHERE Clause: http://www.w3schools.com/sql/sql_where.asp |
|||
<br /> |
|||
<b>Example output</b>: |
|||
<pre>Example clues: |
|||
[$800] |
|||
A: She also created the detectives Tuppence & Tommy Beresford |
|||
Q: What is 'Agatha Christie' |
|||
[$800] |
|||
A: According to this Old Testament book, this "swords into plowshares" prophet walked naked for 3 years |
|||
Q: What is 'Isaiah' |
|||
...</pre> |
|||
=== 5. Daily Doubles === |
|||
Write a script that prints 10 daily doubles and their responses. |
|||
<br /> |
|||
<b>tip</b>: The <code>clue</code> table has an <code>isDD</code> field. |
|||
<br /> |
|||
<b>Example output</b>: |
|||
<pre>Category: NEW YORK CITY |
|||
Question: The heart of Little Italy is this street also found in a Dr. Seuss book title |
|||
Answer: Mulberry Street |
|||
=== |
|||
Category: RIVER SOURCES |
|||
Question: This Mideastern boundary river rises on the slopes of Mount Hermon |
|||
Answer: the Jordan |
|||
=== |
|||
Category: ROOM |
|||
Question: The Titanic has 3 rooms for this--only men were allowed there, as women weren't supposed to do it in public |
|||
Answer: smoking |
|||
...</pre> |
|||
==Bonus exercises== |
==Bonus exercises== |
Revision as of 19:58, 26 July 2012
Project
Learn how to plot data with the matplotlib plotting library. Ditch Excel forever!
Goals
- practice reading data from a file
- practice using the matplotlib Python plotting library to analyze data and generate graphs
Project setup
Mac OS X users only
If you do not already have a C compiler installed, you'll need one to install matplotlib. You have several options depending on your situation:
- Download and install Xcode (1.5 GB) from https://developer.apple.com/xcode/
- Download and install Command Line Tools for Xcode (175 MB) from https://developer.apple.com/downloads/index.action. This requires an Apple Developer account (free, but you have to sign up).
- Download and install kennethreitz's gcc installer (requires 10.6 or 10.7) from https://github.com/kennethreitz/osx-gcc-installer/
Please wave over a staff member and we'll help you pick which option is best for you computer.
Install the project dependencies
Please follow the official matplotlib installation instructions at http://matplotlib.sourceforge.net/users/installing.html
The dependencies vary across operating systems. http://matplotlib.sourceforge.net/users/installing.html#build-requirements summarizes what you'll need for your operating system.
A universal dependency is the NumPy scientific computing library. NumPy has download and installation instructions at http://numpy.scipy.org/
Installing matplotlib and its dependencies is somewhat involved; please ask for help if you get stuck or don't know where to start!
Download and un-archive the Jeopardy database project skeleton code
Un-archiving will produce a JeopardyDatabase
folder containing 3 Python files and one SQL database dump.
Create a SQLite database from the database dump
Inside JeopardyDatabase
is a file called jeopardy.dump
which contains a SQL database dump. We need to turn that database dump into a SQLite database.
Once you have SQLite installed, you can create a database from jeopardy.dump with:
sqlite3 jeopardy.db < jeopardy.dump
This creates a sqlite3 database called jeopardy.db
Test your setup
At a command prompt, start sqlite3
using the jeopardy.db
database by running:
sqlite3 jeopardy.db
That should start a sqlite prompt that looks like this:
SQLite version 3.6.12 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite>
At that sqlite prompt, type .tables
and hit enter. That should display a list of the tables in this database:
sqlite> .tables category clue sqlite>
From a command prompt, navigate to the JeopardyDatabase
directory and run
python jeopardy_categories.py
You should see a list of 10 jeopardy categories printed to the screen. If you don't, let a staff member know so you can debug this together.
Project steps
1. Create a basic plot
-
Run
python basic_plot.py
. This will pop up a window with a dot plot of some data. -
Open
basic_plot.py
. Read through the code in this file. The meat of the file is in one line:pyplot.plot([0, 2, 4, 8, 16, 32], "o")
In this example, the first argument to
pyplot.plot
is the list of y values, and the second argument describes how to plot the data. If two lists had been supplied,pyplot.plot
would consider the first list to be the x values and the second list to be the y values. - Change the plot to display lines between the data points by changing
pyplot.plot([0, 2, 4, 8, 16, 32], "o")
to
pyplot.plot([0, 2, 4, 8, 16, 32], "o-")
-
Add x-values to the data by changing
pyplot.plot([0, 2, 4, 8, 16, 32], "o-")
tox_values = [0, 4, 7, 20, 22, 25] y_values = [0, 2, 4, 8, 16, 32] pyplot.plot(x_values, y_values, "o-")
Note how matplotlib automatically resizes the graph to fit all of the points in the figure for you.
-
Read about how to generate random integers on http://docs.python.org/library/random.html#random.randint.
Then, instead of hard-coding x values and y values in
basic_plot.py
, generate a list of random y values. An example plot using random y values might look like this:
Read these short documents:
- Pyplot tutorial (just this one section; stop before the next section "Controlling line properties"): http://matplotlib.sourceforge.net/users/pyplot_tutorial.html#pyplot-tutorial
- List of line options, including line style and marker shapes and colors: http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot
Check your understanding:
- What does matplotlib pick as the x values if you don't supply them yourself?
- What options would you pass to
pyplot.plot
to generate a plot with red triangles and dotted lines?
2. Plotting the world population over time
-
Run
python world_population.py
. This will pop up a window with a dot plot of the world population over the last 10,000 years. -
Open
world_population.py
. Read through the code in this file. In this example, we read our data from a file. Open the data fileworld_population.txt
and examine the format of the file. - Find the documentation on http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot for customizing the linewidth of plots. Then change the world population plot to use a magenta, down-triangle marker and a linewidth of 2.
World population resources:
- File input and output: http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files.
- Splitting sprints into parts based on a delimiter: http://www.hacksparrow.com/python-split-string-method-and-examples.html
Check your understanding:
- In
world_population.py
, what doesfile("world_population.txt", "r").readlines()
return? - In
world_population.py
, what doespoint.split()
return?
3. Plotting life expectancy over time
In a new file, write code to plot the data in life_expectancies_usa.txt
. The format in this file is <year>,<male life expectancy>,<female life expectancy>.
You can call pyplot.plot
multiple times to draw multiple lines on the same figure. For example:
pyplot.plot(my_data_1, "mo-", label="my data 1") pyplot.plot(my_data_2, "bo-", "label="my data 2")
will plot my_data_1
in magenta and my_data_2
in blue on the same figure.
Supply labels for your plots, like above. Then use pyplot.legend
to give your graph a legend.
Your graph should look something like this:
To save your graph to a file instead of or in addition to displaying it, call pyplot.savefig
.
Life expectancy resources:
- File input and output: http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files.
- Splitting sprints into parts based on a delimiter: http://www.hacksparrow.com/python-split-string-method-and-examples.html
- Examples of legends:
- Ways to configure your legend: http://matplotlib.sourceforge.net/api/legend_api.html
- Saving your graph to a file: http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.savefig
Bonus exercises
1. Random category clues
Write a script that randomly chooses a category and prints clues from that category.
tip: SQL supports an "ORDER BY RANDOM()
" clause that will return rows in a random order. For example, to randomly pick 1 category id you could use:
SELECT id FROM category ORDER BY RANDOM() LIMIT 1
You can also use ORDER BY
to sort the clues by value.
Example output:
5 GUYS NAMED MOE [$200] Last name of Moe of the Three Stooges [$400] Moe Strauss founded this auto parts chain along with Manny Rosenfield & Jack Jackson [$600] Major league catcher Moe Berg was also a WWII spy for this agency, precursor of the CIA [$800] Term for the type of country music Moe Bandy plays, the clubs where he began, or the "Queen" he sang of in 1981 [$1000] This "Kool" rapper's album "How Ya Like Me Now" began a rivalry with LL Cool J
Exercises resources:
- Using ORDER BY: http://www.w3schools.com/sql/sql_orderby.asp
2. Random game categories
Write a script to randomly choose a game number and print the categories from that game.
tip: the category
table has game
and round
fields. round 0 is the Jeopardy round, round 1 is the Double Jeopardy round, and round 2 is Final Jeopardy.
Example output:
Categories for game #136: 0 WELCOME TO MY COUNTRY 0 METALS 0 GO GO GAUGUIN 0 FILE UNDER "M" 0 ANIMATED CATS 0 MAO MAO MAO MAO 1 SHAKESPEARE'S OPENING LINES 1 HEY, MARIO! 1 BRIDGE ON THE RIVER.... 1 RUNNING MATES 1 13-LETTER WORDS 1 TONY BENNETT'S SONGBOOK 2 IN THE NEWS 2000
3. Top 20 Jeopardy categories
Read about the GROUP BY
clause and write a script using it to print the 20 most common Jeopardy categories.
An example of using GROUP BY
and ORDER BY
to produce an ordered list of counts on a hypothetical foo
field is:
SELECT foo, COUNT(foo) AS count FROM my_table GROUP BY foo ORDER BY count
Example output:
81 LITERATURE 79 BEFORE & AFTER 73 WORD ORIGINS 71 SCIENCE 64 BUSINESS & INDUSTRY 63 AMERICAN HISTORY ...
Congratulations!
You've learned about SQL and making database queries from within Python. Keep practicing!