Matplotlib

From OpenHatch wiki
Revision as of 16:17, 26 July 2012 by imported>Jesstess (Created page with "right|300px == Project == Learn how to plot data with the matplotlib plotting library. Ditch Excel forever! == Goals == * practice reading data from a fi...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Project

Learn how to plot data with the matplotlib plotting library. Ditch Excel forever!

Goals

  • practice reading data from a file
  • practice using the matplotlib Python plotting library to analyze data and generate graphs

Project setup

Install the project dependencies

Please follow the official matplotlib installation instructions at http://matplotlib.sourceforge.net/users/installing.html

The dependencies vary across operating systems. http://matplotlib.sourceforge.net/users/installing.html#build-requirements summarizes what you'll need for your operating system.

A universal dependency is the NumPy scientific computing library. NumPy has download and installation instructions at http://numpy.scipy.org/

Installing matplotlib and its dependencies is somewhat involved; please ask for help if you get stuck or don't know where to start!

Mac OS X users only

If you do not already have a C compiler installed, you'll need one to install matplotlib. You have several options depending on your situation:

  1. Download and install Xcode (1.5 GB) from https://developer.apple.com/xcode/
  2. Download and install Command Line Tools for Xcode (175 MB) from https://developer.apple.com/downloads/index.action. This requires an Apple Developer account (free, but you have to sign up).
  3. Download and install kennethreitz's gcc installer (requires 10.6 or 10.7) from https://github.com/kennethreitz/osx-gcc-installer/

Please wave over a staff member and we'll help you pick which option is best for you computer.


Download and un-archive the Jeopardy database project skeleton code

Un-archiving will produce a JeopardyDatabase folder containing 3 Python files and one SQL database dump.

Create a SQLite database from the database dump

Inside JeopardyDatabase is a file called jeopardy.dump which contains a SQL database dump. We need to turn that database dump into a SQLite database.

Once you have SQLite installed, you can create a database from jeopardy.dump with:

sqlite3 jeopardy.db < jeopardy.dump

This creates a sqlite3 database called jeopardy.db

Test your setup

At a command prompt, start sqlite3 using the jeopardy.db database by running:

sqlite3 jeopardy.db

That should start a sqlite prompt that looks like this:

SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>

At that sqlite prompt, type .tables and hit enter. That should display a list of the tables in this database:

sqlite> .tables
category  clue    
sqlite>

From a command prompt, navigate to the JeopardyDatabase directory and run

python jeopardy_categories.py

You should see a list of 10 jeopardy categories printed to the screen. If you don't, let a staff member know so you can debug this together.

Project steps

1. Look at the layout of the Jeopardy database

Start sqlite with sqlite3 jeopardy.db. Then look at the tables in your database by running the following commands and the sqlite prompt:

  • .table, which will list the tables in the database
  • .schema category, which will show the organization of the category table, including the fields and the data types they store.

It should look like this:

sqlite> .schema category
CREATE TABLE "category" (
    id INTEGER PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    game INTEGER NOT NULL,
    boardPosition INTEGER
    );

This tells us that the category table has 4 fields: id, name, game, and boardPosition.

  • .schema clue

Read these short documents:

Check your understanding:

  • What tables are in the database?
  • What is a schema?
  • What fields are in the category table?
  • What fields are in the clue table?


2. Query the database with SELECT

Try running the following queries from the sqlite prompt:

  • SELECT * FROM category;
  • SELECT NAME FROM category;
  • SELECT * FROM clue;
  • SELECT text, answer, value FROM clue;
  • SELECT text, answer, value FROM clue LIMIT 10;

Explore the category and clue tables with your own SELECT queries.

Check your understanding:

  • What does * mean in the above queries?
  • What does the LIMIT SQL keyword do?
  • Does case matter when making SQL queries?

Step 2 resources:


3. Make database queries from Python

Examine the code in jeopardy_categories.py. To make a database query from Python, you need to:

  1. Import a Python library for making database connections
  2. Establish a connection to the desired database
  3. Get a cursor from the database for making queries
  4. Execute the database query using the standard SQL syntax
  5. Retrieve the list of results from the database cursor
  6. Do something useful with the results, like print them
  7. Close the database connection

Match up each of these steps with lines of code in the file.


4. Tweak the existing Jeopardy scripts

1. Modify jeopardy_categories.py to print both the category and game number

tip: Remind yourself of the categories schema by running .schema category at a sqlite prompt.


Example output:

Example categories:

DETECTIVE FICTION (game #1)
THE OLD TESTAMENT (game #2)
ASIAN HISTORY (game #4)
RIVER SOURCES (game #5)
WORLD RELIGION (game #3)
SEAN SONG (game #2)
ANIMATED MOVIES (game #1)
NEW YORK CITY (game #6)
AFRICAN WILDLIFE (game #7)
LITTLE RED RIDING HOOD (game #8)

2. Modify jeopardy_clues.py to only print clues with an $800 value.

A good way to achieve this is by adding a WHERE clause to the SQL query in jeopardy_clues.py.

Read about WHERE clauses in this short document:


Example output:

Example clues:

[$800]
A: She also created the detectives Tuppence & Tommy Beresford
Q: What is 'Agatha Christie'

[$800]
A: According to this Old Testament book, this "swords into plowshares" prophet walked naked for 3 years
Q: What is 'Isaiah'
...


5. Daily Doubles

Write a script that prints 10 daily doubles and their responses.


tip: The clue table has an isDD field.


Example output:

Category: NEW YORK CITY
Question: The heart of Little Italy is this street also found in a Dr. Seuss book title
Answer: Mulberry Street
===
Category: RIVER SOURCES
Question: This Mideastern boundary river rises on the slopes of Mount Hermon
Answer: the Jordan
===
Category: ROOM
Question: The Titanic has 3 rooms for this--only men were allowed there, as women weren't supposed to do it in public
Answer: smoking
...

Bonus exercises

1. Random category clues

Write a script that randomly chooses a category and prints clues from that category.


tip: SQL supports an "ORDER BY RANDOM()" clause that will return rows in a random order. For example, to randomly pick 1 category id you could use:

SELECT id FROM category ORDER BY RANDOM() LIMIT 1

You can also use ORDER BY to sort the clues by value.


Example output:

5 GUYS NAMED MOE
[$200] Last name of Moe of the Three Stooges
[$400] Moe Strauss founded this auto parts chain along with Manny Rosenfield & Jack Jackson
[$600] Major league catcher Moe Berg was also a WWII spy for this agency, precursor of the CIA
[$800] Term for the type of country music Moe Bandy plays, the clubs where he began, or the "Queen" he sang of in 1981
[$1000] This "Kool" rapper's album "How Ya Like Me Now" began a rivalry with LL Cool J

Exercises resources:


2. Random game categories

Write a script to randomly choose a game number and print the categories from that game.

tip: the category table has game and round fields. round 0 is the Jeopardy round, round 1 is the Double Jeopardy round, and round 2 is Final Jeopardy.


Example output:

Categories for game #136:
0 WELCOME TO MY COUNTRY
0 METALS
0 GO GO GAUGUIN
0 FILE UNDER "M"
0 ANIMATED CATS
0 MAO MAO MAO MAO
1 SHAKESPEARE'S OPENING LINES
1 HEY, MARIO!
1 BRIDGE ON THE RIVER....
1 RUNNING MATES
1 13-LETTER WORDS
1 TONY BENNETT'S SONGBOOK
2 IN THE NEWS 2000


3. Top 20 Jeopardy categories

Read about the GROUP BY clause and write a script using it to print the 20 most common Jeopardy categories.

An example of using GROUP BY and ORDER BY to produce an ordered list of counts on a hypothetical foo field is:

SELECT foo, COUNT(foo) AS count FROM my_table GROUP BY foo ORDER BY count

Example output:

81 LITERATURE
79 BEFORE & AFTER
73 WORD ORIGINS
71 SCIENCE
64 BUSINESS & INDUSTRY
63 AMERICAN HISTORY
...

Congratulations!

You've learned about SQL and making database queries from within Python. Keep practicing!