JeopardyDatabase: Difference between revisions

From OpenHatch wiki
Content added Content deleted
imported>Jesstess
 
(19 intermediate revisions by 4 users not shown)
Line 9: Line 9:
* practice using the SQL database query language
* practice using the SQL database query language
* practice getting data from a database in Python
* practice getting data from a database in Python



== Project setup ==
== Project setup ==


=== Install the project dependencies ===
=== 1. Download and un-archive the Jeopardy database project skeleton code ===

* Download and install the SQLite database engine. Precompiled SQLite binaries for all platforms can be found at: http://www.sqlite.org/download.html
* Download and install the <code>pysqlite2</code> SQLite Python bindings.
** On Windows: install the latest <code>pysqlite</code> <code>.exe</code> from http://code.google.com/p/pysqlite/downloads/list
** On OS X or Linux: either install through your package manager or from source at http://code.google.com/p/pysqlite/downloads/list

=== Download and un-archive the Jeopardy database project skeleton code ===


* http://web.mit.edu/jesstess/www/IntermediatePythonWorkshop/JeopardyDatabase.zip
* http://web.mit.edu/jesstess/www/IntermediatePythonWorkshop/JeopardyDatabase.zip
Line 25: Line 19:
Un-archiving will produce a <code>JeopardyDatabase</code> folder containing 3 Python files and one SQL database dump.
Un-archiving will produce a <code>JeopardyDatabase</code> folder containing 3 Python files and one SQL database dump.


=== Create a SQLite database from the database dump ===
=== 2. Install the project dependencies ===

==== On Windows ====

# Download the SQLite command line binary from http://www.sqlite.org/download.html. Download the first link under "Precompiled Binaries For Windows".
# Copy <code>sqlite3.exe</code> into your <code>JeopardyDatabase</code> folder.

==== On OS X ====

* You already have SQLite installed and don't have to do anything!

==== On Linux ====

* Install the <code>sqlite3</code> package through your package manager.

=== 3. Create a SQLite database from the database dump ===


Inside <code>JeopardyDatabase</code> is a file called <code>jeopardy.dump</code> which contains a SQL database dump. We need to turn that database dump into a SQLite database.
Inside the <code>JeopardyDatabase</code> folder is a file called <code>jeopardy.dump</code> which contains a SQL database dump. We need to turn that database dump into a SQLite database.


Once you have SQLite installed, you can create a database from jeopardy.dump with:
Once you have SQLite installed, you can create a database from jeopardy.dump with (this needs to be run in the command prompt):


<pre>sqlite3 jeopardy.db < jeopardy.dump</pre>
<pre>sqlite3 jeopardy.db < jeopardy.dump</pre>
Line 35: Line 44:
This creates a sqlite3 database called <code>jeopardy.db</code>
This creates a sqlite3 database called <code>jeopardy.db</code>


=== Test your setup ===
=== 4. Test your setup ===


At a command prompt, start <code>sqlite3</code> using the <code>jeopardy.db</code> database by running:
At a command prompt, start <code>sqlite3</code> using the <code>jeopardy.db</code> database by running:
Line 54: Line 63:
category clue
category clue
sqlite></pre>
sqlite></pre>

To quit sqlite:

<pre>sqlite> .quit</pre>


From a command prompt, navigate to the <code>JeopardyDatabase</code> directory and run
From a command prompt, navigate to the <code>JeopardyDatabase</code> directory and run


<pre>python jeopardy_categories.py</pre>
<pre>python jeopardy_categories.py</pre>

If you run into a syntax error because of "print", go into the file and make sure the printed statement is encased in a ()


You should see a list of 10 jeopardy categories printed to the screen. If you don't, let a staff member know so you can debug this together.
You should see a list of 10 jeopardy categories printed to the screen. If you don't, let a staff member know so you can debug this together.
Line 65: Line 80:
=== 1. Look at the layout of the Jeopardy database ===
=== 1. Look at the layout of the Jeopardy database ===


Start sqlite with:
Start sqlite with <code>sqlite3 jeopardy.db</code>. Then look at the tables in your database by running the following commands and the sqlite prompt:
<pre>sqlite3 jeopardy.db</pre>

Then look at the tables in your database by running the following commands and the sqlite prompt:


* <tt>.table</tt>, which will list the tables in the database
* <tt>.table</tt>, which will list the tables in the database
Line 93: Line 111:
* What fields are in the <code>category</code> table?
* What fields are in the <code>category</code> table?
* What fields are in the <code>clue</code> table?
* What fields are in the <code>clue</code> table?



=== 2. Query the database with SELECT ===
=== 2. Query the database with SELECT ===
Line 120: Line 137:




=== 3. Make database queries from Python ===
=== 3. Examine the nested <code>for</code> loop in <code>SolidColorTest</code> ===


Examine the code in <code>jeopardy_categories.py</code>. To make a database query from Python, you need to:
<pre>for x in range(self.wall.width):
# Import a Python library for making database connections
for y in range(self.wall.height):
# Establish a connection to the desired database
self.wall.set_pixel(x, y, hsv)</pre>
# Get a cursor from the database for making queries
# Execute the database query using the standard SQL syntax
# Retrieve the list of results from the database cursor
# Do something useful with the results, like print them
# Close the database connection


Match up each of these steps with lines of code in the file.
This code loops over every pixel in the ColorWall, setting the pixel to a particular <code>hsv</code> value. After that <code>for</code> loop is over, <code>self.wall.draw()</code> updates the display.


<b>Check your understanding</b>: what would happen if you moved the <code>self.wall.draw()</code> to inside the inner <code>for</code> loop, just under <code>self.wall.set_pixel(x, y, hsv)</code> in <code>SaturationTest</code>? (Try it!)


=== 4. Tweak the existing Jeopardy scripts ===
<b>Tip</b>: you can run individual tests by passing their names as command line arguments to <code>run.py</code>. For example, if you only wanted to run <code>SaturationTest</code>, you could:


==== 1. Modify <code>jeopardy_categories.py</code> to print both the category and game number ====
<pre>python run.py SaturationTest</pre>


<b>tip</b>: Remind yourself of the <code>categories</code> schema by running <code>.schema category</code> at a sqlite prompt.
<br />


<b>Example output</b>:
=== 4. Implement a new effect called <code>RainbowTest</code> ===
<pre>Example categories:


DETECTIVE FICTION (game #1)
It should run for 5 seconds, cycling through the colors in the rainbow, pausing for a moment at each color.
THE OLD TESTAMENT (game #2)
ASIAN HISTORY (game #4)
RIVER SOURCES (game #5)
WORLD RELIGION (game #3)
SEAN SONG (game #2)
ANIMATED MOVIES (game #1)
NEW YORK CITY (game #6)
AFRICAN WILDLIFE (game #7)
LITTLE RED RIDING HOOD (game #8)</pre>


==== 2. Modify <code>jeopardy_clues.py</code> to only print clues with an $800 value. ====
Remember to add your effect to the <code>Effect</code> list at the bottom of <code>effects.py</code>!


A good way to achieve this is by adding a <code>WHERE</code> clause to the SQL query in <code>jeopardy_clues.py</code>.
Test your new effect with


Read about <code>WHERE</code> clauses in this short document:
<pre>python run.py RainbowTest</pre>
* The WHERE Clause: http://www.w3schools.com/sql/sql_where.asp


<br />
<b>Example output</b>:


<pre>Example clues:
=== 5. Play with the randomness in <code>Twinkle</code> ===


[$800]
Walk through <code>Twinkle</code>. Find explanations of the <code>random.randint</code> and <code>random.uniform</code> functions in the online documentation at http://docs.python.org/library/random.html.
A: She also created the detectives Tuppence &amp; Tommy Beresford
Q: What is 'Agatha Christie'


[$800]
Experiment with these functions at a Python prompt:
A: According to this Old Testament book, this &quot;swords into plowshares&quot; prophet walked naked for 3 years
Q: What is 'Isaiah'
...</pre>


=== 5. Daily Doubles ===
<pre>
import random
random.randint(0, 1)
random.randint(0, 5)
random.uniform(-1, 1)
</pre>

Then experiment with the numbers that make up the hue and re-run the effect:

<pre>
python run.py Twinkle
</pre>


Write a script that prints 10 daily doubles and their responses.
<b>Challenge</b>: make <code>Twinkle</code> twinkle with shades of red.


<br />
<b>tip</b>: The <code>clue</code> table has an <code>isDD</code> field.
<b>tip</b>: Note that the category name is in the category table, and the question and answer are in the clue table.


<br />
=== 6. Implement a new effect that involves randomness! ===
<b>Example output</b>:


<pre>Category: NEW YORK CITY
Remember to add your effect to the <code>Effect</code> list at the bottom of <code>effects.py</code>.
Question: The heart of Little Italy is this street also found in a Dr. Seuss book title
Answer: Mulberry Street
===
Category: RIVER SOURCES
Question: This Mideastern boundary river rises on the slopes of Mount Hermon
Answer: the Jordan
===
Category: ROOM
Question: The Titanic has 3 rooms for this--only men were allowed there, as women weren't supposed to do it in public
Answer: smoking
...</pre>


==Bonus exercises==
==Bonus exercises==


===1. Checkerboard===
=== 1. Random category clues ===


Write a script that randomly chooses a category and prints clues from that category.
Find and change the colors used in the <code>Checkerboards</code> effect, and re-run the effect:


<pre>
<br />
<b>tip</b>: SQL supports an "<code>ORDER BY RANDOM()</code>" clause that will return rows in a random order. For example, to randomly pick 1 category id you could use:
python run.py Checkerboards
</pre>


<pre>SELECT id FROM category ORDER BY RANDOM() LIMIT 1;</pre>
Then change the line


You can also use <code>ORDER BY</code> to sort the clues by value.
<pre>
if (x + y + i) % 2 == 0:
</pre>


<br />
to
<b>Example output</b>:


<pre>
<pre>5 GUYS NAMED MOE
[$200] Last name of Moe of the Three Stooges
if (x + y + i) % 3 == 0:
[$400] Moe Strauss founded this auto parts chain along with Manny Rosenfield &amp; Jack Jackson
</pre>
[$600] Major league catcher Moe Berg was also a WWII spy for this agency, precursor of the CIA
[$800] Term for the type of country music Moe Bandy plays, the clubs where he began, or the &quot;Queen&quot; he sang of in 1981
[$1000] This &quot;Kool&quot; rapper's album &quot;How Ya Like Me Now&quot; began a rivalry with LL Cool J</pre>


<b>Exercises resources</b>:
re-run the effect, and see what changed.
<ul>
<li>
Using ORDER BY: http://www.w3schools.com/sql/sql_orderby.asp
</li>
</ul>


=== 2. Random game categories ===
What other patterns can you create by tweaking the math for this effect?


Write a script to randomly choose a game number and print the categories from that game.
<br />


<b>tip</b>: the <code>category</code> table has <code>game</code> and <code>round</code> fields. round 0 is the Jeopardy round, round 1 is the Double Jeopardy round, and round 2 is Final Jeopardy.
===2. Matrix ===


<br />
Find and change the color of the columns in the <code>Matrix</code> effect, and re-run the effect:

<b>Example output</b>:
<pre>Categories for game #136:
0 WELCOME TO MY COUNTRY
0 METALS
0 GO GO GAUGUIN
0 FILE UNDER &quot;M&quot;
0 ANIMATED CATS
0 MAO MAO MAO MAO
1 SHAKESPEARE'S OPENING LINES
1 HEY, MARIO!
1 BRIDGE ON THE RIVER....
1 RUNNING MATES
1 13-LETTER WORDS
1 TONY BENNETT'S SONGBOOK
2 IN THE NEWS 2000</pre>


<pre>
python run.py Matrix
</pre>


=== 3. Top 20 Jeopardy categories ===
Each column that we see on the wall corresponds to a <code>Column</code> object. Add some randomness to the color used by each column (the variable whose value you changed above) using the <code>random.random</code> function, re-run the effect, and see what happens.


Read about the <code>GROUP BY</code> clause and write a script using it to print the 20 most common Jeopardy categories.


An example of using <code>GROUP BY</code> and <code>ORDER BY</code> to produce an ordered list of counts on a hypothetical <code>foo</code> field is:
===3. Write more of your own effects! ===


<pre>SELECT foo, COUNT(foo) AS count FROM my_table GROUP BY foo ORDER BY count;</pre>
You have color, time, randomness, letters, and more at your disposal. Go nuts!


<b>Example output</b>:
<pre>81 LITERATURE
79 BEFORE &amp; AFTER
73 WORD ORIGINS
71 SCIENCE
64 BUSINESS &amp; INDUSTRY
63 AMERICAN HISTORY
...</pre>


===Congratulations!===
===Congratulations!===


You've read, modified, and added code to a software project that makes art out of pixels. Keep practicing!
You've learned about SQL and making database queries from within Python. Keep practicing!


[[File:Fireworks.png|150px]]
[[File:Fireworks.png|150px]]

Latest revision as of 16:20, 2 November 2014

Project

Learn how to get data from a database in Python while writing parts of a Jeopardy game, using real Jeopardy data!

Goals

  • practice using the SQL database query language
  • practice getting data from a database in Python


Project setup

1. Download and un-archive the Jeopardy database project skeleton code

Un-archiving will produce a JeopardyDatabase folder containing 3 Python files and one SQL database dump.

2. Install the project dependencies

On Windows

  1. Download the SQLite command line binary from http://www.sqlite.org/download.html. Download the first link under "Precompiled Binaries For Windows".
  2. Copy sqlite3.exe into your JeopardyDatabase folder.

On OS X

  • You already have SQLite installed and don't have to do anything!

On Linux

  • Install the sqlite3 package through your package manager.

3. Create a SQLite database from the database dump

Inside the JeopardyDatabase folder is a file called jeopardy.dump which contains a SQL database dump. We need to turn that database dump into a SQLite database.

Once you have SQLite installed, you can create a database from jeopardy.dump with (this needs to be run in the command prompt):

sqlite3 jeopardy.db < jeopardy.dump

This creates a sqlite3 database called jeopardy.db

4. Test your setup

At a command prompt, start sqlite3 using the jeopardy.db database by running:

sqlite3 jeopardy.db

That should start a sqlite prompt that looks like this:

SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>

At that sqlite prompt, type .tables and hit enter. That should display a list of the tables in this database:

sqlite> .tables
category  clue    
sqlite>

To quit sqlite:

sqlite> .quit

From a command prompt, navigate to the JeopardyDatabase directory and run

python jeopardy_categories.py

If you run into a syntax error because of "print", go into the file and make sure the printed statement is encased in a ()

You should see a list of 10 jeopardy categories printed to the screen. If you don't, let a staff member know so you can debug this together.

Project steps

1. Look at the layout of the Jeopardy database

Start sqlite with:

sqlite3 jeopardy.db

Then look at the tables in your database by running the following commands and the sqlite prompt:

  • .table, which will list the tables in the database
  • .schema category, which will show the organization of the category table, including the fields and the data types they store.

It should look like this:

sqlite> .schema category
CREATE TABLE "category" (
    id INTEGER PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    game INTEGER NOT NULL,
    boardPosition INTEGER
    );

This tells us that the category table has 4 fields: id, name, game, and boardPosition.

  • .schema clue

Read these short documents:

Check your understanding:

  • What tables are in the database?
  • What is a schema?
  • What fields are in the category table?
  • What fields are in the clue table?

2. Query the database with SELECT

Try running the following queries from the sqlite prompt:

  • SELECT * FROM category;
  • SELECT NAME FROM category;
  • SELECT * FROM clue;
  • SELECT text, answer, value FROM clue;
  • SELECT text, answer, value FROM clue LIMIT 10;

Explore the category and clue tables with your own SELECT queries.

Check your understanding:

  • What does * mean in the above queries?
  • What does the LIMIT SQL keyword do?
  • Does case matter when making SQL queries?

Step 2 resources:


3. Make database queries from Python

Examine the code in jeopardy_categories.py. To make a database query from Python, you need to:

  1. Import a Python library for making database connections
  2. Establish a connection to the desired database
  3. Get a cursor from the database for making queries
  4. Execute the database query using the standard SQL syntax
  5. Retrieve the list of results from the database cursor
  6. Do something useful with the results, like print them
  7. Close the database connection

Match up each of these steps with lines of code in the file.


4. Tweak the existing Jeopardy scripts

1. Modify jeopardy_categories.py to print both the category and game number

tip: Remind yourself of the categories schema by running .schema category at a sqlite prompt.

Example output:

Example categories:

DETECTIVE FICTION (game #1)
THE OLD TESTAMENT (game #2)
ASIAN HISTORY (game #4)
RIVER SOURCES (game #5)
WORLD RELIGION (game #3)
SEAN SONG (game #2)
ANIMATED MOVIES (game #1)
NEW YORK CITY (game #6)
AFRICAN WILDLIFE (game #7)
LITTLE RED RIDING HOOD (game #8)

2. Modify jeopardy_clues.py to only print clues with an $800 value.

A good way to achieve this is by adding a WHERE clause to the SQL query in jeopardy_clues.py.

Read about WHERE clauses in this short document:


Example output:

Example clues:

[$800]
A: She also created the detectives Tuppence & Tommy Beresford
Q: What is 'Agatha Christie'

[$800]
A: According to this Old Testament book, this "swords into plowshares" prophet walked naked for 3 years
Q: What is 'Isaiah'
...

5. Daily Doubles

Write a script that prints 10 daily doubles and their responses.


tip: The clue table has an isDD field. tip: Note that the category name is in the category table, and the question and answer are in the clue table.


Example output:

Category: NEW YORK CITY
Question: The heart of Little Italy is this street also found in a Dr. Seuss book title
Answer: Mulberry Street
===
Category: RIVER SOURCES
Question: This Mideastern boundary river rises on the slopes of Mount Hermon
Answer: the Jordan
===
Category: ROOM
Question: The Titanic has 3 rooms for this--only men were allowed there, as women weren't supposed to do it in public
Answer: smoking
...

Bonus exercises

1. Random category clues

Write a script that randomly chooses a category and prints clues from that category.


tip: SQL supports an "ORDER BY RANDOM()" clause that will return rows in a random order. For example, to randomly pick 1 category id you could use:

SELECT id FROM category ORDER BY RANDOM() LIMIT 1;

You can also use ORDER BY to sort the clues by value.


Example output:

5 GUYS NAMED MOE
[$200] Last name of Moe of the Three Stooges
[$400] Moe Strauss founded this auto parts chain along with Manny Rosenfield & Jack Jackson
[$600] Major league catcher Moe Berg was also a WWII spy for this agency, precursor of the CIA
[$800] Term for the type of country music Moe Bandy plays, the clubs where he began, or the "Queen" he sang of in 1981
[$1000] This "Kool" rapper's album "How Ya Like Me Now" began a rivalry with LL Cool J

Exercises resources:

2. Random game categories

Write a script to randomly choose a game number and print the categories from that game.

tip: the category table has game and round fields. round 0 is the Jeopardy round, round 1 is the Double Jeopardy round, and round 2 is Final Jeopardy.


Example output:

Categories for game #136:
0 WELCOME TO MY COUNTRY
0 METALS
0 GO GO GAUGUIN
0 FILE UNDER "M"
0 ANIMATED CATS
0 MAO MAO MAO MAO
1 SHAKESPEARE'S OPENING LINES
1 HEY, MARIO!
1 BRIDGE ON THE RIVER....
1 RUNNING MATES
1 13-LETTER WORDS
1 TONY BENNETT'S SONGBOOK
2 IN THE NEWS 2000


3. Top 20 Jeopardy categories

Read about the GROUP BY clause and write a script using it to print the 20 most common Jeopardy categories.

An example of using GROUP BY and ORDER BY to produce an ordered list of counts on a hypothetical foo field is:

SELECT foo, COUNT(foo) AS count FROM my_table GROUP BY foo ORDER BY count;

Example output:

81 LITERATURE
79 BEFORE & AFTER
73 WORD ORIGINS
71 SCIENCE
64 BUSINESS & INDUSTRY
63 AMERICAN HISTORY
...

Congratulations!

You've learned about SQL and making database queries from within Python. Keep practicing!