JeopardyDatabase

From OpenHatch wiki

Project

Learn how to get data from a database in Python while writing parts of a Jeopardy game, using real Jeopardy data!

Goals

  • practice using the SQL database query language
  • practice getting data from a database in Python


Project setup

1. Download and un-archive the Jeopardy database project skeleton code

Un-archiving will produce a JeopardyDatabase folder containing 3 Python files and one SQL database dump.

2. Install the project dependencies

On Windows

  1. Download the SQLite command line binary from http://www.sqlite.org/download.html. Download the first link under "Precompiled Binaries For Windows".
  2. Copy sqlite3.exe into your JeopardyDatabase folder.

On OS X

  • You already have SQLite installed and don't have to do anything!

On Linux

  • Install the sqlite3 package through your package manager.

3. Create a SQLite database from the database dump

Inside the JeopardyDatabase folder is a file called jeopardy.dump which contains a SQL database dump. We need to turn that database dump into a SQLite database.

Once you have SQLite installed, you can create a database from jeopardy.dump with (this needs to be run in the command prompt):

sqlite3 jeopardy.db < jeopardy.dump

This creates a sqlite3 database called jeopardy.db

4. Test your setup

At a command prompt, start sqlite3 using the jeopardy.db database by running:

sqlite3 jeopardy.db

That should start a sqlite prompt that looks like this:

SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>

At that sqlite prompt, type .tables and hit enter. That should display a list of the tables in this database:

sqlite> .tables
category  clue    
sqlite>

To quit sqlite:

sqlite> .quit

From a command prompt, navigate to the JeopardyDatabase directory and run

python jeopardy_categories.py

If you run into a syntax error because of "print", go into the file and make sure the printed statement is encased in a ()

You should see a list of 10 jeopardy categories printed to the screen. If you don't, let a staff member know so you can debug this together.

Project steps

1. Look at the layout of the Jeopardy database

Start sqlite with:

sqlite3 jeopardy.db

Then look at the tables in your database by running the following commands and the sqlite prompt:

  • .table, which will list the tables in the database
  • .schema category, which will show the organization of the category table, including the fields and the data types they store.

It should look like this:

sqlite> .schema category
CREATE TABLE "category" (
    id INTEGER PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    game INTEGER NOT NULL,
    boardPosition INTEGER
    );

This tells us that the category table has 4 fields: id, name, game, and boardPosition.

  • .schema clue

Read these short documents:

Check your understanding:

  • What tables are in the database?
  • What is a schema?
  • What fields are in the category table?
  • What fields are in the clue table?

2. Query the database with SELECT

Try running the following queries from the sqlite prompt:

  • SELECT * FROM category;
  • SELECT NAME FROM category;
  • SELECT * FROM clue;
  • SELECT text, answer, value FROM clue;
  • SELECT text, answer, value FROM clue LIMIT 10;

Explore the category and clue tables with your own SELECT queries.

Check your understanding:

  • What does * mean in the above queries?
  • What does the LIMIT SQL keyword do?
  • Does case matter when making SQL queries?

Step 2 resources:


3. Make database queries from Python

Examine the code in jeopardy_categories.py. To make a database query from Python, you need to:

  1. Import a Python library for making database connections
  2. Establish a connection to the desired database
  3. Get a cursor from the database for making queries
  4. Execute the database query using the standard SQL syntax
  5. Retrieve the list of results from the database cursor
  6. Do something useful with the results, like print them
  7. Close the database connection

Match up each of these steps with lines of code in the file.


4. Tweak the existing Jeopardy scripts

1. Modify jeopardy_categories.py to print both the category and game number

tip: Remind yourself of the categories schema by running .schema category at a sqlite prompt.

Example output:

Example categories:

DETECTIVE FICTION (game #1)
THE OLD TESTAMENT (game #2)
ASIAN HISTORY (game #4)
RIVER SOURCES (game #5)
WORLD RELIGION (game #3)
SEAN SONG (game #2)
ANIMATED MOVIES (game #1)
NEW YORK CITY (game #6)
AFRICAN WILDLIFE (game #7)
LITTLE RED RIDING HOOD (game #8)

2. Modify jeopardy_clues.py to only print clues with an $800 value.

A good way to achieve this is by adding a WHERE clause to the SQL query in jeopardy_clues.py.

Read about WHERE clauses in this short document:


Example output:

Example clues:

[$800]
A: She also created the detectives Tuppence & Tommy Beresford
Q: What is 'Agatha Christie'

[$800]
A: According to this Old Testament book, this "swords into plowshares" prophet walked naked for 3 years
Q: What is 'Isaiah'
...

5. Daily Doubles

Write a script that prints 10 daily doubles and their responses.


tip: The clue table has an isDD field. tip: Note that the category name is in the category table, and the question and answer are in the clue table.


Example output:

Category: NEW YORK CITY
Question: The heart of Little Italy is this street also found in a Dr. Seuss book title
Answer: Mulberry Street
===
Category: RIVER SOURCES
Question: This Mideastern boundary river rises on the slopes of Mount Hermon
Answer: the Jordan
===
Category: ROOM
Question: The Titanic has 3 rooms for this--only men were allowed there, as women weren't supposed to do it in public
Answer: smoking
...

Bonus exercises

1. Random category clues

Write a script that randomly chooses a category and prints clues from that category.


tip: SQL supports an "ORDER BY RANDOM()" clause that will return rows in a random order. For example, to randomly pick 1 category id you could use:

SELECT id FROM category ORDER BY RANDOM() LIMIT 1;

You can also use ORDER BY to sort the clues by value.


Example output:

5 GUYS NAMED MOE
[$200] Last name of Moe of the Three Stooges
[$400] Moe Strauss founded this auto parts chain along with Manny Rosenfield & Jack Jackson
[$600] Major league catcher Moe Berg was also a WWII spy for this agency, precursor of the CIA
[$800] Term for the type of country music Moe Bandy plays, the clubs where he began, or the "Queen" he sang of in 1981
[$1000] This "Kool" rapper's album "How Ya Like Me Now" began a rivalry with LL Cool J

Exercises resources:

2. Random game categories

Write a script to randomly choose a game number and print the categories from that game.

tip: the category table has game and round fields. round 0 is the Jeopardy round, round 1 is the Double Jeopardy round, and round 2 is Final Jeopardy.


Example output:

Categories for game #136:
0 WELCOME TO MY COUNTRY
0 METALS
0 GO GO GAUGUIN
0 FILE UNDER "M"
0 ANIMATED CATS
0 MAO MAO MAO MAO
1 SHAKESPEARE'S OPENING LINES
1 HEY, MARIO!
1 BRIDGE ON THE RIVER....
1 RUNNING MATES
1 13-LETTER WORDS
1 TONY BENNETT'S SONGBOOK
2 IN THE NEWS 2000


3. Top 20 Jeopardy categories

Read about the GROUP BY clause and write a script using it to print the 20 most common Jeopardy categories.

An example of using GROUP BY and ORDER BY to produce an ordered list of counts on a hypothetical foo field is:

SELECT foo, COUNT(foo) AS count FROM my_table GROUP BY foo ORDER BY count;

Example output:

81 LITERATURE
79 BEFORE & AFTER
73 WORD ORIGINS
71 SCIENCE
64 BUSINESS & INDUSTRY
63 AMERICAN HISTORY
...

Congratulations!

You've learned about SQL and making database queries from within Python. Keep practicing!