Shakespeare

From OpenHatch wiki
Revision as of 13:22, 27 September 2014 by imported>Zanwenhuahao (→‎Lists)

How many times does the word 'love' appear in Shakespeare's plays? Is it possible to find negative passages using a list of keywords? We'll use Python to practice our skills and answer questions like these.

Setup

We'll need additional setup here.

Goals

  • Have fun using Python to learn basic data science.
  • Practice searching for information in text documents
  • Practice manipulating strings
  • Practice using loops
  • Practice using lists
  • Practice using dictionaries.
  • Get experience with regular expressions.

Skills & Exercises

Strings

  • Checking if two strings are equal
>>> s = "mama"
>>> s == "mama"
True
>>> s == "papa"
False
  • Checking if a string contains another as a substring
>>> s = "mama"
>>> s in "I love mama"
True
>>> "day" in "Saturday"
True
>>> "Day" in "Saturday"
False
>>> Sat = "Saturday"
>>> "day" in Sat
True

File Operations

  • Open a file
>>> M_S=open("A Midsummer-Night's Dream.txt", "r")
>>> M_S
<open file "A Midsummer-Night's Dream.txt", mode 'r' at 0x102d04660>
  • Read a line
>>> M_S = open("A Midsummer-Night's Dream.txt", "r")
>>> line = M_S.readline()
>>> print line
< Shakespeare -- A MIDSUMMER-NIGHT'S DREAM >
  • Read a file line by line until the end of file (also known as eof )
>>> for eachline in M_S:
>>> # Do something here with each line read
...

File & Strings Exercise

Using the word "love"
  • We will use the play Romeo and Juliet

1. Make a python program, call it RomeoJuliet.py save it under the same directory as where you've saved the Shakespearean texts.

2. In your program, open the file "Romeo and Juliet.txt" (The text file need to be saved in the same directory as your python program to be able to use local file name)

3. Check to see that the file was opened properly

4. Now create a variable that represents the string "love", for example:

>>> lv = "love"

5. Also create a variable that is a counter for the number of time that "love" appears, for example:

>>> lv_counter = 0;

6.Use a for loop (or while loop, if you like) to read through the lines of the file. While you are reading each line, count the number of lines that contains the word "love"

7. Does Shakespeare use a lot of love in his plays? How about other synonyms of "love"?

Lists

  • Recall that lists in Python can contain arbitrary objects and dynamically expanded as new things "arrive" to fill up the list
  • We may declare our list as such
>>> lst=["William","Shakespeare"]
>>> lst
['William', 'Shakespeare']
  • We may append something to our list as such
>>> lst.append("Bard of Avon")
>>> lst
['William', 'Shakespeare', 'Bard of Avon']
>>> lst = open("importList.txt").readlines()
# readlines.() reads each line in the file and puts each line as a string and saves them all to the list called lst.
>>> lst
['apple\n', 'pear\n', 'oranges\n', 'kiwi\n', 'banana\n', 'monkey']
# notice that there are these weird '\n' characters following each item, they are the newline symbols (when you hit enter to start a new line)
# we can remove them as such
>>> lstFinal = [eachthing.strip("\n") for eachthing in lst]
>>> lstFinal
['apple', 'pear', 'oranges', 'kiwi', 'banana', 'monkey']