Shakespeare: Difference between revisions

From OpenHatch wiki
imported>Zanwenhuahao
imported>Zanwenhuahao
Line 144: Line 144:
 
For more information on dictionary: https://docs.python.org/2/tutorial/datastructures.html#dictionaries
 
For more information on dictionary: https://docs.python.org/2/tutorial/datastructures.html#dictionaries
 
====<font color="navy">Lists and Dictionaries Exercises====
 
====<font color="navy">Lists and Dictionaries Exercises====
List & Iteration Exercise 1:
+
<b>List & Iteration Exercise 1:</b>
 
* Create a new python program.
 
* Create a new python program.
 
* Import the list of characters from <b> A Midsummer Night's Dream </b> saved in the file <code>AMsND_Char.txt</code>
 
* Import the list of characters from <b> A Midsummer Night's Dream </b> saved in the file <code>AMsND_Char.txt</code>
Line 156: Line 156:
 
</pre>
 
</pre>
   
List & Iteration Exercise 2:
+
<b>List & Iteration Exercise 2:</b>
 
* In the same program that you created from exercise 1
 
* In the same program that you created from exercise 1
 
* Now open the play <b> A Midsummer Night's Dream </b>
 
* Now open the play <b> A Midsummer Night's Dream </b>
Line 163: Line 163:
 
* Close the file.
 
* Close the file.
   
List & Iteration Exercise 3*:
+
<b>List & Iteration Exercise 3*:</b>
 
* Use the same program as above
 
* Use the same program as above
 
* Open the same play again
 
* Open the same play again
Line 172: Line 172:
 
* <b>Note:</b> You can only readline() through a file once, so for each line read, you must check that if any of the characters have started a new speech that line.
 
* <b>Note:</b> You can only readline() through a file once, so for each line read, you must check that if any of the characters have started a new speech that line.
   
List & Iteration Exercise 4**:
+
<b>List & Iteration Exercise 4**:</b>
 
* Do the same as exercise 3, except now instead of printing them directly, save the <b>names of character</b> and <b>number of times they spoke</b> as a <b>key:value</b> pair in to a dictionary.
 
* Do the same as exercise 3, except now instead of printing them directly, save the <b>names of character</b> and <b>number of times they spoke</b> as a <b>key:value</b> pair in to a dictionary.
 
* Print the dictionary to screen and check that you have the same result as part 3.</font>
 
* Print the dictionary to screen and check that you have the same result as part 3.</font>

Revision as of 14:14, 27 September 2014

Shakespeare.jpg

How many times does the word 'love' appear in Shakespeare's plays? Is it possible to find negative passages using a list of keywords? We'll use Python to practice our skills and answer questions like these.

Setup

We'll need additional setup here.

Goals

  • Have fun using Python to learn basic data science.
  • Practice searching for information in text documents
  • Practice manipulating strings
  • Practice using loops
  • Practice using lists
  • Practice using dictionaries.

Skills & Exercises

Strings

  • Checking if two strings are equal
>>> s = "mama"
>>> s == "mama"
True
>>> s == "papa"
False
  • Checking if a string contains another as a substring
>>> s = "mama"
>>> s in "I love mama"
True
>>> "day" in "Saturday"
True
>>> "Day" in "Saturday"
False
>>> Sat = "Saturday"
>>> "day" in Sat
True

File Operations

  • Open a file
>>> M_S=open("A Midsummer-Night's Dream.txt", "r")
>>> M_S
<open file "A Midsummer-Night's Dream.txt", mode 'r' at 0x102d04660>
  • Read a line
>>> M_S = open("A Midsummer-Night's Dream.txt", "r")
>>> line = M_S.readline()
>>> print line
< Shakespeare -- A MIDSUMMER-NIGHT'S DREAM >
  • Read a file line by line until the end of file (also known as eof )
>>> for eachline in M_S:
>>> # Do something here with each line read
...

File & Strings Exercise

Using the word "love"
  • We will use the play Romeo and Juliet

1. Make a python program, call it RomeoJuliet.py save it under the same directory as where you've saved the Shakespearean texts.

2. In your program, open the file "Romeo and Juliet.txt" (The text file need to be saved in the same directory as your python program to be able to use local file name)

3. Check to see that the file was opened properly

4. Now create a variable that represents the string "love", for example:

>>> lv = "love"

5. Also create a variable that is a counter for the number of time that "love" appears, for example:

>>> lv_counter = 0;

6.Use a for loop (or while loop, if you like) to read through the lines of the file. While you are reading each line, count the number of lines that contains the word "love"

7. Does Shakespeare use a lot of love in his plays? How about other synonyms of "love"?

Lists

  • Recall that lists in Python can contain arbitrary objects and dynamically expand as new things "arrive"
  • We may declare our list as such
>>> lst=["William","Shakespeare"]
>>> lst
['William', 'Shakespeare']
  • We may append something to our list as such
>>> lst.append("Bard of Avon")
>>> lst
['William', 'Shakespeare', 'Bard of Avon']
>>> lst = open("importList.txt").readlines()

# readlines.() reads each line in the file and puts each line as a string and saves them all to the list called lst.

>>> lst
['apple\n', 'pear\n', 'oranges\n', 'kiwi\n', 'banana\n', 'monkey']

# notice that there are these weird '\n' characters following each item, they are the newline symbols (when you hit enter to start a new line)
# we can remove them as such:

>>> lstFinal = [eachthing.strip("\n") for eachthing in lst]
>>> lstFinal
['apple', 'pear', 'oranges', 'kiwi', 'banana', 'monkey']

Dictionaries

  • Dictionaries are another data structure commonly used in programming. Dictionaries (think of dictionary in real life) stores what we call key-value pairs. One common thing to do with dictionary is determine an entry's value given its key.
  • Entries in the dictionary are stored orderless (i.e. no way to arrange storage position according to value)
  • keys for each entry in the dictionary must be unique, values do not have to be unique.
  • For our simplicity, we will use a string as keys.
  • Creating a dictionary:
>>> myDict = {"Python":9, "Workshop": 27, "Interesting":1}
>>> myDict
{'Python': 9, 'Interesting': 1, 'Workshop': 27}

# notice that printing them does not necessarily give the entries in the order you entered them.
  • The following basic operations works with entries in a dictionary (remove an entry, add an entry, check if a key is in the dictionary, print the value corresponding to a given key):
>>> del myDict["Interesting"]
>>> myDict
{'Python': 9, 'Workshop': 27}
>>> myDict["Interesting"]=1
>>> myDict
{'Python': 9, 'Interesting': 1, 'Workshop': 27}
>>> "Interesting" in myDict
True
>>> myDict["Interesting"]
1

For more information on dictionary: https://docs.python.org/2/tutorial/datastructures.html#dictionaries

Lists and Dictionaries Exercises

List & Iteration Exercise 1:

  • Create a new python program.
  • Import the list of characters from A Midsummer Night's Dream saved in the file AMsND_Char.txt
  • Save it to a list, where each item in the list should be the name of the character enclosed in angled brackets, e.g. ['<HELENA>', ...]
  • Print the list to see that you got it right
  • For good practice, you should close the file by using
>>> myFile = open("name_of_file", "r")
...
>>> myFile.close()

List & Iteration Exercise 2:

  • In the same program that you created from exercise 1
  • Now open the play A Midsummer Night's Dream
  • Read through the play line by line, count how many times has <OBERON> spoke.
  • Print to screen the name <OBERON> and the number of times he spoke.
  • Close the file.

List & Iteration Exercise 3*:

  • Use the same program as above
  • Open the same play again
  • Now iterate through the list of character you saved from exercise 1, see how many times each of them speak.
  • Print to screen each character's name and the number of times they spoke.
  • Close the file.
  • Note: You can only readline() through a file once, so for each line read, you must check that if any of the characters have started a new speech that line.

List & Iteration Exercise 4**:

  • Do the same as exercise 3, except now instead of printing them directly, save the names of character and number of times they spoke as a key:value pair in to a dictionary.
  • Print the dictionary to screen and check that you have the same result as part 3.