Community Data Science Workshops (Fall 2014)/Day 3 lecture: Difference between revisions

no edit summary
imported>Mako
imported>Mako
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 1:
{{CDSW Moved}}
 
== Material for the lecture ==
 
Line 14 ⟶ 16:
** We'll focus on manipulating data in Python
** Visualizing things in Google Docs
* Lunch (notvegetarian PizzaGreek!)
* Project based work
** Project and challenge based continuition of the work in here focusing on Google Docs
Line 25 ⟶ 27:
* My philosophy about data analysis: ''use the tools you have''
* Four things in Python I have to teach you:
** Functions
** while loops
*** infinite loops
*** loops with a greater than or less than
** break / continue
** string.join()
** defining your own functions with <code>def foo(argument):</code>
* Walk-through of <code>get_hpwp_dataset.py</code>
* Look at dataset with <code>more</code> and/or in spreadsheet
* Load data into Python
** review of opening files
*** we can also open them for reading
** csv module and and csv.reader() function
** csv.DictReader()
Line 38 ⟶ 43:
** Answer question: ''What proportion of edits to Wikipedia Harry Potter articles are minor?''
*** Count the number of minor edits and calculate proportion
* Looking at time series data
** Answer question: ''What proportion of edits to Wikipedia Harry Potter articles are made by "anonymous" contributors?''
** "Bin" data by day to generate the trend line
*** Count the number of anonymous edits and calculate proportion
* Exporting and visualizing data
** Export dataset on edits over time
** Export dataset on articles over users
** Load data into Google Docs
 
We mostly worked on these questions in the afternoon:
Line 48 ⟶ 57:
** Answer question: ''Who are the most active editors on articles in Harry Potter?''
*** Count the number of edits per user
* Looking at time series data
** "Bin" data by day to generate the trend line
* Exporting and visualizing data
** Export dataset on edits over time
** Export dataset on articles over users
** Load data into Google Docs
Anonymous user