Community Data Science Workshops (Fall 2014)/Day 3 lecture: Difference between revisions

no edit summary
imported>Mako
imported>Mako
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1:
{{CDSW Moved}}
 
== Material for the lecture ==
 
Line 26 ⟶ 28:
* Four things in Python I have to teach you:
** while loops
*** infinite loops
*** loops with a greater than or less than
** break / continue
** string.join()
Line 33 ⟶ 37:
* Load data into Python
** review of opening files
*** we can also open them for reading
** csv module and and csv.reader() function
** csv.DictReader()
Line 38 ⟶ 43:
** Answer question: ''What proportion of edits to Wikipedia Harry Potter articles are minor?''
*** Count the number of minor edits and calculate proportion
* Looking at time series data
** Answer question: ''What proportion of edits to Wikipedia Harry Potter articles are made by "anonymous" contributors?''
** "Bin" data by day to generate the trend line
*** Count the number of anonymous edits and calculate proportion
* Exporting and visualizing data
** Export dataset on edits over time
** Export dataset on articles over users
** Load data into Google Docs
 
We mostly worked on these questions in the afternoon:
Line 48 ⟶ 57:
** Answer question: ''Who are the most active editors on articles in Harry Potter?''
*** Count the number of edits per user
* Looking at time series data
** "Bin" data by day to generate the trend line
* Exporting and visualizing data
** Export dataset on edits over time
** Export dataset on articles over users
** Load data into Google Docs
Anonymous user