Anonymous user
Community Data Science Workshops (Fall 2014)/Day 3 lecture: Difference between revisions
Community Data Science Workshops (Fall 2014)/Day 3 lecture (view source)
Revision as of 22:00, 15 March 2015
, 9 years agono edit summary
imported>Mako |
imported>Mako No edit summary |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1:
{{CDSW Moved}}
== Material for the lecture ==
Line 14 ⟶ 16:
** We'll focus on manipulating data in Python
** Visualizing things in Google Docs
* Lunch (
* Project based work
** Project and challenge based continuition of the work in here focusing on Google Docs
Line 25 ⟶ 27:
* My philosophy about data analysis: ''use the tools you have''
* Four things in Python I have to teach you:
** while loops
*** infinite loops
*** loops with a greater than or less than
** break / continue
** string.join()
** defining your own functions with <code>def foo(argument):</code>
* Walk-through of <code>get_hpwp_dataset.py</code>
* Look at dataset with <code>more</code> and/or in spreadsheet
* Load data into Python
** review of opening files
*** we can also open them for reading
** csv module and and csv.reader() function
** csv.DictReader()
Line 38 ⟶ 43:
** Answer question: ''What proportion of edits to Wikipedia Harry Potter articles are minor?''
*** Count the number of minor edits and calculate proportion
* Looking at time series data▼
** "Bin" data by day to generate the trend line▼
* Exporting and visualizing data▼
** Export dataset on edits over time▼
** Export dataset on articles over users▼
** Load data into Google Docs▼
We mostly worked on these questions in the afternoon:
Line 48 ⟶ 57:
** Answer question: ''Who are the most active editors on articles in Harry Potter?''
*** Count the number of edits per user
▲* Looking at time series data
▲** "Bin" data by day to generate the trend line
▲* Exporting and visualizing data
▲** Export dataset on edits over time
▲** Export dataset on articles over users
▲** Load data into Google Docs
|