Community Data Science Workshops (Fall 2014)/Day 2 Wikipedia project: Difference between revisions

From OpenHatch wiki
Content added Content deleted
imported>Fhocutt
imported>Mako
No edit summary
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{CDSW Moved}}

[[File:Wikipedia.png|right|250px]]
[[File:Wikipedia.png|right|250px]]
__NOTOC__
__NOTOC__
Line 11: Line 13:
* Practice reading and extending other people's code
* Practice reading and extending other people's code
* Create a few collections of different types of data from Wikipedia that you can do research with in the final section
* Create a few collections of different types of data from Wikipedia that you can do research with in the final section

<!--
=== Download and test the Wikipedia project ===
=== Download and test the Wikipedia project ===


If you are confused by these steps, go back and refresh your memory with the [[Community Data Science Workshops/Friday April 4th setup and tutorial|Friday April 4th setup and tutorial]] and [[Community Data Science Workshops/Friday April 4th Tutorial|Friday April 4th Tutorial]]
If you are confused by these steps, go back and refresh your memory with the [[Community Data Science Workshops (Fall 2014)/Day 0 setup and tutorial|Day 0 setup and tutorial]] and [[Community Data Science Workshops (Fall 2014)/Day 0 tutorial|Day 0 tutorial]]


(Estimated time: 10 minutes)
(Estimated time: 10 minutes)


* [[Community Data Science Workshops/May 3rd Wikipedia project Windows setup|Windows]]
* [[Community Data Science Workshops (Fall 2014)/Wikipedia project Windows setup|Windows]]
* [[Community Data Science Workshops/May 3rd Wikipedia project OS X setup|OS X]]
* [[Community Data Science Workshops (Fall 2014)/Wikipedia project OS X setup|OS X]]
* [[Community Data Science Workshops/May 3rd Wikipedia project Linux setup|Linux]]
* [[Community Data Science Workshops (Fall 2014)/Wikipedia project Linux setup|Linux]]

-->
=== Example topics to cover in Lecture ===
=== Example topics to cover in Lecture ===


Line 31: Line 33:
* edit count http://en.wikipedia.org/w/api.php?action=query&list=users&ususers=Benjamin_Mako_Hill|Jtmorgan|Sj|Mindspillage&usprop=editcount&format=jsonfm
* edit count http://en.wikipedia.org/w/api.php?action=query&list=users&ususers=Benjamin_Mako_Hill|Jtmorgan|Sj|Mindspillage&usprop=editcount&format=jsonfm
* get the content of the main page http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Main%20Page&prop=revisions&rvprop=content
* get the content of the main page http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Main%20Page&prop=revisions&rvprop=content
* example programs: [http://mako.cc/teaching/2014/cdsw-autumn/wikipedia-raw1-unicode-problems-example.py wikipedia-raw1-unicode-problems-example.py] (note: this is an example of Unicode problems when running this on Windows), [http://mako.cc/teaching/2014/cdsw-autumn/wikipedia-raw2-mudslide-edit.py wikipedia-raw2-mudslide-edit.py]

=== Resources ===
* [https://en.wikipedia.org/w/api.php?action=help&modules=query API documentation for the query module]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox API Sandbox]
* [[Sample API queries]]
* Example that saves command-line output into a text file: <code>python wikipedia-raw2-mudslide-edit.py > OsoRevisionData.txt</code>

Latest revision as of 22:04, 15 March 2015

Page Moved
All material related to the Community Data Science Workshops have been moved from the OpenHatch wiki to a new dedicated wiki and this page is no longer being updated here. Please visit the new version of the page on the Community Data Science Collective wiki.

Building a Dataset using the Wikipedia API

In this project, we will explore a few ways to gather data using the Wikipedia API. Once we've done that, we will extend this to code to create our own datasets of Wikipedia edits or other data that we might be able to use to ask and answer questions in the final session.

Goals

  • Get set up to build datasets with the Wikipedia API
  • Have fun collecting different types of data from Wikipedia
  • Practice reading and extending other people's code
  • Create a few collections of different types of data from Wikipedia that you can do research with in the final section

Download and test the Wikipedia project

If you are confused by these steps, go back and refresh your memory with the Day 0 setup and tutorial and Day 0 tutorial

(Estimated time: 10 minutes)

Example topics to cover in Lecture

Resources