Community Data Science Workshops (Fall 2014)/Day 2 SQL project: Difference between revisions

Content added Content deleted
imported>Jtmorgan
(notes)
imported>Jtmorgan
(rmv stackexchange stuff, since it's in MSSQL :()
Line 5: Line 5:
== Building a Dataset using MySQL ==
== Building a Dataset using MySQL ==


In this project, we will explore a few ways to gather data from Wikimedia and StackExchange projects using [https://en.wikipedia.org/wiki/MySQL MySQL] and the [http://quarry.wmflabs.org/ Quarry] and [http://data.stackexchange.com/ Data Explorer] applications. Once we've done that, we will download the results of our queries in [https://en.wikipedia.org/wiki/Comma-separated_values CSV format] which can be used to ask and answer questions and visualize data in the final session.
In this project, we will explore a few ways to gather data from Wikimedia using [https://en.wikipedia.org/wiki/MySQL MySQL] and the [http://quarry.wmflabs.org/ Quarry] web application. Once we've done that, we will download the results of our queries in [https://en.wikipedia.org/wiki/Comma-separated_values CSV format] which can be used to ask and answer questions and visualize data in the final session.


=== Goals ===
=== Goals ===


* Learn how to use MySQL (a [https://en.wikipedia.org/wiki/SQL Structured Query Language]) to build datasets.
* Learn how to use MySQL (a [https://en.wikipedia.org/wiki/SQL Structured Query Language]) to build datasets.
* Get set up to run MySQL queries to gather data from Wikimedia and StackExchange projects.
* Get set up to run MySQL queries to gather data from Wikimedia projects.
* Practice running MySQL queries on your own to get data about who is editing particular Wikipedia articles and answering questions on StackOverflow.
* Practice running MySQL queries on your own to get data about who is editing particular Wikipedia articles.
* Create a few collections of Wikipedia data that you can do research with in the final section.
* Create a few collections of Wikipedia data that you can do research with in the final section.