Community Data Science Workshops (Spring 2014)/Reflections: Difference between revisions

imported>Mako
imported>Mako
Line 124:
SimpleAPIs might have been a good example of somethign we could do as a small group excercise between parts of the lecture.
 
== Session 3: Data Analysis and Visualization ==
== session 3 ==
 
Because we only had three sessions, ,our philosophy in Session 3 was different than most other attempts to teach data science in Python:
we covered basic data manipulation stuff
 
* Teach users to get data into tools they already know. Almost every user who attended our sessions had at least basic experience with spreadsheets and simple charting. We tried to help users process data into forms that they could load them up in Python.
** afternoon sessions
 
=== Lecture ===
matplot lib was super tough to install.
 
As a result, the morning lecture focused on basic data manipulation in Python. We mostly focused on review in the form of a detailed walk-through of code we wrote to build a new dataset and then mostly a focus on counting and grouping data.
- maybe anaconda would help?
- heatmaps were a hit and something is hard to do in other software but ath worksed out well
- focus more on stuff that folks can't do with their spreadhseet
 
The lecture started with a dataset of metadata on all revisions to articles about Harry Potter from English Wikipedia. After review of the code necessary to build it, we focused on questions related to counting, binning, and grouping data. In that process, we tried to ask and answer simple questions like:
seeing the harry potter graph was a complete hit
 
* What proportion of edits to Wikipedia Harry Potter articles are minor?
* What proportion of edits to Wikipedia Harry Potter articles are made by "anonymous" contributors?
* What are the most edited articles on Harry Potter?
* Who are the most active editors on articles in Harry Potter?
 
=== Projects ===
 
In the afternoon projects, one group continued with work on English Wikipedia and Harry Potter.
 
In this case, focused on building a time series dataset. We were able to bin edits by day and to graph the time series of edits to English Wikipedia over time.
 
Users could easily see the release of books and movies. This was a major ''ahah'' moment for many of the participants.
 
A second project focused on MatPlotLib and generated heatmaps of contributions to articles about men and women in Wikipedia based on time in Wikipedia's lifetime and time of the subjects lifetime. The heatmaps were popular with participants and were something that could not be easily done with spreadsheets.
 
The challenge with MatPlotLib was mostly focused on installation which took an enormous amount of time. In the future, we will use Anaconda which we hope will address these issues because Anaconda includes MatPlotLib.
 
== final thoughts ==
Anonymous user