Community Data Science Workshops (Spring 2014)/Reflections: Difference between revisions

imported>Mako
imported>Mako
Line 144:
== Session 3: Data Analysis and Visualization ==
 
Our philosophy in Session 3 was to teach users to get data into tools they already know and use. We thought this would be a better use of their time and help make users independent earlier.
Because we only had three sessions, ,our philosophy in Session 3 was different than most other attempts to teach data science in Python:
 
*Based Teachon usersfeedback tofrom getthe dataapplication, into toolswe theyknow already know.that Almostalmost every user who attended our sessions had at least basic experience with spreadsheets and using spreadsheets to create simple chartingcharts. We tried to help users process data using Python into formsformats that they could load them up in Pythonexisting tools like ''LibreOffice'', ''Microsoft Excel'', or ''Google Docs''.
 
=== Lecture ===
 
AsBecause amuch result,of theour morninganalysis lecturewas focusedgoing onto basictake dataplace manipulationoutside inof Python., Wethe mostlylecture focused on review inand theon formnew ofconcept for data manipulation. The lecture began with a detailed walk-through of code we[[User:Mako|Mako]] wrote to build a new dataset andof thenmetadata mostlyfor aall focusrevisions onto countingarticles andabout grouping[https://en.wikipedia.org/wiki/Harry_Potter dataHarry Potter] on English Wikipedia.
 
TheAfter lecture started with a dataset of metadata on all revisions to articles about Harry Potter from English Wikipedia. Afterthis review of the code necessary to build it, we focused on questions related to counting, binning, and grouping data. In that process, wein triedorder to ask and answer simple questions like:
 
* What proportion of edits to Wikipedia ''Harry Potter'' articles are minor?
* What proportion of edits to Wikipedia ''Harry Potter'' articles are made by "anonymous" contributors?
* What are the most edited articles on ''Harry Potter'' articles?
* Who are the most active editors on articles in ''Harry Potter'' articles?
 
Becuse it did not require installation of software and because it ran on every platform, we did sorting and visualization in [http://docs.google.com Google Docs].
=== Projects ===
 
=== Projects ===
In the afternoon projects, one group continued with work on English Wikipedia and Harry Potter.
 
In the afternoon projects, one group continued with work on the ''Harry Potter'' dataset from English Wikipedia. In this case, focusedthe group on building a time series dataset. We were able to bin edits by day and to graph the time series of edits to English Wikipedia over time. Users could easily see the release of the ''Harry Potter'' books and movies from the time series and this was a major ''ahah'' moment for many of the participants.
 
A second project focused on MatPlotLib[http://matplotlib.org/ Matplotlib] and generated heatmaps of contributions to articles about men and women in Wikipedia based on time in Wikipedia's lifetime and time of the subjects lifetime. The heatmaps were popular with participants and were something that could not be easily done with spreadsheets.
Users could easily see the release of books and movies. This was a major ''ahah'' moment for many of the participants.
 
[[File:Matplotlib-hist2d.png|400px]]
A second project focused on MatPlotLib and generated heatmaps of contributions to articles about men and women in Wikipedia based on time in Wikipedia's lifetime and time of the subjects lifetime. The heatmaps were popular with participants and were something that could not be easily done with spreadsheets.
 
The challenge with MatPlotLib''matplotlib'' was mostly focused onaround installation which took an enormous amount of time when several learners ran into trouble. In the future, we will use [https://store.continuum.io/cshop/anaconda/ Anaconda] which we hope will address these issues because ''Anaconda'' includes MatPlotLib''Matplotlib''.
 
== General Feedback ==
Anonymous user