Community Data Science Workshops (Spring 2014)/Reflections

Over three weekends in Spring 2014, a group of volunteers organized the Community Data Workshops (CDSW) — a series of four sessions designed to introduce some of the basic tools of programming and analysis of data from online communities to absolute beginners. The CDSW were held between April 4th and May 31st in 2014 at the University of Washington, Seattle.

This page hosts reflections on organization and curriculum and is written for anybody interested in organizing their own CDSW. This includes future versions of ourselves.

In feedback, the mentors, and the students, suggested that the workshops were a huge success. Students suggested that learned an enormous amount and benefitted enormously. Mentors were also generally very excited about running similar projects in the future. That said, we all felt there were many ways to improve on the projects.

Structure
We had four sessions:


 * Session 0 (Friday April 4th): Setup and Programming Practice
 * Session 1 (Saturday April 5th): Introduction to Python
 * Session 2 (Saturday May 3rd): Building data sets using web APIs
 * Session 3 (Saturday May 31st): Data analysis and visualization

Our organization and the curriculum for Sessions 0 and 1 were borrowed from the Boston Python Workshop. Session 0 was a three hour evening session to install software. The other sessions were all day-long session (10am to 4pm) sessions broken up into the following schedule:


 * Morning, 10am-noon: A 2 hour lecturelanguage
 * Lunch, noon-1pm
 * Afternoon, 1pm-3:30pm: Practice workinig on projects in 3 breakout sessions
 * Wrap-up, 3:30pm-4pm: Wrap-up, next steps, and upcoming opportunities

We did not take roll or even track how many people were present. Our feeling was that nearly every student who came to the first week (Sesions 0 and 1) came to Session 2. Retention between the second two sessions was much worse with perhaps only 60% of the full group returning for session 3. We attribute this both to poor timing (the weekend before finals at UW) and to the long space between the sessions.

Morning Lectures
Benjamin Mako Hill gave all three of the two hours lectures. All of the lectures involved the teach working through material in an interactive Python interpretor with students following along on their own computers. In general, the lecturs were well recieved by students.

Concern with the lectures include the feeling that:


 * Two hours of straight lecture of difficult material was too long
 * If students got lost, it could be very hard to catch up given how the interactive session tended to build on earlier steps.
 * There were often more mentors than needed in the morning sessions meaning that many mentors were idle.
 * As the lectures progressed and the work and tasks became more complex, working in the interactive interpretor become increasingly difficult — particularly for very long programs.

To address these concerns, we've suggested the following changes:


 * Break up the lecture into at least two parts. Between those parts, include a small (10-15 minute) long excercise. This will both break things up, allow mentors to be of more help, and give students who fell behind a chance to catch up. It will also allow students to grab coffee and such.
 * Record the lectures so that students can catch up after the fact.
 * Arrange for some mentors to arrive after noon if they'd prefer.
 * Upload not only the outline, but examples of all of the code, that we will run interactively.
 * Switch into writing code in files and running those files much earlier — perhaps as soon as we hit more than 2-3 lines in a for loop in Session 1. This might make writing these loops useful in that they can be reused by students and will introduce the idea of writing and running code in a file (as opposed to a REPL environment) much earlier.

Projects
In the afternoon, we broken into small groups to work on projects. In each session we tried to have two projects on different topics for people with different interests. These projects were led by a mentor

Session 0: Python Setup
Challenges:


 * Users on Windows struggled to get Python setup.
 * Users had different (and often older) version of Python which became a bigger issue when we began using URL parsing libraries.
 * Mac users struggled with — and generally did not like — Smultron.

Proposed changes:


 * Use Anaconda for getting Python install like SWC does
 * Use a different text editor for MacOS. TextWrangler was suggested
 * http://repl.it looks intriguing but perhaps not either ready enough or "real" enough
 * Emphasize more strongly that Windows users need to come to Session 0.
 * Change the CodeAcademy lessons to remove and change the HTML example. Users that knew HTML already were often confused because printing "&lt;b&gt;foo&lt;/b&gt;" did not result in actually bolded text. This was just the wrong choice for a simple string concatenation example.
 * Add some text to emphasize the difference between the Python shell and the system shell. Students were confused about this until the end.
 * Add a new check off step that includes the following: create a file, save it, run

Session 1: Introduction to Python
The curriculum for BPW is well tested and worked well. We had no major challenges but there several things we will change when we do the material again:


 * If possible, we would have liked to do introductions (i.e., simple "your name and where you are from and what you want to do up") which would have been useful up front — even in a big group.
 * The BPW examples were not focused on data and were more classic computer science projects. In the future, we would like to choose some examples that are little more data focused.

In general, we felt that the Colorwall example was way too complicated. It introduced many features and concepts that nobody had seen up front. The Wordplay example was much beter in this regard.

- colorwall was WAY to complicated. - grepping through the works of shakespeare might be a good example


 * session 2

the most successful project, by far

placekitten was a complete hit
 * lectures

explaining what an API is was hard. it's something worth preparing for in advance

if we removed anything from the whole thing it might be JSON

- benefits are that it provides a way to think to think about more complex data structures - downside is that since most apis students will use already have python interfaces, it ends up being a little irrelevant


 * afternoon sessions

twitter/tweepy projects:

discoverability on the tweepy objects was a challenge. you get an object but you it's not easy to introspect those and see what's there in the same way you can with a json object. this comes as a suprise and was not something we taught.

wikipedia:

- we focused on building a version of categories and catfishing

both worked very well. reviews were generally super posiitive


 * ideas

- other APIS, maybe ones without existing modules - mabye work on toehr apis in small groups


 * session 3

we covered basic data manipulation stuff


 * afternoon sessions

matplot lib was super tough to install.

- maybe anaconda would help? - heatmaps were a hit and something is hard to do in other software but ath worksed out well - focus more on stuff that folks can't do with their spreadhseet

seeing the harry potter graph was a complete hit


 * final thoughts

we want to focus on getting people more toward independence.

people didn't quite make it all the way

our final session seemed to let out a little bit on a low point int he class

one suggestion is to have a final session with no lecture or curriculum. people can come and mentors will be with them to work o n projects.

of course, we want everybody to come so we shoudl have a set of "random" projects for folks that don't have them already


 * logistic observations
 * budget

For lunch we spent between $400 (pizza), $360 (less pizza), and $600 (for fancy Indian at the last one). This was for 50 students and 18 mentors but we assumed about 60 people would actually be there. We also spent $50 in the mornings for coffee.

Most mentors could not make the after-session so we spent about $100 per session on mentor dinners. If more people showed up, it would have been closer to $200-250 per mentor dinner.

The rooms were free.

If you had a toal budget would be in the order of $2000-2500, I think you could easily do a similar 3.5 day-long sessions.

Things we sould do differently

spacing between sessions too much - every other week?

breaks for lunch were a bit too long. 45 minutes shoudl be enough. folks were interested in getting back in action. food was simple and always there on time so we could have jsut run with this

the general structure of the entire thing was not as clear as it might be or could be. this was at least in part because the details of what we would teach int he later sesions were not done when we started

maybe include some spot where we can talk for 10-15 minutes bout how to use documentation

more windows experienced mentors

challenge going to the right directory. understanding about the path and the idea that files/datasets need to be local to the place the script is run. that was unclear

sticky notes worked super well but sort of gained less value as we went along

things to teach:

- debugging - reading documentation - troubleshooting and looking at stackexchange