Anonymous user
Community Data Science Workshops (Spring 2014)/Reflections: Difference between revisions
Community Data Science Workshops (Spring 2014)/Reflections (view source)
Revision as of 22:12, 15 March 2015
, 9 years agomoved to wiki.communitydata.cc
imported>Mako |
imported>Jtmorgan m (moved to wiki.communitydata.cc) |
||
(27 intermediate revisions by 4 users not shown) | |||
Line 1:
{{CDSW Moved}}
Over three weekends in Spring 2014, a group of volunteers organized the [[Community Data Science Workshops (Spring 2014)]] (CDSW) — the first series of four sessions designed to introduce some of the basic tools of programming and analysis of data from online communities to absolute beginners. This version of the [[CDSW]] were held between April 4th and May 31st in 2014 at the University of Washington in Seattle.
This page hosts reflections on organization and curriculum and is written for anybody interested in organizing their own CDSW — including the authors!
In general, the mentors and students
If you have any questions or issues, you can contact [[Benjamin Mako Hill]] directly or can email the whole group of mentors at cdsw-sp2014-mentors@uw.edu.
== Structure ==
The [[
* '''Session 0 (Friday April 4th)''': [[
* '''Session 1 (Saturday April 5th)''': [[
* '''Session 2 (Saturday May 3rd)''': [[
* '''Session 3 (Saturday May 31st)''': [[
Our organization and the curriculum for Sessions 0 and 1 were borrowed from the [http://bostonpythonworkshop.com/ Boston Python Workshop] (BPW): Session 0 was a three hour evening session to install software. The other sessions were all day-long session (10am to 4pm) sessions broken up into the following schedule:
* '''Morning, 10am-noon''': A 2 hour lecture
Line 21 ⟶ 25:
* '''Wrap-up, 3:30pm-4pm''': Wrap-up, next steps, and upcoming opportunities
We had 12 mentors volunteer initially although more joined as the event progressed.
We had about 150 participants apply to attend the sessions. We selected on programming skill (to ensure that all attendees were complete beginners), enthusiasm, and randomly to maintain a learner to mentor ratio of between 4 and 5. We admitted just over 50 participants.
Our feeling was that nearly every student who came to the first week (Sessions 0 and 1) came to Session 2. Retention between the second two sessions was much worse with perhaps only 60% of the full group returning for Session 3. We attribute this rentention to poor timing (the weekend before finals at UW which affected many students) and to the long space between the sessions.
We collected detailed feedback from users at three points using the following Google forms (these are copies):
* [https://docs.google.com/forms/d/1gPmgZvOxfE0KVRkb_ySgTqNvCaa4Rl8PYUY9u-NVwTE/viewform Application to the workshop]
* [https://docs.google.com/forms/d/1FGASnZLA3V13JTuJg5LF0fVvrUX9quKYc95yeEATzHY/viewform After Session 1]
* [https://docs.google.com/forms/d/1UhEU3aWKSuLpfBgR8CZcW8JrdgNRDj6FuT8yAqFCFmE/viewform After Session 2]
We used this feedback to both evaluate what worked well and what did not and to get a sense of what students wanted to learn in the next session and which afternoon sessions they might find interesting. We did not collect feedback after the final session but we should have.
=== Morning Lectures ===
[[Benjamin Mako Hill]] gave all three of the two-hour
Concern with the lectures include
* Two hours of straight lecture of difficult material was
* If students
* There were often more mentors than really needed in the morning sessions meaning that many mentors were often idle.
* As the lectures progressed and the work and tasks became more complex, working in the interactive interpreter become increasingly difficult — particularly for
To address these concerns, we
*
* Record the lectures so that students can catch up after the
* Arrange for some mentors to arrive after noon if they
* Upload not only the outline, but examples of all of the code
* Switch into writing code in separate files and running those files much earlier — perhaps as soon as we hit more than 2-3 lines in a <code>for</code> loop in Session 1
=== Projects ===
In the
In
In Session 3, we did not use Code Academy but instead devoted the self-directed room to students working with mentors on data science projects of their choice. Because of issues with the student to mentor ratio, we asked that students only participate in the self-directed track if they felt confident they could be self-sufficient working on their own 70-80% of the time.
In all other tracks, student would download a prepared example in the form a of a <code>zip</code> file or <code>tar.gz</code> file. In each case, these projects would include:
* All of the libraries necessary to run the examples (e.g., [http://www.tweepy.org/ Tweepy] for the Session 2 Twitter track).
* All of the data necessary to run the example programs (e.g., a full English word list for the Wordplay example).
* Any other necessary code or libraries we had written for the example.
* A series of small numbered example programs (~5-10 examples). Each
On average, the
Learners would work on these challenges at their own pace working with
In cases, more advanced students could "jump ahead" and begin working on their own challenges or changing the code to work in different ways. This was welcome and encouraged.
In all cases, we gave students red sticky notes they could use to signal that they needed help (a tool borrowed from [http://software-carpentry.org/ SWC]).
== Session 0: Python Setup ==
The goal of this session was to get users setup with Python and starting to learn some of the basics. The setup curriculum was adpated from BPW. We ran into the following challanges:
* Users on Windows struggled to get Python setup and added to their path.
* Users had different (and often older) version of Python which became a bigger issue when we began using
* Mac users struggled with — and generally did not like
* Use [https://store.continuum.io/cshop/anaconda/ Anaconda] for getting Python
* Use a different text editor for MacOS.
* In browser Python (e.g., http://repl.it)
* Emphasize more strongly that Windows users ''need'' to come to Session 0
* Change the Code Academy lessons to remove and change the HTML example. Users that knew HTML already were often confused because printing "<b>foo</b>" did not result in actually bolded text. This was just the wrong choice for a simple string concatenation example.
* Add some text to emphasize the difference between the Python shell and the system shell. Students were confused about this
* Add a new check off step that includes the following: create a file, save it, run it.
== Session 1: Introduction to Python ==
The goal of this session was to teach the basic of programming in Python. The curriculum for BPW has been used many times and is well tested.
That said, there several things we will change when we teach the material again:
* If possible, we would have liked to do introductions (i.e., simple "your name and where you are from and what you want to do up") which would have been useful up front — even in a big group. This seems more important in a multi-day event and would have been useful for the mentors.
* The BPW projects were not focused on data and were more like classic computer science class projects. In the future, we would like to choose some examples that are little more data focused.
=== Afternoon sessions ===
In terms of the afternoon sessions, we felt that the [[ColorWall]] example was ''way'' too complicated. It introduced many features and concepts that nobody had seen and many users were flustered.
The [[Wordplay]] project was much better in this regard. In particular, we liked that Wordplay was broken up into a series of small example projects that each did one small thing. This provided us with an opportunity to walk through the example and then pose challenges to students to make changes to the code.
In the future, we will replace [[ColorWall]] with another more data-focused example. Our current thought is to build a little example involves interating through a pre-parsed version of the complete works of Shakespeare.
== Session 2: Learning APIs ==
The goal of this session was to describe what web APIs were, how they worked (making HTTP requests and receiving data back), how to understand JSON Data, and how to use common web APIs from Wikipedia and Twitter.
Mentors and students felt that this session was the most successful and effective session.
=== Morning lecture ===
The morning lecture was well received — if delivered too quickly. Unsurprisingly, the example of [http://placekitten.com/ PlaceKitten] as an API was an enormous hit: informative ''and'' cute.
Defining APIs was difficult. First, general ambiguity around the use of the term and the difference between APIs in general and web APIs should be foregrounded. Learners frequently wanted to ask questions like, "Where in this Python program is the API?" It was difficult for some to grasp that the API is the ''protocol'' that describes what a client can ask for and what they can expect to receive back. Preparing a concise answer to this question ahead of time would have been worthwhile. We spent too much time on this in the session.
Although there was some debate among the mentors, if there is one thing we might remove from curriculum for a future session, it would probably be JSON. The reason it seemed less useful is the APIs that most learners plan to use (e.g., Twitter and Wikipedia) already have Python interfaces in the form of modules. In this sense, spending 30 minutes of a lecture to learn how to parse JSON objects seems like a poor use of time.
On the other hand, time spent looking at JSON objects provides practicing think about more complex data structures (e.g., nested lists and dictionaries) which is something that is necessary and that students will otherwise not be prepared for. We were undecided as a group.
=== Afternoon sessions ===
In our session, more than 60% of students were interested in learning Twitter and that track was heavily attended.
In Twitter, discoverability of the structure of [http://www.tweepy.org/ Tweepy] objects was a challenge. Users would create an object but you it was not easy to introspect those objects and see what is there in the way we had discussed with JSON objects. This came a surprise to us and required some real-time consultation with the [http://tweepy.readthedocs.org/en/v2.3.0/ Tweepy module documentation].
The Wikipedia session ended up spending very little time working with the example code we had prepared. Instead, we worked directly from examples in the morning and wrote code almost entire from scratch while looking directly at the output from the API.
Our session focused on building a version of the [http://kevan.org/catfishing.php game Catfishing]. Essentially, we set out to write a program that would get a list of categories for a set of articles, randomly select one of those articlse, and then show categories associated with that article back to the user to have them "guess" the article. We modified the program to not include obvious giveaways (e.g., to remove categories that include the answer itself as a substring).
Both sessions worked well and received positive feedback.
In future session, we might like to focus on other APIs including, perhaps, APIs that do not include modules. This would provide a stronger non-pedagogical reason to focus on reading and learning JSON. Working with simple APIs might have been a good example of something we could do as a small group exercise between parts of the lecture.
== Session 3: Data Analysis and Visualization ==
The goal of this session was to get users to the point where they could take data from a web API and ask and answer basic data science questions by using Python to manipulating data and by creating simple visualizations.
Our philosophy in Session 3 was to teach users to get data into tools they already know and use. We thought this would be a better use of their time and help make users independent earlier.
=== Lecture ===
* What proportion of edits to
* What proportion of edits to
* What are the most edited
* Who are the most active editors on
Becuse it did not require installation of software and because it ran on every platform, we did sorting and visualization in [http://docs.google.com Google Docs].
=== Projects ===
In the afternoon projects, one group continued with work on the ''Harry Potter'' dataset from English Wikipedia. In this case,
A second project focused on [http://matplotlib.org/ Matplotlib] and generated heatmaps of contributions to articles about men and women in Wikipedia based on time in Wikipedia's lifetime and time of the subjects lifetime. The heatmaps were popular with participants and were something that could not be easily done with spreadsheets.
[[File:Matplotlib-hist2d.png|400px]]
The challenge with
== General Feedback ==
One suggestion to try to address this is to add an additional
* The spacing between sessions too large. In part, this was due to the fact that we were creating curriculum as we went. Next time, we will try to do the sessions every other week (e.g., 4 sessions in 5 weeks).
* The breaks for lunch were a bit too long. We took 1 hour-long breaks but 45 minutes would have been enough. Learners were interested in getting back to work!
* The general structure of the entire curriculum was not as clear as it might have been which led to some confusion. This was, at least in part, because the details of what we would teach in the later sessions were not decided when we began. In the future, we should present the entire session plan clearly up front.
* We did not have enough mentors with experience using Python in Windows. We had many skilled GNU/Linux users and ''zero'' students running GNU/Linux. Most of the mentors used Mac OSX and most of the learners ran Windows.
* Although we did not use it as a recruitment or selection criteria, a majority of the participants in the session were women. Although we had a mix of men and women mentors, the fact that most of our mentors were male and most of our learners were female was something we would have liked to avoid. If we expect to have a similar ratio in the future, we should try to recruit female mentors and, in particular, to attract women to lead the afternoon sessions (all of the afternoon session lead mentors were male).
* The SWC-style sticky notes worked extremely well but were used less, and seemed to have less value, as we progressed.
In the future We might also want to spend time devoting more time explicitly to teaching:
* Debugging code
* Finding and reading documentation
* Troubleshooting and looking at StackExchange for answers to programming questions
=== Budget ===
For lunch we spent between $400 (pizza), $360 (a few less
Most mentors could not make the
All of our food was generously supported by the [http://escience.washington.edu/ eScience Institute at UW]. The rooms were free because they were provided by [http://www.com.washington.edu UW Department of Communication]
If you had a total budget would be in the order of $2000-2500, I think you could easily do a similar 3.5 day-long set of workshops. If we had a little more, we could do better than pizza for lunch.
<!-- LocalWords: CDSW BPW JSON
|