Community Data Science Workshops (Fall 2014)/Reflections: Difference between revisions

m
moved to wiki.communitydata.cc
imported>Mako
imported>Jtmorgan
m (moved to wiki.communitydata.cc)
 
(5 intermediate revisions by 2 users not shown)
Line 1:
{{CDSW Moved}}
:''If you're interested in putting on your own CDSW, you should also see our [[Community Data Science Workshops (Spring 2014)/Reflections|reflections from Spring 2014]].''
 
Line 41 ⟶ 42:
We had 30 mentors who attended at least one of the sessions and at least 20 mentors at each sessions. Many of our mentors were UW students in more technical departments like [https://www.cs.washington.edu/ Computer Science and Engineering] and [https://www.hcde.washington.edu Human Centered Design & Engineering]. Perhaps half of them worked outside of the university as software developers.
 
We had about 150 participants apply to attend the sessions. We selected on programming skill (to ensure that all attendees were complete beginners), enthusiasm, and randomly to maintain a learner to mentor ratio of between 4 and 5. We admitted 80 participants. 58 listed a UW affliationsaffiliations. Affiliations listed by at least three people include the following:
 
{| class=wikitable
Line 62 ⟶ 63:
We had two people each who listed their affiliations as Bio- and Health Informatics, the Foster School of Management, Microsoft, and Wikipedia.
 
We also had people from PyschologyPsychology, the City of Seattle, the Low Income Housing Project, Seattle Meshnet, Biochemical Engineering, Bio Physical, Chemical Engineering, Game Studies, Linguistic, College of the Environment, Oceanography, the School and Public Health, UW Bothell, Central Washington University, and many people who did not specify an affiliation. We continue to think that it's important that people who are not doing research but who are are part of online communities were in the mix with UW-type researchers. Bringing together researchers and participants in online communities is an important goal and would like to work toward more balance in this regard and to increase the amount of non-UW participation.
 
Retention between session and 0 and 1 was nearly 100%. Retention between sessions 1 and 2 and sessions 2 and 3 was roughly 75% leaving us with perhaps 55-60% retention between session 0 and session 3.
Line 80 ⟶ 81:
[[User:Mako|Benjamin Mako Hill]] gave lectures in Session 1 and 3. Frances Hocutt gave the lecture in Session 2 and we felt that this was was an important step. An important future goal is getting other people to give lectures. Tommy is an obvious choice to do one next time. Different faces, perspective, and backgrounds are useful to communicate the breadth of interest here. [[User:Mako|Mako]] does not want to be the only one giving these lectures.
 
Our biggest challenge with growing the workshops was with physical space for the lectures. Basically, rooms thathat can hold more than 100 people at UW are almost exclusively lectures halls that make it almost impossible for mentors to physically reach students in order to help them debug and solve problems.
 
We reserved a lecture hall that fit 200 people and filled it with 100 students in alternating rows to make it at least possible to reach each person. This worked reasonably well althoughtalthough it was still suboptimalssuboptimal.
 
People continue to want a record of lectures. At the very minimum, we should make sure that we turn on console logging so that we can post this after the lectures. We intended to record lectures but, once again, this got lost in all the crazy preparation for the events.
Line 99 ⟶ 100:
== Session 0: Python Setup ==
 
The goal of this session was to get users setup with Python and starting to learn some Python basics. We changed the curriculum originally used by BPW enormously to use Continuum's Anaconda instead of Python directly from [http://python.org python.org]. The result was staggering. Not a ''single person'' reported "many problems with set-up" (i.e., respondantsrespondents reported either "no problems" or a "few problems.")
 
That said, we had several major concerns:
Line 105 ⟶ 106:
* Anaconda is not free software/open source.
* Anaconda does not support Python 3 which we'd like to move to.
* Anaconda seems to have at least some remaining i10n bugs. For example, one studdentstudent had a home directory set to a Chinese string which caused the Anaconda installation to fail at a late stage. This was eventually fixed by a mentor who changed the path by hand.
 
Additionally, we moved the Windows curriculum from away from <code>cmd</code> to using Powershell. This was an huge and unqualified improvement because it meant that <code>ls</code> works and the rest of the curriculum could converge. The only concerns were that Powershell is not installed on Windows XP although ''not a single student had Windows XP''.
Line 115 ⟶ 116:
* We will make it more clear to mentors whether participants should self-report they’d completed the steps or whether the mentor should verify that the steps were all taken (the latter). In future, we will email mentors ahead of time to let them know.
* In a related issue, not everybody loves the checkout step. Maybe there's a way we can make it more fun?
* We need to do a better job of modellingmodeling sticky notes so folks use them more effectively.
* The sticky notes we bought were small and ambiguous color. We should get large red sticky notes next time.
* We should set up/arrange/select space to facilitate better circulation of mentors. Generally, we found that when mentors can circulate easily things are better for participants.
* We are going to try writing additional installation instructions that do not rely on Anaconda so people have a fully open source option.
* Once again, not a single person outside of the mentor group ran GNU/Linux. We should strongly consider how much effort we want to put into maintaining this part of the curriculum which, to date, has never been used.
* We want to seriously investigate the possibility of moving to Python 3 to try to address lingering unicodeUnicode issues.
 
We also had [[Community Data Science Workshops (Fall 2014)/Reflections#Mentorship|a bunch of general feedback on how we could improvement mentorship]] that is particularly relevant to this session.
Line 136 ⟶ 137:
Suggestions based on feedback include:
 
* Do a better job of briningbringing folks back to gethertogether to walk through potential solutions to the questions posed in the project rooms.
* Consider simply having two smaller rooms doing [[Baby Names]] and perhaps having one that emphasizes more numeric and math operations.
* Prepare questions before hand, list them all up front, and let folks choose what to work on.
Line 146 ⟶ 147:
=== Morning lecture ===
 
The [[Community Data Science Workshops (Fall 2014)/Day 2 lecture|morning lecture]] was given by Frances Hocutt and it was well received. Unsurprisingly, the example of [http://placekitten.com/ PlaceKitten] as an API was an enormous hit: informative ''and'' cute.
 
Frances used excellent slides which are shared [[Community Data Science Workshops (Fall 2014)/Day 2 lecture|on the wiki page]] and which we will reuse. About half found Frances’sthe lecture either too fast or too slow and about half found the lecture to be just right.
 
Since many people felt the lecture was on the slower side, we want to use this time to introduce function definitions. We will also devote a bit less time to review which, because of the one week spacing between sessions, feels less important than it did last time.
Line 154 ⟶ 155:
=== Afternoon sessions ===
 
There were three parallel afternoon sessions on '''Twitter''', '''Wikipedia API''' and '''SQL'''. All three werewwere successful and we plan to do some version of all three sessions next round:
 
'''Twitter''':
Line 161 ⟶ 162:
* We should be careful to make sure that the advance notice asks everybody to download the project zip file ahead of time. If we're going to do this in class instead, we should set up a short URL to help streamline the process without forcing everybody to head to the wiki for things.
* A bunch of people found the Twitter session too fast so we should try to slow this down.
* TweePy continues to be both poorly documented and opaque. TtheThe opaqueness of tweepyTweePy was a problem and we may want to create an interface to TweepyTweePy that just gives users raw JSON.
 
'''Wikipedia''' workshop:
Line 170 ⟶ 171:
'''SQL workshop''':
 
Jonathan ran a session on using SQL. Although this was a diversion from the strong Python focus, it was well attended and apprecaitedappreciated by students tryinttrying to build up this skill.
 
* Generally thjethe session was was very successful and seemed to do a good job of giving people an overview of a data science and a way to hook themselves in to it.
* Next session, if we do this again, we should consider integrating Python more closely into this. We may either close the loop in this session or perhaps split into two sessions: (1) introduction to SQL; and (2) using Python to bring data back into Python (e.g., in Pandas).
* We should consider hosting an open SQL database somewhere.
Line 188 ⟶ 189:
We ran two sessions this time.
 
An '''analysis with spreadsheets session''' similar to what we taught last time. This was improved and more effective. By the end, many participants were modifying the code to build their own datasets and doing their own visualizations. One student built a time series of edits to articles about death by police and another to articles about htethe NFL. In both cases, real patterns driven by current events became clearly visible.
 
We also ran a session on '''MatPlotLibmatplotlib''' which was taught by two mentors we broughbrought in specifically to teach it but who had limited experience with the CDSW. Some people in the session were lost. Because the mentors who taught it were not at the other sessions, they therefore didn’t go in with a good sense of where the participants were at. In the future, we should loop in teachers better to where the participants are at. For example, we might encourage new mentors do a practice session with some friendly folks before they let loose.
 
Also, next session, we are going to consider using [https://pypi.python.org/pypi/seaborn/0.1 SeaBorn] instead of MatPlotLibmatplotlib which Tommy seemed excited about.
 
== General Feedback ==
 
* Generally, there was a sense that we should stop creating pages in the wiki by copying and pasting old stuff. This was the BPW model but it's leading to madness. We when archive an old version of a site, we can use MediaWiki to create links to the old version of the pages (we can intstallinstall templates from English Wikipedia to help make this easier).
* We should try to schedule the workshop not quite so close to the end of the quarter. The beginning or middle of the quarter should be better for UW students.
* Mentors should post the code generated in the break-out sessions. Encourage them to capture the code created in examples and to post these afterward systematically.
* There was general interest in pair programming or more team based excercisesexercised. We should consider changes along this line.
* There was a need for several on-the-fly corrections of the instructions and files on the wiki during the workshop. Better planning and testing for this will be very useful.
 
Line 215 ⟶ 216:
=== More Projects or Better Projects ===
 
Once again, we had certain afternoon project sessions that were much more effective than others. One thing we were conflitedconflicted about was whether we wanted more break-out sessions or whether we should just use the best of the break-out sessions (perhaps in two rooms).
 
Arguments for smaller groups of the best break-out session include:
 
* Focus on a known good thing.
* PrecannedPre-canned sessions make it easier for new mentors to feel confident and be successful.
 
Arguments against include:
Line 226 ⟶ 227:
* Diversity of projects inspires people to do the kinds of things that people can do with this new knowledge. 

 
We should pursue other ways to encourage creativity with code. For example, we might give participants creative/flexible moments within sessions and lectures might be empowering in simlar ways. We can also continue to call out participants who are doing creative things.
example, we might give participants creative/flexible moments within sessions and lectures might be empowering in similar ways. We can also continue to call out participants who are doing creative things.
 
== Budget ==
 
We spent a total of $3280 on the CDSW. We spent approximately $280 on coffee. About $350 of this funded food and refreshments during post-session meetings among the mentors. About $280 was spent on coffee,
 
The rest (the large majority) was spent on food. Because were better able to model retention this time around, we did a much better job of ordering the "right" amount of food. We ordered:
 
* Session 1: Pizza from Jet City Pizza
* Session 2: Indian (four entrees) from Jewel of India
* Session 3: Greek food (e.g., salad, hummus, spinach pies, souvlaki) from Costas
 
Because [[Mako]] did the ordering, everybody ate vegetarian. At least one person complained about the lack of meat in Session 2 (but seemed to be confused into thinking it was present in Session 1).
 
<!--
Line 237 ⟶ 249:
 
-->
 
<!-- LocalWords: CDSW th nd BPW Unretained wikitable HCDE iSchool
-->
<!-- LocalWords: Informatics Meshnet Anecdotally suboptimal scipy
-->
<!-- LocalWords: statsmodels cmd Powershell XP deemphasize OSX JSON
-->
<!-- LocalWords: Mentorship mentorship PlaceKitten TweePy SeaBorn
-->
<!-- LocalWords: matplotlib
-->
Anonymous user