Community Data Science Workshops (Fall 2014)/Reflections: Difference between revisions

m
moved to wiki.communitydata.cc
imported>Mako
imported>Jtmorgan
m (moved to wiki.communitydata.cc)
 
(19 intermediate revisions by 2 users not shown)
Line 1:
{{CDSW Moved}}
:''If you're interested in putting on your own CDSW, you should also see our [[Community Data Science Workshops (Spring 2014)/Reflections|reflections from Spring 2014]].''
 
Line 18 ⟶ 19:
* '''Session 3 (Saturday November 22nd)''': [[Community Data Science Workshops (Fall 2014)#Session 3|Data analysis and visualization]]
 
Our organization and the curriculum for Sessions 0 and 1 were originally borrowed from the [http://bostonpythonworkshop.com/ Boston Python Workshop] (BPW) although the particularour curriculum has diverged quite a bit as we've improved it and tailored it to the specific learning goals in our sessions.
 
Session 0 was a three hour evening session to install software. All three of the other sessions were all day-long session (10am to 4pm) sessions broken up into the following schedule:
Line 39 ⟶ 40:
== Participants ==
 
We had 30 mentors who attended at least one of the sessions and at least 20 mentors at each sessions. Many of our mentors were UW students in more technical departments like [https://www.cs.washington.edu/ Computer Science and Engineering] and [https://www.hcde.washington.edu Human Centered Design and& Engineering]. MostPerhaps half of them worked outside of the wereuniversity fullas timesoftware programmersdevelopers.
 
We had about 150 participants apply to attend the sessions. We selected on programming skill (to ensure that all attendees were complete beginners), enthusiasm, and randomly to maintain a learner to mentor ratio of between 4 and 5. We admitted 80 participants. 58 listed a UW affiliations. Affiliations listed by at least three people include the following:
 
{| class=wikitable
Retention between session and 0 and 1 was nearly 100%. Retention between sessions 1 and 2 and sessions 2 and 3 was roughly 75% and leaving us with perhaps 55-60% retention at the end of session 3.
! Department !! Participants
|-
| HCDE || 16
|-
| iSchool || 10
|-
| Communication || 8
|-
| Anthropology || 3
|-
| Alumni || 4
|-
| Undergrad || 3
|-
|}
 
We had two people each who listed their affiliations as Bio- and Health Informatics, the Foster School of Management, Microsoft, and Wikipedia.
Anecdotally, there is a sense that those who are dropping are those who had more trouble but didn’t struggle visibly.
 
AlthoughWe ouralso participanthad poolpeople infrom [[CDSWPsychology, (Springthe 2014)]]City wasof overwhelming femaleSeattle, therethe wasLow closeIncome toHousing genderProject, balanceSeattle inMeshnet, bothBiochemical studentsEngineering, andBio mentors.Physical, RoughlyChemical 2/3Engineering, ofGame menteesStudies, wereLinguistic, fromCollege UWof andthe thisEnvironment, includedOceanography, studentsthe fromSchool randomand placesPublic includingHealth, someoneUW whoBothell, worksCentral forWashington theUniversity, cityand ofmany Seattle.people Manywho randomdid Wikipediansnot werespecify an thereaffiliation. We continue to think that it's coolimportant that people who are not doing research but who are are part of online communities were in the mix with theUW-type researchers. Bringing together researchers and participants in online communities is an important goal and would like to work toward more balance in this regard and to increase the amount of non-UW participation.
 
Retention between session and 0 and 1 was nearly 100%. Retention between sessions 1 and 2 and sessions 2 and 3 was roughly 75% leaving us with perhaps 55-60% retention between session 0 and session 3.
 
Anecdotally, there is a sense that those who are dropping were those who had trouble but who didn’t struggle visibly.
 
Although our participant pool in [[CDSW (Spring 2014)]] was overwhelming female (80-90%), there was close to gender balance in both students and mentors this time around.
 
Once again, quite a large number of people applied were already skilled programmers. We're still not exactly sure why these people are applying because we think that the fact that the workshops are for absolute beginners is very clear. Perhaps people just want more exposure to data science?
 
Once again, the constraint on scaling the workshop iswas the number of mentors. Every mentor we added means that the workshop can accommodate four more menteesparticipants.
 
One suggestion was allowing menteesparticipants with have some programming skills — especially for the second and third workshops (given predictable rates of retention). There was not consensus among the organizers and mentors on this approach and preferred getting more newbies and invest more in them?
 
== Morning Lectures ==
 
[[User:Mako|Benjamin Mako Hill]] gave lectures in Session 1 and 3. Frances Hocutt gave the lecture in Session 2 and, generally,we felt that this was seen aswas an important successstep. An important future goal is getting other people to more of thegive lectures. Tommy is an obvious choice to do one next time. Different faces, perspective, and backgrounds are useful to communicate the breadth of interest here. [[User:Mako|Mako]] does not want to be the only one giving these lectures.
 
Our biggest challenge with growing the workshops was with physical space for the lectures. Basically, rooms thathat can hold more than 100 people at UW are almost exclusively lectures halls that make it almost impossible for mentors to physically reach students in order to help them debug and solve problems.
 
We reserved a lecture hall that fit 200 people and filled it with 100 students in alternating rows to make it at least possible to reach each person. ProjectsThis areworked donereasonably inwell breakoutalthough sessionsit whichwas canstill be splitsuboptimal.
 
People continue to want a record of lectures. At the very minimum, we should make sure that we turn on console logging so that we can post this after the lectures. We intended to record lectures but, once again, this got lost in all the crazy preparation for the events.
 
== Afternoon Sessions ==
 
Projects are done in breakout sessions in a series of three rooms. The general problem was that insisted on teacher per topic and topics were very unequal in their popularity. Next time, we will likely prepare to have multiple teacher for multiple rooms on topics we know will be more popular.
'''Mining research interests/goals.'''
Could we help match up people with similar interests?
 
next time maybe mine the registration for a list of research questions
 
 
'''How can we support self-directed projects?'''
 
Can we give mentees more guidance to support their project interests?
It’s easier to do that if people are pre-clustered.
 
Bring up people’s ideas at the end.
 
The size of the breakout workshops varied and that means different degrees of engagement were feasible.
 
The BIG feedback from the first series of workshops: Bring people back together more often. Bringing people together in the end was effective this time. We need a go between for each session to remind people to reconvene. An emcee.
 
- post examples of code used in teh lectures
 
showcase what students ahve accomplished and places people can change things and do things differently
 
e.g., the fergeson thing with the exmaple from ha=rry party
 
 
- public healtha nd epi data session
 
We would love to create a session on '''basic statistical analysis in Python''' and at least ten mentees would have been enthusiastic to take it.
 
 
show and tell at the end was very effectively
 
we need a designated mc who can go =between rooms
 
ideomatic ptyhon
 
talk to chris to try to fix those things
 
Several changes we hope to make include:
 
* As we refine this process, we were also interested in thinking of trying to select or refine breakout sessions so that they are more closely tailored to individuals and their interests. Next time, we will consider mining the registration for a list of research questions we might use.
* We want to emphasize bringing people back together more often. In particular, we found that bringing people together back together share work several time during each session and then once in the end to show of achievements or interesting results was effective. We also need to designate a person to a person to go between for each session to remind people to reconvene and to create a program of important or inspiring achievements for presenting to the group at the very end.
* There seemed to be broad interest in examples or projects that are focused on public health and/or epidemiological data.
* We would love to create an afternoon project for Session 3 on basic statistical analysis in Python using scipy, statsmodels, and pandas. At least ten participants would have been enthusiastic to take it.
 
== Session 0: Python Setup ==
 
The goal of this session was to get users setup with Python and starting to learn some of thePython basics. We changed the curriculum originally used by BPW enormously to use Continuum's Anaconda instead of Python directly from [http://python.org python.org]. The result was staggering. Not a ''single person'' reported "many problems with set-up" (i.e., respondantsrespondents reported either "no problems" or a "few problems.")
 
Anaconda was key to smoothyness compared to the first workshop series and addressed most of our setup and path issues. That said, we had several major concerns:
 
That said, we had several major concerns:
* Anaconda is not free software or open source
* Anaconda does not support Python 3 which we'd like to move to
* One studdent had a home directory in Chinese which caused the Anaconda installation to fail at a very late stage. This was eventually fixed by a mentor who changed the path.
 
* Anaconda is not free software/open source.
Additionally, we moved the Windows curriculum from away from <code>cmd</code> to using Powershell. This was a huge benefit because it meant that <code>ls</code> works and the rest of the curriculum can converge. The only concerns were:
* Anaconda does not support Python 3 which we'd like to move to.
* Anaconda seems to have at least some remaining i10n bugs. For example, one student had a home directory set to a Chinese string which caused the Anaconda installation to fail at a late stage. This was eventually fixed by a mentor who changed the path by hand.
 
*Additionally, we moved the Windows curriculum from away from <code>cmd</code> to using Powershell. This was an huge and unqualified improvement because it meant that <code>ls</code> works and the rest of the curriculum could converge. The only concerns were that Powershell is not installed on Windows XP although ''not a single student had Windows XP''.
 
Changes for next time include:
 
* Because it was less successfulnecessary, we canwill deemphasize recruiting mentors to the Friday night session. Many folks were standing around.
* Because Powershell was successful, we're going to try to create a single consolidated set of installation instructions for Windows, Mac OSX, and GNU/Linux!
* We will make it more clear to mentors whether participants should self-report they’d completed the steps or whether the mentor should verify that the steps were all taken (the latter). In future, we will email mentors ahead of time to let them know.
* In a related issue, not everybody loves the checkout step. Maybe there's a way we can make it more fun?
* We need to do a better job of modelling stticky notes during lectures early on.
* We need to do a better job of modeling sticky notes so folks use them more effectively.
* The sticky notes we bought were small and ambiguous color. We should get bright red sticky notes next time.
* The sticky notes we bought were small and ambiguous color. We should get large red sticky notes next time.
* Set up/arrange/select the space to facilitate better circulation of mentors.
When* We should set up/arrange/select space to facilitate better circulation of mentors. Generally, we found that when mentors can circulate easily things are better for menteesparticipants.
* We are going to try writing additional installation instructions that do not rely on Anaconda so people have a fully open source option.
* Once again, not a single person outside of mentorsthe mentor group ran GNU/Linux. We should strongly consider how much effort we want to put into maintaining this part of the curriculum which, to date, has never been used.
* We shouldwant moveto seriously investigate the possibility of moving to Python 3 to try to address lingering unicodeUnicode issues. We should try to do this for the next session.
* Not everybody loves the checkout step. Maybe there's a way we can make it more fun?
 
We also had [[Community Data Science Workshops (Fall 2014)/Reflections#Mentorship|a bunch of general feedback on how we could improvement mentorship]] that is particularly relevant to this session.
 
== Session 1: Introduction to Python ==
 
The goal of this session was to teach the basic of programming in Python. The basic curriculum was originally built off the [[Boston Python Workshop]] curriculum which has been used many times and is well tested. Unsurprisingly, it worked well for us as well.
We also had [[Community Data Science Workshops (Fall 2014)/Reflections#Mentorship|a bunch of general feedback on how we could improvement mentorship]] that is particular relevant to the earlier session
 
== Session 1: Introduction to Python ==
 
TheThat goal of this session was to teach the basic of programming in Python. The basic curriculum was originally built off the [[Boston Python Workshop]] curriculum has been used many times and is well tested. Unsurprisinglysaid, it worked well for us as well. Wewe made several major changes this time around. The biggest is that we retained only the [[Wordplay]] project. andWe wealso installedcreated createaa new project, [[Baby Names]], that uses Social Security Administration data on the frequency of Baby Names.
 
=== Afternoon sessions ===
 
We felt that that the new [[Baby Names]] project was excellent and feedback was overwhelmingoverwhelmingly positive. Because it includes both dictionaries and lists of names and(in the form of <code>.keys()</code> numbersmethods), it can do everything that [[Wordplay]] can but it has a much stronger feel of data science to it and, generally, a higher ceiling. Wordplay felt relatively boring.
 
Suggestions based on feedback include:
 
* Do a better job of briningbringing folks back to gethertogether to walk through potential solutions to the questions posed in the project rooms.
* Consider simply having two smaller rooms doing [[Baby Names]] and perhaps havehaving one that emphasizes more numeric and math operations.
* Prepare questions before hand, list them all up front, and let folks choose what to work on.
 
Line 158 ⟶ 147:
=== Morning lecture ===
 
The [[Community Data Science Workshops (Fall 2014)/Day 2 lecture|morning lecture]] was given by Frances Hocutt and it was was well received — if delivered too slowly for a significant minority of attendees. Unsurprisingly, the example of [http://placekitten.com/ PlaceKitten] as an API was an enormous hit: informative ''and'' cute.
 
Frances used excellent slides which are shared [[Community Data Science Workshops (Fall 2014)/Day 2 lecture|on the wiki page]] and which we will reuse. About half found Frances’sthe lecture either too fast or too slow and about half found the lecture to be just right.
 
Since many people felt the lecture was on the slower side, we want to use this time to introduce function definitiondefinitions. upWe front.will Then,also functionsdevote cana bebit reinforcedless intime to review which, because of the one week 2spacing between sessions, feels less important than it did last workshopstime.
 
=== Afternoon sessions ===
 
There were three parallel afternoon sessions on '''Twitter''', '''Wikipedia API''' and '''SQL'''. WeAll three were successful and we plan to do some version of all three sessions next round:
 
'''Twitter''':
 
* Once again, the session had too many people for the room and we should consider splitting it if we have mentors who are comfortable splittingteaching it and we should try to arrange this ahead of time.
* Next time, weWe should be careful to make sure that the advance notice asks everybody to download the project zip file ahead of time. If we're going to do this in class instead, we should set up a short URL of some sort to help streamline the process without headingforcing everybody to head to the wiki for things.
* A bunch of people found the Twitter session too fast. so we should try to slow this down.
* TweePy continues to be both poorly documented and opaque. The opaqueness of TweePy was a problem and we may want to create an interface to TweePy that just gives users raw JSON.
* TweePy is not well documented.
 
 
the opaqueness of tweepy was a problem.. option to creat ea version of tweppty that just gives you json
 
or miku or michael for details onhow to do that
 
dharma might be able to do this.
 
'''Wikipedia''' workshop:
 
* In terms of delivery, there was mixed feedback including some excellent feedback and some who felt that it was too detailed and slow. This mirrored some of our feedback from last time. One approach would be to make the Wikipedia room be a designated "slower" room.
* The teacher explained things very clearly. That was frustrating for those who didn’t need it, but super great for people that wanted/needed a lot of explanation.
* GraduatedWe challengesshould inconsider agraduated workhshopchallenges that go from less challenging to more and more challenging helpswhich might help with the fact there is a range of learning levels.
 
'''SQL workshop''':
 
Jonathan ran a session on using SQL. Although this was a diversion from the strong Python focus, it was well attended and appreciated by students trying to build up this skill.
* Generally was very successfuly Seemed to work really well and did a good job of giving people an overview of a data science and a way to hook themselves in to it.
* Next session, also do a workshop that closes the loop between SQL and Python.
* Can we host an open SQL database somewhere?
 
* Generally the session was was very successful and seemed to do a good job of giving people an overview of a data science and a way to hook themselves in to it.
- maybe split this into two session next time
* Next session, if we do this again, we should consider integrating Python more closely into this. We may either close the loop in this session or perhaps split into two sessions: (1) introduction to SQL; and (2) using Python to bring data back into Python (e.g., in Pandas).
* We should consider hosting an open SQL database somewhere.
 
== Session 3: Data Analysis and Visualization ==
- merge in some more python this time
 
The goal of the lecture was to walk people through the actual mess of writing code from scratch and focused on a single example of code that builds a dataset from Wikipedia.
#1 intro into sql
 
In general, goals were clearer this time and the use of Anaconda meant that we could use <code>requests</code> which cleaned up several problems last time and led to more clear code.
#2 using pythong o tgra data and bring python and pandas
 
 
== Session 3: Data Analysis and Visualization ==
 
One challenge, pointed out in a question at the end of the final lecture, is that we don't actually do very much actual data analysis during the lecture. Next time, we should make this much more clear up front. The reality is that we were doing analysis from the very first day and that where analysis starts and where data cleaning and munging ends can be fluid, fuzzy, and subjective. We should foreground this in the beginning of the lecture or even at the beginning of the workshops.
The goal of the lecture was to walk people through the actual mess of making a code.
 
=== Afternoon sessions ===
 
We ran two sessions this time.
'''Afternoon of Session 3:'''
 
An '''Theanalysis with spreadsheets session.''' Peoplesimilar to what we taught last time. This was improved and more effective. By the end, many participants were modifying the code to build their own datasetdatasets and diddoing their own visualizations. AtOne leaststudent built a fewtime peopleseries of edits to articles about death by police and another to articles about the NFL. ThatIn wasboth cool!cases, real patterns driven by current events became clearly visible.
 
'''TheWe MatPlotLibalso ran a session on '''matplotlib''' which was taught by two mentors we brought in specifically to teach it but who had limited experience with the CDSW. MostSome people in the session were deeply lost. TheBecause the mentors who taught it were not at any of the other sessions, andthey therefore didn’t go in with a good sense of where the menteesparticipants were at. Several people left and went toIn other room. Inthe future, ensure mentor success by havingwe themshould loop in teachers better to where the menteesparticipants are at. Consider nextFor timeexample, encouragingwe newmight encourage new mentors do a practice session with some friendly folks before they let loose. Also, next session, consider using SeaBorn instead of MatPlotLib.
 
matplot lib
 
- maybe replace it with seaborn?
- tommy will teach it
 
Also, next session, we are going to consider using [https://pypi.python.org/pypi/seaborn/0.1 SeaBorn] instead of matplotlib which Tommy seemed excited about.
 
== General Feedback ==
 
* Generally, there was a sense that we should stop creating pages in the wikwiki by copying and pasting old stuff. This was the BPW model but it's leading to madness. We when archive thean old version of a site, we can use MediaWiki to create links to the old version of the pages (we can intstallinstall templates from English Wikipedia) to help make this easier).
* We should try to schedule the workshop not asquite so close to the end of the quarter. The beginning or middle of the quarter should be better for UW students.
* Mentors should post the code generated in the break-outsout sessions. Encourage them to capture the code created in examples and to post these afterward systematically.
* There was general interest in pair programming or more team based excercisesexercised. We should consider changes along this line.
 
* There was a need for several on-the-fly corrections of the instructions and files on the wiki during the workshop. Better planning and testing for this will be very useful.
 
=== Mentorship ===
 
Last time through, most of our observation were focused on improving the experience of attendees and we think we didn't spend as much time on helping mentors have a great experience and helping them prepare effectively. We had many new mentors this round. One general concern was the relative lack of mentor training, especially before the first sessions. We had a series of pieces of feedback on how to improve this.
 
We had many new mentors this round. One general concern was the relative lack of mentor training, especially before the first sessions. We felt that we should:
 
 
* Arrange a pre-CDSW mentors meeting (perhaps a day or two before to over material) and maybe at a bar or other social environment with beer and pizza. We coudlcould use this tnormsto set norms, best practices, goals, planning, etc.
* Perhaps meet 15-20 minutes early before Session 0 to get to know each other and over things.
* Create some easier way to distinguish mentors from students (e.g., t-shirts, buttons, paper them head to foot in sticky notes).
* Send out detailsdetailed instructions and emails to mentors, or create pages in this wiki, withthat detail ongood howmentoring. toFor do this better.example:
** Talk to mentors aboutHow much should you help? (eSome.g., some butBut be careful not to just give away the answer, to focus too much on elegance or technical correctness. and beBe careful not to overwhelm the learners).
** Explicitly encourage mentors to reach out to students and ask them how things are going by walking around to every single person to ask, “How are you doing? What are you working on? Show me what you’re doing.”
 
=== More Projects or Better Projects ===
 
WeOnce again, we had certain afternoon project sessions that were much more effective than others. One thing we were conflitedconflicted about was whether we wanted more break-out sessions or whether we should just use the best of the break-out sessions (perhaps in two rooms).
 
Arguments for smaller groups of the best break-out session include:
 
* Focus on a known good thing.
* PrecannedPre-canned sessions make it easier for new mentors to feel confident and be successful.
 
Arguments against include:
Line 256 ⟶ 227:
* Diversity of projects inspires people to do the kinds of things that people can do with this new knowledge. 

 
We should pursue other ways to encourage creativity with code. For
Otjher ways encourage generative-ness? might include giving mentees creative/flexible moments within sessions and lectures might be empowering. Perhaps, calling out mentees who are doing creative things?
example, we might give participants creative/flexible moments within sessions and lectures might be empowering in similar ways. We can also continue to call out participants who are doing creative things.
 
== Budget ==
 
We spent a total of $3280 on the CDSW. We spent approximately $280 on coffee. About $350 of this funded food and refreshments during post-session meetings among the mentors. About $280 was spent on coffee,
 
The rest (the large majority) was spent on food. Because were better able to model retention this time around, we did a much better job of ordering the "right" amount of food. We ordered:
 
* Session 1: Pizza from Jet City Pizza
* Session 2: Indian (four entrees) from Jewel of India
* Session 3: Greek food (e.g., salad, hummus, spinach pies, souvlaki) from Costas
 
Because [[Mako]] did the ordering, everybody ate vegetarian. At least one person complained about the lack of meat in Session 2 (but seemed to be confused into thinking it was present in Session 1).
 
<!--
'''The ethnographers get the last word:'''
Some observations about the culture of mentoring from a first time mentor: There are some distinct values that came through strongly. There is a clear vision of empowerment through programming. The degree of inclusivity is impressive. The culture of feedback, iteration, and reflection was really surprising such as the amount of effort that goes into improving the materials and the teaching. As is the way that other organizations are able to (and are) using the materials. The way that this is building the community. For example, how menteesparticipants are organizing their own meet-ups (though that could be encouraged even more).
 
The pragmatism of what is taught demonstrates a clear value. It would be helpful to make sure that all mentors are clear that part of what is expected of them they give pragmatic coaching. That is they should lead menteesparticipants to something that works rather than telling them what an expert would do.
 
-->
 
<!-- LocalWords: CDSW th nd BPW Unretained wikitable HCDE iSchool
-->
<!-- LocalWords: Informatics Meshnet Anecdotally suboptimal scipy
-->
<!-- LocalWords: statsmodels cmd Powershell XP deemphasize OSX JSON
-->
<!-- LocalWords: Mentorship mentorship PlaceKitten TweePy SeaBorn
-->
<!-- LocalWords: matplotlib
-->
Anonymous user