Community Data Science Workshops (Fall 2014)/Reflections: Difference between revisions

integrating feedback
imported>Mako
imported>Mako
(integrating feedback)
Line 45:
 
== Session 0: Python Setup ==
 
The goal of this session was to get users setup with Python and starting to learn some of the basics. We changed the curriculum enormously to use Continuum's Anaconda instead of Python directly from [http://python.org python.org]. The result was staggering. Not a ''single person'' reported "many problems with set-up" (i.e., respondants reported either "no problems" or a "few problems.")
 
Anaconda was key to smoothyness compared to the first workshop series and addressed most of our setup and path issues. That said, we had several major concerns:
 
* Anaconda is not free software or open source
* Anaconda does not support Python 3 which we'd like to move to
* One studdent had a home directory in Chinese which caused the Anaconda installation to fail at a very late stage. This was eventually fixed by a mentor who changed the path.
 
Additionally, we moved the Windows curriculum from away from <code>cmd</code> to using Powershell. This was a huge benefit because it meant that <code>ls</code> works and the rest of the curriculum can converge. The only concerns were:
 
* Powershell is not installed on Windows XP although ''not a single student had Windows XP''
 
Changes for next time include:
 
* Because it was less successful, we can deemphasize recruiting mentors to the Friday night session.
* Because Powershell was successful, we're going to try to create a single consolidated set of installation instructions for Windows, Mac OSX, and Linux!
* We will make it clear to mentors whether participants should self-report they’d completed the steps or whether the mentor should verify that the steps were all taken. In future, email mentors ahead of time to let them know.
* We need to do a better job of modelling stticky notes during lectures early on.
* The sticky notes we bought were small and ambiguous color. We should get bright red sticky notes next time.
* Set up/arrange/select the space to facilitate better circulation of mentors.
When mentors can circulate easily things are better for mentees.
* We are going to try writing installation instructions that do not rely on Anaconda so people have a fully open source option.
* Once again, not a single person outside of mentors ran GNU/Linux. We should strongly consider how much effort we want to put into maintaining this part of the curriculum.
* We should move to Python 3 to try to address lingering unicode issues. We should try to do this for the next session.
 
We had many new mentors this round. One general concern was the relative lack of mentor training, especially before the first sessions. We felt that we should:
 
* Arrange a mentors meeting (perhaps a day or two before to over material) and maybe at a bar or other social environment.
* Send out details instructions and emails to mentors, or create pages in this wiki with detail on how to do this better.
* Perhaps meet 15-20 minutes early to get to know each other and over things
* Create some easier way to distinguish mentors from students (e.g., t-shirts, buttons, etc).
* Explicitly encourage mentors to reach out to students and ask them how things are going.
* Talk to mentors about much should you help? (e.g., some but be careful not to just give away the answer, to focus too much on elegance or technical correctness and be careful not to overwhelm the learners).
 
== Session 1: Introduction to Python ==
Line 50 ⟶ 84:
=== Morning lecture ===
=== Afternoon sessions ===
 
'''Baby names''' is good project because it feel data-science-y. Baby Names does everything that '''Word Play''' does but it has the stink of science about it. Next time, let’s have two small rooms doing the exact same thing. Wordplay is kind of boring.
 
 
== Session 2: Learning APIs ==
 
The goal of this session was to describe what web APIs were, how they worked (making HTTP requests and receiving data back), how to understand JSON Data, and how to use common web APIs from Wikipedia and Twitter.
 
=== Morning lecture ===
 
The morning lecture was given by Frances Hocutt and it was was well received — if delivered too slowly for a significant minority of attendees. Unsurprisingly, the example of [http://placekitten.com/ PlaceKitten] as an API was an enormous hit: informative ''and'' cute.
 
Frances used excellent slides which are shared on the wiki page and which we will reuse. Since many people felt the lecture was on the slower side, we want to use this time to introduce function definition up front. Then, functions can be reinforced in the week 2 workshops.
 
=== Afternoon sessions ===
 
There were three parallel afternoon sessions on '''Twitter''', '''Wikipedia API''' and '''SQL'''. We plan to do some version of all three sessions next round:
 
'''Twitter''':
 
* Once again, the session too many people and we should consider splitting it if we have mentors who are comfortable splitting it.
* Next time, we should be careful to make sure that the advance notice asks everybody to download the project zip file ahead of time. If we're going to do this in class, we should set up a short URL of some sort to help streamline the process without heading to the wiki things.
* A bunch of people found the Twitter session too fast.
* TweePy is not well documented.
 
'''Wikipedia''' workshop:
 
* The teacher explained things very clearly. That was frustrating for those who didn’t need it, but super great for people that wanted/needed a lot of explanation.
* Graduated challenges in a workhshop that go from less challenging to more and more challenging helps with the fact there is a range of learning levels.
 
'''SQL workshop''':
 
* Generally was very successfuly Seemed to work really well and did a good job of giving people an overview of a data science and a way to hook themselves in to it.
* Next session, also do a workshop that closes the loop between SQL and Python.
* Can we host an open SQL database somewhere?
 
== Session 3: Data Analysis and Visualization ==
 
'''Afternoon of Session 3:'''
=== Morning Lecture ===
=== Afternoon sessions ===
 
'''The spreadsheets session.''' People were modifying the code to build their own dataset and did their own visualizations. At least a few people. That was cool!
== General Feedback ==
== Budget ==
 
'''The MatPlotLib session'''. Most people in the session were deeply lost. The mentors who taught it were not at any of the other sessions and therefore didn’t go in with a good sense of where the mentees were at. Several people left and went to other room. In future, ensure mentor success by having them loop in better to where the mentees are at. Consider next time, encouraging new mentors do a practice session with some friendly folks before they let loose. Also, next session, consider using SeaBorn instead of MatPlotLib.
== Unprocessed ==
'''
Scheduling the next workshop''': Not too close to the end of the quarter.
 
'''Session 0''' (Friday November 7th Evening 6:30-9:30pm)
Went smoothly. No person reported “many problems with set-up.” All respondants reported either no problems or a few problems.
 
Anaconda was key to smoothyness compared to the first workshop series. However, Anaconda is not open source. It reduced issues, but was not 100% issue free. When one person’s home directory is in Chinese, Anaconda got confused. This was fixed by a mentor who changed the path.
 
=== Morning Lecture ===
At least one mentor was confused about whether mentees should self-report they’d completed the steps or whether the mentor should verify that the steps were all taken. In future, email mentors ahead of time to let them know.
 
The goal of the lecture was to walk people through the actual mess of making a code.
Improvements for session 0 for next time.
'''The process people use to flag for mentor help''': We didn’t model enough using sticky notes during lectures early on.
 
=== Afternoon sessions ===
'''Technology improvements''': Get less ambiguous sticky notes.
 
== General Feedback ==
'''Space improvements:'''
Set up/arrange/select the space to facilitate better circulation of mentors.
When mentors can circulate easily things are better for mentees.
 
* We should try to schedule the workshop not as close to the end of the quarter. The beginning or middle of the quarter should be better for UW students.
'''
Streamling the instructions for set-up for next time.'''
 
Q: How to reduce the number of steps and the number of operating system specific version?
 
This time CDSW moved from Powershell from CMD. Powershell doesn’t work well on PCs. People were instructed to “find the Windows mentor.” No one had XP. Next time, we might be able to move away from separate instructions for Linux / Mac / and Window.
 
Consider writing install instructions that do not rely on Anaconda so people have a fully open source option.
 
== Budget ==
In the first two workshop sequences, no mentees were running Linux. Possibly, in future a Linux workshop would be good. Presently, Linux help/instructions may be moot.
 
== Unprocessed ==
 
 
 
'''Maintanence errors on the wiki.''' There was a need for several on-the-fly corrections of the instructions and files on the wiki during the workshop.
Line 104 ⟶ 157:
Each project can be its own namespace as opposed to having event-specific pages.
 
Mentors should post the code generated in the break-outs. Encourage them to capture the code.
 
 
Line 132 ⟶ 185:
'''
A: Our job is not to extoll the virtues of open source. Our job is to help mentees solve their data problem. “We are teaching you how to do things with data that help you achieve your goal.” However, open source tools are desirable.
 
'''Q: Should we be teaching Python 3?'''
A: Yes, but when? It may solve some technical issues that are occurring now.
 
 
Line 210 ⟶ 260:
 
'''Q: How can we strengthen the relationship between the lectures and the break-outs?'''
 
'''Baby names''' is good project because it feel data-science-y. Baby Names does everything that '''Word Play''' does but it has the stink of science about it. Next time, let’s have two small rooms doing the exact same thing. Wordplay is kind of boring.
'''
Twitter''' had too many people in it. If you ask people do some steps in advance and not others mayhem ensues. Next time have them download all resources. A bitly URL that helps people find the download easier streamlined things.
 
A bunch of people found the Twitter session way too fast. TweePie is not well documented. Squeeze the JSON out of it before the mentees have to cope with it. Get the mentors on it before hand. Yay!
 
 
'''Wikipedia''' workshop. The mentor explained stuff very clearly. That was frustrating for those who didn’t need it, BUT super great for people that wanted/needed a lot of explanation.
 
Graduated challenges in a workhshop that go from less challenging to more and more challenging helps with the fact there is a range of mentee levels.
 
'''SQL workshop'''. Seemed to work really well. Did a good job of giving people an overview of a data science and a way to hook themselves in to it. Next session, also do a workshop that closes the loop between SQL and Python. Can we host an open SQL database somewhere?
 
 
'''Session 3:
AM lecture'''. The goal of the lecture was to walk people through the actual mess of making a code.
 
Maybe the week 2 lecture should introduce APIs and functions. People thought that week 2 lecture was slow, so adding functions would be good. Functions can be reinforced in the week 2 workshops. Lecture 2 is the earliest that makes sense to introduce functions and the latest. Introduce the idea that code is reusable.
 
'''Afternoon of Session 3:'''
 
'''The spreadsheets session.''' People were modifying the code to build their own dataset and did their own visualizations. At least a few people. That was cool!
 
'''The MatPlotLib session'''. Most people in the session were deeply lost. The mentors who taught it were not at any of the other sessions and therefore didn’t go in with a good sense of where the mentees were at. Several people left and went to other room. In future, ensure mentor success by having them loop in better to where the mentees are at. Consider next time, encouraging new mentors do a practice session with some friendly folks before they let loose. Also, next session, consider using SeaBorn instead of MatPlotLib.
 
 
Line 244 ⟶ 269:
== Mako's Raw Notes ==
 
* general
 
anaconda solved problems
 
- next time recruit less mentors for hte first session
 
sticky notes didn't work as well this time
 
- especially during the lectures
- we did this better last time
 
-> hjave better sticky notes would ahve been helpful
 
rooms:
Line 289 ⟶ 302:
 
get rid of pages that are event specific
 
* friday evening
 
better material/training and information for mentors on what to expect
 
mentors should meet 15-20 minutes early to get to know each oand go over things
 
- maybe t-shirts buttons, etc or something to distinguish mentors
 
- encourage peoplt o reach out
 
topics to cover:
 
how much should you help? (not too much)
 
anaconda
 
- non-free and we're unhappy witht hat
 
- linux seems like we might actually want ot do sidewizse but it does work
 
- if something fully free and almost as good comes along, we'll use it
 
-> write installation instructions for linux
 
3 people who used it out of 80 had problems
 
-> anaconda choked on a person unicode path because the users homedir was in simplified chinese
 
broader unicode support wont be fixed until we can move to python3 and we still seem a little while away from that
 
 
Anonymous user