Community Data Science Workshops (Fall 2014)/Reflections: Difference between revisions

no edit summary
imported>Mako
(more work)
imported>Mako
No edit summary
Line 22:
* '''Afternoon, 1pm-3:30pm''': Practice working on projects in 3 breakout sessions
* '''Wrap-up, 3:30pm-4pm''': Wrap-up, next steps, and upcoming opportunities
 
We had 30 mentors who attended at least one of the sessions and at least 20 mentors at each of the sessions.
 
We had about 150 participants apply to attend the sessions. We selected on programming skill (to ensure that all attendees were complete beginners), enthusiasm, and randomly to maintain a learner to mentor ratio of between 4 and 5. We admitted 80 participants.
 
Retention between session and 0 and 1 was nearly 100%. Retention between sessions 1 and 2 and sessions 2 and 3 was probably 75% and similar learning us with perhaps 55-60% retention at the end of session 3.
 
We collected detailed feedback from users at three points using the following Google forms (these are copies):
Line 38 ⟶ 32:
 
We used this feedback to both evaluate what worked well and what did not and to get a sense of what students wanted to learn in the next session and which afternoon sessions they might find interesting.
 
== Applicants ==
 
We had 30 mentors who attended at least one of the sessions and at least 20 mentors at each of the sessions.
 
We had about 150 participants apply to attend the sessions. We selected on programming skill (to ensure that all attendees were complete beginners), enthusiasm, and randomly to maintain a learner to mentor ratio of between 4 and 5. We admitted 80 participants.
 
Retention between session and 0 and 1 was nearly 100%. Retention between sessions 1 and 2 and sessions 2 and 3 was probably 75% and similar learning us with perhaps 55-60% retention at the end of session 3. Anecdotally, there is a sense that those who are dropping are those who had more trouble but didn’t struggle visibly.
 
Although our participant pool in [[CDSW (Spring 2014)]] was overwhelming female, there was close to gender balance in both students and mentors. Roughly 2/3 of mentees were from UW and this included students from random places including someone who works for the city of Seattle. Many random Wikipedians were there. It's cool that people who are not doing research but are part of online communities were in the mix with the researchers.
 
We had 16 students from HCDE were there, but also a bunch of mentors. They were good mentors.
 
Several people applied who are already good at programming. We're still not exactly sure why these people are applying because we think that the fact that the workshops are for absolute beginners is very clear. Maybe they want more exposure to data science?
 
Once again, the constraint on scaling the workshop is the number of mentors. Every mentor means that the workshop can accommodate four more mentees.
 
One suggestion was allowing mentees with have some programming skills — especially for the second and third workshops (given predictable rates of retention). There was not consensus among the organizers and mentors on this approach and preferred getting more newbies and invest more in them?
 
== Morning Lectures ==
 
Frances Hocutt gave one of the lectures and, generally, this was seen as a huge success. An important goal is getting other people to more of the lectures. Tommy is an obvious choice to take over this time time. Different faces, perspective, and backgrounds are useful to communicate the breadth of interest here. [[Mako]] does not want to be the only one giving these lectures.
== Afternoon Sessions ==
 
Our biggest challenge with growign the workshops was with physiucal space for the lectures. Basically, rooms tha can hold more than 100 people are almost exclusively lectures halls that make it almost impossible for mentors to reach students.
 
This time, we reserved a lecture hall that say 200 people and filled it with 100 students in alternating rows to make it at least possible to reach each person. Projects are done in breakout sessions which can be split.
 
- turn on loggin gin the concsol and post it after the lecture
 
 
 
== Session 0: Python Setup ==
Line 69 ⟶ 89:
* Once again, not a single person outside of mentors ran GNU/Linux. We should strongly consider how much effort we want to put into maintaining this part of the curriculum.
* We should move to Python 3 to try to address lingering unicode issues. We should try to do this for the next session.
* Not everybody loves the checkout step. Maybe there's a way we can make it more fun?
 
We had many new mentors this round. One general concern was the relative lack of mentor training, especially before the first sessions. We felt that we should:
 
* Arrange a mentors meeting (perhaps a day or two before to over material) and maybe at a bar or other social environment.
* Send out details instructions and emails to mentors, or create pages in this wiki with detail on how to do this better.
* Perhaps meet 15-20 minutes early to get to know each other and over things
* Create some easier way to distinguish mentors from students (e.g., t-shirts, buttons, etc).
* Explicitly encourage mentors to reach out to students and ask them how things are going.
* Talk to mentors about much should you help? (e.g., some but be careful not to just give away the answer, to focus too much on elegance or technical correctness and be careful not to overwhelm the learners).
 
We also had [[Community Data Science Workshops (Fall 2014)/Reflections#Mentorship|a bunch of general feedback on how we could improvement mentorship]] that is particular relevant to the earlier session
== Session 1: Introduction to Python ==
 
The goal of this session was to teach the basic of programming in Python. The basic curriculum was originally built off the [[Boston Python Workshop]] curriculum has been used many times and is well tested. Unsurprisingly, it worked well for us as well. We made several major changes. The biggest is that we retained only the [[Wordplay]] project and we installed createa new project [[Baby Names]] that uses Social Security Administration data on the frequency of Baby Names.
 
=== Afternoon sessions ===
 
We felt that that the new [[Baby Names]] was excellent and feedback was overwhelming positive. Because it includes both lists of names and numbers, it can do everything that [[Wordplay]] can but it has a much stronger feel of science to it and a higher ceiling. Wordplay felt relatively boring.
 
Suggestions based on feedback include:
 
* Do a better job of brining folks back to gether to walk through potential solutions to the questions posed in the project rooms.
* Consider simply having two smaller rooms doing [[Baby Names]] and perhaps have one that emphasizes more numeric and math operations.
* Prepare questions before hand, list them all up front, and let folks choose what to work on.
 
== Session 2: Learning APIs ==
 
The goal of this session was to describe what web APIs were, how they worked (making HTTP requests and receiving data back), how to understand JSON Data, and how to use common web APIs from Wikipedia and Twitter.
 
=== Morning lecture ===
 
The morning lecture was given by Frances Hocutt and it was was well received — if delivered too slowly for a significant minority of attendees. Unsurprisingly, the example of [http://placekitten.com/ PlaceKitten] as an API was an enormous hit: informative ''and'' cute.
 
Frances used excellent slides which are shared on the wiki page and which we will reuse. Since many people felt the lecture was on the slower side, we want to use this time to introduce function definition up front. Then, functions can be reinforced in the week 2 workshops.
 
=== Afternoon sessions ===
 
There were three parallel afternoon sessions on '''Twitter''', '''Wikipedia API''' and '''SQL'''. We plan to do some version of all three sessions next round:
 
'''Twitter''':
 
* Once again, the session too many people and we should consider splitting it if we have mentors who are comfortable splitting it.
* Next time, we should be careful to make sure that the advance notice asks everybody to download the project zip file ahead of time. If we're going to do this in class, we should set up a short URL of some sort to help streamline the process without heading to the wiki things.
* A bunch of people found the Twitter session too fast.
* TweePy is not well documented.
 
'''Wikipedia''' workshop:
 
* The teacher explained things very clearly. That was frustrating for those who didn’t need it, but super great for people that wanted/needed a lot of explanation.
* Graduated challenges in a workhshop that go from less challenging to more and more challenging helps with the fact there is a range of learning levels.
 
'''SQL workshop''':
 
* Generally was very successfuly Seemed to work really well and did a good job of giving people an overview of a data science and a way to hook themselves in to it.
* Next session, also do a workshop that closes the loop between SQL and Python.
* Can we host an open SQL database somewhere?
 
== Session 3: Data Analysis and Visualization ==
 
'''Afternoon of Session 3:'''
 
'''The spreadsheets session.''' People were modifying the code to build their own dataset and did their own visualizations. At least a few people. That was cool!
 
'''The MatPlotLib session'''. Most people in the session were deeply lost. The mentors who taught it were not at any of the other sessions and therefore didn’t go in with a good sense of where the mentees were at. Several people left and went to other room. In future, ensure mentor success by having them loop in better to where the mentees are at. Consider next time, encouraging new mentors do a practice session with some friendly folks before they let loose. Also, next session, consider using SeaBorn instead of MatPlotLib.
 
 
=== Morning Lecture ===
 
The goal of the lecture was to walk people through the actual mess of making a code.
 
=== Afternoon sessions ===
 
== General Feedback ==
 
* We should try to schedule the workshop not as close to the end of the quarter. The beginning or middle of the quarter should be better for UW students.
 
 
 
 
== Budget ==
 
== Unprocessed ==
 
'''Maintanence errors on the wiki.''' There was a need for several on-the-fly corrections of the instructions and files on the wiki during the workshop.
 
 
'''Q: How to handle when mentees want to refer back to the workshop material that they experienced?'''
 
A: Create and archive template for the page they are looking at during the workshop.
Each project can be its own namespace as opposed to having event-specific pages.
 
Mentors should post the code generated in the break-outs. Encourage them to capture the code.
 
 
'''General observation about mentoring:''' Being a mentor is kind of hard, especially being a good mentor. Some steps were skipped in helping mentors that were in place last time.
 
It was hard to tell who was a mentor and who wasn’t.
 
Improvement: Help the mentors to be visually identifiable. E.g. Paper them head to foot in sticky notes.
 
Questions about mentorship:
How to help the mentors to mentor well?
 
Suggestion for mentors: Walk around to every single person. Ask, “How are you doing? What are you working on? Show me what you’re doing.”
 
How much do you help somebody?
 
Should there be a page of guidelines for mentors?
 
Where is uniformity needed in mentor style and where do we want to encourage diverse approaches?
 
Let’s have a mentors workshop! At a bar! With BEER! and PIZZA!
 
The pizza party, er, mentor workshop could cover: norms, best practices, goals. Planning, etc.
 
 
'''Should only fully open source tools be selected for workshops?
'''
A: Our job is not to extoll the virtues of open source. Our job is to help mentees solve their data problem. “We are teaching you how to do things with data that help you achieve your goal.” However, open source tools are desirable.
 
 
'''Student demographics.'''
This time there was more gender balance in both students and mentors.
 
2/3 of mentees were from UW. Included students from random places including someone who works for the city of Seattle.Many random Wikipedians were there. It's cool that people who are not doing research but are part of online communities were in the mix with the researchers.
 
We had 16 students from HCDE were there, but also a bunch of mentors. They were good mentors.
 
'''Demographics of Applicants.'''
Several people applied who are already good at programming. Why do they apply? Maybe they want more exposure to data science?
 
 
'''Desired applicants.'''
The constraint on scaling the workshop is the number of mentors. Every mentor means that the workshop can accommodate four more mentees.
 
Is it good to have mentees who have some programming skills along with those who don’t have any? Or is it a better use of the seats to only take those with no programming background?
 
Who are the priorities? Get more of the newbies and invest more in them?
 
'''Improving Retention:'''
Anecdotally, there is a sense that those who are dropping are those who had more trouble but didn’t struggle visibly.
 
Q: Would it help with retention if we show people what will happen in the following weeks?
 
A: Several mentors say “yes.” We’re doing that, but let’s do more of it.
 
Pair programming for those who want it might be helpful. Working in groups is another possibility.
 
== Afternoon Sessions ==
 
'''Mining research interests/goals.'''
Could we help match up people with similar interests?
 
next time maybe mine the registration for a list of research questions
 
 
Line 229 ⟶ 111:
 
The size of the breakout workshops varied and that means different degrees of engagement were feasible.
 
 
The BIG feedback from the first series of workshops: Bring people back together more often. Bringing people together in the end was effective this time. We need a go between for each session to remind people to reconvene. An emcee.
 
- post examples of code used in teh lectures
'''
Flow of the workshops'''
Q: What degree of dependencies should there be between workshops?
 
showcase what students ahve accomplished and places people can change things and do things differently
'''Feedback on lectures.'''
About half found Frances’s lecture either too fast or too slow and about half found the lecture to be just right.
 
e.g., the fergeson thing with the exmaple from ha=rry party
Getting other people to do some of the lectures.
Diversity is desired. Mako does not want to be the only one.
 
 
- public healtha nd epi data session
'''Selecting workshops for next time.'''
Do we need more break-out sessions? OR do we need to break out best of the break-out sessions? Two mentors thumb wrestle.
 
We would love to create a session on '''basic statistical analysis in Python''' and at least ten mentees would have been enthusiastic to take it.
Wrestler one: Smaller groups of the same break-out session might be good.
 
Precanned sessions make it easier for new mentors to feel confident and be successful.
 
show and tell at the end was very effectively
Wrestler two: Diversity of projects inspires people to do the kinds of things that people can do with this new knowledge. 

 
we need a designated mc who can go =between rooms
What else can encourage generative-ness?
Giving mentees generative moments within sessions and lectures might be empowering. Perhaps, calling out mentees who are doing generative things.
 
ideomatic ptyhon
'''Basic statistical analysis''' in Python would be a fun thing to teach (says Mako) and at least ten mentees would be enthusiastic about it.
 
talk to chris to try to fix those things
Some people love R some people don’t. The world goes round.
 
 
== Session 1: Introduction to Python ==
'''Q: How can we strengthen the relationship between the lectures and the break-outs?'''
 
The goal of this session was to teach the basic of programming in Python. The basic curriculum was originally built off the [[Boston Python Workshop]] curriculum has been used many times and is well tested. Unsurprisingly, it worked well for us as well. We made several major changes. The biggest is that we retained only the [[Wordplay]] project and we installed createa new project [[Baby Names]] that uses Social Security Administration data on the frequency of Baby Names.
 
=== Afternoon sessions ===
'''The ethnographers get the last word:'''
Some observations about the culture of mentoring from a first time mentor: There are some distinct values that came through strongly. There is a clear vision of empowerment through programming. The degree of inclusivity is impressive. The culture of feedback, iteration, and reflection was really surprising such as the amount of effort that goes into improving the materials and the teaching. As is the way that other organizations are able to (and are) using the materials. The way that this is building the community. For example, how mentees are organizing their own meet-ups (though that could be encouraged even more).
 
We felt that that the new [[Baby Names]] was excellent and feedback was overwhelming positive. Because it includes both lists of names and numbers, it can do everything that [[Wordplay]] can but it has a much stronger feel of science to it and a higher ceiling. Wordplay felt relatively boring.
The pragmatism of what is taught demonstrates a clear value. It would be helpful to make sure that all mentors are clear that part of what is expected of them they give pragmatic coaching. That is they should lead mentees to something that works rather than telling them what an expert would do.
 
Suggestions based on feedback include:
== Mako's Raw Notes ==
 
* Do a better job of brining folks back to gether to walk through potential solutions to the questions posed in the project rooms.
* Consider simply having two smaller rooms doing [[Baby Names]] and perhaps have one that emphasizes more numeric and math operations.
* Prepare questions before hand, list them all up front, and let folks choose what to work on.
 
== Session 2: Learning APIs ==
rooms:
 
The goal of this session was to describe what web APIs were, how they worked (making HTTP requests and receiving data back), how to understand JSON Data, and how to use common web APIs from Wikipedia and Twitter.
maybe get the oodegard room
architecture of the space has quickly become the limiting factor
 
=== Morning lecture ===
checkout
- not everybody loves the checkout, maybe there's a way we can make it more fun?
 
The morning lecture was given by Frances Hocutt and it was was well received — if delivered too slowly for a significant minority of attendees. Unsurprisingly, the example of [http://placekitten.com/ PlaceKitten] as an API was an enormous hit: informative ''and'' cute.
communiate the whole setup process to the mentors ahead of time
 
Frances used excellent slides which are shared on the wiki page and which we will reuse. About half found Frances’s lecture either too fast or too slow and about half found the lecture to be just right.
-> maybe stream line the process
 
Since many people felt the lecture was on the slower side, we want to use this time to introduce function definition up front. Then, functions can be reinforced in the week 2 workshops.
-> finding the directory continues to be hard
 
=== Afternoon sessions ===
 
There were three parallel afternoon sessions on '''Twitter''', '''Wikipedia API''' and '''SQL'''. We plan to do some version of all three sessions next round:
we can move away from 3 separate installations in the setup information.
 
'''Twitter''':
- everybody can use zip instead of a zip/tar.gz both
 
* Once again, the session too many people and we should consider splitting it if we have mentors who are comfortable splitting it.
maybe we can consoldiate the wiki pages into a singel page which will be much eaiser to instlla nd keep updated in the future
* Next time, we should be careful to make sure that the advance notice asks everybody to download the project zip file ahead of time. If we're going to do this in class, we should set up a short URL of some sort to help streamline the process without heading to the wiki things.
* A bunch of people found the Twitter session too fast.
* TweePy is not well documented.
 
generally, lets stop copying and pasting new stuff into the wiki. we when archive the old version, we can create links to teh old version of the wiki pages (intstall the templates from english wikipedia)
 
the opaqueness of tweepy was a problem.. option to creat ea version of tweppty that just gives you json
get rid of pages that are event specific
 
or miku or michael for details onhow to do that
 
dharma might be able to do this.
 
'''Wikipedia''' workshop:
* demorgraphics
 
* The teacher explained things very clearly. That was frustrating for those who didn’t need it, but super great for people that wanted/needed a lot of explanation.
people come in: departments? maybe build a table?
* Graduated challenges in a workhshop that go from less challenging to more and more challenging helps with the fact there is a range of learning levels.
 
'''SQL workshop''':
* org suggestions
 
* Generally was very successfuly Seemed to work really well and did a good job of giving people an overview of a data science and a way to hook themselves in to it.
let people joiun int he later session
* Next session, also do a workshop that closes the loop between SQL and Python.
* Can we host an open SQL database somewhere?
 
- maybe split this into two session next time
making letting peopel skpi #1 could be usefuil
 
- merge in some more python this time
-> maybe we can accept people after words
 
#1 intro into sql
-> alternatively, we can try to accetp more newsiebes and improve retention
 
#2 using pythong o tgra data and bring python and pandas
mixed feelings
 
 
== Session 3: Data Analysis and Visualization ==
 
The goal of the lecture was to walk people through the actual mess of making a code.
 
=== Afternoon sessions ===
 
'''Afternoon of Session 3:'''
 
'''The spreadsheets session.''' People were modifying the code to build their own dataset and did their own visualizations. At least a few people. That was cool!
ways to improve and retain people
 
'''The MatPlotLib session'''. Most people in the session were deeply lost. The mentors who taught it were not at any of the other sessions and therefore didn’t go in with a good sense of where the mentees were at. Several people left and went to other room. In future, ensure mentor success by having them loop in better to where the mentees are at. Consider next time, encouraging new mentors do a practice session with some friendly folks before they let loose. Also, next session, consider using SeaBorn instead of MatPlotLib.
-> layout what we're going to do int he next sessions
 
matplot lib
go to show why are learning things up front
 
- maybe replace it with seaborn?
focus on broad research questions
- tommy will teach it
 
pair programming?
encouraging people to work iun teams or with other on problems they suggest
 
== General Feedback ==
next time maybe mine the registration for a list of research questions
 
* Generally, there was a sense that we should stop creating pages in the wik by copying and pasting old stuff. We when archive the old version, we can use MediaWiki to create links to the old version of the pages (we can intstall templates from English Wikipedia) to make this easier.
note:
* We should try to schedule the workshop not as close to the end of the quarter. The beginning or middle of the quarter should be better for UW students.
* Mentors should post the code generated in the break-outs. Encourage them to capture the code created in examples and to post these afterward systematically.
* There was general interest in pair programming or more team based excercises.
 
* There was a need for several on-the-fly corrections of the instructions and files on the wiki during the workshop. Better planning and testing for this will be very useful.
next time make it explicit that folks can work in grousp
 
=== Mentorship ===
tip: introduce mentors to everybody very clearly
 
Last time through, most of our observation were focused on improving the experience of attendees and we think we didn't spend as much time on helping mentors have a great experience and helping them prepare effectively. We had a series of pieces of feedback on how to improve this.
introductions would have been good but are hard to do
 
We had many new mentors this round. One general concern was the relative lack of mentor training, especially before the first sessions. We felt that we should:
* sesion 1
 
bring folks back together to go over things
 
* Arrange a mentors meeting (perhaps a day or two before to over material) and maybe at a bar or other social environment with beer and pizza. We coudl use this tnorms, best practices, goals, planning, etc.
- post examples of code used in teh lectures
* Perhaps meet 15-20 minutes early to get to know each other and over things
* Create some easier way to distinguish mentors from students (e.g., t-shirts, buttons, paper them head to foot in sticky notes).
* Send out details instructions and emails to mentors, or create pages in this wiki with detail on how to do this better.
** Talk to mentors about much should you help? (e.g., some but be careful not to just give away the answer, to focus too much on elegance or technical correctness and be careful not to overwhelm the learners).
** Explicitly encourage mentors to reach out to students and ask them how things are going by walking around to every single person to ask, “How are you doing? What are you working on? Show me what you’re doing.”
 
=== More Projects or Better Projects ===
- create code base
 
We had certain afternoon project sessions that were much more effective than others. One thing we were conflited about was whether we wanted more break-out sessions or whether we should just use best of the break-out sessions (perhaps in two rooms).
- turn on loggin gin the concsol and post it after the lecture
 
Arguments for smaller groups of the best break-out session include:
mentor workshop:
 
* Focus on a known good thing.
- get people together before
* Precanned sessions make it easier for new mentors to feel confident and be successful.
encourage people to get involved maybe bar meetup
 
Arguments against include:
- track diversity of people along more dimensions
 
* Diversity of projects inspires people to do the kinds of things that people can do with this new knowledge. 

- the sql workshops was well received although slight mixed in terms of feedback
 
Otjher ways encourage generative-ness? might include giving mentees creative/flexible moments within sessions and lectures might be empowering. Perhaps, calling out mentees who are doing creative things?
more breakout session next time
colorwall was gone and nobody missed it
 
== Budget ==
 
<!--
* session 3
'''The ethnographers get the last word:'''
Some observations about the culture of mentoring from a first time mentor: There are some distinct values that came through strongly. There is a clear vision of empowerment through programming. The degree of inclusivity is impressive. The culture of feedback, iteration, and reflection was really surprising such as the amount of effort that goes into improving the materials and the teaching. As is the way that other organizations are able to (and are) using the materials. The way that this is building the community. For example, how mentees are organizing their own meet-ups (though that could be encouraged even more).
 
The pragmatism of what is taught demonstrates a clear value. It would be helpful to make sure that all mentors are clear that part of what is expected of them they give pragmatic coaching. That is they should lead mentees to something that works rather than telling them what an expert would do.
generally:
 
-->
showcase what students ahve accomplished and places people can change things and do things differently
 
e.g., the fergeson thing with the exmaple from ha=rry party
 
strong connection between the lecture and the introduction
 
-> more connections and takeaways to emphasize the session more clearly
 
how to tap mentors on topics more effectively
 
wordplay
 
- kinda borning
 
next time
 
- public healtha nd epi data session
 
end of semseter was too late. maybe have it early next year
 
twitter:
 
 
- have people do the setup ahead of time
 
-> that was clear ahead of time and it happened in the beginnginf of class. either fix the instruction and make sure that everybody is doing the same thing
 
speed was an issue
 
the opaqueness of tweepy was a problem.. option to creat ea version of tweppty that just gives you json
 
or miku or michael for details onhow to do that
 
dharma might be able to do this.
 
 
sql session:
 
- maybe split this into two session next time
 
- merge in some more python this time
 
#1 intro into sql
 
#2 using pythong o tgra data and bring python and pandas
 
wikipedia
 
- too slow
 
we can do it faster
 
lecture
*stress defining functions more and earlier.. maybe in the first project and certain in session #2 so we can use it in the afternoon projects and tweepy
 
 
 
session 3:
 
show and tell at the end was very effectively
 
we need a designated mc who can go =between rooms
 
bring people up to the
 
matplot lib
 
- maybe replace it with seaborn?
- tommy will teach it
 
ideomatic ptyhon
 
talk to chris to try to fix those things
Anonymous user