Open Source Comes to Campus/Resources
If you're looking for help with the logistics of running an OSCTC-type event, see here.
Command line basics
(This is based on what the command line tutorial in the Boston Python Workshop.)
Many of the tools of open source development are primarily used via the command line. Let's get some practice with navigating the computer from the command line.
Open Source Communication Tools
The slides I use for this talk can be found here. These are modified from a version graciously provided by Jessica McKellar. There are substantial notes in the ODP file which can be viewed by going to the "Notes" tab.
- Slides 1-7 (#7 is titled 'Version Control')
- Version Control Demo
- Slide 8 (#8 is titled 'Sharing changes: diff and patch')
- Diff and Patch Demo
- Slides 9-11 (#11 is titled 'Issue Trackers')
- Issue Tracker Demo
- Slide 12 (#12 is titled 'IRC')
- IRC Demo
- Slides 13-19
Version Control Demo
Go to the Wikipedia page for the host institution, e.g. http://en.wikipedia.org/wiki/Wellesley_College and http://en.wikipedia.org/w/index.php?title=Wellesley_College&action=history. The presenter of the version control tutorial will explore the page more thoroughly later.
Diff and Patch Demo
This demo uses the files in this repository.
I have a To Do List! But maybe it needs editing. First, I'll make a copy to edit:
cp ToDoList new_ToDoList
Then I'll open it up and make changes to it. (Side note: make sure to explain which editor you're using and give options for those following along - emacs, vim, nano, or a GUI.)
How do I view the differences between the two?
diff -u ToDoList new_ToDoList
That's just printed to the command line, though. How do I store it in a file?
diff -u ToDoList new_ToDoList > changes.diff
I open up the file and see it contains the same stuff as was printed out before. Okay, now how do I apply these changes to the original list?
patch -p0 ToDoList < changes.diff
Note that the argument given to the patch is the file I want to modify, not the file that already has the changes.
People often ask - what does the argument "-u" given to diff mean, or the argument -p0 given to "patch"?
-p[x] is an argument which allows the user to specify how much of the given file's path needs to be matched. -p0 gives the entire file name unmodified. The documentation has a bit more info.
Issue Tracker Demo
Pick a few random issue trackers, ideally ones which use different platforms, such as:
Information to look for:
- tags like "bite size"
Log into IRC and join the #openhatch channel.
Forgot to install? Go here.
- how your name is highlighted if someone uses it
- how to do /me
- how to send messages to individual users
- different servers vs different channels (#openhatch on other servers might be empty)
- how to make a new channel, if asked
Introduction to Version Control
This talk is assuming you are speaking at Wellesley College. You're probably not going to be speaking at Wellesley, so you'll need to make a handful of changes. For instance, it is highly recommended that you browse through the wikipedia page history of your campus's school ahead of time to find a fun edit to use as an example. Another example besides the one below is this one from Northeastern.
Why version control
- I want you to think back to the last time you worked on a big project. Could be for school, could be for some other purpose. (Take a moment to let them think.) How many people worked on that project? (Get answers.) How long did you work on it? (Get answers.) How many separate pieces did it have? (Get answers.)
- Now here's the real question - how did you keep track of the changes that you made?
- Some people will say they emailed around different copies of files.
- Others may say that they had people all working in person and talking about changes.
- Some will probably mention version control systems.
- How well did that work out? (Attendees will typically chuckle.)
- All sorts of problems can crop up when you're working on a project. What if you don't like the changes you made and you want to go back to how things were? What if multiple people want to make changes to the same file at the same time? What if you have some sort of class clown in your group who keeps going through and changing random words to 'Batman'? How do you handle that.
- The answer: version control. It's a system for tracking different versions of a file. The most popular tools let every contributor keep a copy of the project on their own computer, making whatever changes they want, and eventually synchronizing with the rest of the team.
- Okay, so imagine you're working on a project with about 19 million other people, editing 4 million files?
- That's Wikipedia. As you can imagine, they've put a lot of thought into version control. Let's go look at how they do it. (Navigate web browser to http://en.wikipedia.org/wiki/Wellesley_College )
- So one of the great things about Wikipedia is that the system keeps every track of every version of every article. The wiki system is a version control system, too. If you click on the "Edit" tab, you can see the built in way to edit every article. Let's take a look at the history tab here. (click history tab; zoom in so it's large enough for people to see; make it really enormous)
- So here, you can see a little heading for every version of the page, including (mouse over the date) the time and date of that version, who saved that version, and a brief summary of what the changes are. Here, the most recent version says +73 -- that means that 73 characters were added in that revision.
- One standard feature of version control tools is to be able to see each version of the file. On Wikipedia, you can do that by clicking on the date. (click on the most recent edit, taking you to e.g. http://en.wikipedia.org/w/index.php?title=Wellesley_College&oldid=559029901 )
- You can see the box at the top that says, "This is the current revision of the page." The real thing going on here is that we're using an identifier for the page that will never change, even if the page gets edited, because we're looking at the particular revision.
- Let's hit back, and take a look at an older version that has an edit summary. In this case, we see they say they added a note about New Zealand. When we click the date (Click the date)... and load the page, we can see visually that the page seems to have a note about New Zealand. But we don't know just from looking at this version of the page if maybe they made other changes.
- You folks already saw diffs, and like most version control tools, Wikipedia lets you see a diff between the two pages. It's line-based, like a lot of tools used in programming. Let's click back... (back to http://en.wikipedia.org/w/index.php?title=Wellesley_College&action=history ) and instead of clicking on the date, we'll select this revision as the first item selected, and the one right after it as the one we want to diff against. When we click "Compare selected revisions", it takes us to a page like http://en.wikipedia.org/w/index.php?title=Wellesley_College&diff=558216275&oldid=556114296
- On that page, you can see the one + line: the addition of this link to a different school in New Zealand. In particular, this shows us that there are no changes other than the particular one about the new link.
- One key difference between wikipedia's version control system and most others is that Wikipedia's default is to accept contributions. If you make a change, it will instantly go live. In fact, let's make a change now. (Make a minor edit.) Now, someone could come along and revert the change. They could even ban our account, if they think we're acting in bad faith. But this edit will still have been part of the current version for a brief period of time. In most other version control systems, a change has to be reviewed and approved before it goes live.
Different version control tools out there
- So that's the very basics of what version control is. In open source projects, probably the most popular version control tool is called git. It's a command line program, and during the laptop setup process you already installed it hopefully. A lot of projects tracked with git also use Github, which is a website for browsing these git repositories.
- We're going to focus on git and github today, but you might come across other version control systems and websites today and in the future, so I just want to mention a few of them:
- Other systems: Some of the more popular version control systems include Mercurial, Subversion, GNU-Bazaar. There are some proprietary version control systems, too, but we don't bother with them.
- Then there are some code hosting sites that work with different types of version control systems. Github works with git. Google Code works with git, mercurial, and subversion. Bitbucket works with mercurial and git. Again, there are other hosting sites, too many to mention here.
- If you're trying to work on a project but you're having trouble understanding their version control system, check and see which one it is. We have training missions on Openhatch.org for using git and subversion, and can likely help you informally on #openhatch if it's some other system.
- To get back to git, which is again the most popular tool and the one we'll be teaching you about today.
- Git calls your revisions commits. This refers not to changes to individual files, but to the state of the project at a given point. Here's a list of commits (https://github.com/bundestag/gesetze/commits/master, then click on an individual hexadecimal #) which you can see sometimes involve changes to one file, and sometimes to more. You can also see the diff between this commit and the previous one. You get to decide how frequently to bundle your changes into commits. I personally commit frequently and compulsively.
- Another difference is how the version numbers are encoded. In git, the revision number is a checksum of all the important information about the commit: the name and email address of the person who made the change, plus the contents of the files that are part of the commit, plus a timestamp of when the commit was made, plus a few other bits of data. In some systems, the version numbers are just numbers that increase over time. With git, since every person uses their own computer and can make commits whenever they want, the commit IDs can't collide, so they're computed based on the content.
Other things you can use version control for
- A lot of people use version control for code, but as we can see with wikipedia that's not the only use for it. You can use version control on almost anything digital. For instance:
- There have been several efforts to keep track of legislation via version control systems. For instance, Gesetze follows the German Bundestag.
- The White House has put part of their open data initiative on Github and has been accepting issues and pull requests.
- The city of Chicago has started sharing its open data via Github.
- Hundreds of people have put up repositories of poetry, and a friend of mine uses git to keep track of her novel.
- Some educators are sharing syllabi.
- Some scientists are publishing their raw data, as well as the scripts they use for processing and analyzing it, on Github.
Exercises & Follow up
- (Put on projector: https://openhatch.org/missions/git)
- This is the URL for an interactive teaching tool to show you how open source communities use git, and that helps you get used to using it on your own computer. We'll have teaching assistants walking around hoping to help you.
- If you finish that, work through this second URL:
- (put on projector: http://try.github.io/ ) but we really encourage you to work through the training mission first.
We aim for four panelists representing a diversity of open-source jobs. For example, past panelists have included people who work for Red Hat BoCoup Loft the Sunlight Foundation and the Personal Genome Project, as well as freelancers.
We start off by asking each person to introduce themselves, their current occupation, and their current employer and to talk very briefly about how that work relates to open source.
Then - questions! Frequent questions include:
- How did you first hear about/get involved in open source?
- What advice would you have for if students should participate in open source things in college? Are there particular things that do or don't matter, as things to focus on? (For example, "Definitely start your own open source project rather than contribute to existing ones" or "Don't waste time talking to people on IRC".)
- Do you have advice on what to do when approaching a project for the first time? Either in terms of code, or in terms of community.
We typically do the career panel just before lunch so that attendees can socialize with panelists immediately afterwards.
Ethics and History of Free Software
Typically this takes 20-30 minutes and takes place right before the Career panel.
The inspiration for the career panel comes partly from a past event, namely the first one (at Penn). After the main lecturer (Asheesh) gave a history tour in the lecture, students did Q&A with him and one other staffer. The main lecturer and the other staffer had similar but different opinions, which helped.
To deliver an equivalent lecture to what was given at at UMD or Wellesley College, follow the link to the full Ethics history talk.
The purpose of this demo is to get students more experience, hands-on, in a social environment, submitting pull requests to a project on Github. The content of the pull requests is not the point; instead, the idea is to focus on the social/personal concept that this is how projects often want changes submitted, and the technical side of getting practice with the tools. Students have already been through a basic tutorial on git; the idea is to cement this knowledge.
Typically, this takes the following form.
Before the session, an instructor creates a sample repository where it is safe to merge pull requests from students. This could be a personal project of theirs or a fork of someone else's project through which they create a sandbox to accept students' work where it is okay to eventually erase those changes.
The instructor begins by explaining the project in question, and showing how to find the project's Github page.
They then use the Github website to point to a particular file that can accept simple, text-based edits, typically a file like README. They then navigate to a terminal and demonstrate how to create a pull request by demonstrating all the following steps:
- They "fork" the project into their personal Github account. (In most cases, instructors are demonstrating a personal project; in this case, it would be wise for the instructor to create a dummy Github account to be the person submitting the pull request. This also avoids some trouble where Github accounts that are part of "organizations" see a slightly different fork UI than accounts that aren't.)
- They explain the key concept about forking: This copies all the history of the project into a space they are permitted to modify.
- They "clone" the repository onto their own computers.
- They make a change on their own computer, and run 'git add' on the change. Then they use 'git commit -m' to commit the change. The change is can be as simple as adding their name to a README file. It can also be silly.
- Then they 'git push' the change, which updates their master branch on Github.
- Then they visit their fork on the Github website and notice that their fork has their personal work on it.
- Then they click around in the Github UI to submit a pull request.
- Then they visit the upstream project, and show what the pull request looks like. They typically do not click "Merge" because it might create conflicts for student submissions.
- Then they tell students to submit changes, encouraging them to raise their hand if they need any help.
- The instructor sticks around at the front of the room, showing the upstream project's "List of pull requests" page, which typically automatically refreshes as students' work rolls in. Typically, seeing the student submissions makes for a lot of fun in the classroom. We often have to ask, "Hey, who is susanna112" (for example) and see a student raise their hand.
NOTE: We would do well to have a cheat sheet for students for the commands they use. Typically, students are supposed to follow along with the instructor by typing in analogous commands, with TAs roaming to help students as they run into issues. (For example, one student once ran into an issue where the Git version we recommend on Mac OS X would segfault in libcurl if using the HTTPS access method. Switching to SSH-based worked around the issue, but then she was behind.) We haved typically worked around this by having someone type the corresponding commands into IRC.
Also NOTE: It is important that the instructor suggest students use the HTTP(S) access method for Github; otherwise, students who do not have SSH keys set up will spend time on that, which is beside the point.
Some students may have trouble following along, and it is essential that TAs work to identify students who are not progressing well by walking around and looking at laptop screens to find students who seem stuck.
We're changing the issue tracker, and therefore this section, soon.
- Provide link to issue tracker
- Show students:
- how they can filter tasks by project, skill, or group/individual (you can do so by clicking the relevant filter)
- how to click through to the bug report
- how to claim a task (once we add that feature)
- Hand out bug report worksheets and ask attendees to fill one out for each issue they work on. Stress that this is really helpful to organizers and is not only a pedagogical tool for the attendees' benefit
- Remind attendees that staff is there to help and that, if they're feeling shy or all the staff looks busy, they can ask on the IRC channel too.
These can vary, based on your event. We often:
- Ask attendees to share their successes. Find a few people (perhaps 5) to stand up and speak for a minute or two about what they accomplished during the day.
- Get attendees to fill out your exit survey. Typically, you should plan your exit survey in advance of the event. At the wrap-up, you can use a projector and project a URL of the exit survey (preferably shortened with a meaningful name, using a service like http://bit.ly/ or http://smarturl.it/ ).
- Thank people. Sponsors! Staff! And don't forget the attendees - they are there! You are glad they are there.
- Hand out tokens of appreciation, if you can. Attendees often love to receive T-shirts, books, stickers.
- Follow up:
- Tell attendees to expect a follow up email and an invitation to join the alumni list (if you have an alumni list) and/or the general OpenHatch mailing list.
- Let them know about follow up events. If this is obvious -- like inviting people to a "project night" after an introductory workshop -- great! Make sure to share that recommendation, and do so with high clarity. Perhaps put the URL on a projector, and give people 90 seconds to check their calendars and sign up for it. If the follow-up methods are less obvious, think harder: for an open source outreach event, are there conferences nearby that are coming up? Is there a programming user group that welcomes newcomers that might be a good fit? Mention them, because attendees new to the community are likely to have never heard about them.
- Encourage attendees to keep hanging out on IRC, keep doing training missions and to keep working on the bugs from our bug tracker.
- Remind them about programs like GSoC and GNOME outreach.
- Prepare a solid plan for a nice looking version of this page (aka a "frame" to help people navigate the materials)
- Improve the Github demo by adding screenshots and stating more clearly what to say out loud
- Revisit the intro to FLOSS
Make Intro to Version Control less wall-of-text-y
- Improve Contributions Workshop section once we've got the new issue tracker/compiler