Open Source Comes to Campus/Curriculum/Tools/transcript

Hi, I'm Shauna Gordon-McKeon.

Welcome to OpenHatch's presentation about free and open source communication tools and how to use them. I say 'presentation' instead of 'lecture' because, when we give this talk at Open Source Comes to Campus events, it's very interactive. Of course, there's less interaction possible with a screencast. Still, to preserve some of that dynamic, and to give you a chance to get your hands dirty, throughout this lecture I'll suggest that you pause the video and do some activity. Feel free to ignore me! But if you do try the activities, know that you're welcome to contact us to ask for help or feedback. The best way to do so is via IRC - we're at #openhatch on the network Freenode. If you don't know about IRC, or want to contact us in a different way, follow the link at the bottom. All right, let's get started.

[Switch to Free vs Open Source Software slide]

One of the first things you might be wondering is what the term "open source" means, and if it has anything to do with the term "free software".

The short answer is that they mean about the same thing: software that you can see the source of, and that you're free to modify and distribute.

The medium-length answer is as follows: the term "Free Software" is the older of the two. It was coined in 1985 by Richard Stallman, who was one of many people growing increasingly frustrated by changes that were happening in the software community. It used to be that most software was what we'd now call free or open source but starting in the 70s companies began hiding away source code and making it proprietary. In response, Stallman articulated what he named the four freedoms. These are: the freedom to run a piece of software for any purpose; the freedom to understand and change it; the freedom to share copies of the software; and the freedom to share changed or modified copies. A piece of software must have all of those properties to be free software. The term "open source" didn't become popular until the late 90s. It was part of an effort to reach out to businesses and convince them that free software could still make them money. OpenHatch uses the term "open source" because it's now the more well-known of the two, but we like the name free software, too.

The long answer could take all day to give, so we'll stop there. [Switch to logos slide]

There are literally millions of open source projects out there. Of course, many are no longer maintained, and most have only one contributor, but there are thousands of active, multi-person projects for you to get involve with. Here are just a few of them.

Why don't you pause the video for a moment while you try to recognize as many of these as you can? [Pause for a second.] Welcome back. What are these projects? Going clockwise from the top left, they are:

Blender - a 3D computer graphics software for simulation modeling and animating Ubuntu - a popular open source operating system based on Linux Sugar - an educational desktop environment for children, used for One Laptop Per Child laptops Firefox - an open source web browser OpenMRS - an open source medical records system for underserved communities Python - a popular open source programming language Tremulous - open video game – uses open source game engine ioquake3 R - an open source programming language used for statistics and data manipulation Raspberry Pi - a low cost, credit card sized computer designed to run an open source operating system OpenROV - a low cost remote operated mini submarine

I chose these examples because I think they reflect the diversity of projects out there. There's something for every interest! And there's something for every skillset, because there are a lot of different ways you can contribute.

[Click to ways to contribute slide]

Here's a list of some of the most common ways that people contribute to projects. A lot of people assume that writing or editing code is the best, most important, or, heaven help them, *only* way of contributing to a project. But there are many other ways. I'll highlight just a few.

We just looked at a bunch of logos, of varying levels of clarity and appeal. There is a lot of design work you can do for projects, whether that's making a logo, designing a website, or creating an easy and aesthetically pleasing user experience. OpenHatch's lead designer, Karen Rustad, created our penguin mascot which is by far the piece of OpenHatch that gets the most compliments - and deservedly so.

Another really, really important task is testing and documentation. What good is a project if no one knows how to use it? Open source projects depend on feedback from users and from new contributors, so if you're having trouble making a project run, or you can't get the source code installed, or if you're having other problems, please don't just assume it's something you're doing wrong. It's just as likely to be a problem with the project, and letting the project know that it's project is buggy, or that its instructions are wrong or even just confusing gives them the chance to clear things up for the next person. If you really like testing and documentation, you can do systematic testing for projects, or improve and extend their documentation.

Those are just two ways to contribute to projects. There are many more. But we should get to the meat of this talk, which is the communications tools that open source projects use.

There are four main tools that projects use to communicate with their contributors and users. The first is mailing lists. You're likely already familiar with what a mailing list is, but you might not know what it’s for. Many people believe that mailing lists only exist for sending announcements to, but most open source projects use lists as a place for conversation and discussion. If lists really are for announcements only, they will explicitly say so, and often use a name like “project-announce”.

Projects with a significant number of community members usually have separate lists for different audiences. The most common split is to have two lists, one for users and one for developers, but larger projects have more... sometimes a lot more.

[Click over to Ubuntu mailing lists, and scroll down in silence.]

I enjoy that the fact that their list of discontinued lists is much, much longer than the total number of lists most projects have. Anyway, most projects will only have one or two lists, and many don't have any. When joining a project with multiple mailing lists, check to make sure it's the right one, either by reading descriptions of the list or reading through past posts to get a feel for what it’s usually used for.

[Switch to IRC]

The second open source communication tool is IRC. IRC stands for Internet Relay Chat. I used to say that it was like AOL Instant Messenger, but students have informed me that AIM is no longer cool, so I shouldn't use that as an analogy. It actually predates AIM by quite a bit, having been created in 1988. Basically, it's a text-based messaging service. IRC is a distributed system, which means there's no one person, or company, or server responsible for hosting and coordinating everything. Instead, individuals or organizations can run networks off of one or more servers. Other people can then connect to a particular network via a client such as XChat, Quassel, or irssi. They can join channels, which are basically chatrooms, or send direct messages to other users.

Why don't you pause this video and try joining the #openhatch channel on the network Freenode? Make sure to say hi! If you haven't already installed an IRC client, you can follow our instructions to do so here:

[bring up link to IRC laptop setup]

There are a number of neat things that IRC and your IRC client should let you do. For instance, when you type "/me" and then a statement, it appears in this different format. Also, when someone uses your IRC nickname in the channel, your client should alert you in various ways, such as making a noise, or turning the notification icon a different color or, as you can see here, highlighting that remark in the channel. In addition to channels, IRC also allows for direct messages. You can start a direct message with someone by typing "/msg", the users name, and then the message.

When looking for a channel, it's important to make sure you're on the right network. We can go to #openhatch on another network, for instance, OFTC, but it's pretty lonely in there.

Moving on from IRC, the next communication tool is the issue tracker. Issue trackers are used by projects to organize what needs to be improved. Let's take a look at some projects' issue trackers.

[Go to browser, click through three trackers: OpenStates, GNOME, OpenHatch]

As you can see, they all have the same basic structure. They have a dashboard which lets you search through issues, and then when you click on the issue, you find out more about it. Going back to the dashboards, there are some columns you'll find on virtually every tracker. For instance, the unique ID field, or the brief summary of the issue, or who created the issue, or when the last update to that issue was.

Most dashboards also show a status. This is really useful, because it tells you whether or not you ought to work on it. If the status is "closed", you should leave it alone - that means someone has already addressed the issue, or perhaps the maintainers have decided the issue doesn't need to be addressed. Other statuses to watch out for include "new" or "unconfirmed" - that means no one has yet reproduced the problem, and/or no maintainer has looked at it and decided the issue is worth addressing. With an unconfirmed issue, it's valuable for you to try and reproduce the problem and report your results, but you probably don't want to try fixing it yet. The status you're looking for, for the most part, is "Accepted". This means that an issue has been accepted as needing fixing, but it hasn't been addressed yet. You can also check out issues labeled "In Progress" and see if you can help, or perhaps said progress has stalled and you can pick it up.

Many projects will also have some custom fields in their issue trackers. For instance, OpenStates is an effort to make civic data from US States such as legislator contact info, voting records, and committee meetings available. They do this by scraping information from state government websites. You can see they have a column labeled states, so people can indicate whether their issue is with a particular state. Other projects may have different customizations. You can often find descriptions of how a particular issue tracker is configured, for instance by going to links labeled 'Documentation', 'Instructions', or simply 'Help'.

[Use GNOME Bugzilla help page as an example.]

So that's how you use an issue tracker. But how do you read an issue? Let's take a look at some real issues that were reported in open source projects' issue trackers. In a moment, I'm going to link you to an activity we give as a handout in our in person events. The instructions for the activity assume you have a partner. If you happen to be watching this video with someone, great! You can be each others partners. If you're watching it alone, feel free to join us in the openhatch IRC channel and find a partner there. Or you can just ignore that part.

The activity description links to two issue tracker threads, ‘No December’ and ‘Can’t Print… Sometimes’. For one or both, read through the thread and see if you can answer the associated questions about what the problem is and how it got fixed. You should pause the video now, and come back to hear my explanations.

[Show link]

Let’s look at the ‘No December’ bug first.

“What was the problem that the person was experiencing?”

The person who filed the bug experienced something strange -- when they went to edit a contact to add a birthday, the month of December was not listed among the choices. Subsequent contributors chimed in with whether or not they were having the same problem. This information is very important - and it allows us to answer the next question:

“When did the wrong behavior get introduced?”

You can see that the original poster had a problem in his Galaxy Nexus running Android 4.2. (A Galaxy Nexus is a brand of phone; Android is an operating system for mobile phones.) Another commenter running 4.1.1 does not have the problem. Another half dozen people confirm the bug, and that they’re using Android 4.2. Then, in comment #12, a contributor is able to induce the bug by updating to 4.2, thus confirming that the buggy behavior was introduced by changes made in Android 4.2

In comment #17, someone provides a link to the precise commit, or change, where the behavior was introduced. You can see that eight lines - six of code, two of comments - were added (they’re the ones in green).

“What caused the wrong behavior?”

The core of this bug is that a list of months was being treated sometimes as starting at 1 and other times starting at 0. This is a classic programming mistake, made by programmers of all types and all ages, known as an off-by-one error. Comment #47 goes into detail about how this happened in this particular case. They say, “This change of behavior and the lack of documentation on what is intended is the real problem I think” - pointing out that this bug might never have been made if the documentation had been clearer.

“When did the behavior get fixed, and who fixed it?”

A commit was added on November 19th, reverting the change that had caused the problems, and added some extra documentation. This commit was made by Svetoslav Ganov, the same person who introduced the problem. Interestingly, although the bug was reported on the 14th, the solution found by the 17th, and the change made to the code on the 19th, users were not able to access the fixed code until the 27th. You can see them post as they’re waiting… my favorite comment is #93, on November 23rd. “Will this be fixed by December? Because my birthday is in December. I mean, I think it is - I’m not sure now, though.”

The reason for the delay is that they were waiting for Google, which develops Android, to release their next update.

Let’s move onto the ‘Can’t Print… Sometimes’ bug.

“What is the problem the person is experiencing?”

The original poster is having difficulty printing from OpenOffice 2.4. Not only will it not print, there are no error messages - absolutely nothing happens. Subsequent comments sometimes confirm the bug, and sometimes fail to. In fact, you can see that the same commenters are sometimes able to reproduce the bug, and sometimes not. For instance, in comment #6 the original poster writes, “Open Office started printing today!” Then, in comment #10, they say it’s stopped working.

“Are there certain days of the week that people can reliably print?”

The commenters try a number of solutions, such as installing and uninstalling open office, and upgrading, but none of these seem to work. It’s not until comment #28, eight months after the bug was first reported, that somebody noticed a pattern. A user comments that their wife noticed that OpenOffice only failed to print on Tuesdays. If you go back and look at the previous comments with a calendar, you will see that reports of problems always occurred on Tuesdays (or on Wednesdays, likely due to time zone differences). The “hey it’s working again” comments are never on Tuesdays.

“What caused the wrong behavior?”

Comment #28 also explains what’s causing the problem. It’s kind of complex, but here’s an overview:

When programs on a Linux-based operating system print, they create a “PostScript” file that contains the document they want to print. However, different printers support - that is, accept - different kinds of files. Not all of them accept PostScript. So on Linux there is a print system called CUPS which converts documents from whatever format they’re given in to whatever format the printer will accept.

When programs send their Postscript files to CUPS, they don’t explicitly say that it’s a Postscript file. CUPS has to look at the document contents and guess what type of document it is. In order to do this guessing, CUPS uses another program called file. This program has a database of patterns. When a given document matches a pattern, file says that it knows what type of document it is.

The problem, it turns out, is not with OpenOffice, or with CUPS, but with File. One of the patterns in its database mistakenly specified that any document that contains “Tue” at a particular is an “Erlang JAM file”. It happens that OpenOffice.org annotates their Postscript files with the date and time it was printed, triggering that rule. This leaves CUPS thinking that it needs to send an “Erlang JAM file” to the printer, which it does not know how to do. So instead, it did nothing.

You can read more about this issue and how it was fixed by following the link at the top of the page to the issue reported in the ‘File’ issue tracker. Because the problem turned out to be not in OpenOffice or CUPS, the discussion spans two threads in two different trackers.

“How could the issue have been fixed more quickly?”

It took eight months for the community to figure out what was going wrong. There are a few different things that could have helped the problem get solved more quickly.

Most importantly, the CUPS program could have been written to “fail loudly” when given a document it doesn’t know how to re-format. If users had seen the error message like, “Cannot print format Erlang JAM file” they would have had something more to go on. Having those search terms might have been enough to allow them to find the separate thread in the File issue tracker which explained that files containing “Tue” were being mislabeled. This was a known issue posted to the File issue tracker three weeks before the thread we were reading even got started.

Hindsight is twenty-twenty, as they say, but it’s an important lesson about the value of good error messages.

All right, I hope you enjoyed that activity. Let's take a look at one last issue thread:

[Link to white house issue]

I wonder how long before the joke gets dated and I have to take it out of this presentation.

Anyway, let's move on to the last of our four communications tools: version control.

What is version control and why would you want it? Let's take a moment to imagine that you're working a project with some friends, say, a research paper. You write some of the paper and send it off to everyone for feedback. One of your friends makes some edits and sends it back to everyone. Another of your friends doesn't see her updated version, and makes changes to the original document. Even worse, he's changed some of the same lines as she has, to something different. They both send the modified documents back to you. How do you make a final product that accurately reflects what everyone wants to say? It's a headache.

That's why version control exists. It allows you to systematically handle changes from multiple people, even if those changes conflict with each other. It also allows you to go back to previous versions of your document, which is nice, especially if you're prone to second guessing yourself.

At the heart of version control is something called a diff. What is a diff? To explain, think for a moment about how inefficient a version control system would be if it stored actual complete copies of a project for each change you made. What if you were writing a novel and you changed a single word? Do you really need to store another copy of the 100,000 words that make up the full novel? Of course not - you only need to store the change. Diffs are records of changes.

There's actually a function called diff on your terminal. I'm not going to get into the details of it here, because most version control systems don't require you to generate your own diffs, but if you're interested, you can pause the video and check out our diff tutorial.

[Show link to diff training mission.]

Let's go back to your research paper. What if you were writing a research paper wthl millions of friends and strangers? What if this research paper spanned millions of different topics and sub-topics? Such a research paper does exist. We call it Wikipedia.

[Go to Wikipedia]

With so many people editing so many pages, it's only natural that the beating heart of Wikipedia is a version control system. On any page, you can go to the tab 'View History' and access any version of the page that has ever existed. You can also compare any two versions. When you do, Wikipedia will show you a diff. You can see here how they display it, with the older version on the left and the newer on the right, with removals in beige and additions in blue. You can also see, at the top, what is often called a commit message - a brief summary of why the change was made. Having commit messages is very useful when you're scanning a long list of changes. As you can see, many of Wikipedia’s editors don’t know about commit messages, which makes scanning the many changes that have been made to their pages extra difficult.

At our events, we like to find a fun change that was made in the host school's wikipedia page. This was my personal favorite.

[Show Purdue & Indiana.]

We ran back to back events at Bloomington and Purdue. This got a big laugh at both places. I especially like the commit message explaining why the change was deleted.

[Go back to OS Tools slide]

Wikipedia uses one kind of version control system. There are several, some of which you may have heard of: git, subversion, mercurial, DCVS, Bazaar. At our events, we teach people how to use Git, because it's the most common version control system. Along the way, we also teach people how to use Github, which is the most popular hosting service for Git.

Sometimes people get confused about the difference between Git and Github. To help, I've drawn this handy chart. I drew it in the free software project GIMP, by the way. Any aesthetic failings are my responsibility, not GIMP's. Design is not one of my talents.

At the top level, you can see some of the different version control systems. Below, you can see different hosting services. These are all websites with servers that will store copies of projects. Some of them, like Github, only host projects using one kind of version control system - git. Others, like Google Code and Source Forge, host multiple version control systems. SourceForge hosts projects that use Git, Subversion, CVS, and two other version control systems. Within Github, there are multiple users. For instance, OpenHatch, which is actually a special user called an organization. And users - organizations or individuals - can own repositories. Users can have multiple repositories, and repositories can have multiple contributors, although each repository is "owned" by a single user.

Before I leave you, I want to stress that repositories can hold many things besides code. For instance, this project is an effort by German citizens to record all the laws passed by the German government, which is called the Bundestag, hence the name of the user. When you search through Github, you can find all sorts of neat things, including

[search for poetry] - poetry

and, sadly, an order of magnitude more popular

[resumes] - resumes

I have a project myself

[switch to wcweekly]

that does not have a single line of code, but instead a bunch of scans of issues of a radical 1870s feminist newspaper, and text docs which are transcriptions of them.

So those are the four major communications tools widely used by open source projects: mailing lists, IRC, issue trackers, and version control. Most projects will use at least one of these, and many will use all four. When you're approaching a new project, you might find you feel the most comfortable saying hello in the IRC channel first. Maybe you'd prefer to browse through the issue tracker, or download the source repository and poke around. Maybe you want to lurk on a mailing list for a little while. You should do whatever you're the most comfortable with.

[slide with contact info again]

I hope you've enjoyed this presentation. If you think anything could be improved, feel free to give me feedback via the issue tracker, email, or IRC. And, as always, you're welcome to visit us on our IRC channel.

Happy contributing!