Community Data Science Workshops (Fall 2014)/Day 2 lecture

Lecture Outline

 * Introduction and context
 * You can write some tools in Python now. Congratulations!
 * Today we'll learn how to find/create data sets
 * Next week we'll get into data science (asking and answering questions)


 * Outline:
 * What did we learn in Session 1?
 * What is an API?
 * How do we use one to fetch interesting datasets?
 * How do we write programs that use the internet?
 * How can we use the placekitten API to fetch kitten pictures?
 * Introduction to structured data (JSON)
 * How do we use APIs in general?


 * What is a (web) API?
 * API: a structured way for programs to talk to each other (aka an interface for programs)
 * Web APIs: like a website your programs can visit (you:a website::your program:a web API)

Basic idea: your program sends a request, the API sends data back
 * How do we use an API to fetch datasets?
 * Where do you direct your request? The site's API endpoint.
 * For example: Wikipedia's web API endpoint is http://en.wikipedia.org/w/api.php
 * How do I write my request? Put together a URL; it will be different for different web APIs.
 * Check the documentation, look for code samples
 * How do you send a request?
 * Python has modules you can use, like  (they make HTTP requests)
 * What do you get back?
 * Structured data (usually in the JSON format)
 * How do you understand (i.e. parse) the data?
 * There's a module for that!


 * What did we learn in Session 1?
 * Navigating in the terminal and using it to run programs
 * Writing Python:
 * using variables to manipulate data
 * types of data: strings, integers, lists, dictionaries
 * if statements
 * for loops
 * printing
 * importing modules, so you can use code other people have written for you!


 * How do we write Python programs that make web requests?


 * 1) New programming concepts:
 * 2) * requests
 * 3) * interpolate variables into a string using % and %s
 * 4) * open files and write to them


 * 1) * to use APIs to build a dataset we will need:
 * 2) ** all our tools from last session: variables, etc
 * 3) ** the ability to open urls on the web
 * 4) ** the ability to create custom URLS
 * 5) ** the ability to save to files
 * 6) ** the ability to understand (i.e., parse) JSON data that APIs usually give us


 * How do we use an API to fetch kitten pictures?
 * 1) placekitten.com
 * 2) * API that takes specially crafted URLs and gives appropriately sized picture of kittens
 * 3) * example of placekitten in browser
 * 4) ** visit the API documentation
 * 5) ** kittens of different sizes
 * 6) ** kittens in greyscale or color
 * 7) * show how to use place
 * 8) * write a small program to grab arbitrary square from placekitten by asking for the size on standard in


 * Introduction to structured data (JSON)
 * 1) JSON file (JavaScript Object Notation)
 * 2) * what is json: useful for more structure data
 * 3) * import json; json.loads
 * 4) * like Python (except no single quotes)
 * 5) * simple lists, dictionaries
 * 6) * can reflect more complicated data structures
 * 7) * Example file at http://mako.cc/cdsw.json
 * 8) * download it and parse it


 * Using other APIs
 * every API is different, so read the documentation!
 * If the documentation isn't helpful, search online!
 * for popular APIs, there are python modules that help you make requests and parse json!
 * rate limiting
 * authenticaiton
 * text encoding issues