Community Data Science Workshops (Fall 2014)/Day 2 lecture: Difference between revisions
Content added Content deleted
imported>Fhocutt No edit summary |
imported>Mako No edit summary |
||
(10 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
{{CDSW Moved}} |
|||
[[File:Highfivekitten.jpeg|200px|thumb|In which you learn how to use Python and web APIs to meet the likes of her!]] |
|||
== Lecture Slides == |
|||
* [http://mako.cc/teaching/2014/cdsw-autumn/lecture2-web_apis.pdf Slides (PDF)] — For viewing |
|||
* [http://mako.cc/teaching/2014/cdsw-autumn/lecture2-web_apis.odp Slides (ODP Libreoffice Slides Format)] — For editing and modification |
|||
== Resources == |
|||
* Encoding: |
|||
** [http://nedbatchelder.com/text/unipain.html Pragmatic Unicode] |
|||
** [https://docs.python.org/2/howto/unicode.html Official Python Unicode documentation] |
|||
== Lecture Outline == |
== Lecture Outline == |
||
;Introduction and context |
;Introduction and context |
||
* You can write some tools in Python now. Congratulations! |
* You can write some tools in Python now. Congratulations! |
||
* Today we'll learn how to find/create data sets |
* Today we'll learn how to find/create data sets |
||
* Next week we'll get into data science (asking and answering questions) |
* Next week we'll get into data science (asking and answering questions) |
||
;Outline: |
;Outline: |
||
* What did we learn in Session 1? |
* What did we learn in Session 1? |
||
* What is an API? |
* What is an API? |
||
Line 13: | Line 30: | ||
* Introduction to structured data (JSON) |
* Introduction to structured data (JSON) |
||
* How do we use APIs in general? |
* How do we use APIs in general? |
||
;What is a (web) API? |
;What is a (web) API? |
||
* API: a structured way for programs to talk to each other (aka an interface for programs) |
* API: a structured way for programs to talk to each other (aka an interface for programs) |
||
* Web APIs: like a website your programs can visit (you:a website::your program:a web API) |
* Web APIs: like a website your programs can visit (you:a website::your program:a web API) |
||
; How do we use an API to fetch our datasets? |
|||
Using APIs: your program sends a request, the API sends data back |
|||
** Where do you direct your request? The site's API endpoint. |
|||
*** For example: Wikipedia's web API endpoint is http://en.wikipedia.org/w/api.php |
|||
** How do I write my request? Put together a URL; it will be different for different web APIs. |
|||
*** Check the documentation, look for code samples |
|||
** How do you send a request? |
|||
*** Python has modules you can use, like <code>requests</code> (they make HTTP requests) |
|||
** What do you get back? |
|||
*** Structured data (usually in the JSON format) |
|||
** How do you understand (i.e. parse) the data? |
|||
*** There's a module for that! |
|||
; How do we use an API to fetch datasets? |
|||
What did we learn in Session 1? |
|||
Basic idea: your program sends a request, the API sends data back |
|||
* Where do you direct your request? The site's API endpoint. |
|||
** For example: Wikipedia's web API endpoint is http://en.wikipedia.org/w/api.php |
|||
* How do I write my request? Put together a URL; it will be different for different web APIs. |
|||
** Check the documentation, look for code samples |
|||
* How do you send a request? |
|||
** Python has modules you can use, like <code>requests</code> (they make HTTP requests) |
|||
* What do you get back? |
|||
** Structured data (usually in the JSON format) |
|||
* How do you understand (i.e. parse) the data? |
|||
** There's a module for that! |
|||
; How do we write Python programs that make web requests? |
|||
To use APIs to build a dataset we will need: |
|||
* all our tools from last session: variables, etc |
|||
* the ability to open urls on the web |
|||
* the ability to create custom URLS |
|||
* the ability to save to files |
|||
* the ability to understand (i.e., parse) JSON data that APIs usually give us |
|||
; Session 1 review |
|||
* Navigating in the terminal and using it to run programs |
* Navigating in the terminal and using it to run programs |
||
* Writing Python: |
* Writing Python: |
||
Line 41: | Line 74: | ||
** importing modules, so you can use code other people have written for you! |
** importing modules, so you can use code other people have written for you! |
||
# New programming concepts: |
|||
#* requests |
|||
#* interpolate variables into a string using % and %()s |
|||
#* open files and write to them |
|||
; New programming concepts: |
|||
#* to use APIs to build a dataset we will need: |
|||
#** all our tools from last session: variables, etc |
|||
* interpolate variables into a string using % and %()s |
|||
#** the ability to open urls on the web |
|||
* requests |
|||
#** the ability to create custom URLS |
|||
* open files and write to them |
|||
* parsing a string (turning the string into a data structure we can manipulate) |
|||
#** the ability to understand (i.e., parse) JSON data that APIs usually give us |
|||
; How do we use an API to fetch kitten pictures? |
|||
[http://placekitten.com/ placekitten.com] |
|||
* API that takes specially crafted URLs and gives appropriately sized picture of kittens |
|||
* Exploring placekitten in a browser: |
|||
** visit the API documentation |
|||
** kittens of different sizes |
|||
** kittens in greyscale or color |
|||
* Now we write a small program to grab an arbitrary square from placekitten by asking for the size on standard in: [http://mako.cc/teaching/2014/cdsw-autumn/placekitten_raw_input.py placekitten_raw_input.py] |
|||
; Introduction to structured data (JSON, JavaScriptObjectNotation) |
|||
* what is json: useful for more structured data |
|||
* import json; json.loads() |
|||
* like Python (except no single quotes) |
|||
* simple lists, dictionaries |
|||
* can reflect more complicated data structures |
|||
* Example file at http://mako.cc/cdsw.json |
|||
* download it and parse it: [http://mako.cc/teaching/2014/cdsw-autumn/parse_cdswjson.py parse_cdswjson.py] |
|||
; Using other APIs |
|||
# [http://placekitten.com/ placekitten.com] |
|||
#* API that takes specially crafted URLs and gives appropriately sized picture of kittens |
|||
#* example of placekitten in browser |
|||
#** visit the API documentation |
|||
#** kittens of different sizes |
|||
#** kittens in greyscale or color |
|||
#* show how to use place |
|||
#* write a small program to grab arbitrary square from placekitten by asking for the size on standard in |
|||
* every API is different, so read the documentation! |
|||
# JSON file (JavaScript Object Notation) |
|||
* If the documentation isn't helpful, search online |
|||
#* what is json: useful for more structure data |
|||
* for popular APIs, there are python modules that help you make requests and parse json |
|||
#* import json; json.loads() |
|||
#* like Python (except no single quotes) |
|||
#* simple lists, dictionaries |
|||
#* can reflect more complicated data structures |
|||
#* Example file at http://mako.cc/cdsw.json |
|||
#* download it and parse it |
|||
Possible issues: |
|||
# Other APIs |
|||
#* every API is different, so read the documentation! |
|||
* If the documentation isn't helpful, search online! |
|||
* for popular APIs, there are python modules that help you make requests and parse json! |
|||
* rate limiting |
* rate limiting |
||
* authentication |
|||
* authenticaiton |
|||
* text encoding issues |
* text encoding issues |
Latest revision as of 22:05, 15 March 2015
Lecture Slides
- Slides (PDF) — For viewing
- Slides (ODP Libreoffice Slides Format) — For editing and modification
Resources
Lecture Outline
- Introduction and context
- You can write some tools in Python now. Congratulations!
- Today we'll learn how to find/create data sets
- Next week we'll get into data science (asking and answering questions)
- Outline
- What did we learn in Session 1?
- What is an API?
- How do we use one to fetch interesting datasets?
- How do we write programs that use the internet?
- How can we use the placekitten API to fetch kitten pictures?
- Introduction to structured data (JSON)
- How do we use APIs in general?
- What is a (web) API?
- API: a structured way for programs to talk to each other (aka an interface for programs)
- Web APIs: like a website your programs can visit (you:a website::your program:a web API)
- How do we use an API to fetch datasets?
Basic idea: your program sends a request, the API sends data back
- Where do you direct your request? The site's API endpoint.
- For example: Wikipedia's web API endpoint is http://en.wikipedia.org/w/api.php
- How do I write my request? Put together a URL; it will be different for different web APIs.
- Check the documentation, look for code samples
- How do you send a request?
- Python has modules you can use, like
requests
(they make HTTP requests)
- Python has modules you can use, like
- What do you get back?
- Structured data (usually in the JSON format)
- How do you understand (i.e. parse) the data?
- There's a module for that!
- How do we write Python programs that make web requests?
To use APIs to build a dataset we will need:
- all our tools from last session: variables, etc
- the ability to open urls on the web
- the ability to create custom URLS
- the ability to save to files
- the ability to understand (i.e., parse) JSON data that APIs usually give us
- Session 1 review
- Navigating in the terminal and using it to run programs
- Writing Python:
- using variables to manipulate data
- types of data: strings, integers, lists, dictionaries
- if statements
- for loops
- printing
- importing modules, so you can use code other people have written for you!
- New programming concepts
- interpolate variables into a string using % and %()s
- requests
- open files and write to them
- parsing a string (turning the string into a data structure we can manipulate)
- How do we use an API to fetch kitten pictures?
- API that takes specially crafted URLs and gives appropriately sized picture of kittens
- Exploring placekitten in a browser:
- visit the API documentation
- kittens of different sizes
- kittens in greyscale or color
- Now we write a small program to grab an arbitrary square from placekitten by asking for the size on standard in: placekitten_raw_input.py
- Introduction to structured data (JSON, JavaScriptObjectNotation)
- what is json: useful for more structured data
- import json; json.loads()
- like Python (except no single quotes)
- simple lists, dictionaries
- can reflect more complicated data structures
- Example file at http://mako.cc/cdsw.json
- download it and parse it: parse_cdswjson.py
- Using other APIs
- every API is different, so read the documentation!
- If the documentation isn't helpful, search online
- for popular APIs, there are python modules that help you make requests and parse json
Possible issues:
- rate limiting
- authentication
- text encoding issues