PyCon-2015-sprint-wiki: Difference between revisions

imported>Paulproteus
imported>Jazztpt
 
(69 intermediate revisions by 12 users not shown)
Line 34:
 
Details about sprint: '''"This year at PyCon, we plan to have Sage Days during the sprints, so we will have some ""sprint sessions"" with other PyCon attendees as well as Sage presentation talks nearby."'''
 
More informations on [http://wiki.sagemath.org/days67].
 
''morning sessions at UQAM, room PK-3605, Pavillon Président-Kennedy, 201, Président-Kennedy''
 
=== Stack Storm ===
Line 63 ⟶ 67:
At this workshop? '''No'''
 
List of useful skills: '''Sprinters of all skill levels are welcome. Contributors just need a basic understanding of Python and HTTP. Experience with REST APIs and web apps is a plus, but not required. Likewise, experience tuning Python code for performance is helpful, but is by no means required. In fact, if you'd like to learn more about any of these topics, Falcon is a great place to start.'''
 
We'll be working on completing our [https://github.com/falconry/falcon/milestones/0.3 0.3 milestone].
 
=== Khmer ===
Line 77 ⟶ 83:
'''No coding experience required'''
 
List of useful skills: '''Writing Documentation, User Testing, Unit Testing, Redis, Celery, RedisPy, C, Django, Git, VirtualEnv, Tox, Sphinx'''
 
Details about sprint tasks:
* Close existing issues on the [https://github.com/yahoo/redislite/issues issue tracker].
* Try and complete Enhancement requests for the Pycon2015 Milestone on the [https://github.com/yahoo/redislite/issues issue tracker]
* Work through and review the [http://redislite.readthedocs.org/en/latest/ documentation] and fix issues.
 
=== Tryton ===
Line 97 ⟶ 108:
=== Hey Duwamish! ===
 
At this workshop? '''Noyes'''
 
Days Sprinting: '''Tuesday, Wednesday morning'''
 
List of useful skills: '''PleaseGit, beJavascript, familiardocumentation, withdesign, gitDjango. andAnd unixof command line basics.course, Python! I am happy to provide training on just about anything else, from MVC basics to front-end design andfor databasethe modelsproject.'''
 
=== PyKinect2 ===
Line 144 ⟶ 157:
 
List of useful skills: '''"Participating is only feasible for people that can already at least roughly read C-code. Experience in C programming is recommended, but there are also refactoring and code-arrangement and cleanup-tasks that can be done by moving existing code around; that's why reading C-code is already enough for some tasks. In addition to improving C-skills I provide guidance for and insight into Python's C-extension API and also into Jython, how to use it, how it works."'''
 
=== streamparse ===
 
At this workshop '''Andrew Montalenti'''
 
List of useful skills: '''Python; interest in Apache Storm / Apache Kafka; stream processing, data analytics.'''
 
Details about sprint tasks (if supplied): '''We'll be writing a Python Topology DSL for Apache Storm. This is a generic way to specify a direct acyclic graph of computation for data pipelines, which can then run Python code remotely on a cluster of machines -- thus defeating the GIL and allowing true concurrency. The plan is to use some fun Python features in order to write a good-looking DSL, e.g. I suspect metaclasses and descriptors will be involved.'''
 
Resources: '''[https://github.com/Parsely/streamparse streamparse on Github]; [https://youtube.com/watch?feature=youtu.be&v=ja4Qj9-l6WQ&t=1m22s PyCon2015 video presentation on streamparse]; [http://parse.ly/slides/streamparse/notes/ HTML notes on streamparse slides]; [https://github.com/Parsely/streamparse/issues/84 core Github issue we'll hack on]'''
 
=== Center for Open Science ===
 
At this workshop '''No'''
 
List of useful skills: '''Python; JavaScript; web development.'''
 
People of all experience levels welcome: '''Yes'''
 
Details about sprint tasks (if supplied): '''The Open Science Framework (OSF) supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery. It is developed by the Center for Open Science ([http://cos.io COS.io]). Many members of the COS team are here, and they can support various projects related to the open science initiative. We covered some of the things we might sprint on in the Lightning Talks. The video is at https://www.youtube.com/watch?v=yws4n-0-Yj8 and you should watch the two talks at 9:45 then the one at 47:30.'''
 
Resources:
 
* Lightning talk materials - https://osf.io/3winr/
* Waterbutler Github Repo - https://github.com/CenterForOpenScience/waterbutler
* Waterbutler Docs - http://waterbutler.readthedocs.org/en/latest/
* Waterbutler How-To Notebook - http://nbviewer.ipython.org/gist/chrisseto/4e8ef20dc6465cdfcdb1
* Modular ODM Repo - https://github.com/CenterForOpenScience/modular-odm
 
SHARE:
* SHARE Notification Service Repo - https://github.com/CenterForOpenScience/share
* Creating a metadata harvester for SHARE - https://osf.io/wur56/wiki/Creating%20a%20Harvester/
* Elasticsearch API - http://osf.io/api/v1/share/search/?raw
* Command line tool for visualizing SHARE data - http://github.com/erinspace/scrapi_stats
* Tool for converting OOXML files to HTML - https://github.com/CenterForOpenScience/pydocx
* Gist for adding text storage instead of using Cassandra - https://gist.github.com/fabianvf/597f57ffe8351156bb98
* '''Any cool descriptives, stats, viz would be great to consider, but a list of top keywords (showing variance), % dois by provider, % field by provider. For any graphs you make, also send Erin the data--whatever table I'd need to recreate (so, the number on the y to get each bar on x)'''
* If you would like to work on a visualization, please just leave your name and the visualization you are working on here, so that we can avoid duplicating effort. To start us off, here are a few unclaimed visualizations/statistics:
* Analyzing what keywords appear the most across services
* Analyzing identifiers that appear across services (dois, urls, etc)
* Analyzing contributors that appear across services
* Top keywords/contributors/titles/identifiers/etc
* Analyzing the number of providers that include certain fields
* Histograms would be a nice addition to the command line tool
 
New sources:
 
We can search for new sources via OpenDOAR: Directory of Open Access Repositories - http://opendoar.org/ - We need sources that are licensed CC0. OpenDOAR has an API that allows searching by subject, metadata licensing state, existence of an OAI url and others.
* API documentation - http://www.opendoar.org/tools/api.html NOTE: the PDP contains more information about search parameters, read that first
* Small Python script that uses the above API to query for all sources that are in English, have science content, and allow free access to their metadata; the script parses the XML output and returns a JSON-formatted list of dicts containing the repository name, main URL, and OAI URL: https://gist.github.com/stitchinthyme/dfeac2c8579bbd2d2fb0
 
 
Budapest Open Access Inititive - searching for BOAI or looking on this page for sources dedicated to open accesses to data http://www.budapestopenaccessinitiative.org/list_signatures
 
* CalTech Library - http://caltechs.library.caltech.edu/cgi/oai2
* Harvard: Digital Access to Scholahship at Harvard - http://dash.harvard.edu/
* Oklahoma State Thesis and Dissertation Archive - http://www.library.okstate.edu/thesis/
* Oklahoma Library: General archive, some non-science content - http://www.library.okstate.edu/digital/
* Aberdeen University Research Archive - http://eprints.aston.ac.uk/cgi/oai2?verb=Identify
* Digital Commons Network - http://network.bepress.com
* Birkbeck Institutional Research Online - general archive, science and non-science content - http://eprints.bbk.ac.uk/cgi/oai2?verb=Identify
* Bournemouth University Research Online - http://eprints.bournemouth.ac.uk/cgi/oai2?verb=Identify (general - science at http://eprints.bournemouth.ac.uk/view/subjects/sci.html)
* Bradford Scholars - general archive, science and non-science content - http://bradscholars.brad.ac.uk/dspace-oai/request?verb=Identify
* Canterbury Research and Theses Environment - http://create.canterbury.ac.uk/cgi/oai2?verb=Identify (general - science at http://create.canterbury.ac.uk/view/subjects/Q.html)
* CEDA (Centre for Environmental Data Archival) - http://cedadocs.badc.rl.ac.uk/cgi/oai2?verb=Identify
* CentAUR (Central Archive at the University of Reading) - general archive, science and non-science content - http://centaur.reading.ac.uk/cgi/oai2?verb=Identify
* CADAIR (Aberystwyth University Repository) - http://cadair.aber.ac.uk/dspace-oai/request?verb=Identify
* William &Mary Virginia Institute of Marine Science - https://digitalarchive.wm.edu/handle/10288/615
* ARRO (Anglia Ruskin Research Online - general archive, includes science) - http://angliaruskin.openrepository.com/arro/oai/request?verb=Identify
* University of California - http://escholarship.org/
* Aston University Research Archive - http://eprints.aston.ac.uk/cgi/oai2?verb=Identify
* Cognitive Sciences ePrint Archive - http://cogprints.org/cgi/oai2?verb=Identify
* Central Lancashire Online Knowledge - http://clok.uclan.ac.uk/cgi/oai2?verb=Identify
* City University Research Online - http://openaccess.city.ac.uk/cgi/oai2?verb=Identify (general - science at http://openaccess.city.ac.uk/view/subjects/Q.html)
 
=== Influence-USA ===
 
At this workshop: '''Bob Lannon'''
 
List of useful skills: '''Python, web scraping, record linkage, computer vision'''
 
Details about sprint tasks (if supplied): '''We're working on scraping campaign finance and lobbying data that are publicly available on state government websites, to bring them all together in one central, public domain datacommons.'''
 
Resources: '''http://influence-usa.github.io'''
 
=== No Null Process ===
 
Inspired by Kate Heddleston's talk "How our engineering environments are killing diversity (and how we can fix it)," this repo contains basic starter docs/checklists for company processes. It is designed to be forked and changed to fit your company's needs. In addition to promoting a more diverse personnel, we believe that having processes like these will improve the experience for all new hires.
 
[https://github.com/jazztpt/NoNullProcess]
 
== Info about the intro event ==
 
 
=== 5:30-5:45 ** Pre-event **===
 
Attendees and mentors trickle in. Mentors update the sprint wiki (see above) as needed, and socialize with attendees.
Line 163 ⟶ 266:
 
* IRC, Issue trackers, and beginning Git - Shauna, with help from Naomi Ceder
* Unit tests & testing generally - Ned, Liav
* Virtualenv - Eeshan Garg & Dustin J. Mitchell
* Advanced Git - Trey & Sam
* HTTP basics - Asheesh Laroia,
* Virtualbox or /Vagrant? Philip J.
 
Optional additional topics: ssh,
 
=== 7:00-7:20/30 Unsticking Yourself ===
Anonymous user