Recap from the January 17, 2017 ProjectARCC conference call. Future meetings dates can be found here.

  • Introduction to ProjectARCC and welcome to phone call – Eira Tansey
  • Projects of interest to ARCChivists
  • Other upcoming events/ideas for collaboration
    • Several people on the call discussed an interest in putting together a virtual DataRescue event. Contact eira.tansey@uc.edu if you’re interested in future plans concerning this.
    • The 2nd Keeping History Above Water conference will take place again at the end of October 2017 in Annapolis. If you are interested in sponsorship, in submitting a speaker proposal, or being included in an advanced registration notice, please send your name, email and organization information to histpres@annapolis.gov

Today’s post is from Rachel Appel, Digital Projects & Services Librarian at Temple University.

From January 13-14th, I participated in DataRescue Philly at the University of Pennsylvania which was one event in a long series of DataRefuge grassroots events. These events are taking place in order to capture and archive federal environmental data for long-term access and preservation to combat the incoming administration’s efforts to deny climate change as well as the necessity to have ongoing management of digital data. The event was organized by the Penn Program in Environmental Humanities and the University of Pennsylvania Libraries.1 I was interested to learn more about data archiving because of my commitment to climate change awareness and action and to learn more about data archiving for a project I am working on to preserve civic data accessed through OpenDataPhilly.org.2

The first day acted as an orientation to data management and archiving and included a Teach-In on Data Refuge and Environmental Justice, DataRescue Guide Training, and Roundtable on DataRefuge Value and Vulnerability.

The second day was the DataRescue: A Creative Coding and Archive-a-thon. DataRescue Philly focused on archiving NOAA (National Oceanic and Atmospheric Administration) data.3 There were six DataRefugePaths4 for participants to join:

  • Seeders: Enter seeds, or individual site URLs, into the Internet Archive’s End of Term Archive.5
  • Baggers: Bag breakdowns of web pages that are unable to be archived by the Internet Archive using the tool BagIt.6
  • Metadata: Work on descriptive metadata standard creation and data entering for bags.
  • Tool Builders: Create tools to assist the Baggers.
  • Storytelling: Capture the event on social media and developing documentation.
  • Long Trail: Strategize DateRefuge into the future.

I participated in the Metadata Path. I was one of the Guides for the group and my main role was to facilitate the group and develop a qualified Dublin Core metadata standard for descriptive metadata for bags that were then uploaded into an S3 Bucket and linked to from the DataRefuge CKAN Page (datarefuge.org). The hardest part was constructing a workflow with the Baggers and the S3 Bucket uploaders. Fortunately, the University of Michigan had developed a way to automate some preservation metadata into a JSON file.7 We then had to check against those fields, CKAN’s fields, and the fields we thought were pertinent to description and discovery. We developed a schema for CKAN and were able to work around the software’s limitations through adding custom fields. As soon as data had been bagged, we uploaded it to S3 and then created a record in CKAN, entering the metadata and linking to the file. This is still a work in progress and we hope to have a more streamlined workflow for future events to use and build upon. This is a model that can be applied to a number of fields, not just climate change.

At the end of the Archive-a-thon, we archived nearly 4,000 seeds and over 21GB of bagged data.

To learn more about the project please visit the Data Rescue Philly site at ppehlab.org/datarefuge or the GitHub repo at github.com/datarefugephilly. We are continuously working on updating the documentation.

List of upcoming DataRefuge events:

  • January 27-28, 2017 Ann Arbor: #DataRescueAnnArbor
  • February 4, 2017 New York: #DataRescueNYC
  • February 12, 2017 Boston: #DataRescueBoston

I would encourage everyone to try and attend these events, especially if one is hosted near you. You can bring a multitude of skills, technical and non-technical, and preserve climate data so we can still access it in the years to come.

phillydatarefuge-1
Photo Credit: Andrew Bergman. Co-organizers of DataRescue Philly: Margaret Janz, Patricia Kim, Laurie Allen, and Bethany Wiggin.
phillydatarefuge-2
Photo Credit: Michelle Murphy. Metadata team! Justin Schell (Bagger), Delphine Khanna, Rachel Appel, and Anastasia Chiu.
phillydatarefuge-3
Photo Credit: Margaret Janz. Workflows.

[1] DataRescue Philly http://www.ppehlab.org/datarefugephilly/

[2] Future-Proofing Civic Data Knight Foundation https://www.newschallenge.org/challenge/how-might-libraries-serve-21st-century-information-needs/winning/future-proofing-civic-data

[3] NOAA http://www.noaa.gov/

[4] DataRefuge Paths http://www.ppehlab.org/datarefugepaths

[5] End of Term Archive http://eotarchive.cdlib.org/

[6] BagIt How-to https://github.com/datarefugephilly/bagit-how-to

[7]Data Package Requirements https://docs.google.com/document/d/17vQ6GOIs8aKUex7JdDzPy0QWPW7n2wZISMvz9A8fxG0/edit

Post by Eira Tansey

It is imperative that archivists and our allies who care about climate change educate ourselves as much as possible about the current landscape of federal records, research data, and open government initiatives. There have been a lot of concerns raised about the continuing availability of federal climate change research data, as well as continued access to government webpages. ProjectARCC applauds the work of all of our colleagues who are working to raise awareness to the vulnerability of climate data, particularly the work of our friends at DataRefuge.

This post is part of an ongoing series to educate our professional community on what to prepare for in terms of climate change, environmental regulation, and recordkeeping during the transition to the next presidential administration. The focus of this post will be on agency open government efforts. ProjectARCC also recommends the agency forecasts put together by the Environmental Data and Governance Initiative.

What has been the progress on open-government initiatives to date within these agencies, and what work is left to be done?

Although the Obama administration will be ending with a mixed record on transparency, the Obama administration introduced many very important changes intended to foster open government. Since President Obama took office, a number of directives intended to promote Open Government Initiatives were issued, including the Open Government Directive (M-10-06), the Managing Government Records Directive (M-12-18) and Increasing Access to the Results of Federally Funded Scientific Research (OSTP Memo of February 22, 2013), and Making Open and Machine Readable the New Default for Government Information Executive Order 13642.

All of these have implications for general principles of open government, transparency, and access to data (whether it’s governmental data, or scientific data created outside the government, but funded by federal dollars). While a deep exploration of the various open government initiatives is beyond the scope of this blog post, here’s a quick look at highlights of what agencies which have environmental-related work in their mission are doing. Per the Open Government Directive issued in 2010, agencies are required to maintain a webpage documenting their steps to comply with the various open government requirements. A full list can be found here.

Please note all links below were working as of the afternoon of January 18, 2017. However, over the course of working on this post I noticed some URLs had already changed from early drafts. I have nominated many of these URLs to the End of Term archive, but I would urge you to also nominate them as well, and save any local copies of PDFs or webpages you may want to refer to later. I would advise you to save local copies sooner rather than later given that the new administration will be taking office in 2 days.

Environmental Protection Agency https://www.epa.gov/open

The last EPA open government plan was issued in September 2016.

Data highlights:
Open data currently offered by the Environmental Protection Agency can be found at https://edg.epa.gov/metadata/catalog/main/home.page. The report goes into great detail about EPA’s approach to developing an information management policy to be compliant with the Open Government directives, as well as plans to develop a data lifecycle plan in FY17.

Records Management and FOIA highlights:
EPA is currently investigating email archiving tools based on role or content, and is also evaluating open-source or cloud-based” records management systems in anticipation of the 2019 deadline laid out in the Managing Government Records directive (https://www.whitehouse.gov/sites/default/files/omb/memoranda/2012/m-12-18.pdf). EPA FOIA requests can be tracked online, and the most recent Open Government report includes an objective to reduce backlog requests by 10%. The records management page for the EPA can be found here: https://www.epa.gov/records More information on EPA FOIA is here: https://www.epa.gov/foia

Department of the Interior https://www.doi.gov/open

The last DOI open government plan report was issued in June 2014.

Data highlights:
Open data currently offered by the Department of the Interior can be found at https://data.doi.gov/dataset A major initiative towards transparency within the DOI has been the establishment of the US Extractive Industries Transparency Initiative. According to the DOI, EITI is ” voluntary, global effort designed to strengthen accountability and public trust for the revenues paid and received for a country’s oil, gas and mineral resources. Countries that follow the standard publish a report in which governments and companies publicly disclose royalties, rents, bonuses, taxes and other payments from oil, gas, and mining resources.” (https://www.doi.gov/eiti)
The 2016 EITI Exective Summary Report can be found here.

Records Management and FOIA highlights:
According to the 2014 report, many records retention schedules are in the process of consolidation, and migration work had started on several records systems. The records management page for the Department of Interior can be found here: https://www.doi.gov/ocio/policy-mgmt-support/information-and-records-management/records More information on DOI FOIA is here: https://www.doi.gov/foia

Department of Energy http://energy.gov/open-government

The last Department of Energy open government plan was issued in September 2016.

Data highlights:
Open data currently offered by the Department of Energy can be found at https://www.data.gov/energy/. There is also significant data available through the U.S. Energy Information Administration (EIA) at http://www.eia.gov/tools/. One of the more interesting tools on the EIA website is the real-time tracker documenting power demands on the US electrical grid (http://www.eia.gov/beta/realtime_grid/#/summary/demand?end=20161213&start=20161113).

Records Management and FOIA highlights:
According to the 2016 report, the Department of Energy has opted to use the Capstone method for agency email. The records management page for the Department of Energy can be found here: http://energy.gov/cio/office-chief-information-officer/services/guidance/records-management More information on Department of Energy FOIA is here: http://energy.gov/management/office-management/operational-management/freedom-information-act

National Aeronautics and Space Administration https://open.nasa.gov/

The last National Aeronautics and Space Administration open government plan was issued in September 2016.

Data highlights:
Open data currently offered by the National Aeronautics and Space Administration (NASA) can be found at https://open.nasa.gov/open-data/ (this portal takes you to open data, open code, APIs, and other resources). All research produced with NASA funding is now required to be deposited in the NASA research repository within a year, and is available at PubSpace: https://www.ncbi.nlm.nih.gov/pmc/funder/nasa/

Records Management and FOIA highlights:
The records management page for NASA can be found here: https://www.nasa.gov/content/nasa-records-management More information on NASA FOIA: https://www.nasa.gov/FOIA/index.html NASA maintains a FOIA library of available documents of interest to the public (as determined by frequent requests): http://www.hq.nasa.gov/office/pao/FOIA/err.htm You may be interested in reading the NASA Transition Binder: https://www.hq.nasa.gov/office/pao/FOIA/Transition_Binder.pdf

Department of Commerce (which oversees the National Oceanic and Atmospheric Agency [NOAA])** 

**As far as I could find, there is not a dedicated Open Government office for NOAA, so I reviewed the Open Government initiative documents for the Department of Commerce. This can be found here: http://www.osec.doc.gov/opog/OG/default.htm

The last National Aeronautics and Space Administration open government plan was issued in September 2016. Pages 111-121 concern the activities of National Oceanic and Atmospheric Agency.

Data highlights:
Open data currently offered by NOAA can be found at https://data.noaa.gov/dataset. Interesting NOAA data highlights include their efforts to assign DOIs to datasets that are in the National Center for Environmental Information. NOAA is also responsible for maintaining climate.gov.

Records Management and FOIA highlights:
The records management page for NOAA can be found here: http://www.corporateservices.noaa.gov/audit/records_management/ More information on NOAA FOIA: http://www.noaa.gov/foia-freedom-of-information-act NOAA maintains a FOIA reading room, including links to frequerntly requested records: http://www.noaa.gov/foia-reading-room

Recap from the December 13, 2016 ProjectARCC conference call. Future meetings dates can be found here.

  • Introduction to ProjectARCC and welcome to phone call – Eira Tansey
  • Explanation of recent changes made to ProjectARCC and departure from the original committee structure  – Casey Davis
  • Things to keep an eye on with the next administration – Eira Tansey
    • EPA and Monitoring/Regulatory/Research Agencies
    • Continuing access to federal climate research data (see this story)
    • Open Government/Transparency Initiatives
  • Projects of interest to ARCChivists
    • Ben from Penn State discussed his Fracking Documentation Project. You can access collection here and if anyone want to help archive pipeline-related activism, check this out
    • Michelle from University of Toronto shared updates from the Guerrilla Archiving effort
    • Bethany and Laurie from Penn shared updates from #DataRefuge
  • Areas for collaboration
    • Allana would like to write about archives divesting/investing in climate-friendly activities (e.g. vendors, technology, banking). If you want to work with her, get in touch

A slightly modified verison of this post first ran on the Issues and Advocacy Roundtable blog. Post by Eira Tansey, University of Cincinnati.

 

Shortly after the results of the US election, many who rely on federal climate and environmental data became very concerned about the continuing public availability of this data in the new administration. I am among this group myself, as my research partners from Penn State and I use data sets from NOAA to map climate change risks to American archival repositories. In the past few weeks, institutions such as the University of Toronto and the Penn Environmental Humanities Lab began to organize hackathons in order to seed the End of Term Project with climate and environmental webpages, and determine ways to effectively copy large data sets. The issue gained steam over the weekend when climate journalist and meteorologist Eric Holthaus began tweeting about it, and has gained major news coverage with stories in the Washington Post and Vice.

 

As a leader within ProjectARCC (Archivists Responding to Climate Change), I had reached out to individuals at Toronto and Penn to get more information about their projects as soon as I heard about them, including the role of librarians and archivists in their efforts. Representatives from the University of Toronto and Penn joined last night’s monthly ProjectARCC conference call to update us on their efforts.

 

Things are moving very swiftly right now on all of these fronts, so additional posts will be forthcoming as information and efforts are updated.

 

What is already in place?

 

Fellows from the Penn Environmental Humanities Lab began raising the issue of vulnerable environmental data with a hackathon earlier this month. The Penn Environmental Humanities Lab is now quickly organizing on many of the issues associated with downloading and distributing the work of copying the many data sets scientists rely on. You can read their initial vision here, their preliminary take on how not all data sets may be equally vulnerable, and yesterday’s update regarding their taking over the initial crowdsourced spreadsheet that Eric Holthaus started, as well as their collaborative work with the University of Toronto.

 

The University of Toronto is hosting a “guerrilla archiving” event on December 17. This event will focus on EPA page URLs that will be seeded for the End of Term project.

 

What is next?

 

The folks at Penn and Toronto have received a massive outpouring of interest. Which is great! It also means that they need to take some time to organize their efforts, so that they can evaluate the offers of help/storage space/etc most effectively. You can visit Penn’s #DataRefuge website, which just went live yesterday, to learn more about the efforts as they evolve.

 

Beyond the work that is coming out of the Toronto event on December 17, Toronto and Penn are planning to develop a toolkit that other institutions can use to host their own hackathons.

 

The Penn folks are currently setting up contacts with representatives from many organizations, including the Society of American Archivists.

 

How can you help?

 

The Penn #DataRefuge project now has a “I’d like to help” form. You can submit your response here.

 

If you have any .gov pages you would like to nominate for the End of Term web archiving project, you can do that right now using the End of Term Nomination Tool.

 

Why are people so worried about this to begin with?

 

Several departments and agencies within the federal government, including the Environmental Protection Agency, Department of Interior, Department of Energy, National Aeronautic and Space Administration, and the National Oceanic and Atmospheric Administration (to name but a few), create myriad and massive data sets related to monitoring pollution of air and water, weather patterns, energy usage, and tracking indicators associated with climate change (ocean temperature and acidification, sea level modeling, and global temperature records).

 

The incoming Trump administration is signalling that it will likely be hostile to the established consensus science on climate change, as well as existing pollution regulations. The President Elect has denied the reality of global warming, and has made a series of appointments that have a legislative or business record of undermining environmental regulation and efforts to reduce greenhouse gas emissions. Many of the recent appointments have extensive ties to the fossil fuel industry, including the nominee for the EPA (Scott Pruitt, Oklahoma Attorney General) and the nominee for Secretary of State (Rex Tillerson, CEO of ExxonMobil). Multiple meta-surveys of climate science papers have established that climate change is real, and that it is primarily driven by human activities. The last publication on this extensively documented issue includes one published in April 2016, showing that between 90-100% of climate scientists themselves are in consensus on the causes of global warming, and 18 of America’s prominent scientific organizations are in agreement on the science showing that climate change is primarily driven by human activities.

 

Researchers are worried funding will be cut from existing federal environmental and climate monitoring and research efforts, but also about continued access to currently public data sets. It remains to be seen whether many of the recent Open Government initiatives that increased public access to federal data will receive the same level of support in the next administration. If data sets are removed from public access, this could mean that researchers would be required to file FOIA requests for access to data sets. With similarly extensive ties to the fossil fuel industry, during the Bush administration scientists documented dozens of instances of manipulation of scientific advice, restrictions on federal scientists’ work, and cutbacks on public access to environmental information (the most famous case probably being the proposed closure of EPA libraries). Some Canadians are alarmed by what could happen in the United States, given how the Harper administration also reduced public access to federal environmental data.

 

For now, researchers are in wait and see mode, but most are erring on the side of being overly cautious — hence why so many have mobilized to copy the data that is currently available as fast as possible.

 

For questions about the current status of this work, please feel free to contact eira.tansey@uc.edu