Menu Close

Chapter 3: The Rise of Computer-Assisted Reporting by Brant Houston

Embed from Getty Images
The third chapter in Partners for Preservation is ‘The Rise of Computer-Assisted Reporting: Challenges and Successes’ by Brant Houston. A chapter on this topic has been at the top of my list of chapter ideas from the very start of this project. Back in February of 2007, Professor Ira Chinoy from the University of Maryland, College Park’s Journalism Department spoke to my graduate school Archival Access class. His presentation and the related class discussion led to my blog post Understanding Born-Digital Records: Journalists And Archivists With Parallel Challenges. Elements of this blog post even inspired a portion of the book’s introduction.

The photo above is from the 1967 Detroit race riots. 50 years ago, the first article recognized to have used computer-assisted reporting was awarded the 1968 Pulitzer Prize for Local General or Spot News Reporting “For its coverage of the Detroit riots of 1967, recognizing both the brilliance of its detailed spot news staff work and its swift and accurate investigation into the underlying causes of the tragedy.” In his chapter, Brant starts here and takes us through the evolution of computer-assisted reporting spanning from 1968 to the current day, and looking forward to the future.

As the third chapter in Part 1: Memory, Privacy, and Transparency, it continues to weave these three topics together. Balancing privacy and the goal of creating documentation to preserve memories of all that is going on around us is not easy. Transparency and a strong commitment to ethical choices underpin the work of both journalists and archivists.

This is one of my favorite passages:

“As computer-assisted repoting has become more widespread and routine, it has given rise to discussion and debate over the issues regarding the ethical responsibilitys of journalists. There have been criticisms over the publishing of data that was seen as intrusive and violating the privacy of individuals.”

I learned so much in this chapter about the long road journalists had to travel as they sought to use computers to support their reporting. It never occurred to me, as someone who has always had the access to the computing power I needed through school or work, that getting the tools journalists needed to do their computational analysis often required negotiation for time on newspaper mainframes or seeking partners outside of the newsroom. It took tenacity and the advent of personal computers to make computer-assisted reporting feasible for the broader community of journalists around the world.

Journalists have sought the help of archivists on projects for many years – seeking archival records as part of the research for their reporting. Now journalists are also taking steps to preserve their field’s born-digital content. Given the high percentage of news articles that exist exclusively online – projects like the Journalism Digital News Archive are crucial to the survival of these articles. I look forward to all the ways that our fields can learn from each other and work together to tackle the challenges of digital preservation.

Bio

Brant Houston

Brant Houston is the Knight Chair in Investigative Reporting at the University of Illinois at Urbana-Champaign where he works on projects and research involving the use of data analysis in journalism. He is co-founder of the Global Investigative Journalism Network and the Institute for Nonprofit News. He is author of Computer-Assisted Reporting: A Practical Guide, co-author of The Investigative Reporter’s Handbook. He is a contributor to books on freedom of information acts and open government. Before joining the University of Illinois, he was executive director of Investigative Reporters and Editors at the University of Missouri after being an award-winning investigative journalist for 17 years.  

 

Chapter 2: Curbing the Online Assimilation of Personal information by Paulan Korenhof

The second chapter in Partners for Preservation is ‘Curbing the Online Assimilation of Personal Information’ by Paulan KorenhofGiven the amount of attention being focused on the right to be forgotten and the EU General Data Protection Regulation (GDPR), I felt it was essential to include a chapter that addressed these topics. Walking the fine line between providing access to archival records and respecting the privacy of those whose personal information is included in the records has long been an archival challenge.

In this chapter, Korenhof documents the history of the right to be forgotten and the benefits and challenges of GDPR as it is currently being implemented. She also explores the impact of the broad and virtually instantaneous access to content online that the Internet has facilitated.

This quote from the chapter highlights a major issue with making so much content available online, especially content that is being digitized or surfaced from previously offline data sources:

“With global accessibility and the convergence of different contextual knowledge realms, the separating power of space is nullified and the contextual demarcations that we are used to expecting in our informational interactions are missing.”

As the second chapter in Part 1: Memory, Privacy, and Transparency, it continues to pull these ideas together. In addition to providing a solid grounding in the right to be forgotten and GDPR, it should guide the reader to explore the unintended consequences of the mad rush to put everything online and the dramatic impact that search engines (and their human coded algorithms) have on what is seen.

I hope this chapter triggers more contemplation of these issues by archivists within the big picture of the Internet. Often we are so focused on improving access to content online that these questions about the broader impact are not considered.

Bio

Paulan Korenhof

Paulan Korenhof is in the final stages of her PhD-research at the Tilburg Institute for Law, Technology, and Society (TILT). Her research is focused on the manner in which the Web affects the relation between users and personal information, and the question to what degree the Right to Be Forgotten is a fit solution to address these issues. With a background in philosophy, law, and art, she investigates this relation from an applied phenomenological and critical theory perspective. Occasionally she co-operates in projects with Hacklabs and gives privacy awareness workshops to diverse audiences. Recently she started working at the Amsterdam University of Applied Sciences (HVA) as a researcher on Legal Technology.

 

Image credit: Flickr Commons: British Library: Image taken from page 5 of ‘Forget-Me-Nots. [In verse.]’: https://www.flickr.com/photos/britishlibrary/11301997276/

Chapter 1: Inheritance of Digital Media by Dr. Edina Harbinja

You're Dead, Your Data Isn't: What Happens Now?
The first chapter in Partners for Preservation is ‘Inheritance of Digital Media’, written by Dr. Edina Harbinja. This topic was one of the first I was sure I wanted to include in the book. Back in 2011, I attended an SXSW session titled Digital Death. The discussion was wide-ranging and attracted people of many backgrounds including lawyers, librarians, archivists, and social media professionals. I still love the illustration above, created live during the session.

The topic of personal digital archiving has since gained traction, inspiring events and the creation of resources. There are now multiple books addressing the subject. The Library of Congress created a kit to help people host personal digital archiving events. In April 2018 a Personal Digital Archiving Conference(PDA) was held in Houston, TX. You can watch the presentations from the PDA2017 hosted by Stanford University Libraries. PDA2016 was held at the University of Michigan Library and PDA2015 was hosted by NYU. In fact, the Internet Archive has an entire collection of videos and presentation materials from various PDA’s dating back to 2010.

I wanted the chapter on digital inheritance to address topics at the forefront of current thinking. Dr. Edina Harbinja delivered exactly what I was looking for and more. As the first chapter in Part 1: Memory, Privacy, and Transparency, it sets the stage for many of the common threads I saw in this section of the book.

Here is one of my favorite sentences from the chapter:

“Many digital assets include a large amount of personal data (e.g. e-mails, social media content) and their legal treatment cannot be looked at holistically if one does not consider privacy laws and their lack of application post-mortem.”

This quote gets at the heart of the chapter and provides a great example of the intertwining elements of memory and privacy. What do you think will happen to all of your “digital stuff”? Do you have an expectation that your privacy will be respected? Do you assume that your loved ones will have access to your digital records? To what degree are laws and policies keeping up (or not keeping up) with these questions? As an archivist, how might all this impact your ability to access, extract, and preserve digital records?

Look to chapter one of Partners for Preservation to explore these ideas.

Bio

Dr. Edina Harbinja

Dr. Edina Harbinja is a senior lecturer in media/privacy law at Aston University, Birmingham, UK. Her principal areas of research and teaching are related to the legal issues surrounding the Internet and emerging technologies. In her research, Edina explores the application of property, contract law, intellectual property, and privacy online. Edina is a pioneer and a recognized expert in post-mortem privacy, i.e. privacy of the deceased individuals. Her research has a policy and multidisciplinary focus and aims to explore different options of regulation of online behaviors and phenomena. She has been a visiting scholar and an invited speaker to universities and conferences in the USA, Latin America, and Europe, and has undertaken consultancy for the Fundamental Rights Agency. Her research has been cited by legislators, courts, and policymakers in the US, Australia, and Europe as well. Find her on Twitter at @EdinaRl.

Overview of Partners for Preservation

This friendly llama (spotted in the Flickr Commons) is here to give you a quick high-level tour of Partners for Preservation.

The book’s ten chapters have been organized into three sections:

Part 1: Memory, Privacy, and Transparency

Part 2: The Physical World: Objects, Art, and Architecture

 Part 3: Data and Programming

As I recruited authors to write a chapter, the vision for each individual chapter evolved. Each author contributed their own spin on the topic I originally proposed. There were two things I had hoped for and was particularly pleased to have come to pass. First was that I learned new things about each of the fields addressed in the book. The second was discovering threads that wove through multiple chapters. While the chapters are each freestanding and you may read the book’s chapters in any order you like, the section groupings were designed to help highlight common threads of interest to archivists focused on digital preservation.

The book also includes a foreword by Nancy McGovern, and my own introductory and final thoughts.

I will be writing a blog post about each chapter’s author(s) and sharing some favorite tidbits along the way. Thanks for your interest in Partners for Preservation. [Updated 1/29/2018 to add links above to the chapter spotlight posts]

Countdown to Partners for Preservation

Yes. I know. My last blog post was way back in May of 2014. I suspect some of you have assumed this blog was defunct.

When I first launched Spellbound Blog as a graduate student in July of 2006, I needed an outlet and a way to connect to like-minded people pondering the intersection of archives and technology. Since July 2011, I have been doing archival work full time. I work with amazing archivists. I think about archival puzzles all day long. Unsurprisingly, this reduced my drive to also research and write about archival topics in the evenings and on weekends.

Looking at the dates, I also see that after I took an amazing short story writing class, taught by Mary Robinette Kowal in May of 2013, I only wrote one more blog post before setting Spellbound Blog aside for a while in favor of fiction and other creative side-projects in my time outside of work.

Since mid-2014, I have been busy with many things – including (but certainly not limited to):

I’m back to tell you all about the book.

In mid-April of 2016, I received an email from a commissioning editor in the employ of UK-based Facet Publishing (initially described to me as the publishing arm of CILIP, the UK’s equivalent to ALA). That email was the beginning of a great adventure, which will soon culminate in the publication of Partners for Preservation by Facet (and its distribution in the US by ALA). The book, edited by me and including an introduction by Nancy McGovern, features ten chapters by representatives of non-archives professions. Each chapter discusses challenges with and victories over digital problems that share common threads with issues facing those working to preserve digital records.

Over the next few weeks, I will introduce you to each of the book’s contributing authors and highlight a few of my favorite tidbits from the book. This process was very different from writing blog posts and being able to share them immediately. After working for so long in isolation it is exciting to finally be able to share the results with everyone.

PS: I also suspect, that finally posting again may throw open the floodgates to some longer essays on topics that I’ve been thinking about over the past years.

PPS: If you are interested in following my more creative pursuits, I also have a separate mailing list for that.

The CODATA Mission: Preserving Scientific Data for the Future

The North Jetty near the Mouth of the Columbia River 05/1973This session was part of The Memory of the World in the Digital Age: Digitization and Preservation conference and aimed to describe the initiatives of the Data at Risk Task Group (DARTG), part of the Committee on Data for Science and Technology (CODATA), a body of the International Council for Science.

The goal is to preserve scientific data that is in danger of loss because they are not in modern electronic formats, or have particularly short shelf-life. DARTG is seeking out sources of such data worldwide, knowing that many are irreplaceable for research into the long-term trends that occur in the natural world.

Organizing Data Rescue

The first speaker was Elizabeth Griffin from Canada’s Dominion Astrophysical Observatory. She spoke of two forms of knowledge that we are concerned with here: the memory of the world and the forgettery of the world. (PDF of session slides)

The “memory of the world” is vast and extends back for aeons of time, but only the digital, or recently digitized, data can be recalled readily and made immediately accessible for research in the digital formats that research needs. The “forgettery of the world” is the analog records, ones that have been set aside for whatever reason, or put away for a long time and have become almost forgotten.  It the analog data which are considered to be “at risk” and which are the task group’s immediate concern.

Many pre-digital records have never made it into a digital form.  Even some of the early digital data are insufficiently described, or the format is out of date and unreadable, or the records cannot be located at all easily.

How can such “data at risk” be recovered and made useable?  The design of an efficient rescue package needs to be based upon the big picture, so a website has been set up to create an inventory where anyone can report data-at-risk. The Data-at-Risk Inventory (built on Omeka) is front-ended by a simple form that asks for specific but fairly obvious information about the datasets, such as field (context), type, amount or volume, age, condition, and ownership. After a few years DARTG should have some better idea as to the actual amounts and distribution of different types of historic analog data.

Help and support are needed to advertise the Inventory.  A proposal is being made to link data-rescue teams from many scientific fields into an international federation, which would be launched at a major international workshop.  This would give a permanent and visible platform to the rescue of valuable and irreplaceable data.

The overarching goal is to build a research knowledge base that offers a complimentary combination of past, present and future records.  There will be many benefits, often cross-disciplinary, sometimes unexpected, and perhaps surprising.  Some will have economic pay-offs, as in the case of some uncovered pre-digital records concerning the mountain streams that feed the reservoirs of Cape Town, South Africa.  The mountain slopes had been deforested a number of years ago and replanted with “economically more appealing” species of tree.  In their basement hydrologists found stacks of papers containing 73 years of stream-flow measurements.  They digitized all the measurements, analyzed the statistics, and discovered that the new but non-native trees used more water.  The finding clearly held significant importance for the management of Cape Town’s reservoirs.  For further information about the stream-flow project see Jonkershoek – preserving 73 years of catchment monitoring data by Victoria Goodall & Nicky Allsopp.

DARTG is building a bibliography of research papers which, like the Jonkershoek one, describe projects that have depended partly or completely on the ability to access data that were not born-digital.  Any assistance in extending that bibliography would be greatly appreciated.

Several members of DARTG are themselves engaged in scientific pursuits that seek long-term data.  The following talks describe three such projects.

Data Rescue to Increase Length of the Record

The second speaker, Patrick Caldwell from the US National Oceanographic Data Center (NODC), spoke on rescue of tide gauge data. (PDF of full paper)

He started with an overview of water level measurement, explaining how an analog trace (a line on a paper style record generated by a float w/a timer) is generated. Tide gauges include geodetic survey benchmark to make sure that the land isn’t moving. The University of Hawaii maintains a network of gauges internationally. Back in the 1800s, they were keeping track of the tides and sea level for shipping. You  never know what the application may turn into – they collected for tides, but in the 1980s they started to see patterns. They used tide gauge measurements to discover El Niño!

As you increase the length of the record, the trustworthiness of the data improves. Within sea level variations, there are some changes that are on the level of decades. To take that shift out, they need 60 years to track sea level trends. They are working to extend the length of the record.

The UNESCO Joint Technical Commission for Oceanography & Marine Meteorology has  Global Sea Level Observing System (GLOSS)

GLOSS has a series of Data Centers:

  • Permanent Service for Mean Sea Level (monthly)
  • Joint archive for sea level (hourly)
  • British Oceanographic Data center (high frequency)

The biggest holding starts at 1940s. They want to increase the number of longer records. A student in France documented where he found records as he hunted for the data he needed. Oregon students documented records available at NARA.

Global Oceanographic Data Archaeology and Rescue (GODAR) and the World Ocean Database Project

The Historic Data Rescue Questionnaire created in November 2011 resulted in 18 replies from 14 countries documenting tide gauge sites with non-digital data that could be rescued. They are particularly interested in the records that are 60 years or more in length.

Future Plans: Move away from identifying what is out there to tackling the rescue aspect. This needs funding. They will continue to search repositories for data-at-risk and continue collaboration with GLOSS/DARTG to freshen on-line inventory. Collaborate with other programs (Atmospheric Circulation Reconstructions over the Earth (ACRE) meeting 11-2012). Eventually move to Phase II = recovery!

The third speaker, Stephen Del Greco from the US NOAA National Climatic Data Center (NCDC), spoke about environmental data through time and extending the climate record. (PDF of full paper) The NCDC is a weather archive with headquarters in Asheville, NC. It fulfills much of the nation’s climate data requirements. Their data comes from many different sources. Safe storage of over 5,600 terabytes of climate data (= 6.5 billion kindle books). How will they handle the upcoming explosion of data on the way? Need to both handle new content coming in AND provide increased access to larger amounts of data being downloaded over time. 2011 number = data download of 1,250 terabytes for the year. They expect that download number to increase 10 fold over the next few years.

The climate database modernization program went on over more than a decade rescuing data. It was well funded and millions of records were rescued with a budget of roughly 20 Million a year. The goal is to preserve and make major climate and environmental data available via the World Wide Web. Over 14 terabytes of climate data are now digitized. 54 million weather and environmental images are online. Hundreds of millions of records are digitized and now online. The biggest challenge was getting the surface observation data digitized. NCDC digital data for hourly surface observations generally stretch back to around 1948. Some historical marine observations go back to the spice trade records.

For international efforts they bring their imaging equipment to other countries where records were at risk. 150,000 records imaged under the Climate Database Modernization Program (CDMP).

Now they are moving from public funding to citizen-fueled projects via crowdsourcing such as the Zooniverse Program. Old Weather is a Zooniverse Project which uses crowdsourcing to digitize and analyze climate data. For example, the transcription done by volunteers help scientists model Earth’s climate using wartime ship logs. The site includes methods to validate efforts from citizens.  They have had almost 700,000 volunteers.

Long-term Archive Tasks:

  • Rescuing Satellite Data: raw images in lots of different film formats. All this is at risk. Need to get it all optically imaged. Looking at a ‘citizen alliance’ to do this work.
  • Climate Data Records: Global Essential Climate Variables (ECVs) with Heritage Records. Lots of potential records for rescue.
  • Rescued data helps people building proxy data sets: NOAA Paleoclimatology. ‘Paleoclimate proxies’ – things like boreholes, tree rings, lake levels, pollen, ice cores and more. For example – getting temperate and carbon dioxide from ice cores. These can go back 800,000 years!

We have extended the climate record through international collaboration. For example, the Australian Bureau of Meteorology provided daily temperature records for more than 1,500 additional stations. This meant a more than 10-fold increase in previous historical climate daily data holdings from that country.

Born Digital Maps

The final presentation discussed the map as a fundamental source of memory of the world, delivered by D. R. Fraser Taylor and Tracey Lauriault from Carleton University’s Geomatics and Cartographic Research Center in Canada. The full set of presentation slides are available online on SlideShare. (PDF of full paper)

We are now moving into born digital maps. For example, the Canadian Geographic Information System (CGIS) was created in the 1960s and was the worlds 1st GIS. Maps are ubiquitous in the 21st century. All kinds of organizations are creating their own maps and mash-ups. Community based NGOs, citizen science, academic and private sector are all creating maps.

We are loosing born digital maps almost faster than we are creating them. We have lost 90% of the born digital maps. Above all there is an attitude that preservation is not intrinsically important. No-one thought about the need to preserve the map – everyone thought someone else would do it. There was a complete lack of thought related to the preservation of these maps.

The Canada Land Inventory (CLI) was one of the first and largest born digital map efforts in the world. Mapped 2.6 million square kilometers of Canada. Lost in the 1980s. No-one took responsibility for archiving. Those who thought about it believed backup equaled archiving. A group of volunteers rescued the process over time – salvaged from boxes of tapes and paper in mid-1990s. It was caught just in time and took a huge effort. 80% has been saved and is now it is online. This was rescued because it was high profile. What about the low-profile data sets? Who will rescue them? No-one.

The 1986 BBC Doomsday Book was created in celebration of 900 years after William the Conqueror’s original Domesday Book. It was obsolete by the 1990s. A huge amount of social and economic information was collected for this project. In order to rescue it they needed an acorn computer and needed to be able to read the optical disks. The platform was emulated in 2002-2003. It cost 600,000 british pounds to reverse engineer and put online in 2004. New discs made in 2003 at the UK Archive.

It is easier to get Ptolomy’s maps from 15th century than it is to get a map 10 years old.

The Inuit Siku (sea ice) Atlas, an example of a Cybercartographic atlas, was produced in cooperation with Inuit communities. Arguing that the memory of what is happening in the north lies in the minds of the elders, they are capturing the information and putting it out in multi-media/multi-sensory map form. The process is controlled by the community themselves. They provide the software and hardware. They created a graphic tied to the Inuit terms for different types of sea ice. In some cases they record the audio of an elder talking about a place. The narrative of the route becomes part of the atlas. There is no right or wrong answer. There are many versions and different points of view. All are based on the same set of facts – but they come from different angles. The atlases capture them all.

The Gwich’in Place Name Atlas is building in the idea of long term preservation into the application from the start

The Cybercartographic Atlas of the Lake Huron Treaty Relationship Process is taking data from surveyors diaries from the 1850s.

There are lots of government of Canada geospatial data preservation intitatives, but in most cases there is a lot of retoric, but not so much action. There have been many consultations, studies, reports and initiatives since 2002, but the reality is that apart from the Open Government Consultations (TBS), not very much as translated into action. Even in the case where there is legislation, lots of things look good on paper but don’t get implemented.

There are Library and Archives Guidelines working to support digital preservation of geospatial data. The InterPares 2 (IP2) Geospatial Case Studies tackle a number of GIS examples, including the Cybercartographic Atlas of Antartica. See the presentation slides online for more specific examples.

In general, preservation as an afterthought rarely results in full recovery of born digital maps. It is very important to look at open source and interoperable open specifications. Proactive archiving is an important interim strategy.

Geospatial data are fundamental sources of our memory of the world. They help us understand our geo-narratives (stories tied to location), counter colonial mappings, are the result of scientific endeavors, represent multiple worldviews and they inform decisions. We need to overcome the challenges to ensure their preservation.

Q&A:

QUESTION: When I look at the work you are doing with recovering Inuit data from people. You recover data and republish it – who will preserve both the raw data and the new digital publication? What does it mean to try and really preserve this moving forward? Are we really preserving and archiving it?

ANSWER: No we are not. We haven’t been able to find an archive in Canada that can ingest our content. We will manage it ourselves as best we can. Our preservation strategy is temporary and holding, not permanent as it should be. We can’t find an archive to take the data. We are hopeful that we are moving towards finding a place to keep and preserve it. There is some hope on the horizon that we may move in the right directions in the Canadian context.

Luciana: I wanted to attest that we have all the data from InterPARES II. It is published in the final. I am jealously guarding my two servers that I maintain with money out of my own pocket.

QUESTION: Is it possible to have another approach to keep data where it is created, rather than a centralized approach?

ANSWER: We are providing servers to our clients in the north. Keeping copies of the database in the community where they are created. Keeping multiple copies in multiple places.

QUESTION: You mention surveys being sent out and few responses coming back. When you know there is data at risk – there may be governments that have records at risk that they are shy to reveal to the public? How do we get around that secrecy?

ANSWER: (IEDRO representative) We offer our help, rather than a request to get their data.

As is the case with all my session summaries, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.

Image Credit: NARA Flickr Commons image “The North Jetty near the Mouth of the Columbia River 05/1973”

Updated 2/20/2013 based on presenter feedback.

Election Eve: Fighting for the Right to Vote

In less than six hours, the polls in Maryland will open for the 2012 general election. Here on ‘election eve’ in the United States of America, I wanted to share some records of those who fought to gain the right to vote for all throughout the USA. Some of these you may have seen before – but I did my best to find images, audio, and video that may not have crossed your path. Why do we have these? In most cases it is because an archive kept them.

Of course I couldn’t do this post without including some of the great images out there of suffragists, but I bet you didn’t know that they had Suffrage Straw Rides.

Or perhaps Suffrage Dancers?

Here we see a group from the Suffrage Hike to Albany, NY in 1914.

Fast forward to the 1960s and the tone shifts. In this excerpt from a telegram sent to President Kennedy in 1961, civil rights activist James Farmer reports on an attack on a bus of Freedom Riders:

We also find images like this one of the leaders of the 1963 Civil Rights March on Washington, DC:

In Alabama from 1964 to 1965, a complicated voter registration process was in place to discourage registration of African-American voters. If you click through you can see a sample of one of these multi-page voter registration forms. In a different glimpse of what voter suppression looked like, listen to Theresa Burroughs tell her daughter Toni Love about registering to vote in this StoryCorps recording:

Finally, you can watch Lyndon B. Johnson’s remarks on the signing of the Voting Rights Act on August 6th, 1965.

These records just scratch the surface, but at least they give you a taste of the hard work by so many that has gone into gaining the right to vote for all in the United States. If you are a registered voter in the USA, please honor this hard work by exercising your right to vote at the polls Tuesday!

Harnessing The Power of We: Transcription, Acquisition and Tagging

In honor of the Blog Action Day for 2012 and their theme of ‘The Power of We’, I would like to highlight a number of successful crowdsourced projects focused on transcribing, acquisition and tagging of archival materials. Nothing I can think of embodies ‘the power of we’ more clearly than the work being done by many hands from across the Internet.

Transcription

  • Old Weather Records: “Old Weather volunteers explore, mark, and transcribe historic ship’s logs from the 19th and early 20th centuries. We need your help because this task is impossible for computers, due to diverse and idiosyncratic handwriting that only human beings can read and understand effectively. By participating in Old Weather you’ll be helping advance research in multiple fields. Data about past weather and sea-ice conditions are vital for climate scientists, while historians value knowing about the course of a voyage and the events that transpired. Since many of these logs haven’t been examined since they were originally filled in by a mariner long ago you might even discover something surprising.”
  • From The Page: “FromThePage is free software that allows volunteers to transcribe handwritten documents on-line.” A number of different projects are using this software including: The San Diego Museum of Natural History’s project to transcribe the field notes of herpetologist Laurence M. Klaube and Southwestern University’s project to transcribe the Mexican War Diary of Zenas Matthews.
  • National Archives Transcription: as part of the National Archives Citizen Archivist program, individuals have the opportunity to transcribe a variety of records. As described on the transcription home page: “letters to a civil war spy, presidential records, suffrage petitions, and fugitive slave case files”.

Acquisition:

  • Archive Team: The ArchiveTeam describes itself as “a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage.” Here is an example of the information gathered, shared and collaborated on by the ArchiveTeam focused on saving content from Friendster. The rescued data is (whenever possible) uploaded in the Internet Archive and can be found here:

    Springing into action, Archive Team began mirroring Friendster accounts, downloading all relevant data and archiving it, focusing on the first 2-3 years of Friendster’s existence (for historical purposes and study) as well as samples scattered throughout the site’s history – in all, roughly 20 million of the 112 million accounts of Friendster were mirrored before the site rebooted.

Tagging:

  • National Archives Tagging: another part of the Citizen Archivist project encourages tagging of a variety of records, including images of the Titanic, architectural drawings of lighthouses and the Petition Against the Annexation of Hawaii from 1898.
  • Flickr Commons: throughout the Flickr Commons, archives and other cultural heritage institutions encourage tagging of images

These are just a taste of the crowdsourced efforts currently being experimented with across the internet. Did I miss your favorite? Please add it below!

UNESCO/UBC Vancouver Declaration

In honor of the 2012 Day of Digtal Archives, I am posting a link to the UNESCO/UBC Vancouver Declaration. This is the product of the recent Memory of the World in the Digital Age conference and they are looking for feedback on this declaration by October 19th, 2012 (see link on the conference page for sending in feedback).

To give you a better sense of the aim of this conference, here are the ‘conference goals’ from the programme:

The safeguard of digital documents is a fundamental issue that touches everyone, yet most people are unaware of the risk of loss or the magnitude of resources needed for long-term protection. This Conference will provide a platform to showcase major initiatives in the area while scaling up awareness of issues in order to find solutions at a global level. Ensuring digital continuity of content requires a range of legal, technological, social, financial, political and other obstacles to be overcome.

The declaration itself is only four pages long and includes recommendations to UNESCO, member states and industry. If you are concerned with digital preservation and/or digitization, please take a few minutes to read through it and send in your feedback by October 19th.