Menu Close

Category: journalism

RSS and Mainstream News Outlets

Recently posted on the FP Passport blog, The truth about RSS gives an overview of the results of a recent RSS study that looks at the RSS feeds produced by 19 major news outlets. The complete study (and its results) can be found here: International News and Problems with the News Media’s RSS Feeds.

If you are interested in my part in all this, read the Study Methodology section (which describes my role down under the heading ‘How the Research Team Operated’) and the What is RSS? page (which I authored, and describes both the basics of RSS as well as some other web based tools we used in the study – YahooPipes and Google Docs).

Why should you care about RSS? RSS feeds are becoming more common on archives websites. It should be treated as just another tool in the outreach toolbox for making sure that your archives maintains or improves its visibility online. To get an idea of how they are being used, consider the example of the UK National Archives. They currently publish three RSS feeds:

  • Latest news Get the latest news and events for The National Archives.
  • New document releases Highlights of new document releases from The National Archives.
  • Podcasts Listen to talks, lectures and other events presented by The National Archives.

The results of the RSS study I link to above shed light on the kinds of choices that are made by content providers who publish feeds – and on the expectations of those who use them. If you don’t know what RSS is – this is a great intro. If you use and love (or hate) RSS already – I would love to know your thoughts on the study’s conclusions.

Epidemiological Research and Archival Records: Source of Records Used for Research Fails to Make the News

Typist wearing mask, New York City, October 16, 1918 (NARA record 165-WW-269B-16)In early April, Reuters ran an article that was picked up by YahooNews titled Closing Schools reduced flu deaths in 1918. I was immediately convinced that archival records must have supported this research – even though no mention of that was included in the article. The article did tell me that it was Dr. Richard Hatchett of the National Institute of Allergy and Infectious Diseases (NIAID) who led the research.

I sent him an email asking about where the data for his research came from. Did the NIH have a set of data from long ago? Here is an excerpt from his kind reply:

Unfortunately, nobody kept track of data like this and you can see the great lengths we went to to track it down. Many of the people we thank in our acknowledgment at the end of the paper tracked down and provided information in local or municipal archives. For Baltimore, I came up and spent an entire day in the library going through old newspapers on microfilm. Some of the information had been gathered by previous historians in works on the epidemic in individual cities (Omaha — an unpublished Master’s thesis — and Newark are examples). Gathering the information was extremely arduous and probably one of the reasons no one had looked at this systematically before. Fortunately, several major newspapers (the NYTimes, Boston Globe, Washington Post, Atlanta Journal-Constitution, etc.) now have online archives going back at least until 1918 that facilitated our search.

Please let me know if you have any other questions. We were amateurs and pulling the information together took a lot longer than we would ever have imagined.

He also sent me a document titled “Supporting Information Methods”. This turned out to be 37 pages of detailed references found to support their research. They were hunting for three types of information: first reported flu cases, amplifying events (such as Liberty Loan Parades ) and interventions (such as quarantines, school closings and bands on public gatherings).

Many of the resources cited are newspapers (see The Baltimore Sun’s 1918 flu pandemic timeline for examples of what can be found in newspapers), but I was more intrigued by the wide range of non-newspaper records used to support this research. A few examples:

  • Chicago (First reported case): Robertson JD. Report and handbook of the Department of Health of the City of Chicago for the years 1911 to 1918 inclusive. Chicago, 1919.
  • Cleveland (School closings): The City Record of the Cleveland City Council, October 21, 1918, File No. 47932, citing promulgation of health regulations by Acting Commissioner of Health H.L. Rockwood.
  • New Orleans (Ban on public gatherings): Parish of Orleans and City of New Orleans. Report of the Board of Health, 1919, p. 131.
  • Seattle (Emergency Declaration): Ordinance No. 38799 of the Seattle City Council, signed by Mayor Hanson October 9, 1918.

The journal article referenced in the Reuter’s story, Public health interventions and epidemic intensity during the 1918 influenza pandemic, was published in the Proceedings of the National Academy of Sciences (PNAS) and is available online.

The good news here is that the acknowledgment that Dr. Hatchett mentions in his email includes this passage:

The analysis presented here would not have been possible without the contributions of a large number of public health and medical professionals, historians, librarians, journalists, and private citizens […followed by a long list of individuals].

The bad news is that the use of archival records is not mentioned in the news story.

We frequently hear about how little money there is at most archives. Cutbacks in funding are the norm. Every few weeks we hear of archives forced to cut their hours, staff or projects. Public understanding of the important ways that archival records are used can only help to reverse this trend.

Maybe we need a bumper sticker to hand out to new researchers. Something catchy and a little pushy – something that says “Tell the world how valuable our records are!” – only shorter.

  • If You Use Archival Records – Go On The Record
  • Put Primary Sources in the Spotlight
  • Archivists for Footnotes: Keep the paper trail alive
  • Archives Remember: Don’t Forget Them

I don’t love any of these – anyone else feeling wittier and willing to share?

(For more images of the 1918 Influenza Epidemic, visit the National Museum of Health and Medicine’s Otis Historical Archives’ Images from the 1918 Influenza Epidemic.)

Understanding Born Digital Records: Journalists and Archivists with Parallel Challenges

My most recent Archival Access class had a great guest speaker from the Journalism department. Professor Ira Chinoy is currently teaching a course on Computer-Assisted Reporting. In the first half of the session, he spoke about ways that archival records can fuel and support reporting. He encouraged the class to brainstorm about what might make archival records newsworthy. How do old records that have been stashed away for so long become news? It took a bit of time, but we got into the swing of it and came up with a decent list. He then went through his own list and gave examples of published news stories that fit each of the scenarios.

In the second half of class he moved on to address issues related to the freedom of information and struggling to gain access to born digital public records. Journalists are usually early in the food chain of those vying for access to and understanding of federal, state and local databases. They have many hurdles. They must learn what databases are being kept and figure out which ones are worth pursuing. Professor Chinoy relayed a number of stories about the energy and perseverance required to convince government officials to give access to the data they have collected. The rules vary from state to state (see the Maryland Public Information Act as an example) and journalists often must quote chapter and verse to prove that officials are breaking the law if they do not hand over the information. There are officials who deny that the software they use will even permit extractions of the data – or that there is no way to edit the records to remove confidential information. Some journalists find themselves hunting down the vendors of proprietary software to find out how to perform the extract they need. They then go back to the officials with that information in the hopes of proving that it can be done. I love this article linked to in Prof. Chinoy’s syllabus: The Top 38 Excuses Government Agencies Give for Not Being Able to Fulfill Your Data Request (And Suggestions on What You Should Say or Do).

After all that work – just getting your hands on the magic file of data is not enough. The data is of no use without the decoder ring of documentation and context.

I spent most of the 1990s designing and building custom databases, many for federal government agencies. There are an almost inconceivable number of person hours that go into the creation of most of these systems. Stakeholders from all over the organization destined to use the system participate in meetings and design reviews. Huge design documents are created and frequently updated … and adjustments to the logic are often made even after the system goes live (to fix bugs or add enhancements). The systems I am describing are built using complex relational databases with hundreds of tables. It is uncommon for any one person to really understand everything in it – even if they are on the IT team for the full development life cycle.

Sometimes you get lucky and the project includes people with amazing technical writing skills, but usually those talented people are aimed at writing documentation for users of the system. Those documents may or may not explain the business processes and context related to the data. They will rarely expose the relationship between a user’s actions on a screen and the data as it is stored in the underlying tables. Some decisions are only documented in the application code itself and that is not likely to be preserved along with the data.

Teams charged with the support of these systems and their users often create their own documents and databases to explain certain confusing aspects of the system and to track bugs and their fixes. A good analogy here would be to the internal files that archivists often maintain about a collection – the notes that are not shared with the researchers but instead help the archivists who work with the collection remember such things as where frequently requested documents are or what restrictions must be applied to certain documents.

So where does that leave those who are playing detective to understand the records in these systems? Trying to figure out what the data in the tables mean based on the understanding of end-users can be a fool’s errand – and that is if you even have access to actual users of the system in the first place. I don’t think there is any easy answer given the realities of how many unique systems of managing data are being used throughout the public sector.

Archivists often find themselves struggling with the same problems. They have to fight to acquire and then understand the records being stored in databases. I suspect they have even less chance of interacting with actual users of the original system that created the records – though I recall discussions in my appraisal class last term about all the benefits of working with the producers of records long before they are earmarked to head to the archives. Unfortunately, it appeared that this was often the exception rather than the rule – even if it is the preferred scenario.

The overly ambitious and optimistic part had the idea that what ‘we’ really need is a database that lists common commercial off-the-shelf (COTS) packages used by public agencies – along with information on how to extract and redact data from these packages. For those agencies using custom systems, we could include any information on what company or contractors did the work – that sort of thing can only help later. Or how about just a list of which agencies use what software? Does something like this exist? The records of what technology is purchased are public record – right? Definitely an interesting idea (for when I have all that spare time I dream about). I wonder if I set up a wiki for people to populate with this information if people would share what they already know.

I would like to imagine a future world in which all this stuff is online and you can login and download any public record you like at any time. You can get a taste of where we are on the path to achieving this dream on the archives side of things by exploring a single series of electronic records published on the US National Archives site. For example, look at the search screen for World War II Army Enlistment Records. It includes links to sample data, record group info and an FAQ. Once you make it to viewing a record – every field includes a link to explain the value. But even this extensive detail would not be enough for someone to just pick up these records and understand them – you still need to understand about World War II and Army enlistment. You still need the context of the events and this is where the FAQ comes in. Look at the information they provide – and then take a moment to imagine what it would take for a journalist to recreate a similar level of detailed information for new database records being created in a public agency today (especially when those records are guarded by officials who are leery about permitting access to the records in the first place).

This isn’t a new problem that has appeared with born digital records. Archivists and journalists have always sought the context of the information with which they are working. The new challenge is in the added obstacles that a cryptic database system can add on top of the already existing challenges of decrypting the meaning of the records.

Archivists and Journalists care about a lot of the same issues related to born digital records. How do we acquire the records people will care about? How do we understand what they mean in the context of why and how they were created? How do we enable access to the information? Where do we get the resources, time and information to support important work like this?

It is interesting for me find a new angle from which to examine rapid software development. I have spent so much of my time creating software based on the needs of a specific user community. Usually those who are paying for the software get to call the shots on the features that will be included. Certain industries do have detailed regulations designed to promote access by external observers (I am thinking of applications related to medical/pharmaceutical research and perhaps HAZMAT data) but they are definitely exceptions.

Many people are worrying about how we will make sure that the medium upon which we record our born digital records remains viable. I know that others are pondering how to make sure we have software that can actually read the data such that it isn’t just mysterious 1s and 0s. What I am addressing here is another aspect of preservation – the preservation of context. I know this too is being worried about by others, but while I suspect we can eventually come up with best practices for the IT folks to follow to ensure we can still access the data itself – it will ultimately be up to the many individuals carrying on their daily business in offices around the world to ensure that we can understand the information in the records. I suppose that isn’t new either – just another reason for journalists and archivists to make their voices heard while the people who can explain the relationships between the born digital records and the business processes that created them are still around to answer questions.