Menu Close

Category: software

Getting Your Toes Wet: Basic Principals of Design for the New Web

Ellyssa Kroski of InfoTangle has created a great overview of current trends in website and application design in her post Information Design for the New Web. If you are going to Computers in Libraries, you can see her present the ideas she discusses in her post in a session of the same name on Monday April 16.

She highlights 3 core principles with clear explanations and great examples:

  • Keep it Simple
  • Make it Social
  • Offer Alternate Navigation

As archives continue to dive into the deep end of the internet pool, more and more archivists will find themselves participating in discussions about website design choices. Understanding basic principals like those discussed in Kroski’s post will go a long way to making archivists feel more comfortable contributing to these sorts of discussions.

Don’t think that things like this should be left to the IT department or only the ‘techie archivists’ on your staff (if you have any). You all have a lot to contribute. You know your collections. You know the importance of archival principals of provenance, original order and context. There are lots of aspects of archival materials that traditional web designers might not consider important. Things that you know are very important if people are to understand your archives’ materials while browsing from the comfort of their homes.

So dip your toes in. Learn some buzz words, look at some fun websites and get comfortable with some innovative ideas. The water is just fine.

Understanding Born Digital Records: Journalists and Archivists with Parallel Challenges

My most recent Archival Access class had a great guest speaker from the Journalism department. Professor Ira Chinoy is currently teaching a course on Computer-Assisted Reporting. In the first half of the session, he spoke about ways that archival records can fuel and support reporting. He encouraged the class to brainstorm about what might make archival records newsworthy. How do old records that have been stashed away for so long become news? It took a bit of time, but we got into the swing of it and came up with a decent list. He then went through his own list and gave examples of published news stories that fit each of the scenarios.

In the second half of class he moved on to address issues related to the freedom of information and struggling to gain access to born digital public records. Journalists are usually early in the food chain of those vying for access to and understanding of federal, state and local databases. They have many hurdles. They must learn what databases are being kept and figure out which ones are worth pursuing. Professor Chinoy relayed a number of stories about the energy and perseverance required to convince government officials to give access to the data they have collected. The rules vary from state to state (see the Maryland Public Information Act as an example) and journalists often must quote chapter and verse to prove that officials are breaking the law if they do not hand over the information. There are officials who deny that the software they use will even permit extractions of the data – or that there is no way to edit the records to remove confidential information. Some journalists find themselves hunting down the vendors of proprietary software to find out how to perform the extract they need. They then go back to the officials with that information in the hopes of proving that it can be done. I love this article linked to in Prof. Chinoy’s syllabus: The Top 38 Excuses Government Agencies Give for Not Being Able to Fulfill Your Data Request (And Suggestions on What You Should Say or Do).

After all that work – just getting your hands on the magic file of data is not enough. The data is of no use without the decoder ring of documentation and context.

I spent most of the 1990s designing and building custom databases, many for federal government agencies. There are an almost inconceivable number of person hours that go into the creation of most of these systems. Stakeholders from all over the organization destined to use the system participate in meetings and design reviews. Huge design documents are created and frequently updated … and adjustments to the logic are often made even after the system goes live (to fix bugs or add enhancements). The systems I am describing are built using complex relational databases with hundreds of tables. It is uncommon for any one person to really understand everything in it – even if they are on the IT team for the full development life cycle.

Sometimes you get lucky and the project includes people with amazing technical writing skills, but usually those talented people are aimed at writing documentation for users of the system. Those documents may or may not explain the business processes and context related to the data. They will rarely expose the relationship between a user’s actions on a screen and the data as it is stored in the underlying tables. Some decisions are only documented in the application code itself and that is not likely to be preserved along with the data.

Teams charged with the support of these systems and their users often create their own documents and databases to explain certain confusing aspects of the system and to track bugs and their fixes. A good analogy here would be to the internal files that archivists often maintain about a collection – the notes that are not shared with the researchers but instead help the archivists who work with the collection remember such things as where frequently requested documents are or what restrictions must be applied to certain documents.

So where does that leave those who are playing detective to understand the records in these systems? Trying to figure out what the data in the tables mean based on the understanding of end-users can be a fool’s errand – and that is if you even have access to actual users of the system in the first place. I don’t think there is any easy answer given the realities of how many unique systems of managing data are being used throughout the public sector.

Archivists often find themselves struggling with the same problems. They have to fight to acquire and then understand the records being stored in databases. I suspect they have even less chance of interacting with actual users of the original system that created the records – though I recall discussions in my appraisal class last term about all the benefits of working with the producers of records long before they are earmarked to head to the archives. Unfortunately, it appeared that this was often the exception rather than the rule – even if it is the preferred scenario.

The overly ambitious and optimistic part had the idea that what ‘we’ really need is a database that lists common commercial off-the-shelf (COTS) packages used by public agencies – along with information on how to extract and redact data from these packages. For those agencies using custom systems, we could include any information on what company or contractors did the work – that sort of thing can only help later. Or how about just a list of which agencies use what software? Does something like this exist? The records of what technology is purchased are public record – right? Definitely an interesting idea (for when I have all that spare time I dream about). I wonder if I set up a wiki for people to populate with this information if people would share what they already know.

I would like to imagine a future world in which all this stuff is online and you can login and download any public record you like at any time. You can get a taste of where we are on the path to achieving this dream on the archives side of things by exploring a single series of electronic records published on the US National Archives site. For example, look at the search screen for World War II Army Enlistment Records. It includes links to sample data, record group info and an FAQ. Once you make it to viewing a record – every field includes a link to explain the value. But even this extensive detail would not be enough for someone to just pick up these records and understand them – you still need to understand about World War II and Army enlistment. You still need the context of the events and this is where the FAQ comes in. Look at the information they provide – and then take a moment to imagine what it would take for a journalist to recreate a similar level of detailed information for new database records being created in a public agency today (especially when those records are guarded by officials who are leery about permitting access to the records in the first place).

This isn’t a new problem that has appeared with born digital records. Archivists and journalists have always sought the context of the information with which they are working. The new challenge is in the added obstacles that a cryptic database system can add on top of the already existing challenges of decrypting the meaning of the records.

Archivists and Journalists care about a lot of the same issues related to born digital records. How do we acquire the records people will care about? How do we understand what they mean in the context of why and how they were created? How do we enable access to the information? Where do we get the resources, time and information to support important work like this?

It is interesting for me find a new angle from which to examine rapid software development. I have spent so much of my time creating software based on the needs of a specific user community. Usually those who are paying for the software get to call the shots on the features that will be included. Certain industries do have detailed regulations designed to promote access by external observers (I am thinking of applications related to medical/pharmaceutical research and perhaps HAZMAT data) but they are definitely exceptions.

Many people are worrying about how we will make sure that the medium upon which we record our born digital records remains viable. I know that others are pondering how to make sure we have software that can actually read the data such that it isn’t just mysterious 1s and 0s. What I am addressing here is another aspect of preservation – the preservation of context. I know this too is being worried about by others, but while I suspect we can eventually come up with best practices for the IT folks to follow to ensure we can still access the data itself – it will ultimately be up to the many individuals carrying on their daily business in offices around the world to ensure that we can understand the information in the records. I suppose that isn’t new either – just another reason for journalists and archivists to make their voices heard while the people who can explain the relationships between the born digital records and the business processes that created them are still around to answer questions.

Should we be archiving fonts?

I am a fan of beautiful fonts. This is why I find myself on the mailing list if MyFonts.com. I recently received their Winter 2007 newsleter featuring the short article titled ‘A cast-iron investment’. It starts out with:

Of all the wonderful things about fonts, there’s one that is rarely mentioned by us font sellers. It’s this: fonts last for a very long time. Unlike almost all the other software you may have bought 10 or 15 years ago, any fonts you bought are likely still working well, waiting to be called back into action when you load up that old newsletter or greetings card you made!

Interesting. The article goes on to point out:

But, of course, foundries make updates to their fonts every now and then, with both bug fixes and major upgrades in features and language coverage.

All this leaves me wondering if there is a place in the world for a digital font archive. A single source of digital font files for use by archives around the world. Of course, there would be a number of hurdles:

  1. How do you make sure that the fonts are only available for use in documents that used the fonts legally?
  2. How do you make sure that the right version of the font is used in the document to show us how the document appeared originally?

You could say this is made moot by using something like Adobe’s PDF/A format. It is also likely that we won’t be running the original word processing program that used the fonts a hundred years from now.

Hurdles aside, somehow it feels like a clever thing to do. We can’t know how we might enable access to documents that use fonts in the future. What we can do is keep the font files so we have the option to do clever things with them in the future.

I would even make a case for the fact that fonts are precious in their own right and deserve to be preserved. My mother spent many years as a graphic designer. From her I inherited a number of type specimen books – including one labeled “Adcraft Typographers, Inc”. Google led me to two archival collections that include font samples from Adcraft:

Another great reason for a digital font archive is the surge in individual foundries creating new fonts every day. What once was an elite craft now has such a low point of entry that anyone can download some software and hang out their shingle as a font foundry. Take a look around MyFonts.com. Read about selling your fonts on MyFonts.com.

While looking for a good page about type foundries I discovered the site for Precision Type which shows this on their only remaining page:

For the last 12 years, Precision Type has sought to provide our customers with convenient access to a large and diverse range of font software products. Our business grew as a result of the immense impact that digital technology had in the field of type design. At no other time in history had type ever been available from so many different sources. Precision Type was truly proud to play a part in this exciting evolution.

Unfortunately however, sales of font software for Precision Type and many others companies in the font business have been adversely affected in recent years by a growing supply of free font software via the Internet. As a result, we have decided to discontinue our Precision Type business so that we can focus on other business opportunities.

I have to go back to May 23, 2004 in the Internet Archive Wayback Machine to see what Precision Type’s used to look like.

There are more fonts than ever before. Amateurs are driving professionals out of business. Definitely sounds like digital fonts and their history are a worthy target for archival preservation.

Book Review: Past Time, Past Place: GIS for History

Past Time, Past Place: GIS for History consists mainly of 11 case studies of geographic information systems being applied to the study of history. It includes a nice sprinkling of full color maps and images and a 20 page glossary of GIS terms. Each case study includes a list of articles and other resources for further reading.

The book begins with an introduction by the editor, Anne Kelly Knowles. This chapter explains the basics of using GIS to study history, as well as giving an overview of how the book is organized.

The meat of the book are the case studies covering the following topics:

I suspect that different audiences will take very different ideas away from this book. I was for looking for information about GIS and historical records (this is another book found during my mad hunt for information on the appraisal and preservation of GIS records) and found a bit of related information to add to my research. I think this book will be of interest to those who fall in any of the following categories:

  • Archivists curious about how GIS might enhance access to and understanding of the records under their care
  • Historians interested in understanding how GIS can be used to approach historical research in new ways
  • History buffs who love reading a good story (complete with pictures)
  • Map aficionados curious about new and different kinds of information that can be portrayed with GIS

I especially loved the maps and other images. I am a bit particular when it comes to the quality of graphics – but this book comes through with bright colors and clear images. The unusual square book format (measuring 9″x9″) gave those who arranged the layout lots of room to work – and they took full advantage of the space.

No matter if you plan to read the case studies for the history being brought to life or are looking for “how-tos” as you tackle your own GIS-History project – this book deserves some attention.

Footnote.com and US National Archives records

Thanks to Digitization 101‘s recent post “Footnote launches and announces partnership with National Archives” I was made aware of the big news about the digitization of the US National Archives’ records. Footnote.com has gone live with the first of apparently many planned installments of digitized NARA records. My first instinct was one of suspicion. In the shadow of recent historian alarm about the Smithsonian/Showtime deal, I think its valid to be concerned about new agreements between government agencies and private companies.

That said, I am feeling much more positive based on the passage below from the the January 10th National Archives Press Release about the agreement with Footnote (emphasis mine):

This non-exclusive agreement, beginning with the sizeable collection of materials currently on microfilm,will enable researchers and the general public to access millions of newly-digitized images of the National Archives historic records on a subscription basis from the Footnote web site. By February 6, the digitized materials will also be available at no charge in National Archives research rooms in Washington D.C. and regional facilities across the country. After an interval of five years, all images digitized through this agreement will be available at no charge through the National Archives web site .

This sounds like a win-win situation. NARA gets millions of records digitized (4.5 million and counting according to the press release). These records will be highlighed on the Footnote web site. They will have the advantages of Footnote’s search and browse interfaces (which I plan to do an in depth review of in the next week).

When signing up for my free account – I actually read through the entire Footnote Terms of Service including this passage (within the section labeled ‘Our Intellectual Property Rights’ – again, emphasis mine):

Content on the Website is provided to you AS IS for your information and personal use only as permitted through the functionality of the Website and may not be used, copied, reproduced, distributed, transmitted, broadcast, displayed, sold, licensed, or otherwise exploited for any other purposes whatsoever without the prior written consent of the respective owners . Footnote.com reserves all rights not expressly granted in and to the Website and the Content. You agree not to engage in the use, copying, or distribution of any of the Content other than expressly permitted herein, including any use, copying, or distribution of User Submissions of third parties obtained through the Website for any commercial purposes. If you download or print a copy of the Content for personal use, you must retain all copyright and other proprietary notices contained therein.

These terms certainly are no different from that under which most archives operate – but it did give me a moment of wondering how many extra hoops one would need to jump through if you wanted to use any of the NARA records found in Footnote for a major project like a book. A quick experiment with the Pennsylvania Archives (which are available for free with registration) did not show me any copyright information or notices related to rights. I downloaded an image to see what ‘copyright and other proprietary notices’ I might find and found none.

In his post “The Flawed Agreement between the National Archives and Footnote, Inc.“, Dan Cohen expresses his views of the agreement. I had been curious about what percentage of the records being digitized were out of copyright – Dan says they all are. If all of the records are out of copyright – exactly what rights are Footnote.com reserving (in the passage from the terms of service shown above)? I also agree with him in his frustration about the age restriction in place for using Footnote.com (you have to be over 18).

My final opinion about the agreement itself will depend on answers to a few more questions:

1) Were any of the records recently made available on Footnote.com already digitized and available via the archives.gov website?

2) What percentage of the records that were digitized by Footnote would have been digitized by NARA without this agreement?

3) What roadblocks will truly be set in place for those interested in using records found on Footnote.com?

4) What interface will be available to those accessing the records for free in “National Archives research rooms in Washington D.C. and regional facilities across the country” (from the press release above)? Will it be the Footnote.com website interface or via NARA’s own Archival Research Catalog (ARC) or Access to Archival Databases (AAD)?

If the records that Footnote has digitized and made available on Footnote.com would not otherwise have been digitized over the course of the next five years (a big if) then I think this is an interesting solution. Even the full $100 fee for a year subscription is much more reasonable than many other research databases out there (and certainly cheaper than even a single night hotel room within striking distance of National Archives II).

As I mentioned above, I plan to post a review of the Footnote.com search and browse interfaces in the next week. The Footnote.com support folks have given me permission to include screen shots – so if this topic is of interest to you, keep an eye out for it.

OBR: Optical Braille Recognition

In the interest of talking about new topics – I opened my little moleskine notebook and found a note to myself wondering if it is possible to scan Braille with the equivalent of OCR.

Enter Optical Braille Recognition or OBR. Created by a company called Neovision, this software will permit anyone with a scanner and a Windows platform computer to ‘read’ Braille documents.

Why was this in my notebook? I was thinking about unusual records that must be out in the world and wondering about how to improve access to the information within them. So if there are Braille records out there – how does the sighted person who can’t read Braille get at that information? Here is an answer. Not only does the OBR permit reading of Braille documents – but it would permit recreation of these same documents in Braille from any computer that has the right technology.

Reading through the Wikipedia Braille entry, I learned a few things that would throw a monkey wrench into some of this. For example – “because the six-dot Braille cell only offers 64 possible combinations, many Braille characters have different meanings based on their context”. The page on Braille code lists links to an assortment of different Braille codes which translate the different combinations of dots into different characters depending on the language of the text. On top of the different Braille codes used to translate Braille into specific letters or characters – there is another layer to Braille transcription. Grade 2 Braille uses a specific set of contractions and shorthand – and is used for official publications and things like menus, while Grade 3 Braille is used in the creation of personal letters.

It all goes back to context (of course!). If you have a set of Braille documents with no information on them giving you details of what sort of documents they are – you have a document that is effectively written in code. Is it music written in Braille Music notation? Is it a document in Hiranga using the Japanese Code? Is this a personal letter using Grade 3 Braille shorthand? You get the idea.

I suspect that one might even want to include a copy of both the Braille Code and the Braille transcription rules that go with a set of documents as a key to their translation in the future. If there are frequently used records – they could perhaps include the transcription (both literal transcription and a ‘translation’ of all the used Braille contractions) to improve access of analog records.

In a quick search for collections including braille manuscripts it should come as no surprise that the Helen Keller Archives does have “braille correspondence”. I also came across the finding aids for the Harvard Law School Examinations in Braille (1950-1985) and The Donald G. Morgan Papers (the papers of a blind professor at Mount Holyoke College).

I wonder how many other collections have Braille records or manuscripts. Has anyone reading this ever seen or processed a collection including Braille records?

DMCA Exemption Added That Supports Archivists

The Digital Millennium Copyright Act, aka DMCA (which made it illegal to create or distribute technology which can get around copyright protection technology) has six new classes of exemptions added today.

From the very long named Rulemaking on Exemptions from Prohibition on Circumvention of Technological Measures that Control Access to Copyrighted Works out of the U.S. Copyright Office (part of the Library of Congress) comes the addition of the following class of work that will not be “subject to the prohibition against circumventing access controls”:

Computer programs and video games distributed in formats that have become obsolete and that require the original media or hardware as a condition of access, when circumvention is accomplished for the purpose of preservation or archival reproduction of published digital works by a library or archive. A format shall be considered obsolete if the machine or system necessary to render perceptible a work stored in that format is no longer manufactured or is no longer reasonably available in the commercial marketplace.

This remain valid from November 27, 2006 through October 27, 2009. Hmm.. three years? So what happens if this expires and doesn’t get extended (though one would imagine by then either we will have a better answer to this sort of problem OR the problem will be even worse than it is now)? When you look at the fact that places like NARA have fabulous mission statements for their Electronic Records Archives with phrases like “for the life of the republic” in them – three years sounds pretty paltry.

That said, how interesting to have archivists highlighted as benefactors of new legal rules. So now it will be legal (or at least not punishable under the DMCA) to create and share  programs to access records created by obsolete software. I don’t know enough about the world of copyright and obsolete software to be clear on how much this REALLY changes what places like NARA’s ERA and other archives pondering the electronic records problem are doing, but clearly this exemption can only validate a lot of work that needs to be done.