Menu Close

Category: learning technology

Chapter 8: Preparing and Releasing Official Statistical Data by Professor Natalie Shlomo

Black and white photo of a woman using a keypunch to tabulate the United States Census, circa 1940.Chapter 8 of Partners for Preservation is ‘Preparing and Releasing Official Statistical Data’ by Professor Natalie Shlomo. This is the first chapter of Part III:  Data and Programming. I knew early in the planning for the book that I wanted a chapter that talked about privacy and data.

During my graduate program, in March of 2007, Google announced changes to their log retention policies. I was fascinated by the implications for privacy. At the end of my reflections on Google’s proposed changes, I concluded with:

“The intersection of concerns about privacy, government investigations, document retention and tremendous volumes of private sector business data seem destined to cause more major choices such as the one Google has just announced. I just wonder what the researchers of the future will think of what we leave in our wake.”

While developing my chapter list for the book – I followed my curiosity about how the field of statistics preserves privacy and how these approaches might be applied to historical data preserved by archives. Fields of research that rely on the use of statistics and surveys have developed many techniques for balancing the desire for useful data with the expectations of confidentiality by those who participate in surveys and censuses. This chapter taught me that “statistical disclosure limitation”, or SDL, aims to prevent the disclosure of sensitive information about individuals.

This short excerpt gives a great overview of the chapter:

“With technological advancements and the increasing push by governments for open data, new forms of data dissemination are currently being explored by statistical agencies. This has changed the landscape of how disclosure risks are defined and typically involves more use of peturbative methods of SDL. In addition, the statistical community has begun to assess whether aspects of differential privacy which focus on the peturbation of outputs may provide solutions for SDL. This has led to collaborations with computer scientists”

Almost eighty years ago, the woman in the photo above used a keypunch to tabulate the US Census. The amount of hands-on detail labor required to gather that data boggles the mind in comparison to born-digital data collection techniques now possible. The 1940 census was released in 2012 and is available online for free through a National Archives website. As archives face the onslaught of born-digital data tied to individuals, the techniques used by statisticians will need to become a familiar tool for archivists seeking to both increase access to data while respecting the privacy of those who might be identified through unfettered access to the data. This chapter serves as a solid introduction to SDL, as well as a look forward to new ideas in the field. It also ties back to topics in Chapter 2: Curbing The Online Assimilation Of Personal Information and Chapter 5: The Internet Of Things.

Bio:

Natalie Shlomo (BSc, Mathematics and Statistics, Hebrew University; MA, Statistics, Hebrew University; PhD, Statistics, Hebrew University) is Professor of Social Statistics at the School of Social Sciences, University of Manchester.  Her areas of interest are in survey methods, survey design and estimation, record linkage, statistical disclosure control, statistical data editing and imputation, non-response analysis and adjustments, adaptive survey designs and small area estimation.   She is the UK principle investigator for several collaborative grants from the 7th Framework Programme and H2020 of the European Union all involving research in improving survey methods and dissemination. She is also principle investigator for the Leverhulme Trust International Network Grant on Bayesian Adaptive Survey Designs. She is an elected member of the International Statistical Institute and a fellow of the Royal Statistical Society. She is an elected council member and Vice-President of the International Statistical Institute. She is associate editor of several journals, including International Statistical Review and Journal of the Royal Statistical Society, Series A.   She serves as a member of several national and international advisory boards.

Image source:  A woman using a keypunch to tabulate the United States Census, circa 1940. National Archives Identifier (NAID) 513295 https://commons.wikimedia.org/wiki/File:Card_puncher_-_NARA_-_513295.jpg

Chapter 7: Historical Building Information Model (BIM)+: Sharing, Preserving and Reusing Architectural Design Data by Dr. JuHyun Lee and Dr. Ning Gu

Chapter 7 of Partners for Preservation is ‘Historical Building Information Model (BIM)+: Sharing, Preserving and Reusing Architectural Design Data’ by Dr. JuHyun Lee and Dr. Ning Gu. The final chapter in Part II: The physical world: objects, art, and architecture, this chapter addresses the challenges of digital records created to represent physical structures. I picked the image above because I love the contrast between the type of house plans you could order from a catalog a century ago and the way design plans exist today.

This chapter was another of my “must haves” from my initial brainstorm of ideas for the book. I attended a session on ‘Preserving Born-Digital Records Of The Design Community’ at the 2007 annual SAA meeting. It was a compelling discussion, with representatives from multiple fields. Archivists working to preserve born-digital designs. People working on building tools and setting standards. There were lots of questions from the audience – many of which I managed to capture in my notes that became a detailed blog post on the session itself. It was exciting to be in the room with so many enthusiastic experts in overlapping fields. They were there to talk about what might work long term.

This chapter takes you forward to see how BIM has evolved – and how historical BIM+ might serve multiple communities. This passage gives a good overview of the chapter:

“…the chapter first briefly introduces the challenges the design and building industry have faced in sharing, preserving and reusing architectural design data before the emergence and adoption of BIM, and discusses BIM as a solution for these challenges. It then reviews the current state of BIM technologies and subsequently presents the concept of historical BIM+ (HBIM+), which aims to share, preserve and reuse historical building information. HBIM+ is based on a new framework that combines the theoretical foundation of HBIM with emerging ontologies and technologies in the field including geographic information systems (GIS), mobile computing and cloud computing to create, manage and exchange historical building data and their associated values more effectively.”

I hope you find the ideas shared in this chapter as intriguing as I do. I see lots of opportunities for archivists to collaborate with those focused on architecture and design, especially in the case of historical buildings and the proposed vision for HBIM+.

Bios:

Ning Gu is Professor of Architecture in the School of Art, Architecture and Design at the University of South Australia. Having an academic background from both Australia and China, Professor Ning Gu’s most significant contributions have been made towards research in design computing and cognition, including topics such as computational design analysis, design cognition, design com­munication and collaboration, generative design systems, and Building Information Modelling. The outcomes of his research have been documented in over 170 peer-reviewed publications. Professor Gu’s research has been supported by prestigious Australian research funding schemes from Australian Research Council, Office for Learning and Teaching, and Cooperative Research Centre for Construction Innovation. He has guest edited/chaired major international journals/conferences in the field. He was Visiting Scholar at MIT, Columbia University and Technische Universiteit Eindhoven.

JuHyun Lee is an adjunct senior lecturer, at the University of Newcastle (UoN). Dr. Lee has made a significant contribution towards architectural and design research in three main areas: design cognition (design and language), planning and design analysis, and design computing. As an expert in the field of architectural and design computing, Dr. Lee was invited to become a visiting academic at the UoN in 2011. Dr. Lee has developed innovative computational applications for pervasive computing and context awareness in the building environments. The research has been published in Computers in Industry, Advanced Engineering Informatics, Journal of Intelligent and Robotic Systems. His international contribution has been recognised as: Associate editor for a special edition of Architectural Science Review; Reviewer for many international journals and conferences; International reviewer for national grants.

Image Source: Image from page 717 of ‘Easy steps in architecture and architectural drawing’ by Hodgson, Frederick Thomas, 1915. https://archive.org/details/easystepsinarch00hodg/page/n717

Chapter 6: Accurate Digital Colour Reproduction on Displays: from Hardware Design to Software Features by Dr. Abhijit Sarkar

The sixth chapter in Partners for Preservation is “Accurate Digital Colour Reproduction on Displays: from Hardware Design to Software Features” by Dr. Abhijit Sarkar. As the second chapter in Part II: The physical world: objects, art, and architecture, this chapter continues to walk the edge between the physical and digital worlds.

My mother was an artist. I spent a fair amount of time as a child by her side in museums in New York City. As my own creativity has led me to photography and graphic design, I have become more and more interested in color and how it can change (or not change) across the digital barrier and across digital platforms. Add in the ongoing challenges to archival preservation of born-digital visual records and the ever-increasing efforts to digitize archival materials, and this was a key chapter I was anxious to include.

One of my favorite passages from this chapter:

If you are involved in digital content creation or digitisation of existing artwork, the single most important advice I can give you is to start by capturing and preserving as much information as possible, and allow redundant information to be discarded later as and when needed. It is a lot more difficult to synthesise missing colour fidelity information than to discard information that is not needed.

This chapter, perhaps more than any other in the book, can stand alone as a reference. It is a solid introduction to color management and representation, including both information about basic color theory and important aspects of the technology choices that govern what we see when we look at a digital image on a particular piece of hardware.

On my computer screen, the colors of the image I selected for the top of this blog post please me. How different might the 24 x 30-inch original screenprint on canvas mounted on paperboard, created fifty years ago in 1969 and now held by the Smithsonian American Art Museum, look to me in person? How different might it look on each device on which people read this blog post? I hope that this type of curiosity will lure you develop an understanding of the impacts that the choices explored in this chapter can have on how the records in your care will be viewed in the future.

Bio: 

Abhijit Sarkar specializes in the area of color science and imaging. Since his early college days, Abhijit wanted to do something different from what all his friends were doing or planning to do. That mission took him through a tortuous path of earning an undergraduate degree in electrical engineering in India, two MS degrees from Penn State and RIT on lighting and color, and a PhD in France on applied computing. His doctoral thesis was mostly focused on the fundamental understanding of how individuals perceive colors differently and devising a novel method of personalized color processing for displays in order to embrace individual differences.

Because of his interdisciplinary background encompassing science, engineering and art, Abhijit regards cross-discipline collaborations like the Partners for Preservation extremely valuable in transcending the boundaries of myriads of specialized domains and fields; thereby developing a much broader understanding of capabilities and limitations of technology.

Abhijit is currently part of the display design team at Microsoft Surface, focused on developing new display features that enhance users’ color experience. He has authored a number of conference and journal papers on color imaging and was a contributing author for the Encyclopedia of Color Science and Technology.

Image source: Bullet Proof, from the portfolio Series I by artist Gene Davis, Smithsonian American Art Museum, Bequest of Florence Coulson Davis

Chapter 3: The Rise of Computer-Assisted Reporting by Brant Houston

Embed from Getty Images
The third chapter in Partners for Preservation is ‘The Rise of Computer-Assisted Reporting: Challenges and Successes’ by Brant Houston. A chapter on this topic has been at the top of my list of chapter ideas from the very start of this project. Back in February of 2007, Professor Ira Chinoy from the University of Maryland, College Park’s Journalism Department spoke to my graduate school Archival Access class. His presentation and the related class discussion led to my blog post Understanding Born-Digital Records: Journalists And Archivists With Parallel Challenges. Elements of this blog post even inspired a portion of the book’s introduction.

The photo above is from the 1967 Detroit race riots. 50 years ago, the first article recognized to have used computer-assisted reporting was awarded the 1968 Pulitzer Prize for Local General or Spot News Reporting “For its coverage of the Detroit riots of 1967, recognizing both the brilliance of its detailed spot news staff work and its swift and accurate investigation into the underlying causes of the tragedy.” In his chapter, Brant starts here and takes us through the evolution of computer-assisted reporting spanning from 1968 to the current day, and looking forward to the future.

As the third chapter in Part 1: Memory, Privacy, and Transparency, it continues to weave these three topics together. Balancing privacy and the goal of creating documentation to preserve memories of all that is going on around us is not easy. Transparency and a strong commitment to ethical choices underpin the work of both journalists and archivists.

This is one of my favorite passages:

“As computer-assisted repoting has become more widespread and routine, it has given rise to discussion and debate over the issues regarding the ethical responsibilitys of journalists. There have been criticisms over the publishing of data that was seen as intrusive and violating the privacy of individuals.”

I learned so much in this chapter about the long road journalists had to travel as they sought to use computers to support their reporting. It never occurred to me, as someone who has always had the access to the computing power I needed through school or work, that getting the tools journalists needed to do their computational analysis often required negotiation for time on newspaper mainframes or seeking partners outside of the newsroom. It took tenacity and the advent of personal computers to make computer-assisted reporting feasible for the broader community of journalists around the world.

Journalists have sought the help of archivists on projects for many years – seeking archival records as part of the research for their reporting. Now journalists are also taking steps to preserve their field’s born-digital content. Given the high percentage of news articles that exist exclusively online – projects like the Journalism Digital News Archive are crucial to the survival of these articles. I look forward to all the ways that our fields can learn from each other and work together to tackle the challenges of digital preservation.

Bio

Brant Houston

Brant Houston is the Knight Chair in Investigative Reporting at the University of Illinois at Urbana-Champaign where he works on projects and research involving the use of data analysis in journalism. He is co-founder of the Global Investigative Journalism Network and the Institute for Nonprofit News. He is author of Computer-Assisted Reporting: A Practical Guide, co-author of The Investigative Reporter’s Handbook. He is a contributor to books on freedom of information acts and open government. Before joining the University of Illinois, he was executive director of Investigative Reporters and Editors at the University of Missouri after being an award-winning investigative journalist for 17 years.  

 

Digitization Program Site Visit: Archives of American Art

The image of Alexander Calder above shows him in his studio, circa 1950. It is from a folder titled Photographs: Calder at Work, 1927-1956, undated, part of Alexander Calder’s Papers held by the Smithsonian Archives of American Art and available online through the efforts of their digitization project. I love that this image capture him in his creative space – you get to see the happy chaos from which Calder drew his often sleek and sparse sculptures.

Back in October, I had the opportunity to visit with staff of the digitization program for the Smithsonian Archives of American Art along with a group of my colleagues from the World Bank. This is a report on that site visit. It is my hope that these details can help others planning digitization projects – much as it is informing our own internal planning.

Date of Visit: October 18, 2011

Destination: Smithsonian Archives of American Art

Smithsonian Archives of American Art Hosts:

Summary:  This visit was two hours in length and consisted of a combination of presentation, discussion and site tour to meet staff and examine equipment.

Background: The Smithsonian’s Archives of American Art (AAA) program was first funded by a grant from the Terra Foundation of American Art in 2005, recently extended through 2016. This funding supports both staff and research.

Their digitization project replaced their existing microfilm program and focuses on digitizing complete collections. Digitization focused on in-house collections (in contrast with collections captured on microfilm from other institutions across the USA as part of their microfilm program).

Over the course of the past 6 years, they have scanned over 110 collections – a total of 1,000 linear feet – out of an available total of 13,000 linear feet from 4,500 collections. They keep a prioritized list of what they want digitized.

The Smithsonian DAM (digital asset management system) had to be adjusted to handle the hierarchy of EAD and the digitized assets. Master files are stored in the Smithsonian DAM. Files stored in intermediate storage areas are only for processing and evaluation and are disposed of after they have been ingested into the DAM.

Current staffing is two and a half archivists and two digital imaging specialists. One digital imaging specialist focuses on scanning full collections, while the other focuses on on-demand single items.

The website is built in ColdFusion and pulls content from a SQL database. Currently they have no way to post media files (audio, oral histories, video) on the external web interface.

They do not delineate separate items within folders. When feedback comes in from end users about individual items, this information is usually incorporated into the scope note for the collection, or the folder title of the folder containing the item. Full size images in both the image gallery and the full collections are watermarked.

They track the processing stats and status of their projects.

Standard Procedures:

Full Collection Digitization:

  • Their current digitization workflow is based on their microfilm process. The workflow is managed via an internal web-based management system. Every task required for the process is listed, then crossed off and annotated with the staff and date the action was performed.
  • Collections earmarked for digitization are thoroughly described by a processing archivist.
  • Finding aids are encoded in EAD and created in XML using NoteTab Pro software.
  • MARC records are created when the finding aid is complete. The summary information from the MARC record is used to create the summary of the collection published on the website.
  • Box numbers and folder numbers are assigned and associated with a finding aid. The number of the box and folder are all a scanning technician needs.
  • A ‘scanning information worksheet’ provides room for notes from the archivist to the scanning technician.  It provides the opportunity to indicate which documents should not be scanned. Possible reasons for this are duplicate documents or those containing personal identifying information (PIP).
  • A directory structure is generated by a script based on the finding aid, creating a directory folder for each physical folder which exists for the collection. Images are saved directly into this directory structure. The disk space to hold these images is centrally managed by the Smithsonian and automatically backed up.
  • All scanning is done in 600dpi color, according to their internal  guidelines. They frequently have internal projects which demand high resolution images for use in publication.
  • After scanning is complete, the processing archivist does the post scanning review before the images are pushed into the DAM for web publication.
  • Their policy is to post everything from a digitized collection, but they do support a take-down policy.
  • A recent improvement was made in January, 2010. At that time they relaunched the site to include all of their collections co-located on the same list, both digitized and non-digitized.

On Demand Digitization:

  • Patrons may request the digitization of individual items.
  • These requests are evaluated by archivists to determine if it is appropriate to digitize the entire folder (or even box) to which the item belongs.
  • Requests are logged in a paper log.
  • Item level scanning ties back to an item level record with an item ID. There is an ‘Online Removal Notice’ to create item level stub.
  • An item level cataloger describes the content after it is scanned.
  • Unless there is an explicit copyright or donor restriction, the items is put online in the Image Gallery (which currently has 12,000 documents).
  • Access to images is provided by keyword searching.
  • Individual images are linked back to the archival description for the collection from which they came.

Improvements/Changes they wish for:

  • They currently have no flexibility to make changes in the database nimbly. It is a tedious process to change the display and each change requires a programmer.
  • They would like to consider a move to open source software or to use a central repository – though they have concerns about what other sacrifices this would require.
  • Show related collections, list connected names (currently the only options for discovery are an A-Z list of creators or keyword search).
  • Ability to connect to guides and other exhibits.

References:

Image Credit: Alexander Calder papers, Archives of American Art, Smithsonian Institution.

Day of Digital Archives

To be honest, today was a half day of digital archives, due to personal plans taking me away from computers this afternoon. In light of that, my post is more accurately my ‘week of digital archives’.

The highlight of my digital archives week was the discovery of the Digital Curation Exchange. I promptly joined and began to explore their ‘space for all things ‘digital curation’ ‘. This led me to a fabulous list of resources, including a set of syllabi for courses related to digital curation. Each link brought me to an extensive reading list, some with full slide decks related to weekly in classroom presentations. My ‘to read’ list has gotten much longer – but in a good way!

On other days recently I have found myself involved in all of the following:

  • review of metadata standards for digital objects
  • creation of internal guidelines and requirements documents
  • networking with those at other institutions to help coordinate site visits of other digitization projects
  • records management planning and reviews
  • learning about the OCR software available to our organization
  • contemplation of the web archiving efforts of organizations and governments around the world
  • reviewing my organization’s social media policies
  • listening to the audio of online training available from PLANETS (Preservation and Long-term Access through NETworked Services)
  • contemplation of the new Journal of Digital Media Management and their recent call for articles

My new favorite quote related to digital preservation comes from What we reckon about keeping digital archives: High level principles guiding State Records’ approach from the State Records folks in New South Wales Australia, which reads:

We will keep the Robert De Niro principle in mind when adopting any software or hardware solutions: “You want to be makin moves on the street, have no attachments, allow nothing to be in your life that you cannot walk out on in 30 seconds flat if you spot the heat around the corner” (Heat, 1995)

In other words, our digital archives technology will be designed to be sustainable given our limited resources so it will be flexible and scalable to allow us to utilise the most appropriate tools at a given time to carry out actions such as creation of preservation or access copies or monitoring of repository contents, but replace these tools with new ones easily and with minimal cost and with minimal impact.

I like that this speaks to the fact that no plan can perfectly accommodate the changes in technology coming down the line. Being nimble and assuming that change will be the only constant are key to ensuring access to our digital assets in the future.

Rescuing 5.25″ Floppy Disks from Oblivion

This post is a careful log of how I rescued data trapped on 5 1/4″ floppy disks, some dating back to 1984 (including those pictured here). While I have tried to make this detailed enough to help anyone who needs to try this, you will likely have more success if you are comfortable installing and configuring hardware and software.

I will break this down into a number of phases:

  • Phase 1: Hardware
  • Phase 2: Pull the data off the disk
  • Phase 3: Extract the files from the disk image
  • Phase 4: Migrate or Emulate

Phase 1: Hardware

Before you do anything else, you actually need a 5.25″ floppy drive of some kind connected to your computer.  I was lucky – a friend had a floppy drive for us to work with. If you aren’t that lucky, you can generally find them on eBay for around $25 (sometimes less). A friend had been helping me by trying to connect the drive to my existing PC – but we could never get the communications working properly. Finally I found Device Side Data’s 5.25″ Floppy Drive Controller which they sell online for $55. What you are purchasing will connect your 5.25 Floppy Drive to a USB 2.0 or USB 1.1 port. It comes with drivers for connection to Windows, Mac and Linux systems.

If you don’t want to mess around with installing the disk drive into our computer, you can also purchase an external drive enclosure and a tabletop power supply. Remember, you still need the USB controller too.

Update: I just found a fantastic step-by-step guide to the hardware installation of Device Side’s drive controller from the Maryland Institute for Technology in the Humanities (MITH), including tons of photographs, which should help you get the hardware install portion done right.

Phase 2: Pull the data off the disk

The next step, once you have everything installed, is to extract the bits (all those ones and zeroes) off those floppies. I found that creating a new folder for each disk I was extracting made things easier. In each folder I store the disk image, a copy of the extracted original files and a folder named ‘converted’ in which to store migrated versions of the files.

Device Side provides software they call ‘Disk Image and Browse’. You can see an assortment of screenshots of this software on their website, but this is what I see after putting a floppy in my drive and launching USB Floppy -> Disk Image and Browse:

You will need to select the ‘Disk Type’ and indicate the destination in which to create your disk image. Make sure you create the destination directory before you click on the ‘Capture Disk File Image’ button. This is what it may look like in progress:

Fair warning that this won’t always work. At least the developers of the software that comes with Device Side Data’s controller had a sense of humor. This is what I saw when one of my disk reads didn’t work 100%:

If you are pressed for time and have many disks to work your way through, you can stop here and repeat this step for all the disks you have on hand.

Phase 3: Extract the files from the disk image

Now that you have a disk image of your floppy, how do you interact with it? For this step I used a free tool called Virtual Floppy Drive. After I got this installed properly, when my disk image appeared, it was tied to this program. Double clicking on the Floppy Image icon opens the floppy in a view like the one shown below:

It looks like any other removable disk drive. Now you can copy any or all of the files to anywhere you like.

Phase 4: Migrate or Emulate

The last step is finding a way to open your files. Your choice for this phase will depend on the file formats of the files you have rescued. My files were almost all WordStar word processing documents. I found a list of tools for converting WordStar files to other formats.

The best one I found was HABit version 3.

It converts Wordstar files into text or html and even keeps the spacing reasonably well if you choose that option. If you are interested in the content more than the layout, then not retaining spacing will be the better choice because it will not put artificial spaces in the middle of sentences to preserve indentation. In a perfect world I think I would capture it both with layout and without.

Summary

So my rhythm of working with the floppies after I had all the hardware and software installed was as follows:

  • create a new folder for each disk, with an empty ‘converted’ folder within it
  • insert floppy into the drive
  • run DeviceSide’s Disk Image and Browse software (found on my PC running Windows under Start -> Programs -> USB Flopy)
  • paste the full path of the destination folder
  • name the disk image
  • click ‘Capture Disk Image’
  • double click on the disk image and view the files via vfd (virtual floppy drive)
  • copy all files into the folder for that disk
  • convert files to a stable format (I was going from WordStar to ASCII text) and save the files in the ‘converted’ folder

These are the detailed instructions I tried to find when I started my own data rescue project. I hope this helps you rescue files currently trapped on 5 1/4″ floppies. Please let me know if you have any questions about what I have posted here.

Update: Another great source of information is Archive Team’s wiki page on Rescuing Floppy Disks.

Heading to Austin for SXSW Interactive

Anyone out there going to be at SXSWi? I would love to find like-minded DH (digital humanities) and GLAM (Galleries, Libraries, Archives & Museums) folks in Austin. If you can’t go, what do you wish I would attend and blog about after the fact?

No promises on thoroughness of my blogging of course. I never have mastered the ‘live blogging’ approach, but I do enjoy taking notes and if the past is any guide to the future I usually manage at least 2 really detailed posts on sessions from any one conference. The rest end up being notes to myself that I always mean to somehow go back to and post later. Maybe I need to spend a month just cleaning up and posting old session summaries (or at least those that still seem interesting and relevant!).

Drop me a comment below or contact me directly and let me know if you will be in Austin between March 10 and 15. Hope to see some of you there!

ArchivesZ Needs You!

I got a kind email today asking “Whither ArchivesZ?”. My reply was: “it is sleeping” (projects do need their rest) and “I just started a new job” (I am now a Metadata and Taxonomy Consultant at The World Bank) and “I need to find enthusiastic people to help me”. That final point brings me to this post.

I find myself in the odd position of having finished my Master’s Degree and not wanting to sign on for the long haul of a PhD. So I have a big project that was born in academia, initially as a joint class project and more recently as independent research with a grant-funded programmer, but I am no longer in academia.

What happens to projects like ArchivesZ? Is there an evolutionary path towards it being a collaborative project among dispersed enthusiastic individuals? Or am I more likely to succeed by recruiting current graduate students at my former (and still nearby) institution? I have discussed this one-on-one with a number of individuals, but I haven’t thrown open the gates for those who follow me here online.

For those of you who have been waiting patiently, the ArchivesZ version 2 prototype is avaiable online. I can’t promise it will stay online for long – it is definitely brittle for reasons I haven’t totally identified. A few things to be aware of:

  • when you load the main page, you should see tags listed at the bottom – if you don’t at all, then drop me an email via my contact form and I will try and get Tomcat and Solr back up. If you have a small screen – you may need to view your browser full screen to get to all the parts of the UI.
  • I know there are lots of bugs of various sizes. Some paths through the app work – some don’t. Some screens are just placeholders. Feel free to poke around and try things – you can’t break it for anyone else!

I think there are a few key challenges to building what I would think of as the first ‘full’ version of ArchivesZ – listed here in no particular order:

  • In the process of creating version 2, I was too ambitious. The current version of ArchivesZ has lots of issues, some usability – some bugs (see prototype above!)
  • Wherever a collaborative workspace of ArchivesZ were going to live, it would need large data sets. I did a lot of work on data from eleven institutions in the spring of 2009, so there is a lot of data available – but it is still a challenge.
  • A lot of my future ideas for ArchivesZ are trapped in my head. The good news is that I am honestly open to others’ ideas for where to take it in the future.
  • How do we build a community around the creation of ArchivesZ?

I still feel that there is a lot to be gained by building a centralized visualization tool/service through which researchers and archivists could explore and discover archival materials. I even think there is promise to a freestanding tool that supports exploration of materials within a single institution. I can’t build it alone. This is a good thing – it will be a much better in the end with the input, energy and knowledge of others. I am good at ideas and good at playing the devil’s advocate. I have lots of strength on the data side of things and visualization has been a passion of mine for years. I need smart people with new ideas, strong tech skills (or a desire to learn) and people who can figure out how to organize the herd of cats I hope to recruit.

So – what can you do to help ArchivesZ? Do you have mad Action Script 3 skills? Do you want to dig into the scary little ruby script that populates the database? Maybe you prefer to organize and coordinate? You have always wanted to figure out how a project like this could group from a happy (or awkward?) prototype into a real service that people depend on?

Do you have a vision for how to tackle this as a project? Open source? Grant funded? Something else clever?

Know any graduate students looking for good research topics? There are juicy bits here for those interested in data, classification, visualization and cross-repository search.

I will be at SAA in DC in August chairing a panel on search engine optimization of archival websites. If there is even just one of you out there who is interested, I would cheerfully organize an ArchivesZ summit of some sort in which I could show folks the good, bad and ugly of the prototype as it stands. Let me know in the comments below.

Won’t be at SAA but want to help? Chime in here too. I am happy to set up some shared desktop tours of whatever you would like to see.

PS: Yes, I do have all the version 2 code – and what is online at the Google Code ArchivesZ page is not up to date. Updating the ArchivesZ website and uploading the current code is on my to do list!