Metadata World Building: Freebase.com and OpenLibrary.org

July 17, 2007 10 Comments

I find it interesting to have discovered both Freebase.com (an open shared database of the world’s knowledge) and the Open Library Demo Site (ultimately intended to be a library of all the world’s information about all the world’s books) in the same week. They are both counting on crowdsourcing to populate large databases of information – all of which will be available for use by the general public. Freebase.com’s data is all licensed under Creative Commons CC-BY while Open Library describes that the data must be “a product of the people: letting them create and curate its catalog, contribute to its content, participate in its governance, and have full, free access to its data”.

For the Open Library project, the creators “wrote a new type of wiki that lets users enter structured data”. They have a page showing the data structure for bibliographic items on a Schema page. They also document their creation of a database framework called ThingDB which was designed to hold huge quantities of records, “hold arbitrary semi-structured data” and handle history/versioning.

For Freebase, the the data structure itself is evolving – and alpha users have a major hand in guiding that evolution. They have three major building blocks: Topics, Properties and Types. A Topic may be anything about which you want to create a Freebase entry. A person, place, thing, or idea. Properties are exactly what they sound like – such as the name of a person, the location of a place or the material a thing is made of. The Types are what make it all interesting. A Freebase Type is a set of properties. For any Topic that is about a person it makes sense that you want the easy option to add all the ‘person’ related Properties to that Topic easily. But a Topic can have more than one Type associated with it – and a Property can be populated by values that themselves are (or become) Topics. I would give you links to examples in Freebase, but to get your fingers in the mix at this point you need to put your name in the hat and wait for them to invite you to play with the alpha version.

Let us consider Books. Freebase already has the notion of an Author. Any Topic associated with the Author Type automagically can have books the person wrote associated with them as values for a multi-value property — and each of the books added will instantly become a new Topic (or you can pick them from a list if Freebase already knows about the book) with all the Properties you need for a book (and yes – it already knows who the Author is if you added the book from the Author’s page).

At the time I wrote this post, the Freebase topic page for Mark Twain shows the following associated types:

Person (People)
Film Writer (Film)
Author (Publishing)
Deceased Person (People)
Influence Node (mikelove’s types)
Book Subject (Publishing)

The values in parentheses above are the Domains to which a Type belongs. You will note that the ‘Influence Node’ Type is associated with a domain called mikelove’s types. This is because Freebase lets individuals create new Types. These types can later be promoted for use by anyone – and if I am reading the help properly, can be ‘published’ for others to use even before they are ‘promoted’ to be belong to an official Domain.

For Mark Twain, each of the Types listed above brings the opportunity to populated various Properties of structured data. Here are all the structured data Properties available for population for Mark Twain (some with their current values):

Name: Mark Twain
Description: currently imported from Wikipedia
Also known as: Samuel Langhorne Clemens
Gender: Male
Date of Birth: Nov 30, 1835
Place of Birth: Florida, Missouri
Country Of Nationality: United States
Profession
Religion
Spouse(s)
Parents: Jane Lampton Clemens, John Marshall Clemens,
Children
Sibling(s)
Height (meters)
Weight (kg)
Date of Death: 1910
Place of Death: Redding, Connecticut
Cause Of Death
Date of burial
Place of burial
Web Links
Employment History
Education
Quotations
Film Writing Credits
Books Written
Short Stories Written
Essays/Articles Written
Influenced By
Peers
Influenced
Books About This Topic
Short Works of Non-Fiction About This Topic

In contrast – take a look at the Open Library page for Mark Twain – these are the structured data elements I spotted (populated or otherwise) on that page:

Name
Text Entry
November 30, 1835 – April 21, 1910
Genres
Related Authors
Website
Location: Elmira, New York (buried)
Alternate Name: Samuel Clemens
Books by this author

I wonder how much of the book/publishing/literary world of data will be duplicated between Open Library and Freebase. They are both very young (‘alpha’ for Freebase vs ‘early technology preview’ for Open Library) and both are throwing open their arms for help. Open Library has a special page about the librarianship and another about how you can help. If you can get the secret pass into Freebase Alpha, there are plenty of discussions, examples, demos, help and opportunities to contribute. (I set the value of the Gender Property on the Mark Twain record while pondering it for this blog post).

The biggest difference between these two projects relates to what each is trying to accomplish. Freebase wants to be a super flexible universal database of knowledge that can be used to power applications (and they have tons of tools and APIs aimed at developers). Open Library is all about books, books, books – and all things related to them. Freebase has many Topics assigned the Book Type (8,749), but has even more associated with Person (355,359), Restaurant (99,971), City/Town (59,848), Company (20,405), Film (22,641) and many more.

Freebase is the growing creation of Metaweb (made up of veterans of Netscape, The Internet Archive, Alexa, Tellme, Intel and Broderbund) – . The Open Library Demo includes a tidy list of the people who created the it. I noticed Alexa and The Internet Archive on both lists – small world.

I love the idea of Open Library (and will enthusiastically follow its progress), but personally I am more excited to work on Archives related data in Freebase. I am pondering the creation of a new Archives Type and an Archival Collection or Archival Finding Aid Type. Sound like fun? Ask for an Alpha account and let me know when you get one … I would love to brainstorm this with others of a like mind.

Posted in access, appraisal, metadata, software

10 Comments

Jane Stevenson
July 17, 2007 at 10:14 am

This looks very interesting. I’ve signed up for an account. Is Freebase selective or do all-comers get an account? It will be great to add some archival types into the mix.
Jeanne Post author
July 17, 2007 at 10:18 am

I don’t know how selective they are – all I know is that I stuck my name in the magic box and got an invite after not a long time (a week? two weeks?).
Bruce Smith
July 17, 2007 at 10:32 pm

Wow,

Good job on representing archives in the open source world. Thank you.

I am curious, what will you populate the freebase with? Data from where you work? Data that you find on the web? Do you have a particular research interest that you want to share with the world. I find that I often do not quite know how to participate in these projects (such as the archivist’s toolkit) as an individual archivist, because I have no body of archival data to play with (I am not sure if the organizational culture of the archive where I work is ready for this open source world).

I will try to find the time to join, but if I don’t is ‘collection’ the best term for archives? Could it be Archival Unit, Group, or Body?

“Perhaps the only records that do not reflect organic activity are artificial collections of private papers brought together by collectors or by archivists themselves.” -T. R. Schellenberg. 1961. The American Archivist 24(1):12.

Thanks again. And as always, wonderful blog and I love the ReCaptcha!

Bruce
Jeanne Post author
July 18, 2007 at 12:24 am

Bruce,

I don’t have any special access to archival data – but am enthusiastic to both empower those with access to data as well as help support capture of public information already available online.

I guess I had at least two kinds of Types in mind. The first one to include properties of institutions that have archival and/or manuscript collections. This would permit inclusion of information such as collecting policies or specializations. The second would be for an archival unit or collection . I don’t want to recreate entire finding aids (EAD or otherwise) in freebase – but I suspect there is a tidy subset of EAD elements that would be very useful to include as properties of a collection.

This all ties back to my desire to create applications based on structured data about archives and collections – much as I have begun working towards with the ArchivesZ prototype.

I suspect I will be posting more on this soon. Thanks for the feedback – it is very appreciated!
Peter Van Garderen
July 18, 2007 at 3:40 pm

Thanks for this great intro to Freebase Jeanne. As usual, you are on the forefront of exploring new technologies and anticipating how they might be applied to improve access to archives. I applied for a Freebase account over two months ago and only got an invite a couple of days ago so I am not sure how they are handling that. Anyway, this project is amazing. In my opinion their technology and methodology has finally brought the power of the semantic web to the everyday web. Their API, querying language and sample applications are just awesome and the potential is huge.
Jane Stevenson
August 9, 2007 at 11:23 am

Hi Jeanne,

I’ve got a Freebase account now and have found that my service, the Archives Hub, is already represented on there as the data has been taken from Wikipedia. I created a type ‘Archive’ so that I could classify the Archives Hub under this type. But I’m now thinking that we would need Archive (Collection) and Archive (repository). I wondered whether we could think about this together? Terminology might be different in the UK and the US so beween us we could maybe come up with something?

If you’d like to email me its: jane.stevenson@manchester.ac.uk

I’d like to think about the properties to add as well. I must say its quite impressive, although there is old wikipedia data in there that confuses things a bit.

cheers,
Jane.
David Mattison
October 7, 2007 at 4:50 pm

As I’m working on an article for Searcher magazine (http://www.infotoday.com/searcher) about Freebase that I’m submitting on October 15, I’d be interested in reactions to and use of Freebase by archivists and librarians. So far Jeanne’s comment on my blog (http://www.davidmattison.ca/wordpress/?p=2146#comments) is the only reaction I’ve received so far. I also sent the same question to a couple of library and archival mailing lists. I’m also looking at the Domain of Publishing in Freebase as one of the examples for my article and hope to do some work on the archives area. Freebase alpha has some structural defects that need addressing, most immediately in the area of the workflow between the registered user and Metaweb staff. Using discussion forums as the primary method of user submissions seems totally at odds with all the other advanced capabilities of Freebase.
Pingback:Pondering Structured Data About Archives: Archives Wiki, Freebase and OCLC’s World Map & WikiD - SpellboundBlog.com - spellbound by archival science and information technology in the digital age
George Oates
June 25, 2010 at 6:14 pm

Hi there – I wondered if you’d seen the redesign of the Open Library site that went live in early May? We’re also beginning to poke at working with Freebase more directly – they’ve been making use of OL data for some time, and have even (apparently) started helping to spot duplicates etc, which is a struggle we’re still fighting with at OL. The future is bright!
Jeanne Post author
June 30, 2010 at 11:15 pm

@George – thanks for the pointer. I will definitely check it out!

Comments are closed.