I find it interesting to have discovered both Freebase.com (an open shared database of the world’s knowledge) and the Open Library Demo Site (ultimately intended to be a library of all the world’s information about all the world’s books) in the same week. They are both counting on crowdsourcing to populate large databases of information – all of which will be available for use by the general public. Freebase.com’s data is all licensed under Creative Commons CC-BY while Open Library describes that the data must be “a product of the people: letting them create and curate its catalog, contribute to its content, participate in its governance, and have full, free access to its data”.
For the Open Library project, the creators “wrote a new type of wiki that lets users enter structured data”. They have a page showing the data structure for bibliographic items on a Schema page. They also document their creation of a database framework called ThingDB which was designed to hold huge quantities of records, “hold arbitrary semi-structured data” and handle history/versioning.
For Freebase, the the data structure itself is evolving – and alpha users have a major hand in guiding that evolution. They have three major building blocks: Topics, Properties and Types. A Topic may be anything about which you want to create a Freebase entry. A person, place, thing, or idea. Properties are exactly what they sound like – such as the name of a person, the location of a place or the material a thing is made of. The Types are what make it all interesting. A Freebase Type is a set of properties. For any Topic that is about a person it makes sense that you want the easy option to add all the ‘person’ related Properties to that Topic easily. But a Topic can have more than one Type associated with it – and a Property can be populated by values that themselves are (or become) Topics. I would give you links to examples in Freebase, but to get your fingers in the mix at this point you need to put your name in the hat and wait for them to invite you to play with the alpha version.
Let us consider Books. Freebase already has the notion of an Author. Any Topic associated with the Author Type automagically can have books the person wrote associated with them as values for a multi-value property — and each of the books added will instantly become a new Topic (or you can pick them from a list if Freebase already knows about the book) with all the Properties you need for a book (and yes – it already knows who the Author is if you added the book from the Author’s page).
At the time I wrote this post, the Freebase topic page for Mark Twain shows the following associated types:
- Person (People)
- Film Writer (Film)
- Author (Publishing)
- Deceased Person (People)
- Influence Node (mikelove’s types)
- Book Subject (Publishing)
The values in parentheses above are the Domains to which a Type belongs. You will note that the ‘Influence Node’ Type is associated with a domain called mikelove’s types. This is because Freebase lets individuals create new Types. These types can later be promoted for use by anyone – and if I am reading the help properly, can be ‘published’ for others to use even before they are ‘promoted’ to be belong to an official Domain.
For Mark Twain, each of the Types listed above brings the opportunity to populated various Properties of structured data. Here are all the structured data Properties available for population for Mark Twain (some with their current values):
- Name: Mark Twain
- Description: currently imported from Wikipedia
- Also known as: Samuel Langhorne Clemens
- Gender: Male
- Date of Birth: Nov 30, 1835
- Place of Birth: Florida, Missouri
- Country Of Nationality: United States
- Parents: Jane Lampton Clemens, John Marshall Clemens,
- Height (meters)
- Weight (kg)
- Date of Death: 1910
- Place of Death: Redding, Connecticut
- Cause Of Death
- Date of burial
- Place of burial
- Web Links
- Employment History
- Film Writing Credits
- Books Written
- Short Stories Written
- Essays/Articles Written
- Influenced By
- Books About This Topic
- Short Works of Non-Fiction About This Topic
In contrast – take a look at the Open Library page for Mark Twain – these are the structured data elements I spotted (populated or otherwise) on that page:
- Text Entry
- November 30, 1835 – April 21, 1910
- Related Authors
- Location: Elmira, New York (buried)
- Alternate Name: Samuel Clemens
- Books by this author
I wonder how much of the book/publishing/literary world of data will be duplicated between Open Library and Freebase. They are both very young (‘alpha’ for Freebase vs ‘early technology preview’ for Open Library) and both are throwing open their arms for help. Open Library has a special page about the librarianship and another about how you can help. If you can get the secret pass into Freebase Alpha, there are plenty of discussions, examples, demos, help and opportunities to contribute. (I set the value of the Gender Property on the Mark Twain record while pondering it for this blog post).
The biggest difference between these two projects relates to what each is trying to accomplish. Freebase wants to be a super flexible universal database of knowledge that can be used to power applications (and they have tons of tools and APIs aimed at developers). Open Library is all about books, books, books – and all things related to them. Freebase has many Topics assigned the Book Type (8,749), but has even more associated with Person (355,359), Restaurant (99,971), City/Town (59,848), Company (20,405), Film (22,641) and many more.
Freebase is the growing creation of Metaweb (made up of veterans of Netscape, The Internet Archive, Alexa, Tellme, Intel and Broderbund) – . The Open Library Demo includes a tidy list of the people who created the it. I noticed Alexa and The Internet Archive on both lists – small world.
I love the idea of Open Library (and will enthusiastically follow its progress), but personally I am more excited to work on Archives related data in Freebase. I am pondering the creation of a new Archives Type and an Archival Collection or Archival Finding Aid Type. Sound like fun? Ask for an Alpha account and let me know when you get one … I would love to brainstorm this with others of a like mind.
- Pondering Structured Data About Archives: Archives Wiki, Freebase and OCLC’s World Map & WikiD
- Book Review: Dreaming in Code (a book about why software is hard)
- Overview of Partners for Preservation
- Freebase Parallax Search Interface: Exploring Olympic Games Facts