Improve metadata

You can help out preservation of books by improving metadata! First, read the background about metadata on Anna’s Archive, and then learn how to improve metadata through linking with Open Library, and earn free membership on Anna’s Archive.

Background §

When you look at a book on Anna’s Archive, you can see various fields: title, author, publisher, edition, year, description, filename, and more. All those pieces of information are called metadata.

Since we combine books from various source libraries, we show whatever metadata is available in that source library. For example, for a book that we got from Library Genesis, we’ll show the title from Library Genesis’ database.

Sometimes a book is present in multiple source libraries, which might have different metadata fields. In that case, we simply show the longest version of each field, since that one hopefully contains the most useful information! We’ll still show the other fields below the description, e.g. as ”alternative title” (but only if they are different).

We also extract codes such as identifiers and classifiers from the source library. Identifiers uniquely represent a particular edition of a book; examples are ISBN, DOI, Open Library ID, Google Books ID, or Amazon ID. Classifiers group together multiple similar books; examples are Dewey Decimal (DCC), UDC, LCC, RVK, or GOST. Sometimes these codes are explicitly linked in source libraries, and sometimes we can extract them from the filename or description (primarily ISBN and DOI).

We can use identifiers to find records in metadata-only collections, such as OpenLibrary, ISBNdb, or WorldCat/OCLC. There is a specific metadata tab in our search engine if you’d like to browse those collections. We use matching records to fill in missing metadata fields (e.g. if a title is missing), or e.g. as “alternative title” (if there is an existing title).

To see exactly where metadata of a book came from, see the “Technical details” tab on a book page. It has a link to the raw JSON for that book, with pointers to the raw JSON of the original records.

For more information, see the following pages: Datasets, Search (metadata tab), Codes Explorer, and Example metadata JSON. Finally, all our metadata can be generated or downloaded as ElasticSearch and MariaDB databases.

Open Library linking §

So if you encounter a file with bad metadata, how should you fix it? You can go to the source library and follow its procedures for fixing metadata, but what to do if a file is present in multiple source libraries?

There is one identifier that is treated special on Anna’s Archive. The annas_archive md5 field on Open Library always overrides all other metadata! Let’s back up a bit first and learn about Open Library.

Open Library was founded in 2006 by Aaron Swartz with the goal of “one web page for every book ever published”. It is kind of a Wikipedia for book metadata: everyone can edit it, it is freely licensed, and can be downloaded in bulk. It’s a book database that is most aligned with our mission — in fact, Anna’s Archive has been inspired by Aaron Swartz’ vision and life.

Instead of reinventing the wheel, we decided to redirect our volunteers towards Open Library. If you see a book that has incorrect metadata, you can help out in the following way:

Go to the Open Library website.
Find the correct book record. WARNING: be sure to select the correct edition. In Open Library, you have “works” and “editions”.
- A “work” could be “Harry Potter and the Philosopher's Stone”.
- An “edition” could be:
  - The 1997 first edition published by Bloomsbery with 256 pages.
  - The 2003 paperback edition published by Raincoast Books with 223 pages.
  - The 2000 Polish translation “Harry Potter I Kamie Filozoficzn” by Media Rodzina with 328 pages.
- All of those editions have different ISBNs and different contents, so be sure to select the right one!
Edit the record (or create it if none exist), and add as much as useful information as you can! You’re here now anyway, might as well make the record really amazing.
Under “ID Numbers” select “Anna’s Archive” and add the MD5 of the book from Anna’s Archive. This is the long string of letters and numbers after “/md5/” in the URL.
- Try to find other files in Anna’s Archive that also match this record, and add those as well. In the future we can group those as duplicates on Anna’s Archive search page.
When you’re done, write down the URL that you just updated. Once you’ve updated at least 30 records with Anna’s Archive MD5s, send us an email and send us the list. We’ll give you a free membership for Anna’s Archive, so you can more easily do this work (and as a thank you for your help). These have to be high quality edits that add substantial amounts of information, otherwise your request will be rejected. Your request will also be rejected if any of the edits get reverted or corrected by Open Library moderators.

Note that this only works for books, not academic papers or other types of files. For other types of files we still recommend finding the source library. It might take a few weeks for changes to be included in Anna’s Archive, since we need to download the latest Open Library data dump, and regenerate our search index.