Metadata Hell

[Metadata screencap]

I’ve been in metadata tagging hell.

I updated iTunes a few weeks ago and somehow managed to wipe out my music library. Not to worry — it was only metadata that got lost, not any files. So reconstructing the library wouldn’t be difficult, just tedious.

I’ve used Musicbrainz to manage the tagging of my files since 2006, but the service has expanded greatly since then, supporting a number of fields that weren’t available when I first ran my library through its database.

So I thought I’d take the time to re-tag my collection and perhaps contribute a few more edits to Musicbrainz itself.

The effort reminded me of a big blind spot that the recorded music industry refuses to acknowledge — the lack of an industry-wide standard for metadata.

Yes, I know there are technical specifications for CD-TEXT, and developers of audio formats have their own specs for metadata. But that fragmentation is exactly the issue, particularly when it comes to compilations, classical and even world music. Throw in internationalization and localization, and things get really hairy.

And these efforts are being led by technologists, not the recorded music industry itself. Rather, the technologists are extrapolating their specifications from generally-accepted practices which have no documentation (that I know of, at least.)

What you get is a lot of redundancy. Musicbrainz, freedb, Discogs, Rate Your Music, Collectorz — all of these services depend on user-contributed data, and each have their own schema and style guide for data. None of them can consult an open standard to resolve questions such as how to prioritize the billing on classical releases, or how to group solo artists who make a duet album, or how to capitalize English-language titles on international releases where capitalization is stylized.

Perhaps Gracenote, the commercial data juggernaut, may have the capacity to normalize this data, but I doubt it. The lack of an open standard still means Sony Classical can submit data to Gracenote according to its style, while Deutsche Grammophone follows its own standard. Just because a data field is available doesn’t mean users are going to put the right information in there.

I’m not the first person to suggest centralizing all this metadata, but I won’t go so far as advocating for an industry-provided database. Big labels and technology companies don’t exactly have a warm relationship. Remember the Secure Digital Music Initiative?

But a standard would be nice. Maybe even an international standard. Just something to which people can refer and know that from service to service, the data will be somewhat clean.

As it now stands, it’s up to listeners to provide data to all these services, and it’s tiring dealing with these competing schemas and all the bad input they invite.