We Need Open Standards in Webcomics
Submitted by Joey Manley on November 19, 2006 - 11:58
Before getting too far into this, a clarification: open standards and Open Source are not synonymous. Both are objectively good things, and many, but not all, Open Source projects also conform to open standards. Here's the difference, as I understand it: Open Source projects allow programmers to share actual source code, the internal stuff that makes programs work, with one another, and improve upon one another's ideas; open standards, on the other hand, allow programmers to write programs that interact well with programs written by others -- without necessarily having to have an understanding of the internals of the other programs. HTTP, which defines the transport mechanism for web pages, for example, is an open standard. Internet Explorer, a non-Open Source program from Microsoft, can implement HTTP to talk to the talkaboutcomics.com web server, which runs Apache, an example of Open Source software -- and Apache, in turn, can talk back to Internet Explorer using the HTTP open standard. Both web browser and web server are key pieces of software in your experience of browsing the World Wide Web. Neither has to know how the other works, in order to be able to work together -- and you, of course, don't have to know anything about any of that, more than likely. Which is probably why there hasn't been a lot of action on open standards in the webcomics world. Comics creators, after all, are artists. Open standards aren't about artists communicating with people (which is what you do when you make your artistic choices -- choices which should always be left infinitely free) -- open standards are about computers communicating with other computers and computer programs.
But if you're a webcartoonist, you should care about open standards, especially if you are using a webcomics automation system to run your comic, like my own WCN, or one of the other competing hosting/automation platforms, or even one of the many Open Source install-it-yourself systems. These are all examples of "content management systems." Unlike the "good old days" of hand-writing and hand-linking every HTML page yourself, a content management system runs like a machine -- you input the stuff that matters (your comic, your formatting choices) sometimes spending hours to get it just right, and developing, over time, an immense database, then the content management system creates your website for you, generating and linking all the pages automatically. For most webcartoonists, especially those carrying vast archives, using a content management system is the only way to go. There are obvious drawbacks, given the state of webcomics content management systems as they exist today (yes, even my own), but the benefits are even more obvious.
One of the benefits, which has not yet been fully realized, or even, frankly, touched upon much at all, is the possibility that those of us who create content management systems for webcomics could develop a body of open standards which would make life easier for webcomics artists. I see two areas where open standards could come in very, very handy indeed:
1. Searchability and Findability
More on those two below:
Have you ever heard of microformats? The idea, if I understand it correctly, is that the actual meaning of stuff from the Internet can be made more transparent, to other computers, by developing simple, universally-defined markers for particular kinds of content, so that, for example, when the Google searchbot comes upon a web page that also happens to be a review, or a job posting, or a resume, it can be automatically indexed as such, and made available specifically to people who are looking for reviews, or job postings, or resumes, more immediately and reliably than the usual fuzzy language interpretation that "intelligent" searchbots try to perform today to categorize content along those lines. There are solid microformats for things like contact information (digital business cards, you might say) and calendars, and draft specs for things like reviews and resumes. There are plans for more. If we had a webcomic microformat, webcomics search engines and external portals, like OnlineComics.net and The Webcomic List. could take a quantum leap forward, no longer depending on human data entry from readers and cartoonists, but going out and discovering new webcomics (and their RSS feeds -- but that's a subject for another day) automatically.
Likewise, we need some means of sharing webcomics transcriptions and other metadata. At present, most webcomics don't offer transcriptions, of course. That needs to be fixed. What's more, the only automated way to offer transcriptions, OhNoRobot, is a sealed black box, from a computer's perspective -- available to humans to play with, but not fully interactive to other computers and (specifically) difficult for other search engines and content management systems to extract data from and submit data to. I've had some preliminary conversations with Ryan about this (they were dropped because I got busy -- totally my fault, not Ryan's), but there's a lot of work that needs to be done on this issue. The people who make webcomics automation software, like me, need to work hard to make transcription a key part of our projects, either using OCR, having the cartoonists enter transcriptions by hand when uploading the comic, or relying on fans to input transcriptions (the way OhNoRobot does), or some combination of the three. But that's just the beginning. There also needs to be an automated way to extract the transcriptions from the black boxes where they're stored -- ONR being the only significant such black box so far -- so that, say, instead of building a transcription engine within WCN itself, I could, if I wanted, just have WCN connect automatically to ONR and extract the information, for my own indexing and search functionality. Or, assuming that I decide to build my own transcription storing house (which I do), ONR should be able to connect automatically to WCN and do the same. And maybe there's some way to synchronize the two, so that only one transcription is ever the "official" transcription of a comic, and any search engine or content management system participating in the open standard knows exactly which one it is, and where it's stored, and how to access it and index it.
At some point, Ryan and I will probably make this happen for WCN and ONR. But without greater participation from others -- from Comic Genesis and DrunkDuck and the myriad standalone webcomics automation softwares out there, as well as other search engine developers, whether within the webcomics community or from outside the webcomics community -- any cooperation Ryan and I manage to accomplish will only serve to make WCN better (and ONR better) but will not really help webcartoonists and webcomics readers, except for the ones who happen to use our own particular websites.
What we really need is an open mechanism for the creation, formatting, discovery, extraction and synchronization of webcomics transcriptions and other metadata across as many networks and systems as possible. This will make the development and popularization of more webcomics-specific search engines and portals feasible, and that's a good thing. But maybe even more importantly, it will give us a better way to communicate with the Googles, Yahoo's and Ask.com's of the world. Presently, any given example of a webcomic, whose content is "locked" within an opaque (to search engine bots) image file, operates at a disadvantage over, say, a blog entry, or some other standard text-based nugget of content, when it comes to searchability and findability.
Some artists try. They stick transcriptions in their meta headers. Others use "alt" attributes for their image tags. Others just copy/paste the transcription into the visible part of the page, beneath the comic itself. And so on, and so on. Humans can handle that kind of thing -- variable information in variable places. Computers cannot -- or, at least, they're not very good at it. They work best chewing up controlled, well-defined formats and spitting them back out in an index. An open standard, providing predictable and repeatable ways for computers to extract and "understand" the textual content from a webcomic, would go a long way toward solving the searchability/findability problem, for the Googles, for the webcomics-specific projects, and for individual people, too.
Now on to number two: portability. In this case, I'm not talking about being able to read comics on cellphones or PSP's or whatever. What I mean is the ability of a cartoonist to easily pick up his/her webcomic archive, and move it from one automation system to another, or even just to store it on his/her own hard drive for backup purposes. Webcomics automation systems need to be able to spit out their entire archives, to the owner of the material, on demand, and in a format that allows for immediate upload to another system or restoration on the current system, no questions asked. Those of us who build webcomics automation systems work hard, and we deserve some perks (and some of us get quite a few), but one of those perks isn't a lifetime entitlement to our current list of users. But any webcartoonist who has been using a particular automation system for, say, five years, will find that switching systems is difficult as hell. It shouldn't be this way.
I've looked into providing portability on WCN. At the very least, I hope to provide backup-ability (a backup which includes the image files, of course, as well as all the commentary, metadata, transcriptions, and organizing structure of the archive itself) -- but my ability to do so only extends to the WCN system itself. That is, I can let you download your archive, but the only system you'll be able to re-upload it automatically to is another WCN-managed system (Modern Tales, say, or Rocket Pirates). That's because there's no standard format, understood by all the other webcomics automation systems, in existence. It's true that, as a developer, I could spend my time creating a way to output your data to Comic Genesis' date-based filenaming system, or whatever -- but that's not feasible for me. For one thing, I could only support so many particular formats, meaning that a few would be privileged over the others -- strengthening a couple of key competitors, and helping to drive the top 3 hosting sites (mine, Keen's, and DD) even higher up on the food chain than they already are, while leaving start-ups and less popular sites in the dark, still. Two, there's the high probability that Keen or DD could change their format -- either deliberately or incidentally -- breaking my ability to provide automatically uploadable archives. Until there's an open standard, it's not in my interest to do any of this. Once there's an open standard, it's very much in my interest to do this -- because everybody else will be doing it, too (or will be failing in the marketplace, assuming large numbers of cartoonists are educated about the importance of portability and backup-ability) -- meaning that every automation system will be competing directly on its own merits, not on its ability to lock in cartoonists with large archives and making it difficult for them to move away.
So there you go. Thoughts?