Skip to main content

Interactive Comics? by Neil Cohn

Sometimes, when people hear my proposal that the "comics medium" is literally a visual language (VL), I receive a response of disbelief stating something like, "What, do you expect people to carry around little pads of paper so they can 'talk' in comic form?" Statements like this bring up an important aspect of language that is essential to address in visual language studies: the social and interactive role of language. Throughout this article, I will address how the role of social interactivity with regard to comics contributes to a further understanding of visual language.

As humans' primary means of communication, language serves as an important connection between people. Given that most language is produced dynamically in real-time, while drawings usually are not, it is no wonder that the prospect of a visual language might seem perplexing in light of language's social aspects. However, such concerns can be answered by considering where our modern VL finds most of its exposure: comics, a recognizable print culture. In print, VL faces the same restrictions as written language, separating the "encoder" from the "decoder" across both space and time.

This was not always the case. Some early comic authors actually performed their works. In the early 1900's, many well-known cartoonists had vaudeville acts where they would speedily draw pictures on a sketch-pad, canvas, or blackboard, often while performing voices and commentary. These "chalk talks" or "lightning sketches" emerged from Victorian parlor acts in the late nineteenth century, and quickly produced several celebrity performers both in America and Europe.

In fact, it was the connections to these sorts of real-time performances that inspired Winsor McCay (1869-1934) to create his famous cartoon Gertie the Dinosaur, which opened the door for animated features in movie theatres. While performance is not entirely the same as the interactive use of language, it does provide a middle ground between static print and interactive exchange. Moreover, this mixing of verbal and visual expression in real-time reflects a common practice of storytelling that has pervaded human history across cultural boundaries.

Taking a "modality holistic" view of language, these methods should be expected. Real-time narratives often combine all three "modalities" in which language can manifest: spoken, gestured, and drawn. Indeed, spoken language rarely occurs in isolation, save for restricted media such as radio and, in written form, as bare text. In person, most speech is accompanied by gesturing, which occurs systematically with spoken language for various communicative purposes. The gestural code provides a wealth of information in addition to speech, yet it often goes unnoticed because of our cultural perceptions of the dominance of spoken language (which has resulted in pushing public speakers to stand coldly static, or use only artificial looking hand motions). Co-speech gesture shows that language is naturally a "multimodal" act, with meaning created through the combination of modalities' expressions, a predisposition so strong that people often gesture when on the phone, when their communicative partner can't even see them!

When manual expression uses systematized signs with sequential structures—a grammar—it becomes a "sign language," found most often in deaf communities, such as those that use American Sign Language. Likewise, communication with individual drawings occurs frequently, but when put into a grammatical sequence those drawings take on the qualities of a visual language—as found in the social objects of our culture's "comics." Despite the shift towards the interactive poverty of our print culture, it has not restricted multimodality. In print, comics feature both written and visual language, both of which can take on fully grammatical forms—evidenced in their structured sequences.

The sequential aspect to language could in fact play a role in the inability to recognize older artifacts of visual language as well. Because VL is by nature visual, any temporally made sequential production of its form will be preserved statically. Thus, if a person were to "visually converse" in real-time (as we do in spoken language) but drawing only in one physical area, those images would lose their sequential characteristics. The temporal aspect of that succession would be preserved merely as a static representation: only the aftermath of that sequence would be seen. Furthermore, the accompaniment of any spoken words or gesture might also be lost because they are not physically fossilized, negating any semblance of a multimodal act.

Historical findings of graphic representation such as cave paintings, drawings by native cultures, and other such visual expressions could be instances of visual language use, though overlooked because our analysis of them sees only the finality of their static preservation (not to mention our own inability to decode their visual grammar). Our perception of them is lacking in both temporal sequence and understanding of them as potentially multimodal communicative acts. Forms such as these might very well be examples of visual languages, though unrecognizable because of our limited categorization of them. Granted, this does not prove that such fully developed structures existed, though pictures that we consider as static works of "art" may have been part of a multimodal interactive process, accompanied by speech and or gestures.

By comparison, this highlights the features of our own modern VL's structure, which mimics the spatial juxtaposition employed in written language to circumvent such real-time interactivity. By having a sequence of panels, VL in comics takes on the spatial characteristics of writing, making it understandable even when preserved temporally. Interestingly, with digital tools, the preservation of a temporally unfurling sequence has become possible, such as in Scott McCloud's The Right Number.

Our cultural views on the medium could also have contributed to the lack of the interactivity in our visual language as well. The current perception grounds VL users to the roles of a print culture—a static state emerging from an "artist" and interpreted by a "reader," most often in a separate time and place than that of the creation. This is a far cry from an interactive exchange using visual material merely as a tool for expression rather than the expression itself. Culturally, such a position comes from the perception of the form as some sort of "Art" as opposed to "Language," again resulting in inhibiting the social interactivity of VL.

Interactivity in visual creation has also been shown to aid learning as well. Studies have shown that children excel in their image-making ability far more when they are able to see the actual sequential process of drawing, rather than just imitating the style of the final product. Perhaps this is why drawing instructors like "Commander" Mark Kistler are so effective in their methods, as they provide a setting of social interactivity necessary for learning visual creation.

Such findings also lend insight into why researchers have found a drop-off in drawing ability at puberty for most children in our culture—kids are not receiving the necessary stimulus in order to develop productive visual language fluency. In conjunction with the view of Art rather than Language, it also helps explain why a systematic visual vocabulary hasn't developed. Without social interaction providing regularized sustenance for signs to acquire, learners have been free to hit the reset button on their lexicon every time they learn the visual language. In addition, consideration must also be given to the effect that non-interactivity has in respect to time allotted to learning. When children learn spoken language, they constantly acquire it in contact with other users as it happens, and when learning to write they do it in a setting of mandated education with high frequency. With a visual language of a print culture, children must devote a specific time to practicing and developing productive fluency, because such exposure does not simply arise out of everyday human interactions nor from a time of guided learning.

Furthermore, the print culture also limits how many speakers the learner comes into contact with. In order for most people to gain a "voice" in the culture, one first needs to be published. Compare this to the extremely democratic use of spoken language, where everybody can partake in the culture's language simply by coming into contact with another person. One of the promises of web-comics is a leveling of this playing field, allowing a "digital print culture" that provides unrestrained access to other speakers (across unlimited geographic boundaries) without the gate-keeping of a publisher.

A narrow field of speakers is probably one reason why popular authors have such vast influence on other people's styles. If the "voices" are limited to a select group, learners can thus consciously select specific styles they wish to imitate (as opposed to acquiring the general "style" of the group). That is, of course, if learners decide to imitate at all, given the emphasis Art has in on our culture for innovation and individuality.

However, a print culture alone does not limit widespread regularity. Take for example the generalized style that permeates most Japanese comics, with facial features like big eyes, pointy noses and slender chins. Originally, that style stemmed from the "God of Comics" Osamu Tezuka (who himself emulated Walt Disney). First, his single popular "voice" influenced the styles of several others. In time though, it spread to so many people that it no longer could be identifiable as the way a small group of individuals drew, but fossilized as a "manga style" permeating a culture. At this point, new learners (such as the American children now reading manga) become more interested in learning the generalized system, regardless of the individual authors associated to it.

In contrast, American comics authors by and large have styles that slightly resemble those of other authors, but not to the degree of allowing for a complete generalized style. Widespread regularity would have difficulty emerging in a culture emphasizing originality of style. For instance, recall the many Jim Lee clone artists from the early and mid-1990s. These people started out like those who originally imitated Tezuka—they all shared common styles derived from an individual influence. However, unlike the Japanese example, most Jim Lee clones that have survived continued to develop their own individual styles, using his as a foundation for broader personal development. As a result, they might be systematic in their own work, but have only with tenuous relations to the rest of the language group. Thus, though the print culture might play a role in the exposure that individual "voices" have on the language users, it alone does not determine how the learners of the visual language might develop.

At the same time, the print culture could also have contributed to development of a more widespread visual language. The advantage of print is its potential to reach a large quantity of people not in the same time and space as the producer. Though it may have had constraining effects on the actual structure of the VL, mass distribution has offered many people the exposure necessary for the language as a whole to flourish. Learners may need to contribute more to the language's structure during their development than in a real-time exchange, but they are still provided with stimuli that they can use to develop their own fluency.

Think of this in contrast to ancient methods of image-making. Many instances of sequential narratives in the past (at least those that have been found archeologically) take the form of carvings into wood or stone, or paintings onto walls or everyday items like pottery. Speculatively, while other decomposed artifacts might have not survived en masse for our inspection, if these forms were the primary media of use (and not drawn in real time to an audience), it hardly could have served to initiate any sort of mass fluency. Carving visual expressions would require too much time, effort, and labor intensity to be effective across a wide populace, much less in real-time exchange. And, putting visual representation onto walls (often inaccessible like tombs) or everyday items (such as pottery) seems to imply the usage of the form less for direct communicative means and more for ritual or aesthetics (though the categories are admittedly not mutually exclusive).

At the same time, though print might excel at mass distribution, the issue of "media of transmission" could also play a role in limiting our personal interactivity as well. This harkens back to the opening statement about carrying around pens and paper to draw to each other. Practically speaking: what form must VL take in our culture for people to use it interactively? Comparison to another culture's VL might help illuminate the issue further.

Linguist David Wilkins has documented that the Central Australian community of the Arrernte (pronounced "Ar-un-da") create narratives by drawing in the sand, in what could be interpreted as a visual language. These narratives are sequential expressions with highly developed features. Notably, in comparison to our own culture the Arrernte have no writing system, but call their VL "writing" when asked. Their VL seems to have developed without the influence of writing, which perhaps shows in its structure—it does not feature spatially juxtaposed images like in comics. Rather, it is drawn temporally in one location with structurally simple icons. More interestingly though, these narratives are made in real-time exchanges, drawn into the culturally accessible media of sand.

Part of what makes the Arrernte system so effective is that their landscape provides them with the resource for expression—the sand allows for easy real-time interactivity, without the need for extraneous tools (such as pens or paper). Without that accessible sand, their VL would be forced to take on different structural features to adapt to a different ecological environment or might not have even developed as language at all.

In our own context, the main media of expression matches that of a print culture—pens and paper or computers. Most interactive graphic use in our culture comes in restricted forms, such as drawings by teachers on classroom blackboards, or in the game Pictionary. These instances remain lacking the syntactic qualities found in VL syntax though. While there have been some attempts to bring visual language into everyday use through technology (such as Microsoft's Comic Chat and the experimental Japanese ComicDiary) the results have yet to develop a system where the full allowance of creative expression and grammaticality is paired with real-time interactivity.

However, if our current VL has taken on the characteristics of written language, then interactivity could conceivably be sponsored by utilizing that form. We have already seen text used in real-time over the internet by chat programs. Why not incorporate our visual language into this type of interactivity, not in the contextual sense of a program like Comic Chat, but fully integrating word and image communicatively?

Part of the difficulty lies in the structure of the VL itself. Again, we return to the question at the beginning, because simply, drawing full panels in high detail requires time, and real-time language interaction must be immediate. Nevertheless, these factors will no doubt work with each other if allowed to. The visual language will become more capable of interactivity if it is used as such, and interactivity will grow through that change in structure.

No doubt, this perspective on the "comic medium" is a far cry from the positions and attitudes currently surrounding it. Indeed, people may not even want to propagate it in any sort of interactive form! However, within the medium is a chance to go beyond the print culture of comics, carrying the potential to literally converse with another person in a visual language.