The Trouble with Words: Word Analyses through Data VisualizationCreated by Cassidy. Last Updated on May 3, 2017.
Word Analysis in DH
When thinking about comic books, one’s first inclination is not to picture graphs, charts, or word maps. Yet, data visualizations can be incredibly useful when it comes to analyzing the textual information in the comic books. When we decided to collect data on Michigan State University’s comic collection — a corpus of over 250,000 items, which were narrowed down to just over nine hundred items before 1938, when the Copyright laws were put into place. But what was in these comics? The idea of translation was a main concern for our project because we had to make decisions on whether to foreignize or modernize our texts and data.
We worked through multiple data visualizations such as, publishing location maps, and comic distribution timelines. However, we felt it would be particularly beneficial to our project if we included data visualizations as a part of our textual analysis of the comic collections, both with the MSU collection as a whole, and with a sample of five comics from the class collection. By using the Voyant database, trends, ideas, and relationships can be found among the comics. Through these analyses, we found certain benefits and difficulties that contribute to the translation process such as misinterpreted data, social norms, and the use of slang or colloquial language. Exploring the different theories of individuals such as Venuti, Rosenzweig, and Tufte help explain the varying viewpoints of translation in regard to fluency and the integrity of the literary work. This information will then be related back to the digital humanities and our project, to further supplement it. In order to discover the relationships between comics, we used the word-analyzing database, Voyant, since it compares word frequency and language trends. We chose to look specifically at word frequency in MSU’s Special Collections’ stock, the class collection, and in several of our sample comics. Through these frequencies we hoped to gain an idea of common themes and topics from the 1920’s and 1930’s comics. we had hypothesized that certain words would be more frequent than others, which would indicate the tropes and language that were used during that time period.
We felt Voyant would be particularly useful for the Michigan State University comic collection as a whole because through exploring the collective titles of the comics, we could see the most frequent words, which would indicate the genre or topic types people enjoyed reading most. While working with the MSU corpus from Erin’s data, we used the Circus feature in Voyant that created a word map of the top fifty words used in the nine hundred comic titles. The most used word in this case was “little” with a frequency of thirty-nine. This could indicate that the majority of the comics had something to do with children or smaller characters. Could this have been more appealing to readers? Proceeding “little,” other comic genres such as “cartoons,” “pictures,” “history,” “comics,” and “book” were also frequently used. While Circus was most comprehensive and straight forward, we tested other features such as the Terms Berry style which offered frequent words and linked them with other words commonly used with them. However, we felt the Circus word maps were far more clear to understand, so we proceeded with that feature.
We found that there were numerous names included in this word map of both authors and characters. We did not choose to edit out reoccurring names from comic editions, nor did we edit the data to only show names; however we would predict that one would find that the comics are mostly male-based. Even by looking at this word map, there is a male dominancy of sixteen male-oriented words like names, (Walt, Jack, Buck) and nouns (men, man, father) compared to an arguable two female names (Terry, Tracy). Gender roles were very much a norm of society in the 1920’s and 1930’s, so it is not surprising that men seem to be favored and advertised in these comics. It was quite acceptable during this time period to feature women only in the kitchen or cleaning the house, while the men were the bread winners and took care of business. In “Mr. and Mrs.” for example by Briggs, there wasn’t one scene where Vi left the house or wasn’t complaining about some chore, while her husband Joe grumbled about her neediness and worked long hours.
Voyant Page for MSU Comic Collection
While using Voyant, we found several benefits and drawbacks when using the tool. We found that it works very well with a large corpus like the MSU comic collection, the bigger the data sample, the broader the scope. On the other hand, Voyant can be helpful when looking at smaller corpuses too because one can compare between specific pieces. We analyzed the text from a portion of our class comics to see the relationships between works; this sample included “Mr. and Mrs.,” “Ain’t it a Grand and Glorious Feeling,” “Adventures of Peck’s Bad Boy,” “The Mischievous Monks of Crocodile Isle,” and “Charlie Chaplin’s Comic Capers.” Once again, it seemed to be a fairly male centered sample with “Pa” being the most commonly used word throughout the comic texts along with “boy” and “Pop.” We found the Circus map to be very comprehensive for readers, however the stopword list that enables you to discard of common words like “it” and “got” would not save the words that we added to the list. This glitch was insignificant to the overall function of the website, but it made it difficult to form sufficient conclusions about the texts’ data.
Another problem our team ran into involved the actual texts themselves; while transcribing the five comics from our class sample, we discovered that all of the comic stories included in “Ain’t it a Grand and Glorious Feeling” were exactly the same as the comic stories in “Mr. and Mrs.” This meant that there would be twice as many of the words included in those comics in the class sample, which could throw off our conclusions. We entered both scenarios into Voyant, one including “Ain’t it a Grand and Glorious Feeling” in the corpus, and one excluding it. There wasn’t much of a difference in the word maps as “Joe,” “oh,” and “ha” were the most frequently used words throughout the corpus whether “Ain’t it a Grand and Glorious Feeling” was included or not.
Five Comics - class sample - most frequent fifty words (including “Ain’t it a Grand and Glorious Feeling”)
Five Comics - class sample - most frequent fifty words (excluding “Ain’t it a Grand and Glorious Feeling”)
Translation in DH
The information gained from the word maps leads to the other question we had about slang and syntax in our transcriptions. While the majority of our comics used comprehensive English, there were moments in all of them where words like “ha-ha” — more of a sound than a word — or “s’alright” — colloquial language— were used. While transcribing, we had to ask ourselves if we should in fact include words that gave an idea of emotion or syntax like the “dee-liberately” exampled in “Charlie Chaplin’s Comic Capers.” However, this was a mild language barrier to overcome; four out of five of our sample comics were intelligible, whereas “Adventures of Peck’s Bad Boy” contained a vast amount of invented words such as “ijot,” “skiddoo,” and “unreasonist.” Not only did we need to worry about including words such as these in our corpus as comprehensive words, but we also needed to worry about whether or not Voyant, or perhaps other text analyzation tools, would even recognize it as the English language. How would one be able to distinguish invented words between typos? How would regular typos affect our corpus?
Individual analyzation of “Adventures of Peck’s Bad Boy”
All of these questions are key components to the idea of translation in the digital humanities. Translations are the replacements of the linguistic and cultural aspects of a foreign text that have been reinterpreted in comprehensive ways for the translating-language readers. “Successful” translations have been thought of in the Anglo-American world as the result of successful transparencies, which, as Lawrence Venuti describes in his article The Translator’s Invisibility are, “the effects of a fluent translation strategy, of the translator’s effort to ensure easy readability by adhering to current usage, continuous syntax, and fixing a precise meaning,” (Venuti, 1). Transparencies go hand in hand with the process, as Venuti refers to it, of enacting “violence” through translation. This violence often wrestles with respecting and acknowledging the pre-constructed beliefs, morals, and customs to the foreign texts’ cultures. Typically, this idea is used in translating different languages “correctly,” however we analyzed English comics. These theories have been explored over many years by multiple people such as Lawrence Venuti, Eugene Nida, Franz Rosenzweig, Lauran Klein, and Edward Tufte. There seems to be a debate over how translations should be created, either through “foreignizing” or through modernizing the text. Foreignizing refers to translating a text so it maintains the integrity of the original text; slang is left in the translation, and as Nida describes, generates, “an equivalent effect in the receiving culture” (Nida). However, modernizing the text refers to translating it so it is more convenient to use and more compelling to read; this often includes changing slang to modern words and removing extra bits of information that are otherwise irrelevant to the text. While Nida and Tufte both believed in a system of translation that foreignized the text and maintained its authenticity, Venuti, Klein, and Rosenzweig preferred an authentic approach, but for modern viewers, as a way of looking at the texts differently.
These theories helped shape our own word analysis of the comics’ text. While we chose to include slang and stay consistent with the integrity of the text, we also chose to exclude additional bits such as publishing information and author’s notes. While we sit more on the foreignizing side of the spectrum, we believed there was no right or wrong in either theory, we chose to take the happy medium road by including a little of both foreignizing and modernizing strategies. Foreignizing, in our case, would include transcribing all of the text as written, and modernizing included deleting publishing information from our transcriptions and excluding repeated text. These techniques allowed us to make some conclusions about the sample comics which are consistent with the MSU corpus. While the two corpus sizes offer different relationships in terms of scale, we can see that the majority of the comics from the 1920’s and 1930’s were generally male-specific, however many of the comics’ titles contained descriptive genre types like “cartoons” and “history.”
Voyant has the capability of producing numerous data visualizations involving the text. While we experimented with other features, I found the Circus feature to be the most helpful asset for our project; however the stopword glitches made it difficult to receive the most sufficient information. Through the data visualizations, one can gain a general feel for the comics’ topics, whether it’s from the MSU corpus and its genres, or whether it’s from the individually analyzed comics from their character names or slang. The idea of translation was considered during this phase of the project because we had to make decisions on whether to foreignize or modernize our texts during transcriptions. The ever continuous question of accessibility carries on through this process; should we have kept the transcriptions one hundred percent authentic, or should we have made them one hundred percent modern and comprehensive? In this project we tried to sit in the middle of both which hopefully made the texts informative, but authentic. Though there is no right answer, we are at least one step closer to achieving the ultimate accessibility in our data visualizations of the comics.