“Language is one of the best data-compression mechanisms we have. The information contained in literature, or even email, encodes our identity as human beings. The entire literary canon may be smaller than what comes out of particle accelerators or models of the human brain, but the meaning coded into words can’t be measured in bytes. It’s deeply compressed. Twelve words from Voltaire can hold a lifetime of experience.”
– Fernanda Viegas, IBM
Something I came across while reading Wired. Viegas runs a site called Many Eyes that lets users upload data and observe how it maps into a data visualization of their choice.
What really intrigues me as an avid writer is the way she is talking about the way words compress data. Often I wonder what exactly makes good writing. Some factors I had looked at before were things like sentence structure, a developed personal style, vocabulary, and in most cases a high concept. Those are all pretty hard things; hard as in they’re easy targets, hard data that we can look at in order to determine something. You can usually apply those criteria to writing and gauge relatively how ‘developed’ it is.
But just like game mechanics, there is always core functionality. Something that operates at a low level, and is harder to locate and describe, especially when trying to gauge quality. Why are some games just fun, and why are some writers just eye opening.
Data compression is an excellent theory (would you call it a theory? It seems to be proven) to apply to writing in order to locate and evaluate those soft targets.
So at its lowest level, what is data compression. You’re taking a good deal of information and pushing it all together into something smaller — when we’re talking about words, smaller means we’re centralizing ‘things’. One word can contain several meanings, because just like compressing digital data, you lose some quality. The cost of compression is the loss of specificity.
This is fascinating because it implies that what a good writer is really doing is arranging pieces of compressed data in such a way that people can unravel, or decompress, that data in such a way that it is pleasing to the human brain. I’m no scientist, but I believe that’s called triggering the ‘aha’ moment, wherein humans think they’re quite clever for figuring something out. Most of the time it’s something they already knew, but just hadn’t had it told to them in such a way before — confirmation of a belief is a strong sensation.
Anyways, I’m wandering — I think this idea of data compression is more or less where the idea of subjectivity was born. If words are our way of compressing things in the natural world, and by compressing them they become more vague, then that means people can interpret those vagaries for themselves.
The same thing goes for art, although I don’t believe I’m quite as qualified to speak about semiotics or visual art. In order to create something usable and transferable, such as visuals like symbols or words in a language, it was necessary to simplify things. This doesn’t mean they don’t contain complexity, it’s just not inherent, but rather implied. A painting of a landscape can evoke different feelings and inspire different thoughts in different people because each person is deconstructing that simplification and creating their own complex model.
This personal revelation has led me to second guess my use of certain vocabulary — we can say that the more complex a word is the less it is compressing data; the more specific it is being. For technical writing that’s great sometimes, but for most of the writing I do I’m trying to stir emotions. If using more simple words means that data is being compressed more, and is therefore left open for more interpretation, then great. Simple words are easier for people to learn, and therefore you have an increased *potential* reading audience.
For a long time I was under the impression that having a wide vocabulary meant learning and using lots of ‘big’, or complex, words. But as I’m learning now, for most of my work, it really means finding the ‘right’ words, not necessarily the most complex.