One thing on which I think we can agree at the outset is that you, in reading this article, are currently consuming some content.
But what is this thing you’re seemingly soaking up now? What is its nature? How is it different from other things that aren’t content – or in fact is everything you can consume on the internet content? What is content?
It’s neither idle curiosity nor academic interest that’s led me to try and define this term. For those of us engaged in the construction of connected content ecosystems, or even just navigating the world of contemporary enterprise-scale digital content, it’s important to have a precise understanding of just what such ecosystems can contain. In building any type of content network, what is the material of which this network consists?
That, I discovered, is actually the wrong question. But by dint of my efforts to define content I’m now able to ask better questions about content systems, questions informed both by that undertaking and the definition I eventually arrived at.
The story so far
I’m of course hardly the first person to tackle this question, and there’s been many good explorations about the nature of content over the years.
Some of the best treatments hail from the world of content strategy and content marketing. More than a decade ago, Margot Carmichael Lester arrived at this definition:
Content is information presented with a purpose distributed to people in a form through a channel.
Lester’s description is admirable in its concision, and by leading with “Content is information” she captures perhaps the essential characteristic of content in the first three words of her definition. From a content engineering perspective, however, “with a purpose” seems somewhat exclusionary, and there’s ambiguity surrounding the concepts “form” and “channel” that isn’t satisfactorily resolved in her commentary.
In discussing content strategy, I think Camden Gaspar comes closer to a functionally useful definition with this:
Content is information that is relevant in a given context and has a form shaped by the medium through which it’s transmitted.
I like Gaspar’s definition and his elaboration on it in the referenced post, and especially its acknowledgement of the role of context in shaping content.
But both of these definitions omit any direct reference to material of which content is made (for reasons I’ll explain anon, “information” is a laconic and ultimately inadequate description of that source material), and fail to provide a general purpose for its existence. What is it made of, and what does it do?
These are essential questions for a content engineer to be able to answer so they can judge, with precision, whether or not the stuff they’re dealing with (or endeavoring to create) is really content. What’s been lacking, or at least what I’m seeking, is a normative definition of content, one that can be confidently wielded by content architects, engineers, designers and analysts to distinguish what’s content and what’s not.
And in that latter category is data.
Just data is not content
Per the definitions quoted above there’s a pretty broad consensus that data is not in itself content. And that makes sense if you think about data’s place in the DIKW pyramid, where its meaning is well summarized by Jennifer Rowley:
Data can be characterized as being discreet, objective facts or observations, which are unorganized and unprocessed and therefore have no meaning or lack value because of the lack of context and interpretation.
It is through its materialization in a meaningful context that data is transformed into information. Rowley again:
Information is described as organized or structured data, which has been processed in such a way that the information now has relevance for a specific purpose or context, and is therefore meaningful, valuable, useful and relevant.
A couple of quick examples will illustrate this hopefully straightforward principle. You might have, say, a table with some data in it that looks like this.
Useful? I think not. But process that data and present it in a context that provides it with meaning and – viola, you now know to grab an umbrella if you’re heading out in the afternoon.
The importance of the context in which data is materialized cannot be overstated. If I send you here you’ll be little better informed than before you clicked the link. But simply by changing the anchor text of the link I can transform a small bit of data into information.
Mood.
I could keep going. (Snap quiz. Complete this verse: “I see a red door / And I want it painted —–.” Answer.) The point is that when data is made meaningful by being rendered in context, the information that’s conveyed to a person consuming it is dependent on that context. As it pertains to content, the meaning of data is always mutable.
That may seem evident when the material from which you’re crafting your content is limited to an image of a black box, but it’s also true when that source material seems independently comprehensible.
All content is built of data
The source material for content is never itself content, because we cannot know how it will be shaped and the context in which it will be provided. However well-described the data used in a piece of content may be, its informational capabilities remain potential rather than kinetic until it’s made tangible.
Consider this:
Ah now, it was tootwoly torrific, the mummurrlubejubes! And then after that they used to be so forgetful, counting mother-peributts (up one up four) to membore her beaufu mouldern maiden name, for overflauwing, by the dream of woman the owneirist, in forty lands.
“Nonsense!” you say. Correct, but if I tell you this is from James Joyce’s Finnegan’s Wake (which it is), you probably know that it’s well-respected nonsense. By adding an author and title I’ve provided a context that changes how that passage is interpreted. (If you knew it was Joyce – bravo! If you guessed and got it right I hope you performed a fist-pump: like almost ever other person on earth I have not read Finnegan’s Wake.)
Sure, but what about something more evidently self-describing, like a newspaper article? Isn’t it useful to distinguish between content and data when you’re designing content products or managing content models?
No. It his, however, critical that you understand the shape and meaning of the data in order to know how you can use it, and how this differs between the data sources that might be inputs into your content. And coming to this understanding is not only useful, it allows you to take a more effusive view of what your content could contain and the contexts in which it could be provided.
In contemporary practice from the moment of its inception that newspaper article is divided and so becomes divisible. The title, byline, date, text and photos are combined and expressed as an article, but those and other discreet pieces of data can be as easily and selectively combined to generate many different types of content.
(If, as I propose, content is only is what’s actually made tangibly available to a person, then we can thankfully drive a stake through the heart of the concept of metadata as it pertains to content engineering. But as I don’t want to veer too far off course here I’ll save that eulogy for another day.)
All sort of data inputs are combined, too, in generating the context required for us to make sense of media objects (remember this?). Even if you fired up Netflix with the sole intent of watching Don’t Look Up it’s an image or a title or a search result that allows you not only to access the video, but is itself part of the content that’s received by you in the moment.
Yes, when you’ve settled down with your bowl of popcorn and the movie’s playing without content warnings or subtitles on the screen you’re absolutely consuming content. And when you’re comparing notes with your buddy who now lives in Tulsa it’s fair to say you saw the same movie, but more accurate to say you watched different screenings of the same movie.
Content is representational
Whatever the source data for a piece of content, it is transformed by the context of its materialization. It is no longer the thing it was. Content is always a representation of data, no matter how little context is required to provide it as information. Put simply, content is not the thing itself.
This was driven home to me in a tweet from the venerable Emperor Palatine:
In the unlikely event I ever get a Wikipedia page the title would be my name. Which is not content or metadata, it’s my name
Quite so: his name is the thing itself, and is no more content than his date of birth or one of the flowers growing in his garden. But behold, by providing it as something that you can view (materializing it), in a context that gives it meaning (making it information), it hath becometh content.
What you see in big bold type next to the picture isn’t Michael’s name, but a representation of his name.
If it seems pedantic to underline that point (or too Plato Cave-y, or too HTTPRange-14-y), consider the extension of this principle to the picture above. That is not Michael, but a depiction of him. And even though we’re all fetching the same binary that binary isn’t itself content: the content is rather the browser-rendered image that any given reader of this article actually sees.
The representational nature of content isn’t captured in my definition because it’s inherent to the concept of materialization: if, as I contend, content is data that’s manifested in some consumable form it must be representational because the source of the information presented is necessarily transformed.
But I’m underlining this point because it can be a trap in a content ecosystem to consider things like images and videos to be wholly self-describing, to view them both as content and as the thing itself. The meaning of that representation is dependent on the context in which it is materialized – and it’s not content unless it’s materialized.
Content is consumable
Let’s say I have a copy of Moby Dick sitting somewhere on my hard drive. Is that content?
Not for you, because I’m not going to let you browse my hard drive.
It is for me once I open it – if it’s, say, a text file I can open in Notepad. If it’s a .lit file and I don’t have a LIT reader I’m hooped, though – it’s no more content to me than you.
Content is data conveyed as information. There is no such thing as content-in-the-rough: content is the artifact actually consumed by some human, received through the senses and processed in the brain.
I don’t know whether this seems self-evident to you, but coming to this understanding was both liberating and illuminating for me.
Liberating because it’s an Occam’s razor that fundamentally frees an inquisitor from having to fret over the “is it, isn’t it?” question. Some sort of thing in front of me? Content. Some sort of thing in your CMS? Not content.
Illuminating because I think it leads to more interesting and better-structured questions to be asked about that stuff of which content is made.
Which brings me to Michael Andrew’s excellent piece, “Revisiting the difference between content and data.” I wanted to provide context before citing it because where Michael’s approach is nuanced mine, as I’ve just outlined, makes a clear-cut distinction between data and content. Where Michael allows for “limited overlap” of content and data, that’s not possible as I’ve defined content because it’s either been materialized (content) or it hasn’t (not content).
But the way Michael frames and discusses those differences don’t, I think, put us at odds. I’m thumbs-up for most items in the opening compare-and-contrast table, and ultimately Michael’s piece raises many of the same questions I arrived at about the shape and semantics of data in the context of content engineering.
A new way of looking at content
If by this point you’re asking why this all matters, consider the importance for a content engineer of understanding the different inputs and processes at play in the generation of content.
That content is constructed out of data underscores the point that any data can be used as content, and that no type of data need be privileged when it comes to creating useful content.
That content doesn’t exist until it’s actually conveyed to some human underscores the point that the information provided is dependent on the context in which it’s conveyed.
This understanding shifts the focus from what your content is made of to how it all comes together at the moment of instantiation.
It helps us recognize that content is dynamic and experiential, rather than static and detached from the moment of its consumption. It allows us to view content less as an artifact and more as an information experience.
A definition, and some better questions
I said at the outset that when I set out to answer the question, “what is content?” it was in the context of wanting to better understand what I’ve got to work with in a given content production environment.
Insofar as I’ve addressed that question, mission accomplished!
So what is content?
Content is data materialized as information for human consumption.
But now that I’ve arrived at that definition, and feel pretty comfortable with it, it’s apparent that I was asking the wrong question, and one that fundamentally presupposed the answer. Because it turns out that the stuff you’ve got to work with in a content ecosystem isn’t, in fact, content.
Perhaps that wasn’t the wrong question so much as the first question, because the understanding I gained in attempting to answer that it has naturally spawned other questions.
What data is available to me in a given environment that I may use for content? How is it structured? How is it described? What’s required for me to effect the transformation of that data into content?
Ultimately I think these are more useful and interesting questions for a content engineer, but I ones I’ll explore another day – and be able to better answer, I think, by virtue of what I’ve come to understand about the nature of content.
Aaron Bradley is Senior Structured Content Architect at Telus Digital, and chief cook and bottle washer at The Graph Lounge.