The Bits and Pieces II: Content Model

The content model is an XML representation of your content. The great thing about XML is you get to decide what all the tags are and how they all fit together. That’s also the bad thing about XML. There’s no “right” way to do anything.

One approach is to determine whether an existing content model will work for your content. If you’re doing narrative texts, Docbook might be the way to go. Online help or topic-based content might work in DITA. If you’re doing assessment, QTI might work. If it’s all about math, then MathML is something to explore. If you’re doing all of the above plus more (like we are), then a combination of them all could be the answer.

In our project, we looked at all these standards (except DITA which didn’t exist yet) and tried to figure out the parts that would work for us. We realized that nothing fit exactly, and that the language used in the various models didn’t match the language we used to describe our content. That kind of freaked us out a little.

We put all those standards on the shelf for the time being and decided to analyze our content on its own, and tried to describe it using our own vocabulary.

Content analysis is something I learned about while studying linguistics in college. We would take texts and break them down into functional parts, then describe what the pieces were doing. I wrote a brilliant (in my opinion) analysis of the content structure of the LP, CD, and 8-track packaging of Blood, Sweat and Tears’ Greatest Hits album. (Yes, I had a copy of each and liked to entertain my dorm neighbors by cranking the 8-track at random times.) What I found was that each format had specific conventions about how they organized information, and that the information on all three was different. The song titles and cover art were the same, but even those were formatted differently depending on the packaging size and shape. This is an early example of content reuse, I suppose.

I applied that kind of thinking to the products we wanted to convert to XML. I sat down with the design director (who knows why things are formatted the way they are) and editors (who know what the content is and why it’s written the way it is) and pored over thousands upon thousands of pages, looking for similarities in the way content is presented, and the underlying reason that the content is even there in the first place.

This exercise took many months. In the end, we came up with a chart showing all the content chunks, stripped of their presentation attributes (font, size, and so on) and context. These were the building blocks. We theorized that you could take any of these building blocks, and put them together in any order (like Legos) and build any kind of product we could ever possibly want to build.

We found obvious things like paragraphs and heads and lists. We also found that of all the zillions of questions on all the bazillions of worksheets, we really only had about 14 different kinds. They just had different numbers of write-on lines or were stacked in slightly different ways. We also found that wildly different products, intended for different audiences, really had more in common than we thought they would. A paragraph is a paragraph regardless of the topic of the words inside. Multiple-choice questions are universal, whether they’re intended for kindergartners and have pictures of kittens, or for college students and have quotes from Ulysses.

Which brings us back to the whole “make sure your content people know what’s going on” thing. In early discussions with the various product groups, a common refrain was “our content is SO different from everybody else’s, there’s no way we can all use one model.” In our case, that just wasn’t true. When we presented the building blocks and showed how each product group’s content would fit into them, it became apparent to everyone that the nature of the content, if not the subject matter, was universal. We were, after all, presenting educational content to a particular audience, and THAT is what determined the conventions and content organization that we were doing.

Next up: how we identified building blocks.


2 Responses

  1. What is very difficult for “designers” especially is the concept that content is separate from presentation.

    If you’re doing a test booklet, you’ll want to be sure to CODE the answers as either correct or incorrect. In presentation, you might not be noting a differentiation, but it will be beneficial in some other rendering of the data (in the book the answers are hidden, in an electronic version, the answer might suddenly appear once a question has been answered correctly or incorrectly).

    In planning XML it’s also vital it decide how granular you want to make your content. Is it important to code every element as separate or can some content elements just be ignored. What do I mean? An example that I’ve used in the past is a VERY granular XML file for an elementary reading text. In this example the text will not only be used for a book, but will be also used for a teacher’s electronic file. In my example each character is coded for whether it is a vowel or consonant, each word is coded with its parts of speach, suffixes are coded, prefixes are coded, etc. In this manner, you can programatically manipulate the data so the teacher can visually highlight vowels, or show verbs, etc. Again, using this example, you can visualize a VERY granular XML implementation.

    As for standards, different publishers have also come up with different standards. About 8 years ago I was working on a committee with McGraw-Hill to create a standards set of coding for textbooks (I can’t remember the name of the initiative off-hand) and I also was trained on the ETM initiative at Pearson. These two, separate, XML implementations were extremely similar in the kinds of content they were coding, but had different naming/syntax convention. Reallly fun!

  2. […] Streamlining With CinnamonXML Authoring Tools, part 2Try to Tri-fold CorrectlyPeel Me A GREPThe Bits and Pieces II: Content ModelXML Authoring Tools, part 4The Bits and Pieces IV: The […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: