The content model is an XML representation of your content. The great thing about XML is you get to decide what all the tags are and how they all fit together. That’s also the bad thing about XML. There’s no “right” way to do anything.
One approach is to determine whether an existing content model will work for your content. If you’re doing narrative texts, Docbook might be the way to go. Online help or topic-based content might work in DITA. If you’re doing assessment, QTI might work. If it’s all about math, then MathML is something to explore. If you’re doing all of the above plus more (like we are), then a combination of them all could be the answer.
In our project, we looked at all these standards (except DITA which didn’t exist yet) and tried to figure out the parts that would work for us. We realized that nothing fit exactly, and that the language used in the various models didn’t match the language we used to describe our content. That kind of freaked us out a little.
We put all those standards on the shelf for the time being and decided to analyze our content on its own, and tried to describe it using our own vocabulary.
Content analysis is something I learned about while studying linguistics in college. We would take texts and break them down into functional parts, then describe what the pieces were doing. I wrote a brilliant (in my opinion) analysis of the content structure of the LP, CD, and 8-track packaging of Blood, Sweat and Tears’ Greatest Hits album. (Yes, I had a copy of each and liked to entertain my dorm neighbors by cranking the 8-track at random times.) What I found was that each format had specific conventions about how they organized information, and that the information on all three was different. The song titles and cover art were the same, but even those were formatted differently depending on the packaging size and shape. This is an early example of content reuse, I suppose.
I applied that kind of thinking to the products we wanted to convert to XML. I sat down with the design director (who knows why things are formatted the way they are) and editors (who know what the content is and why it’s written the way it is) and pored over thousands upon thousands of pages, looking for similarities in the way content is presented, and the underlying reason that the content is even there in the first place.
This exercise took many months. In the end, we came up with a chart showing all the content chunks, stripped of their presentation attributes (font, size, and so on) and context. These were the building blocks. We theorized that you could take any of these building blocks, and put them together in any order (like Legos) and build any kind of product we could ever possibly want to build.
We found obvious things like paragraphs and heads and lists. We also found that of all the zillions of questions on all the bazillions of worksheets, we really only had about 14 different kinds. They just had different numbers of write-on lines or were stacked in slightly different ways. We also found that wildly different products, intended for different audiences, really had more in common than we thought they would. A paragraph is a paragraph regardless of the topic of the words inside. Multiple-choice questions are universal, whether they’re intended for kindergartners and have pictures of kittens, or for college students and have quotes from Ulysses.
Which brings us back to the whole “make sure your content people know what’s going on” thing. In early discussions with the various product groups, a common refrain was “our content is SO different from everybody else’s, there’s no way we can all use one model.” In our case, that just wasn’t true. When we presented the building blocks and showed how each product group’s content would fit into them, it became apparent to everyone that the nature of the content, if not the subject matter, was universal. We were, after all, presenting educational content to a particular audience, and THAT is what determined the conventions and content organization that we were doing.
Next up: how we identified building blocks.