At a certain level, your content is probably nothing like my content. At another level, your content is exactly like my content, assuming your content is in the form of words printed on a page or displayed on a screen. (If it’s sand painting or butter sculpture, then the following may not apply.)
For our XML project, we wanted to automate the composition of workbooks. We figured that we basically give them away as part of the whole package we were selling, so why spend the time and money redesigning them for every project, and why pay a vendor to set all the thousands of pages (sorry about that, vendors!). The first step in making this happen was to analyze the pages in question.
We needed a reasonable sample size. It might seem like a good idea to grab every product you’ve ever made, but since it’s likely you don’t have the time and staffing to analyze all of that, it’s better to narrow it down. We took the latest published version of all our product lines, all the products in development at the time, and to round it out, some selected samples of older products that had things the newer stuff didn’t. We still ended up with an obnoxious pile of books and files to look at, but it could have been worse.
One of the first things we noticed was that there are different levels of granularity to deal with. There’s the “book-level” stuff, like chapters and sections, and appendices and tables-of-contents and copyright pages and answer keys. The next level down describes what’s on those pages, like introductions or review questions or the selected poems of Emily Dickinson. Below that, there are more generic bits, like paragraphs and lists and titles. And even below that are the inline bits, like bold and italic, which beg the question of why something is bold or italic, which leads to things like vocabulary terms, pronunciations, and book titles. Then there’s the information that doesn’t necessarily appear on pages that we still need to think about in order to use the content. That’s the metadata, like difficulty level, state standards, or stuff that’s not needed for print but is needed for other outputs.
Are we having fun yet? It took several months of analysis, working closely with design and editorial, in order to identify all the different pieces of content, at all these different levels. It might have gone faster without their input, but the result would have been much weaker. And then selling it to them would have been very difficult. Because our design department was involved in identifying all the parts, for example, it was much easier for them to come up with the design solutions needed to accommodate all the parts.
So after all that, we had some cool design sketches, a list of all the building blocks we had identified, and a general idea of how all these pieces might fit together. We printed out all the information, put it in a box, and waited for the XML fairy to come and turn it all into a content model. She never came, so we had to figure out how to turn this information into something you can plug into an authoring tool. And because we’re publishers and not, under any stretch of the imagination, programmers, we had to hire somebody to do it for us.
Our next installment will discuss the two vendors we used at the beginning of the project. I won’t name names, but rather will explore the two approaches they took. This will also open the topic of where to put all those adorable XML files once you start making them.
Our next installment will also come in two weeks, as next week I’ll be in sunny Phoenix attending several spring training games, and presumably consuming a great deal of the fine food and beverages the Valley of the Sun has to offer. And I’m not bringing my computer or thinking about XML. Even if someone asks (and you know they will).
Filed under: XML |