The Bits and Pieces II: Content Model

The content model is an XML representation of your content. The great thing about XML is you get to decide what all the tags are and how they all fit together. That’s also the bad thing about XML. There’s no “right” way to do anything.

One approach is to determine whether an existing content model will work for your content. If you’re doing narrative texts, Docbook might be the way to go. Online help or topic-based content might work in DITA. If you’re doing assessment, QTI might work. If it’s all about math, then MathML is something to explore. If you’re doing all of the above plus more (like we are), then a combination of them all could be the answer.

In our project, we looked at all these standards (except DITA which didn’t exist yet) and tried to figure out the parts that would work for us. We realized that nothing fit exactly, and that the language used in the various models didn’t match the language we used to describe our content. That kind of freaked us out a little.

We put all those standards on the shelf for the time being and decided to analyze our content on its own, and tried to describe it using our own vocabulary.

Content analysis is something I learned about while studying linguistics in college. We would take texts and break them down into functional parts, then describe what the pieces were doing. I wrote a brilliant (in my opinion) analysis of the content structure of the LP, CD, and 8-track packaging of Blood, Sweat and Tears’ Greatest Hits album. (Yes, I had a copy of each and liked to entertain my dorm neighbors by cranking the 8-track at random times.) What I found was that each format had specific conventions about how they organized information, and that the information on all three was different. The song titles and cover art were the same, but even those were formatted differently depending on the packaging size and shape. This is an early example of content reuse, I suppose.

I applied that kind of thinking to the products we wanted to convert to XML. I sat down with the design director (who knows why things are formatted the way they are) and editors (who know what the content is and why it’s written the way it is) and pored over thousands upon thousands of pages, looking for similarities in the way content is presented, and the underlying reason that the content is even there in the first place.

This exercise took many months. In the end, we came up with a chart showing all the content chunks, stripped of their presentation attributes (font, size, and so on) and context. These were the building blocks. We theorized that you could take any of these building blocks, and put them together in any order (like Legos) and build any kind of product we could ever possibly want to build.

We found obvious things like paragraphs and heads and lists. We also found that of all the zillions of questions on all the bazillions of worksheets, we really only had about 14 different kinds. They just had different numbers of write-on lines or were stacked in slightly different ways. We also found that wildly different products, intended for different audiences, really had more in common than we thought they would. A paragraph is a paragraph regardless of the topic of the words inside. Multiple-choice questions are universal, whether they’re intended for kindergartners and have pictures of kittens, or for college students and have quotes from Ulysses.

Which brings us back to the whole “make sure your content people know what’s going on” thing. In early discussions with the various product groups, a common refrain was “our content is SO different from everybody else’s, there’s no way we can all use one model.” In our case, that just wasn’t true. When we presented the building blocks and showed how each product group’s content would fit into them, it became apparent to everyone that the nature of the content, if not the subject matter, was universal. We were, after all, presenting educational content to a particular audience, and THAT is what determined the conventions and content organization that we were doing.

Next up: how we identified building blocks.

Cross-Promotional Log-rolling, vol. 3

The new issue of InDesign Magazine is out on virtual newstands throught the cosmos. It sports 14 pages of awesome tips and tricks, info on handing off InDesign content to Flash developers, and a review of InMath by yours truly. So if you ever find yourself having to do this:


Read this. And tell ’em Mike sent ya.

Lunchtime Links

Going to the O’Reilly conference was like going to Supermarket for Lunchtime Links. Grab a shopping cart and we’ll see if we can sneak 15 items in the 10 Links or Less aisle. Check the labels for how many of our items have the magical “social” ingredient. “Social” is the high-fructose corn syrup of new media.

Shelfari is a social network site devoted to reading. You create a bookshelf with areas for the books you’ve read/are reading/want to read. You can write reviews and give star ratings à la Netflix. You can connect with others and share your bookish experiences and discover new things. You can also try to make yourself look smarter and cooler than you really are by putting One Hundred Years of Solitude on your shelf and leaving off Garfield Beefs Up! You’re welcome to check out my Shelfari page, where I will attempt to look smarter and cooler than I really am.

The unfortunately named Bookglutton is an online social reading site where you can read books (mostly public domain oldies) in a window called an “unbound reader.” The book is displayed in the middle, and on either side you can open windows for chat with other people reading the book, or leave/read comments. You can start or join reading groups devoted to authors or subjects. You can also upload your own work for people to find and read. I would’ve called it BookJunky or something.

Feedbooks is a universal e-reading platform for mobile devices. You can download free e-books and share your own content. The thing I’m most curious about: the ability to create your own customized newspapers from RSS feeds and widgets. I love my RSS, but its crying out for something that brings it organization and design.

Bookworm is an O’Reilly site where users can create their own online library and read eBooks on their browser or mobile device. You can store your eBooks on Bookworm and download them when you want to read them in your iPhone (via the Stanza app).

Espresso Book Machine is a print on demand machine that makes paperback while-U-wait. It takes about 4 minutes to churn out an average book. The quality is indistinguishable from something you’d buy in a book store. At the O’Reilly show they had one with a clear side, so we could see how it works. Watching it in action is weirdly hypnotic. It was the most simultaneously amazing and boring experience of my life. (“This is incredible; when will it be over?”) Sort of like watching microwave popcorn. The makers humbly state, “What Gutenberg’s press did for Europe in the 15th century, digitization and the Espresso Book Machine will do for the world tomorrow.”

Buzzmachine is the blog of Jeff Jarvis, author of What Would Google Do? Jeff blogs about new media and the ways in which is is changing (or could change) business, journalism, the universe, you. Lotsa Big Ideers from smart people. Good stuff.

Institute for the Future of the Book is a “think and do tank” based on the premise that print is dead, we need to deal with that and positively shape those tools that will replace it. In their mission statement, they state one of their goals is to build tools for “ordinary, non-technical people to assemble complex, elegant and durable electronic documents without having to master overly complicated applications or seek the help of programmers.” Hmmm, wonder if they’re hiring.

CommentPress is one of the tools created by the Institute for the Future of the Book. It is a WordPress theme that re-orients the comments on the page to enable social interaction around long-form texts.

Safari Rough Cuts is a social, interactive publishing service that gives you access to pre-published manuscripts on technology topics from O’Reilly. Authors submit their working manuscript, which you can read and comment on to help to shape the final book. Call it CrowdEdit.

E-Ink is an electronic paper display technology with a paper-like high contrast appearance, low power consumption, and a taste just like raspberries (just kiddin’). It’s the technology behind the Plastic Logic reader.

IDPF is the standards body responsible for ePub. Lots of publishing companies, technology companines, and publishing technology companies are members. Important because ePub is going to be the standard format for eBooks.

The DAISY Pipeline is an open source collaborative software development project hosted by the Daisy Consortium. It includes includes beta versions of tools for the transformation of documents between different formats: “uptransforms” (non-XML text to XML), “crosstransforms” (XML-grammar to XML-grammar), and “downtransforms” (XML to non-xml deliverable format).

Adobe Digital Editions is a free RIA (Rich Internet Application) for viewing and managing eBooks and other digital publications in ePub and PDF/A formats. Although it’s free, it’s not DRM-free. You can use it with eBooks you download from your public library. Here’s the FAQ.

Bonus Quiz!

The Bits and Pieces I: Making XML

Cutting through the purple prose of my prior posts, we have:

1. Why do you want to use XML?
2. Will using XML actually save or make you money?
3. Do the people who are going to pay for it believe you?
4. Do these people back the project 100%?
5. Is everybody who’s going to be affected at least aware of what you’re up to?

If the answer to all these questions is “yes”, go back and re-read #1. If the answers to all but 1 are “yes” and the answer to 1 is a list of compelling things, then you’re all set to actually start an XML workflow project.

As we’ve established, XML is not magic, or even useful in its raw form, just sitting there, quivering with potential (or botulism). You need to be able to make it, store it, and do stuff with it. (I suppose that applies to lots of things, but here we’re just talking about XML.)

To make XML, you could go a number of ways. You could have content written in a word processor, then use a vendor to turn it into XML. You could also buy something that turns word processor files into XML via stylesheets. I think that’s lame, but sometimes the only way to get content is from an author who uses a word processor and there’s nothing you can do about it.

Why do I think it’s lame? Because word processors are format-centric platforms. There. I said it. You can make stylesheets that describe content, but you’re asking a lot to get people to use them. It’s so much easier to click the Bold button at 3am than to search through 1200 character styles to find the “scientifictermusedinchapteropener” style. And if you fail to capture the content information from the author, there’s no way your vendor or magical stylesheet-to-XML automator is going to be able to do it.

XML authoring tools are equally lame. They’re either so XML-geeky that an author is going to get lost amidst the flashing lights and smoke machines, or so “Friendly” that the authoring experience is reduced to filling out forms, which can be insulting to Someone Who’s Been Doing This For 25 Years (I’ve only Been Doing This For 14 Years and I’m outraged).

Nothing that I’ve seen has been able to find a middle ground. Maybe I’m not as good at googling as I think I am, and there IS something out there that mixes the familiarity of a word processor with the rigor of an XML markup tool (though I found this, this, and this in under 15 seconds, so I’m clearly pretty good at googling).

Until somebody makes the “just-right” authoring tool, we’ll all have to deal with the “existing” ones. I’d think about who your authors are, and how comfortable they are with technology. If your company is like my company, the answer is “they are all different with different comfort levels technology-wise”. If that’s the case then you might need more than one pathway to XML-land.

A handy thing to do is make a list of criteria for choosing an authoring tool. And by that I mean have your authors make a list of criteria for choosing an authoring tool. They’re the ones who are going to be using it all day. They might ask for crazy things like “ease of use” and “intuitive interface” but that’s what you’re going to have to look for.

Find a bunch of authoring tools, download the demos, and . . . what. Start writing in XML? Have your authors try them out? Really? Writing WHAT in XML? I wasn’t kidding about the whole “work up-front” thing. You can’t just buy an authoring tool and start writing. There’s a bit missing. What KIND of XML are you making? Whether you can see them onscreen or not, there are tags in there. What are they?

You, my friend, need a content model.

Is This What a Kindle Killer Looks Like?

One of the coolest things I saw and held at the O’Reilly Tools of Change conference was the Plastic Logic Reader.


As I played with it, two words came to mind: Kindle Killer. Yes, I know you won’t be able to buy a Plastic Logic reader until 2010. Yes, I know Amazon is bigger than the Milky Way, Coca-Cola, and Andre the Giant put together. I also know that what I held was like an iPod and the Kindle, even the much-improved Kindle 2, is like a Zune. During the session breaks attendees were swarmed around the Plastic Logic display asking questions and pawing at the thing. I had to trample to two authors and a developer to get my hands on one.

Plastic Logic is positioning the product as more professional and business-oriented than the Kindle, but from what I’ve seen it’s just a more compelling device, period. In my view, the Plastic Logic reader has three killer advantages over the Kindle: size, touchscreen, and file format support. Plus, I’m betting there’s an ace in the hole.

The touchscreen technology supports gestures for navigation, annotation, and note taking. You can draw on the screen, and attach notes. The touchscreen also allows for a virtual keyboard. I’ve never liked the Kindle’s look because of the keyboard. Maybe I’ve been brainwashed from years of iPod UI, but if it’s a reading device, the vast majority of the surface area should be devoted to reading.

This also relates to the size issue. The 8.5 x 11 display is much more like what I’d want to have for reading a magazine, news, or a complex work document. I know that makes it less portable, but the Kindle is 8 inches tall, so I’m not sticking that in my pocket either. Plastic Logic has a 150 ppi resolution screen (Kindle 2 is 167 ppi) which can be rotated to display content in either portrait or landscape format. Color capability is planned as well. Here are some YouTube videos on the product.

In terms of file format support, Plastic Logic wins too. For reading content, the Kindle 2 supports Kindle (.AZW, .AZW1), Text (.TXT), and Unprotected Mobipocket (.MOBI, .PRC). You can use .PDF, .DOC, and .HTML files only after they have they have been converted to Kindle-readable formats. To convert files you have to either pay Amazon a small fee (ca-ching!), or you have to attach your files to an e-mail that you send to Amazon (privacy? we don’t need no stinkin’ privacy), and they send you a link to the converted file. Come on. I just want things to work. Period. Plastic Logic supports Office file formats, HTML, EPUB, PDF, and more, out of the box.The claim is that it can display any file you can print.

That ace in the hole? Plastic Logic’s eReader already has a flexible screen. It’s just attached to a hard backing. So it’s not too hard to picture a foldable reader evolving from this product. Then you have one killer eReader. Any file format you want, big, color, foldable display in your pocket. Of course, Amazon has walked the walk. The Kindle 2 is out and you can own one. Plastic Logic is still somewhere between drawing board and reality. No word or street date or pricing, but they’re off to a very promising start.

PS: Memo to Amazon documentation department, regarding the 100-page Kindle 2 User Guide. Thanks for making it readily-available. But if you’re not going to put page numbers in the table of contents, for God’s sake give me hyperlinks to the pages. Don’t make me search or scroll up and down to find where a section might be. Never stop thinking UI, people. Thanks.

Reversing the Curve

Think about a typical project. In my experience, they start with planning and prototyping, then go into production, then as production ends all the bits and pieces are sent out for printing or conversion for electronic use, and eventually archiving. At the beginning of a project, there’s a small staff doing the market research and planning, then a little more staff for prototyping, then the whole Machinery of Progress gets thrown at it for the production and post-production phases. Lovely.

Except during the production and post-production phases (or really any of the phases), things happen, schedules get strained or broken, and more and more people and effort are required to keep the whole thing going. People start working overtime, temps might need to be brought in, or parts of the project might need to be sent out. Typical.

And since you don’t just make one product then close for a six-month tropical vacation (and if you do, are you hiring?), the next project has to start during the Crazy Time of the current project. So the planning and prototyping get fewer staff. The next project isn’t as well-planned or developed when production needs to start. More things happen that require rework and more help. And on it goes forever.

I drew a chart of this once (nerd!), with time on the bottom and effort on the vertical axis. As time goes on, effort goes up. The highest effort of one project overlaps with the beginning effort of the next, shortchanging the next project. These curves keep overlapping as projects go on. It’s an endless cycle of sadness.

The use of XML can, in fact, reverse those curves. Throw the effort at the start of a project, and as time goes on, effort goes down, freeing up people to start the next project. If the effort is expended up front, you still have time to make adjustments without blowing up the whole project.

How does that work? Theoretically, XML should help with planning and prototyping, since you have access to all your stuff in an easy-to-reuse format. It should help with production, especially if you’re using it to facilitate page comp activity. It will definitely help with delivery to other departments, vendors, or the archives. Theoretically.

So what’s the up-front effort? Design and templating, which should be part of that up-front work anyway. Content and metadata tagging, which usually isn’t.

In introducing an XML-based workflow, we’re really talking about changing more than just the tools. We also have to change the whole idea of how a project gets done. We’re asking authors, editors and designers to trade up-front effort for a smoother project down the road. We’re asking them to do more work than they might be responsible for now. And it’s not work they signed up to do. This might be the most difficult part of getting an XML workflow up and running. If they’re not 100% on board, and, dare to dream, enthusiastic, then you’ll end up with an expensive system with nothing in it. Or you’ll end up spending time and money getting stuff ready to put in it, which adds steps to the project, and might end up putting you right back in that old way of working, but now with even more work to do in the middle of the project.

Baby, It’s Cold Outside

I got chilled to the bone today and it had nothing to do with the fact it was 9 degrees outside. Nor that polar bears, tired of drifting in the melting arctic, have come to live in my back yard. Hope they don’t eat cats.

No, that stuff is warmth and sunshine compared to this:


Brrr! That was today’s job listings from the Bookbuilders of Boston website. That whitespace under “Design and Desktop Publishing” isn’t just ice buildup on my screen. I have feeling that it’s not particularly new news either. It seems that a desktop publishing professional nowadays has fewer friends than Bernie Madoff on Facebook. Is this what it was like to be a blacksmith in Manhattan right around the time that Model T’s were creating the first traffic jams? I’d better use up the rest of my Quark jokes soon before there’s no one left around to get them.

Two text boxes walk into a bar…

S’cuse me, I need to go evolve.

Lunchtime Links

In anticipation of the O’Reilly Tools of Change Conference, today’s menu of lunch links has a mostly “changey” flavor. As opposed to my usual links, which often taste like chicken.

iPublishCentral is a solution by Impelsys that allows Publishers to upload PDFs and build marketing and distribution tools around them. Everything from Flash-enabled micro widgets to full blown Web portals. Whenever I hear “portal” I still think of the pylons from the Land of the Lost TV show, or the portal into John Malkovitch’s head in Being John Malkovitch. Sadly, uploading a PDF to iPublishCentral will not transport you to a prehistoric jungle. But it may help you sell some eBooks. has an interesting idea for reincarnating your old books: Vintage Publishing Services. Basically you mail them a crumbling, but beloved old tome, they “gently scan” it, and send you back the original, plus a brand spankin’ new copy. You can also get a DVD with the high-res PDFs. The service isn’t cheap, but I find it interesting because I collect old history books, some of which are more than 100 years old and are quite literally turning to dust. Of course, you could always do the scanning yourself, send Lulu your PDFs and save the dough.

Ars technica has a huge and insightful article on the past, present, and future of eBooks. It makes the point that with the slightest effort, Apple could’ve dominated the eBook world with the iTunes store and the iPhone. So why haven’t they? I won’t give away the answer, but it begins with Steve and ends with Jobs.

Woodwing has released Smart Connection 6, the latest iteration of their enterprise publishing platform. I’m interested in checking out the Content Station, a Flex-flavored rich internet application for publication planning and monitoring. Now if we can just get the RIA for authoring…SCE 6 also supports InDesign Libraries and Books. Hallefreakinleujah.

Thenextweb has TwitterKeys which are entities you can copy and paste into your tweets to spice them up with some graphical goodness (aka Dingbats).

Last, but never least, InCopySecrets has the straight dope on the right way to fix missing links to InCopy stories.

The Life of O’Reilly

I’m happy to announce that next Monday and Tuesday I will be in New York attending the O’Reilly Tools of Change conference. With all the bad news permeating the publishing world, it will be a welcome change to be immersed in the technology and the vision of people focused on future of publishing. I’ll be gathering and sharing as much info as I can on some of the most forward-thinking publishing technologies and the companies behind them.

I plan to sample as many sessions as possible, and will be posting and tweeting anything and everything I find share-worthy. The things that directly relate to InDesign, I’ll post at Everything else I’ll post here or tweet here. The official conference Twitter page is here. I’ve already been doing my homework, reading up on the solutions that will be showcased, and there’s some very cool stuff. Should be a lot of fun—and a lot to write about.

The industry is quickly evolving from a linear print-driven desktop publishing type workflow to a collaborative, cross-media delivery of rich content. Publishing is being redefined. “Ecosystems” is the buzzword. No longer is it adequate to put out a beautiful book. You need the beautiful book, plus the Flash-enabled PDF eBook. Make it customized. And throw in a blog, a wiki, and a Twitter page for people to follow. In my mind, the key questions are: what do your customers really want? how do you make it? and how do you make money doing it? And that is what I will seek to find out. Stay tuned.

Introducing XML Into the Wild

You’ve studied what XML is and how it can be applied to your workflow. You’ve made a stunning multi-color presentation proving it will save or make you money. The people who sign the checks are on board and enthusiastic about bringing this technology in. So you’re about 10% of the way there.

Publishing, as you may have noticed, is a creative business. We’re not making millions of standardized widgets on an assembly line. Content is the product. Its presentation is a major part of what makes your products sell. The people who create the content and design the presentation will have to be comfortable with the new XML lifestyle.

When we introduced the concept of XML to our editorial, design and production departments, we were given a block of time at a monthly managers’ meeting. I stood in front of a room of publishing professionals and showed them a color-coded screen of XML tags surrounding the content from a sample page. I showed them how you could take those tags and rearrange them or transform them into another set of tags to produce a new page. I showed them how you could search on all the tags to find content that would normally be buried in page comp files. I threw lots of acronyms at them. They all nodded sagely, asked no questions, and moved to the next agenda item. It was a complete waste of their time.

Based on that, I think it’s better to explain what XML does, rather than what it is, and it’s best to do that without focusing too much on, you know, the whole XML thing. I don’t recall being lectured on what PostScript is when starting to work with PageMaker. Knowing how your car’s engine works doesn’t mean you can drive.

If I had to do it again, I’d ask the content creators and designers what kinds of things they have to do that they consider repetitive or not a good use of their time. Do they have to spend a week paging through old products or archives to find that particular bit that they want to reuse? Is somebody spending all their time keeping track of art usage so you don’t get sued for using it where you don’t have permission? Are entire projects based on the concept that you’re going to take large numbers of pages and alter them ever so slightly for a customized use? At the end of a project, does someone have to gather all the files, package them up, and ship them to a vendor or another department for further processing for electronic use?

You can also look at your current workflow and determine where the silly bits are. If using XML can make them less silly, then there’s a good illustration of why you want to use it.

Once you know what all the inconveniences, inefficiencies, and idiocies are, you can determine how (or if) XML can help. That’s the presentation you want to give to content creators and designers. The big benefits. The what’s in it for them.

Once they get it, you can hit them with the catch. Did I not mention the catch? Yeah, the benefits only come if they put in some extra work up front. Tagging, file management, adhering to some pretty strict processes, maybe giving up some flexibility, especially on the design side. These are all fascinating topics for another time.