Lightweight DITA: I’ve seen the light

DITA logo being held aloft by balloons

Lightweight DITA doesn’t have a logo yet. The technical committee is welcome to use this one.

If you’ve taken one of my DITA classes, you’ve heard me extol the power of DITA. One aspect of that power is semantic tagging. In DITA, a piece of content isn’t boldface or italics. It’s a command name. Or it’s a citation to another document. Or it’s the name of a screen (a wintitle, in DITA parlance).

That’s a big selling point for DITA, you probably heard me say. Each DITA element represents what a thing is (hence the term semantic) rather than how it looks. Just think: you can take a big document and generate a list of all the command names, or all the screen names. You can’t do that when you’re just tagging things as boldface and italics.

Turns out there are a couple of problems.

  • First, I’ve never met anyone who wanted to generate a list of all the command names, or all the screen names. While it sounds good in theory, in practice it’s more like a solution in search of a problem.
  • Second, it’s a lot to remember. When is a command parameter a parameter? When is it an option? (DITA has tags for both.) Writers working side by side, writing content for the same help system, might tag the same object in different ways.

Just now, in fact, as I wrote this article, I couldn’t remember the name of the tag for citations. Even though I’m accustomed to using it, I couldn’t retrieve <cite> from my brain. I had to look it up.

Enter Lightweight DITA.

Currently being developed as an OASIS standard, Lightweight DITA (LwDITA) sells itself in two ways:

  • It’s simpler than traditional DITA, with many fewer elements and stricter content rules. In other words, it’s less powerful but much less complicated.
  • It’s flexible, accommodating (so far) 3 major authoring formats: XML, HTML, and Markdown. (Traditional DITA is authored in XML; rendering the content in the other formats requires publishing software, or transforms.)

For the part of me who loves the power of DITA, who sings the praises of semantic tags to my students, LwDITA was hard to accept at first. Take away the semantic tags, and allow tags that just control the appearance of the content (bold, italics, underline, subscript, superscript), and you take away a lot of what makes DITA special and unique.

I had to get over that attitude.

A lot of DITA writers, I’ve come to admit, simply don’t bother with the multitude of semantic tags — and a lot of DITA writers don’t need to bother. Those who do bother often struggle to use them properly because there are so darn many of them.

LwDITA, wisely, doesn’t claim to be a replacement for full DITA. It’s not a be-all and end-all. According to the spec, it’s designed “for situations in which DITA…would be too complex or for communities that do not use XML as an authoring platform.”

And, for me, here’s the clincher: because LwDITA makes DITA useful and palatable to a larger set of content developers, I’m betting it will eventually increase the acceptance of DITA in our professional community. Writers will appreciate having a flavor of DITA they find approachable. Businesses will appreciate the economy and the flexibility. Everyone will appreciate the gentler learning curve.

Don’t get me wrong. I’m still a big fan of traditional DITA. It’s still the right solution in plenty of situations. But maybe not in every situation. For those other situations, we now have an alternative that still delivers the value inherent in structured authoring.

You have until March 12 if you want to comment on the current LwDITA specification. Here’s my comment: I like it. Its simplicity and flexibility won’t be ideal for every situation, but much of the time they’ll be just right.


3 thoughts on “Lightweight DITA: I’ve seen the light

  1. Mark Baker


    You say, “First, I’ve never met anyone who wanted to generate a list of all the command names, or all the screen names.” But you have. You have met me. I do this kind of thing all the time.

    Why? Because those lists provide important audit functions. Generating lists like this let you ensure that all of the command or screen names (or whatever it is) are correct and that all the commands or screens (or whatever it is) are covered by the docs. By doing checks like this I have discovered major mistakes in omissions in documentation that were not caught by human review.

    Audit is one of the most neglected and potentially useful applications of structured writing. Unfortunately, because it is so neglected it is often not adequately provided for. In many cases people rush to apply semantic markup because they have been told it is a good thing, only to figure out down the road that they are not actually using it because they don’t actually know what they were supposed to use it for.

    While auditing and author guidance are two of the biggest applications of semantic markup, there are a number of others. It can be used to generate linking, to organize content, to select content elements for use in different contexts.

    Unfortunately, these highly productive uses of structured markup tend to get lost, at least in part, because of the rush to standardization. People are told that they want semantics, but they are also told that they should stick to standards. But semantics are, as you point out, about saying what a thing is rather than what it looks like, what the subject matter of a text is, rather than its formatting, and subject matter, and its relationships to other pieces of subject matter, and to the uses people put that subject matter to, are highly individual, and therefore not subject to broad standardization. So people choose a standardized format with vaguely generalized semantic markup that they are not quite sure how to apply and have no idea what to do with. And eventually they figure out that there they are realizing no gain for this pain and chuck it all and switch to non-semantic formats.

    Technically, of course, DITA is intended to support specific semantics through specialization. For reasons too complex to go into here, there are significant drawbacks to this approach. But the most significant drawback is simply that few people use it, meaning they get only the residual semantics of the base types and don’t know what to do with them. Once they realize that that implementing those semantics is work for no gain, they are free to move to non-semantic forms of block reuse, of which there are several providers in the market. Will lightweight DITA become a factor in that space? TBD.

    For me, though, the way forward is through richer semantics, not poorer, and a far richer appreciation of the potential of semantic markup. Look for my forthcoming book on that subject: _Structured Writing: Rhetoric and Process_. Coming from XML Press in the hopefully not to distant future.

  2. Larry Kunz Post author

    Thanks, Mark. And thanks for using the semantic tags for audit purposes. (Somewhere, someone on the DITA technical committee is pounding their fist on the table and yelling ‘YES!”) The reason more people don’t do this, I’m afraid, has to do with priorities: people feel they have more pressing business and don’t get around to doing things like you’re doing.

    DITA’s creators provided a very rich set of semantic elements — probably with the expectation that various industry-specific specializations would evolve, giving writers a more-or-less standard, unambiguous set of elements for their particular industry. This has happened, but not as quickly as they probably expected. Perhaps LwDITA, by increasing the size of the DITA community, will engender a market for these specializations. While some writing teams will be happy using LwDITA indefinitely, others will want to “move up” to something more powerful. At least that’s one plausible outcome.

    1. Mark Baker

      Partly priorities, certainly. But also I think a lack of solid theory and tools for doing it plays a big role.

      The belief in, and hope for, the emergence of industry-wide vocabularies goes all the way back to the earliest SGML days, if not earlier. In more transactional fields, such as banking, such vocabularies have become widespread, indeed, essential. But in content it really has not happened. One could point to S1000D, though how useful the semantics of S1000D are for the kinds of things I am talking about more than I know.

      But there is a reason why these initiatives have not prospered in the content field. I think you are spot on in describing the expectation: “that various industry-specific specializations would evolve, giving writers a more-or-less standard, unambiguous set of elements for their particular industry.” The problem with that is that there is enough variation from company to company and product to product, not to mention different visions and priorities among the people who get appointed to the standards committees, that anything that an industry consortium can actually agree on is vast and full of ambiguities. Even base DITA topics contain all kinds of ambiguities and lead to all kinds of debates about the right way to tag things. So there is a fourfold problem here:

      1. The resulting vocabularies are large, complex and hard to learn, which increases your authoring costs and reduces your pool of available authors.

      2. The compromises and the generalization required to get to agreement mean that many of the audit and build functions you would like to perform are not supported by the industry standard markup.

      3. There is enough ambiguity and variability in the standard that valid documents from different authors are not directly comparable in their markup and therefore there is little or no reliable auditing and automation you can do, beyond basic publishing functions.

      4. Getting to agreement takes so long, and then getting to agreement and updating tools for needed changes to the agreement take so long that the standards and tools are never ready when you actually need them.

      All this means that there is very little practical gain from adopting the standard. Thus these things just don’t prosper. These are human problems and information complexity problems, which means that they are not going to be solved with yet another markup technology. DITA brings nothing new to the party here.

      The simple fact is that a semantic markup language that is uniform enough and specific enough drive the kinds of auditing and automation functions that I am talking about has to be very local to the organization and the subject matter. And to be simple enough to use consistently, it similarly has to be highly specific to the local organization, and highly strict to avoid competing interpretations.

      Such markup languages can be marvels of simplicity, making authoring far easier and processing algorithms easier to code and more reliable to execute. But this requires a different skill set and a different way of looking at the world. (A standardized publishing chain sitting behind all this is a huge boon to this kind of development. Standards where standardization works is a grand thing. But some parts of every system have to be customized.)

      And I think it is in this regard that the DITA technical committee has rather boxed itself in. They have chosen standardization as the rallying cry of DITA. (Every DITA pundit and vendor has been trained to always say “the DITA standard”, never simply “DITA”.) In doing so they undermine the idea that you should actually use specialization to create your own specialized vocabularies that are specific enough to be both easy to use and useful for auditing and automation.

      Mind you, specialization is not the best mechanism for this. It is based on the idea that you can create markup in the subject domain as a specialization of markup in the document domain (terms I will explain in detail in my forthcoming book) and that does not work well because subject domain markup does not have the is-a relationship to document domain structures that the principle of specialization demands. The whole point of it is to break that relationship.

      Which really leaves DITA in an awkward position. While it is riding a wave of undeniable enthusiasm at the moment, I think it stands at a fork in the road. On the one had, we are seeing alternative less structured systems for topic-based reuse developed. On the other, there is definitely a growing demand for highly semantic content which DITA may not be best structured to deliver, and for which its own rhetoric of standardization may work against it. LwDITA seems to be an attempt to let it take both roads, but it is not clear that it is the strongest candidate for either road.

      What we really want, of course, is not to have to choose between the two roads, to have a semantically rich system that is easy enough to use that there is no impetus to ignore semantics in the name of simplicity. The book attempts to show what such a system might look like.


Tell me what you think

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s