Overcoming DITA’s acceptance hurdles

dita-bird_0This is an appeal to the DITA community: the experts and the evangelists, and possibly the tools vendors as well.

We’ve done a good job selling DITA: after years of slow growth it’s gaining momentum. As it does so, paradoxically, I’m hearing more and more anti-DITA rhetoric. While some of the rhetoric reflects a lack of understanding or even a hidden agenda, some is worth listening to.

I’m thinking of two things in particular that the DITA community often touts as selling points: authors no longer have to worry about formatting, and their DITA content can readily be used for adaptive content — output customized for the audience.

As good as those sound, I don’t see content authors raving about them. We need to understand why that is, and find a way to address it.

Leave the formatting to us

I’ve proudly touted this in every DITA class I’ve taught: Freed from having to worry about fonts, indentations, and other formatting issues, authors at long last can concentrate on content.

Except that a lot of authors like to worry about formatting. Oh, they’re grateful to be rid of weirdly indented paragraphs and things like that. But they want to tweak table formats and insert page breaks so that their content is perfect for a particular output type (usually PDF).

You can’t blame the authors for wanting their content to be good. Perhaps our tools fall short by not making the content as good as it should be.

Table formatting probably isn’t a big problem. By and large, it can be managed through the XML transform. But page breaks are a special case.

Hand holding a penUsually when an author wants to insert a page break it’s because the XML transform has placed a page break where it shouldn’t be. So it’s not about inserting desirable page breaks; it’s about avoiding undesirable ones. The solution? How about smarter XSL transforms?

The default DITA Open Toolkit already knows not to separate a title or a heading from the first line of following text. I think it also knows to avoid widows and orphans. Would it be feasible to build in additional checks, like not breaking up short tables and not separating a figure from the preceding text?

If it’s not feasible to update the DITA-OT, then perhaps someone in the community can provide some sample XSLT code to handle common formatting issues. I’m thinking of something akin to the excellent examples in Leigh White’s DITA for Print. If the XSL transforms can be made smarter in this way, a lot of authors will no longer feel the itch to tweak their formatting. And they’ll be more inclined to embrace DITA.

Adaptive content

The other big selling point for DITA is the ability to adapt DITA content to particular audiences in particular contexts. It sounds great. Except that practically no one is using DITA in that way.

Why not? Metadata is the key to making adaptive content work, and most of the authors I know simply aren’t comfortable working with metadata. Even when an information architect develops a great taxonomy to support a solid content strategy, metadata still eludes many writers. Perhaps it’s counter-intuitive to create content and also tag it according to a taxonomy.

How can we make metadata easier for authors, so they’ll be comfortable using it effectively?

DITA already has a simple way of including metadata in topics and maps. The element is flexible enough to support any and all taxonomies. But it requires the authors to do too much by hand.

Some higher-end content-management systems (CMSs) have become good at managing metadata. But that’s at the CMS level, an environment where authors might be less comfortable working. Can we do anything to make metadata more intuitive, and less manual, at the topic and map level?

What if editing tools, as part of their file-create dialog, included checkboxes with which authors could assign metadata based on the local taxonomy? The result would be a pre-populated element that could be modified later, if necessary, in the normal way.

What other ideas might help authors become more comfortable with metadata and taxonomies?

A glass half full

Unlike my colleague Mark Baker, I see DITA as a glass half full, not half empty. When Mark asks “Why does XML suck?” I paraphrase Churchill and say that XML is the worst way to develop content — except for all of the others. If XML sucks as a way of communicating information and telling stories, then so did cave drawings, papyrus scrolls, typewriters, and word processors. All of them had strengths and weaknesses; all of them enabled savvy communicators to reach their audiences.

Yet for all its potential, DITA still has hurdles to overcome in terms of being accepted by the majority of content authors. What about it, DITA community?

Do you agree that we need to help build authors’ trust in DITA’s ability to format their content? That we need to help them become more comfortable with using metadata?

What should the community do to address these issues? What is the community already doing, that I haven’t mentioned?

Are there other hurdles to overcome besides the two I’ve mentioned?

I’d love to hear what you think in the comments section.

Advertisements

22 thoughts on “Overcoming DITA’s acceptance hurdles

  1. Ugur Akinci

    Larry, the need to make working with metadata easier and the ever-present concern with aesthetic and precision formatting are great points. There is one more major factor that needs to be mentioned I believe — DITA is not as affordable as it first looks. It really requires an enterprise-level managerial commitment before we technical writers even begin to worry about metadata and formatting. The real and high “hidden cost” of adopting DITA (in terms of reformatting the unstructured “silo” of existing documents, training cost and time, and budget for CMS acquisition and maintenance) still remains a major obstacle for most “unstructured authors” that I know of.

    Reply
  2. Mark Baker

    Larry, re “If XML sucks as a way of communicating information and telling stories…” XML is not a way of communication information and telling stories. It is a way of encoding content data structures. And where is sucks is as a format for writing content into those data structures.

    The primary reason it sucks as a format for writing is that it introduces an abstraction: it ask the writer to create a structure of elements and attributes. We name those elements and attributes to denote parts of a document, but the author is now creating elements named p, not paragraphs. That abstraction is both mental overhead and mechanical overhead, and the mechanical complexity can become quite problematic, especially in large vocabularies.

    And abstraction is at the heart of the other difficulties people have with DITA (and with DocBook and other similar systems as well). The reason that so many writers prefer to deal with formatting is that it is concrete. Yes, they like to get the page breaks right, but that is not the heart of the problem. DITA forces them to deal with abstractions and robs them of immediate concrete feedback.

    But WYSIWYG is not the only way to be concrete. Markdown is concrete. It is the least powerful, least consistent, and most concrete of all the lightweight markup languages, and it is by far the most popular. Why? Because it is concrete and concrete is easy.

    Similarly with metadata, it involves abstraction. The idea that a piece of content has a place in a metadata schema is an abstraction. Worse, taxonomies are inevitably reductive. The give up nuance for internal consistency. (Don’t confuse consistency with accuracy, though.) They reduce the complexity of the world to simple labeled boxes. Useful perhaps, but a reductive abstraction that is mentally taxing. Even if I accept the need for reductive abstractions, the way I would reduce and abstract is different from how you reduced and abstracted, so I’m being asked to fit with a model that seems foreign to me. It’s all just really hard, and foreign, and a big mental overhead I have to deal with on top of my writing work.

    There is a real curse of knowledge problem here. People who love DITA love its reductive and abstract nature. The rightly see that this kind of reductive and abstract approach opens up possibilities for organization and processing. But being immersed in these things, they forget how foreign they are to most people. They also tend to forget how much violence they do to meaning and storytelling. Writers are understandably reluctant to lie down in the Procrustean bed of DITA topic types. Their first loyalty is to the integrity of their individual work, not the consistency of the overall collection.

    All structured writing is to one extent or another reductive and abstracting. I’m a structured content guy, though I fully appreciate the position of those who reject the abstract and reductive nature of it wholesale. My problem with DITA is that it revels and boasts in the abstract and reductive nature. It exposes it to the author. It makes no attempt to diminish or hide its abstractions or to mitigate its reductive nature. My issue is not whether the glass is half full or half empty but whether it is a glass or a vitreous vessel for the hydration of sentient bipeds.

    This is not a tools problem. Tools can’t paper over the difficulties that the reductive and abstract nature of DITA (and XML) create for writers. What I am trying to figure out is how we can do structured writing in a way that give us the benefits the we seek while reducing to a minimum the amount of reductiveness and abstraction that we ask writers to deal with.

    Reply
    1. John Tait

      It’s a paradox for a single role.

      I admire the tools and approaches of tech comm – I’ve learned a lot – but some of what it’s doing seems very odd. Writing is complex intellectual work and it’s only one part of a range of different skills needed to help people through publishing.

      I work closely with engineers and safety professionals, and help them produce work that might make sense beyond their local groups (proactively overcoming that curse of knowledge). Otherwise it wouldn’t make sense.

      I also work with a designer, who takes edited work and presents it in a way that, ultimately, changes people’s behaviour

      At another time, I’ve worked as an indexer, applying metadata consistently (the Derwent World Patent Index) to help retrieval. I’m a much better editor than an indexer – indexing well is dead hard and that’s even when you’ve got a very mature set of indexing codes to use. Publishing follows all of this.

      All of these (explaining/editing/designing/indexing/publishing) are completely different jobs requiring different experience. It’s interesting to see all of this being demanded from writer and a schema and a stylesheet. I occasionally ask myself why your industry doesn’t employ these roles rather than gamble on expensive experimental tooling.

      (I actually like DITA but I come from industries with formal senior project manager roles.)

      Reply
      1. Mark Baker

        John, you are right about the single role. Writing is already requires a dual focus on the subject matter and the exposition. These are fully taxing in themselves, so any additional duties you pile on top reduce the mental resources required to do these things well.

        Structured writing was supposed to relieve the writer of formatting concerns. Unfortunately, merely abstracting them out of the writing task does not remove them from the writer’s list of concerns, as Steven’s comment shows.

        But what our current approaches to structured writing have done is heap even more responsibilities on the writer. Even if the writer gets used to being responsible for reuse, content management, and metadata, as they got used to being responsible for publishing, all these things still take mental energy away from the already fully tasking roles of understanding subject matter and composing effective narrative, which must necessarily reduce their performance in these areas.

        Not to mention, of course, that the more stuff you have to master to work in the authoring system, the fewer qualified authors you will have available to you.

        It is possible to use structured writing techniques to significantly simplify the task for all authors — by, among other things, helping with understanding subject matter and composing effective narratives. Unfortunately, that does not seem to be the direction we are heading. We are adding burdens rather than reducing them.

  3. Susan Carpenter

    I suspect that I’m going to be the outlier here. I spent my first ten years in the business poring over Mil-spec documentation. When I left that world, I let go of any urges to fuss with formatting. That isn’t to say that I pay no attention to formatting at all, only that I’d rather apply those energies toward improving the XSLT transforms and any governing CSS.

    Regarding the metadata issue, I’m a huge fan of marking content for what it is and what it supports. The semantic nature of DITA means that I don’t have to worry too much about the former – the tags describe what the content is. (We did use some metadata to identify certain types within types, though.) The latter – marking what the content supports – can be as simple or as complicated as a group of writers is willing to tolerate. Because we were on a relentless monthly maintenance cycle and our customers wanted version applicability keenly marked, we used metadata very effectively to track content provenance as well as flag specifics for customers. We were building doc for more than a dozen distinct offering configurations every week, and metadata did the heavy lifting. Our writers came from a variety of backgrounds and fields of experience, but they all did just fine with it.

    Reply
  4. Michael McCallister

    Great questions, Larry! I will also echo Ugur’s point. The conversion costs for a lone-writer shop like mine proved to be the roadblock to adopting DITA a few years ago. I know there are businesses that can do this electronically now, but it still is a somewhat scary proposition.

    Reply
    1. Ugur Akinci

      Michael, thanks for your feedback. Sometimes I find myself wondering if it’s just me who finds the cost of DITA a bit too much for a lone-writer to shoulder. Glad to hear it’s not just me 🙂

      Reply
  5. Steven Jong

    I have been exposed to DITA at only one company, but I also co-presented on moving from a FrameMaker environment to DITA at an STC New England program, during which I compared notes with a colleague elsewhere, so that’s at least two data points.

    Both of us are veterans who worked in and remembered the days of presenting information on pages, and I admit I still have that mindset. Everyone in my workgroup, and everyone in hers, went through the DITA conversion. It is the opinion of everyone in both groups that judging by the standards of the past, the documents now produced in DITA are embarrassingly poor in presentation and layout. I mean embarrassing to the point that neither of us submits documents to STC competitions any more, because by those standards our work products are shameful.

    Now, it could be that the standards for presentation and layout have changed since then, but I doubt it. It could be that readers don’t care about whether a first-level head appears at the bottom of a right-hand page any more, but I doubt it (I can observe readers missing it even today). It could be that our focus should be on Help files, which admittedly we can produce from the same source with the click of a button, but I see analogous formatting issues in that medium as well. I think we’re just producing poor output.

    The root cause in both workgroups seems to be using free formatting output processors (for which, it seems, we’re paying a steep price) that we can’t or won’t pay to customize. But even for the open-source, free formatters, I don’t see why they can’t be taught the same formatting rules that computerized typesetters knew about in the 1980s.

    The ability to type and tag information is powerful. It’s just that we have historically had little call to filter topics or elements by type (to create, say, a concepts guide, or to filter screenshots out of Help files). However, I think that will change in the future. Perversely, I think some writers have used “wrong” tags to format information to their liking, which creates problems down the road. That speaks to the first problem.

    Reply
  6. Tom Johnson (@tomjohnson)

    Excellent points. I agree that if DITA had more rockstar tooling, there would be greater adoption. But users are always going to want to customize their outputs, and DITA makes it really hard to give users this level of control using common languages like CSS.

    One thing I’d like to try is authoring lightweight DITA using a robust platform like Fluid Topics or something else. I haven’t really tried that route with DITA — just used the OxygenXML approach with individual file outputs.

    Reply
  7. Larry Kunz Post author

    Forgive me, everyone, for not responding to your comments sooner. Every one of you has made some great points, and I thank you.

    Ugur, Mike: Yes, it’s true that DITA out-of-the-box doesn’t produce very satisfying results. The cost of creating and maintaining transforms, while not prohibitive for an enterprise-sized operation, can certainly be a tough pill for smaller shops. They should consider whether the cost is justified by their need to reuse content, save on translation, etc. It’s true: DITA isn’t for everyone.

    Mark, I know you’re not saying you’d stop worrying and learn to love XML if they only renamed <p> to <paragraph>. But in terms of level of abstraction: every advance in technology — especially the invention of the written word — introduced a level of abstraction. People learned to cope. The trick was, and still is, making the leap from existing technology to new technology as short as practicable.

    Susan, I know that you work for IBM — where DITA was born — and so I think your experience isn’t typical. Your experience is interesting, however, because it shows this can be done. Can you point us to papers or presentations that you or your colleagues have written, describing how you got writers to buy in and work successfully with metadata?

    Reply
    1. Mark Baker

      Larry, just this morning I came across an article in the Globe and Mail about our changing relationship with the car. http://flip.it/mIybK It talks about hao advances in car techology mean we no longer know or care how cars work. They just work.

      “There was a time when you had to understand a bit of mechanical technology to drive a car – unless you could work a crank starter and set the spark advance, you weren’t going anywhere in a Model T. In the 1970s, many cars still had manually-operated chokes, which altered the fuel mixture through a dash-mounted lever. Starting a Fiat 600 on a sub-zero morning demanded the expert touch that Jimi Hendrix once brought to the tremolo bar of his Stratocaster. Fast forward to today. Every car on the market starts with robotic reliability, and advertising focuses on infotainment features, luxury interiors and storage space for lifestyle accessories.”

      Technological advances do start by introducing new complexities and new abstractions, but they progress not by people getting used to them but by the technology hiding them.

      Reply
  8. radaghastblog

    What if the author were to be presented with a series of questions and could answer them with highlighting? For example, let’s assume that an article will have three possible audiences: medical professionals, patients, and med school students. When the writer is done (for the moment) writing, he or she is asked to “Please highlight the material that your MEDICAL PROFESSIONAL audience should see.” And so on for each defined audience.

    The key idea here is to find easier ways to apply metadata to blocks of content.

    Reply
    1. Mark Baker

      But the real issue is not how does the writer denote the answer, but how do they figure out what the answer is. Are they qualified to answer it? How much time/study does it take to answer it? How much overhead is this for their time and their thought processes?

      Equally importantly, why are they being asked to answer it over and over again? If our concern is with reuse, why should writers have to make this distinction over and over again for every piece of content? If such a distinction is being made across a body of content, presumably it should be made in a consistent way that we can factor out.

      Too much of the emphasis, I find, is placed on reusing content, when what we really should be focused on is reusing information, which includes metadata. Let’s see if we can factor these things out and simplify the author’s life.

      Reply
  9. Don Bridges

    Re: Metadata and “can we do anything to make metadata more intuitive, and less manual, at the topic and map level?” Two ways we do this are:

    * Provide UI within the authoring environments to help the authors enter metadata rather than relying on them to use tags. In most cases, a standard taxonomy is leveraged to drive the UI.

    * Selectively synchronize the DITA metadata with the CMS metadata to take advantage of the benefits of both worlds. The prevents users from having to double enter metadata while still being able to leverage the benefits of both the CMS and Editor.

    It seems that most organizations have a loosely defined metadata strategy (or none at all). I recently wrote a LinkedIn blog post on metadata that’s available at https://www.linkedin.com/pulse/metadata-primer-why-you-should-care-don-bridges?trk=prof-post (Comments appreciated).

    Reply
  10. Ray

    I completely agree with Mark that metadata needs to be as reusable as the content. Seems something that should go on the wish list for the next DITA revision.

    Larry, I like the idea of the check boxes for metadata. This is doable today, at least in oXygen, and I imagine in other editors. It does take a geek to set up the forms based environment for the writers, but once done, it works just fine.

    Reply
    1. Michael Priestley

      Today in DITA you can put metadata on either individual topics, or collections of topics, or branches of collections, or (using attributes) on individual sections, paragraphs, or phrases. I think you’d have to look pretty hard for a level of content that does NOT allow metadata attribution today. And of course, at the collection level, that means you can have different metadata for different reusing contexts.

      I’m a big fan of just using the attributes for all levels, because there’s a specified way to derive a controlled value list for them from a taxonomy, they can be used for filtering at any level with no additional coding, and the cascading effects/priorities are specified.

      I’m not sure what flexibility Mark is looking for, beyond what DITA already gives.

      Reply
      1. Mark Baker

        Michael,

        My point was not about where you can put metadata (though that is important for other reasons). It is about factoring out the need to repeatedly apply the same metadata in hundreds of places. For product metadata, for instance, rather than entering product metadata in hundreds of places in content that is reused between products, we should try to factor out the product differences so that we can manage them centrally.

        I know of no reason why you can’t do that in DITA today (so I don’t know what Ray may be thinking of adding to support it — though he may have some ideas to make it easier or more obvious). But it is not DITA’s default — not a feature of the standard topic types out of the box.

        Defaults are incredibly powerful in how they shape the use and understanding of any technology. Going beyond the defaults in meaningful ways require a whole other level of understanding, skill, and resources. People tend to choose tools whose defaults are closer to their needs because the cost of initial acquisition is so much lower. JSON replacing XML as the language of web services is a case in point of this.

        On a related note, I am not looking for flexibility. Structured writing is about constraints. Conformance to appropriate constraints improves quality. Recorded conformance to constraints allows for reliable automation. Flexibility complicates the business of enforcing constraints. I am looking for ease of constraint.

        The easiest constraint of all is the one that exists by default. But that argues for many small dedicated systems rather than one large flexible one that attempts to be all things to all people. DITA does, of course, have a constraint mechanism, so you can build constrained systems with it. I am just not enamoured of a system in which I have to constrain out stuff I never wanted in the first place on the way to creating the set of constrained structures I really want.

        I do think DITA is right that starting from absolute zero, the way you do if you build something from scratch in XML, is not the right approach. (This is a big change for me.) You need defaults. A system with no defaults requires those higher levels of skill, understanding, and resources from day one. But there is no single universal default that works for all problems.

        Default + extension is clearly the right model, and we see it in every successful content tool. But the power of defaults is such that I can see no possibility of consolidation around a single model. We didn’t have it in the WYSIWYG era, and the structured writing era, with its emphasis on business-specific constraints, seems even more inimical to it.

      2. Michael Priestley

        Mark, you wrote:

        >It is about factoring out the need to repeatedly apply the same metadata in hundreds of places. For product metadata, for instance, rather than entering product metadata in hundreds of places in content that is reused between products, we should try to factor out the product differences so that we can manage them centrally.

        That was exactly my point. That’s standard practice in DITA. The only time I’d put product metadata on a topic, rather than the collection, is when it’s part of the topic’s identity (a good clue is when the topic title includes the product name, like “Installing product X”). For the 90% case, entering the product metadata once – at the root of the product collection – is sufficient, and works out of the box to apply that product metadata to every child/descendent topic (except where overridden).

        >But it is not DITA’s default — not a feature of the standard topic types out of the box.

        Yes it is, and well documented in the spec.
        http://docs.oasis-open.org/dita/dita/v1.3/os/part1-base/archSpec/base/cascading-in-a-ditamap.html#cascading-in-a-ditamap

      3. Mark Baker

        Michael,

        I think you are talking about a different use case. Yes, if an entire document tree is product specific then you can move the product filter to the point in the tree where it diverges. I’m talking about the case where the divergences happen at many different points in the tree or within the text of an individual topic. In those cases, if you look at it as an exercise in filtering the document tree, you have to put the filter metadata at every point of divergence across the document set. DITA implements the filter-the-document-tree model pretty well.

        But I am talking about factoring out the divergences be moving more of the content from the document domain to the subject domain. In that model you are not moving the filter point up or down the tree because there is no filter point any more. Rather document domain structures are constructed from subject domain structures. It is possible to do that in DITA too, but it is not the default.

      4. Michael Priestley

        Mark,

        I think your use of the phrase “document tree” gives me a clue to the way you’re thinking about this. In DITA, a collection of topics is not the same as a single document – it is a collection with internal organization and hierarchy, but any number of collections can exist over the same topics, applying different metadata as required.

        In standard DITA processing, you do in fact typically have a product collection (DITA map), which then does the job you asked for: centralizing management of product metadata, so that authors don’t have to reapply it in every instance it is required. But that’s not the same as having a document tree, in which every difference across reuse contexts is managed with additional (competing) metadata. Different collections represent different contexts, and are not aware of each other.

        The basic question is: how can we make it easier to manage metadata in DITA? I think part of the answer is recognizing the mechanisms it already has in place to do just that: leveraging map hierarchies as a way to express metadata in the place and at the scope that it applies.

        I honestly think the number one thing we can do to improve metadata usage in DITA is to make it more obviously useful. It doesn’t matter how easy you make it to express metadata, if the use case isn’t compelling; although making it easy – for example with linguistically assisted keyword classification, and project-scoped or task-scoped metadata defaults in your authoring tool – is still important. One possibility is to make sure that we are auto-populating the right fields in the HTML output to take full advantage of what consuming applications can recognize, eg leveraging Google support for schema.org metadata around products.

  11. Don Day

    In catching up with the comments here, one thing I noticed is an almost implicit falling back to stored XML in some form for this role of metadata. “Stored” matters because most HTML is served from WebCMS systems that simply plug stored chunks into the page matrix, so there is little or no metadata in the chunk itself–the CMS provides database fields for storing both the chunk and its metadata, and the end user’s viewing context definitely dictates which, if any, metadata even matters to what is displayed (an RSS feed being an example of content delivered with very little style or metadata). The advantage of this arrangement is that the content is dynamically selectable by search (facets are simply LIKE filters in the SQL query), but usually not very adaptable since the query knows nothing about what is actually inside the chunk itself (other than full text search).

    “Stored” XML in a CCMS is also searchable, but usually only to the writer. Readers see that content only after a build process that interposes a firewall between the reader and the source nature of the content as seen by the writer. To the extent that you build your XML as a static deliverable with appropriate metadata in the static page, it may behave like the same content retrieved as a chunk via an SQL search on a database. Changing some context may mean producing a new build.

    So if you depend on processing your DITA content into HTML, then a particular static collection depends on the map-based metadata inheritance that Michael mentioned. If you change the publication model so that DITA (or any other XML) content is transformed on the fly into HTML, then your metadata needs to be somewhere else so that it is as facile for querying as the SQL/HTML database. Sure, you can depend on XML-aware databases like BaseX or eXist for querying internal metadata. But the obvious standard fallback is to use an SQL database just like the WebCMS does. This means that any query that retrieves a collection of topics is, for all practical purposes, a formula that represents a map of those same topics. The secret sauce of this kind of system is representing that database schema in such a way that the inheritance mechanism, as such, is moved out of the “build” and into the “query.”

    This is all Business As Usual for Web developers. It is because of the near universal paradigm of WebCMS as a publishing engine that I think DITA adoption could be pushed into more such places if our DITA publishing engines simply behaved the same way. This is the change from the build paradigm that I think may better popularize DITA uptake into new areas such as marketing, live eBooks, adaptive tutorials, and more.

    Reply
  12. Pingback: Living and learning: 2016 | Leading Technical Communication

Tell me what you think

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s