Stuff

Essentials

  • March 2019
    S M T W T F S
    « Sep    
     12
    3456789
    10111213141516
    17181920212223
    24252627282930
    31  
  • Facebook

Posts Tagged "Technical Writing"

The complexities of translation and the need for dynamic variables in the build process

Listen to this post:

You can download the MP3 file or subscribe in iTunes.

I mentioned in previous posts that I was tackling translation with static site generators, and that I would circle back around on this topic to provide more detail (see Will the docs-as-code approach scale?).

Translation is a complex undertaking. In Andrew Etter’s Modern Technical Writing, he says translation projects are time-consuming and costly. To quote:

Internationalization, the process of translating documentation to other languages, is a nightmare. If you ever think you need to do it, interface with management and perform a careful cost-benefit analysis, because the process is expensive, time-consuming, error-prone, and tedious. Once you’ve arrive at what you believe is an accurate estimate for company costs, triple it. Now you have a realistic estimate.

Etter briefly describes his translation workflow using a static site generator, Sphinx. The worklow involves using gettext scripts to convert the content into POT (Portable Object Template) files, which he sends to a translation company. The translation company converts them to PO (Portable Object) files (these file formats basically facilitate separating the text into strings that translators can manage) and after finishing the translation, sends the files back. He commits them to a repo, converts the PO files to MO (Machine Object) files, and builds them again in his Sphinx project.

There are quite a few different tools, formats, workflows, and approaches for doing translation. For example, here’s how one group handles translation with Middleman, another static site generator. Their process is quite different. They set environment variables in their configuration files that the provide information to the server about which language to build. Their process involves Codeship, Heroku, submodules in Git repositories, webhooks, custom Rack apps, and more.

My scenario is a lot simpler. For some projects, we send out files to translation agencies. One translation agency requires the content to either be HTML or Microsoft Word, while another translation agency accepts Markdown, HTML, or Word. I’m not sure what the agency does with the files (I assume they convert them to another format to ingest them in their computer-assisted translation systems), but we get the files back in the same format that we sent.

Since my content is in kramdown Markdown, translating the Markdown source format would be ideal, but translating HTML isn’t a deal-breaker either. However, here I should note that just saying Markdown is the source format hardly scratches the surface. If Markdown is your only source format (and you just have a collection of static files as your source), it would be very difficult to handle translation. You need a more robust tool to handle dynamic content, which is where a static site generator like Jekyll becomes essential.

Notice I used the word “dynamic” in the last sentence. There’s somewhat of a misnomer about static site generators. In your source content, you aren’t working just with static content, because if you were, translation would be extremely difficult. In Discover Meteor, the authors explain that static is really more dynamic than we typically credit it as being. They note,

A classic Ruby or PHP app can be said to be dynamic because it can react to various parameters on each request (for example, variables passed through the URL).

True, static HTML files can’t do that. But your static site generator can still take into account parameters during the build process. In other words, static sites are only static after they have been generated. (See Three More Ways To Make Your Static Sites Smarter)

The ability to use variables and parameters in your source is essential when setting up translation to multiple languages. It’s the ability to use these parameters, variables, and other dynamic techniques during the build process – before the files become static – that allow you to account for more sophisticated scenarios like translation even though you’re using a static site generator.

My overall strategy is to translate the [dynamic] source as much as possible, so that I can easily roll in future updates and then regenerate the site as needed. I don’t want to just work with the static HTML output. If I have an update that adds a new link in the navigation, or if I add a new file and want to reference it on multiple pages, I want to be able to re-build my site with the latest updates.

To accomplish this, in my project I have multiple directories (for example, _pages_english, _pages_german, etc.) for the documentation pages – one directory for each language. I also have separate configuration files for each language (for example, configenglish.yml, configgerman.yml, etc.). Each configuration file specifies which page directories should be included in the build.

When I build a Jekyll site, Jekyll processes the build based on information in the configuration file. By making changes to the configuration file (such as specifying the project language or directories to be included), I can control the output.

For my sidebar navigation, I want to generate the navigation from the same YAML file rather than splitting my navigation into multiple files for each language, which I would then need to keep in sync. To accomplish this, in my sidebar data file, I have different instances of the title, but each entry uses the same URL and other properties. For example, a sidebar data entry might look like this:

    - title: Sample
      titleja: ガイドスタート
      titlede: Frschun Rontzahnfüllungen
      titlesp: Ejemplo
      url: /sample.html
      ref: sample

When Jekyll builds my site, there’s logic in my sidebar scripts (enforced by the language specified in the configuration file) that check whether the navigation entry contains title, titleja, titlede, or titlesp and if so, the entry gets included in the navigation using the appropriate language title.

See what I mean by static site generators not really being static? Their output is static, but they can build in dynamic ways that leverage variables, parameters, and other programming logic.

Links on the site are relative, so I can just publish the built site into a sub-directory (such as /ja/) from the base url, and the links should all work. Making links relative is actually a huge advantage when it comes to configuring your static site generator output. I don’t think you could easily push out translated sites with absolute links, since the links wouldn’t point to the language subdirectories.

For images, instead of using standard Markdown image tags, I have an image template (an include) that automatically appends a suffix on the image file name (such as “-german” or “-spanish”) based on the language specified in the configuration file. If a lang property is specified in my include template, then the template appends the language suffix onto the file name. If no lang property is specified, no suffix gets appended.

Further, if some screenshots aren’t translated, I can use conditional logic like this to filter them out of the translated outputs:

{% if site.language == "english" %}
{% include image.html file="myimagefile.png" %}
{% endif %}

I access values from my configuration file through the site namespace. So my configuration file just has a property that says this:

language: english

BTW, I realize that unless you’re already familiar with Jekyll, you will probably just glaze over these technical details. I explain them here only to reinforce the fact that you need more than just static Markdown in your source to handle a translation project.

There are some strings that are re-used in notes (tips, cautions, warnings, etc.) and in the interface (a “Submit Feedback” button, for example). I store these strings in the configuration file and then reference them with tags such as {{site.note}}. In the configuration file, that note might look like this:

alerts:
  note: Hinweis
  tip: Spitze
  warning: Warnung
  important: Wichtig

The note then uses a note include template (referenced just like the image template, except with {% include note.html content="..."%}). The template for the include contains content that looks something like this:

<div class="mynote">{{site.alerts.note}}: {{include.content}}</div>

The {{site.alerts.note}} references the value in the configuration file – in this case, since the configuration file is for the German build, the value is Hinweis. If you had other strings that you wanted to isolate like this, you could separate them out into either your configuration file or into other data files (stored in your _data folder) and reference them as variables in your code.

(BTW, these sample translations are just filler text – I haven’t actually translated them yet.)

So far, so good. But now comes the tricky part. Let’s say I have an include to re-use the same paragraph of content, and the translator accepts only HTML. My Markdown source for the re-used component looks like this:

{% include_relative myfile.md %} 

In the HTML output, the contents of myfile.md will automatically be included wherever I’ve included this note (which I wouldn’t know until I searched for all instances of this tag in my content). Won’t this result in the translator translating the same passage multiple times, which would increase the cost?

No, it shouldn’t. The idea of translation memory (standard in most translation tools) is that the translator gets prompted when a previously translated segment appears again in the text. If the translator isn’t using a computer-assisted translation tool (CAT) that provides translation memory, you probably shouldn’t be using the translator. Without translation memory, though, this would be a problem.

So the HTML files go out to the translators, they plug them into their CAT tools, translate the files, and regenerate them back into HTML or Markdown and return them to me. Now I take the translated content and insert it into my source Markdown files (not my site output), using either Markdown or HTML (Markdown file formats can process either). I can now generate my site from the source into multiple formats.

Theoretically this should work, though I say theoretically because I haven’t pushed my approach through the whole workflow yet.

Also, the scenario that I’ve outlined at a very high level here is just the best-case scenario. In real life, there are additional complications and quirks that I have to account for.

While static site generators are flexible and allow me to implement hacks to get around quirks or oddities in non-standard systems, at some point I have to take a stance against absolute flexibility and lay down some rules, such as not allowing authors to mix relative and absolute links in the sidebar navigation, or not allowing custom include templates that don’t implement translation variables.

I spent the last few days working through my translation scenario, and I hope I’ve accounted for all the details, but I won’t know for another few months until I’ve gone through this process multiple times. I’m taking Andrew’s initial advice to triple my estimates.

I will readily admit that the more languages and formats you output to (for example, suppose I was outputting to 10 languages in both HTML and PDF), the scenario I’ve used here will get more cumbersome. At some point, it might make sense to plug into tools like Paligo that have been designed from the ground up to support robust translation workflows.

Then again, I’m betting that each system has a learning curve along with some strengths and weaknesses, and given the variety of scenarios and requirements, push button solutions might not be advantageous over more flexible and custom setups using processes like I described here.

If you’ve managed translation projects and want to share your insights, please do so in the comments below. One reason I write posts about topics or techniques that I’m still developing is to learn from others ahead of time and hopefully avoid mistakes that I might otherwise make without this input.

Back to Top | Comments Off on The complexities of translation and the need for dynamic variables in the build process

Presentation recording: Hunting for API developer documentation jobs in the San Francisco Bay area, by Andrew Davis

Presentation description

Andrew gave the the presentation “Hunting for Dev Doc Work Around the Bay” on August 15, 2016 at the STC Silicon Valley chapter in Santa Clara, California.

Here’s a description of Andrew’s presentation:

The job market for developer documentation in Silicon Valley is the hottest in the country. But developer documentation jobs can be tricky to get, and also difficult to excel in, especially if you lack an engineering background. Despite the difficulties, technical writers in these positions can bring tremendous value to engineering teams. Immersed in developer documentation, you’re constantly learning new technologies and interacting closely with engineers.

In this presentation, Andrew Davis will talk about developer documentation jobs in the Bay area, specifically:

>

  • Why is developer doc work desirable?
  • Which skills and abilities open doors into developer doc jobs?
  • Which traits and assets keep the doors open?
  • Where is demand strongest?
  • What’s the work pay?
  • How can I get there from here?

Video recording

Here’s the video recording (slides + audio):

Slides

Here are the slides. The slides have detailed speaker notes, so be sure to check those out too.

Audio

Just want the audio?

You can download the MP3 file or subscribe in iTunes.

About Andrew Davis

Andrew DavisAndrew Davis has recruited technical content developers in the SF Bay Area since 1995. He is a former software industry Technical Writer and has a reputation for both understanding and championing the role of content development.

Andrew enjoys helping those who communicate complex information get ahead by recognizing and refining their value to technology companies. He’s candid and connected and, just as importantly, he likes to help tech industry workers achieve their goals and achieve independence from intermediaries.

Andrew ran Synergistech Communications during the Internet Gold Rush years and has recently returned to solo recruiting mode, incorporating his business as Tech Comm Talent. He remains focused on recruiting great technical content development talent for discerning local technology companies. Join him on LinkedIn (http://www.linkedin.com/in/synergistech) and go to Synergistech to learn more.

Back to Top | Comments Off on Presentation recording: Hunting for API developer documentation jobs in the San Francisco Bay area, by Andrew Davis

The Story of Paligo &mdash; a new browser-based CCMS with all the features you’d ever want

Listen to this post:

You can download the MP3 file or subscribe in iTunes.

Beginnings

Up until two years ago, Anders Svensson and his colleagues, based in Sweden, provided DITA and XML consulting full-time to European companies looking to migrate and manage their content in an XML structure.

Although many companies could understand the DITA spec, migrating content to DITA in bulk, managing it in a user-friendly content management system, and building out the PDF and HTML deliverables were more complex and daunting tasks than companies could handle themselves. This was the focus of Svensson’s company, Expertinfo.

After years of steering companies toward custom setups or existing CCMS systems, which often cost a small fortune to use and included a host of problems (long deployment projects, steep learning curve, poor user acceptance, etc.), Svensson felt it was time to build their own system.

Having been on the lookout for good systems for a long time, he finally came into contact with Frank Arensmeier, who Svensson describes as nothing short of a programming genius. Frank had been working on exactly the type of system Svensson was looking for.

They teamed up and, starting with the existing foundation code base that Arensmeier had built, they put together a group of engineers and set out to build an affordable, easy-to-use CCMS that would solve the many problems Svensson and his colleagues had encountered through their years of DITA consulting. This is how Paligo started.

The release

Paligo is an XML-based component content management system (CCMS) that users access entirely in the cloud. Paligo’s team built on top of a custom topic-based version of Docbook XML to create a number of user-friendly features.


Paligo’s user interace

With Paligo, you can do the following:

  • Drag and drop topics into publications (similar to DITA maps)
  • Configure variants for different products and/or output formats
  • Easily find strings and text snippets that can be re-used
  • Render attractive, responsive HTML5 websites as well as print quality PDFs and several other formats
  • Collaborate with other users and reviewers simultaneously on the same project
  • Manage translations to any language
  • Manage versions and branches
  • Tag topics with taxonomy categories to surface related content, and more

You can see a more detailed list features here.

Since its launch in 2014, Paligo has been steadily growing and has just released Version 2, which provides a significant revamp to their code and makes authoring and content management in the interface smoother and easier.

Who uses Paligo

Paligo primarily attracts documentation teams who want to take the next step beyond help authoring tools (HATs) toward a content management system. The most common customers are those using Flare, Robohelp, Author-it, or DITA and need something more robust to handle their content. These customers usually have heavy single-sourcing requirements for their content, often including translation as well.

While these customers want a CCMS, they either don’t have budget and resources to implement high-end CCMS platforms like IXIASOFT, Trisoft, or others (which can cost $100-200k per year), or they want to avoid the threshold and complexity of an installed system. But they still want a fully featured system to handle every documentation need.

On a platform and price comparison, Paligo’s most comparable competitor would probably be easyDITA, which is a cloud-based CCMS that has also been growing. Both systems are XML-based, with easyDITA being based on DITA and Paligo on Docbook. But the products differ in other ways as well. easyDITA follows more closely the DITA content model and the DITA Open Toolkit, whereas in Paligo the DocBook content model only provides the document structure, and the features are developed around it in the database programming paradigm.

Because of the database model as the foundation, where each individual text string is actually stored separately, Svensson says you can do things like prompt users when they might consider re-using the same string and store it as a re-usable chunk, find out where each text string is re-used, and so on.

With Paligo, you write in a visual editor, which makes it easy to focus on the content. The system guides you to the elements you need, and authors quickly get how to step in and out of elements as they create their content.

Svensson says most users don’t actually care whether the underlying XML is DITA or Docbook, as long as the CCMS provides the features they need. It’s how the CCMS implements the schemas, not so much the schemas themselves, that makes the difference.

And it turns out that building on a foundation of Docbook XML is considerably easier than building with DITA. DITA tends to impose more restrictions about what you can and can’t do, Svensson says. Even so, Paligo is only “based on Docbook.” Paligo extends from this foundation, adding what they need and not letting the content model restrict the system, while maintaining full capability to export to the open standard.

Instead of deliberating about schemas, users are much more focused on the CCMS’s features. In a thread on Techwr-l about “modern” authoring tools, Robert Lauriston, who recently switched from Confluence to Paligo, noted:

Paligo is my idea of what the next generation should look like: SaaS, browser client, DocBook source, GUI editor, built-in CMS for versioning, topic reuse, translation, and multiple language support, integrated review and approval workflow management, integrated tech support so they can work directly with my docs to fix things, CMS plugin for Oxygen so you can do things not directly supported by Paligo.

Reading Lauriston’s comment underscores how the list of doc needs extends beyond HATs and encompasses the full feature set needed for robust content management.

Growth

With the steady increase in users, Paligo had to ramp up the number of engineers and support staff. Their product roadmap is also growing as well. Right now Paligo is just a component content management system, meaning you author and store your content in it, and push out the HTML5 or PDF publications you want.

The long list of planned future enhancements include the following:

  • Integrations for help desk systems, third-party translation systems, and more
  • Hosting the HTML5 and PDF deliverables on servers for users
  • Enhanced contributor role, beyond the standard author, reviewer, and translator roles
  • Batch and scheduled publishing
  • Gallery of more ready-made output templates

Pricing

Paligo is priced on a SaaS model, starting as low as $79/month for a Solo user up to $259/month per user in the Business plan. (See more pricing details here.)

Paligo even has a pricing model that “grandfathers” in customers at their starting price. If you become a user at a specific price, you get to keep that monthly subscription price for as long as you maintain your subscription. This way you can avoid fears that Paligo might suddenly escalate their pricing.

Summary

Paligo provides a compelling solution to content management. With their releases, they’ve shown the tech comm world a number of trends:

  • Users who are initially happy with HATs eventually grow out of them and need more robust systems to manage their content.
  • Users working in teams on the web need browser-based CCMSs that allow for easy collaboration and interaction.
  • You can build a robust CCMS that includes all the features users need without charging users the equivalent of a small house every year.

To explore Paligo with a free trial, click Get Started in the upper-right corner of the Paligo homepage.

Note: This post was sponsored by Paligo, which is one of the advertisers on this site.

Interface Tour

The following are some screenshots of Paligo that give you a better sense of the user interface and functionality.

Structure view

Overview of resources, such as image assets

Searching for content based on keywords and elements

Selecting variables

Drag and drop widget structure

Viewing statistics about re-use

Tags for taxonomy

Translation management and status

Workflow for documents

Back to Top | Comments Off on The Story of Paligo &mdash; a new browser-based CCMS with all the features you’d ever want

Other Stuff