Stuff

Essentials

  • November 2018
    S M T W T F S
    « Sep    
     123
    45678910
    11121314151617
    18192021222324
    252627282930  
  • Facebook

Posts Tagged "api-doc"

The complexities of translation and the need for dynamic variables in the build process

Listen to this post:

You can download the MP3 file or subscribe in iTunes.

I mentioned in previous posts that I was tackling translation with static site generators, and that I would circle back around on this topic to provide more detail (see Will the docs-as-code approach scale?).

Translation is a complex undertaking. In Andrew Etter’s Modern Technical Writing, he says translation projects are time-consuming and costly. To quote:

Internationalization, the process of translating documentation to other languages, is a nightmare. If you ever think you need to do it, interface with management and perform a careful cost-benefit analysis, because the process is expensive, time-consuming, error-prone, and tedious. Once you’ve arrive at what you believe is an accurate estimate for company costs, triple it. Now you have a realistic estimate.

Etter briefly describes his translation workflow using a static site generator, Sphinx. The worklow involves using gettext scripts to convert the content into POT (Portable Object Template) files, which he sends to a translation company. The translation company converts them to PO (Portable Object) files (these file formats basically facilitate separating the text into strings that translators can manage) and after finishing the translation, sends the files back. He commits them to a repo, converts the PO files to MO (Machine Object) files, and builds them again in his Sphinx project.

There are quite a few different tools, formats, workflows, and approaches for doing translation. For example, here’s how one group handles translation with Middleman, another static site generator. Their process is quite different. They set environment variables in their configuration files that the provide information to the server about which language to build. Their process involves Codeship, Heroku, submodules in Git repositories, webhooks, custom Rack apps, and more.

My scenario is a lot simpler. For some projects, we send out files to translation agencies. One translation agency requires the content to either be HTML or Microsoft Word, while another translation agency accepts Markdown, HTML, or Word. I’m not sure what the agency does with the files (I assume they convert them to another format to ingest them in their computer-assisted translation systems), but we get the files back in the same format that we sent.

Since my content is in kramdown Markdown, translating the Markdown source format would be ideal, but translating HTML isn’t a deal-breaker either. However, here I should note that just saying Markdown is the source format hardly scratches the surface. If Markdown is your only source format (and you just have a collection of static files as your source), it would be very difficult to handle translation. You need a more robust tool to handle dynamic content, which is where a static site generator like Jekyll becomes essential.

Notice I used the word “dynamic” in the last sentence. There’s somewhat of a misnomer about static site generators. In your source content, you aren’t working just with static content, because if you were, translation would be extremely difficult. In Discover Meteor, the authors explain that static is really more dynamic than we typically credit it as being. They note,

A classic Ruby or PHP app can be said to be dynamic because it can react to various parameters on each request (for example, variables passed through the URL).

True, static HTML files can’t do that. But your static site generator can still take into account parameters during the build process. In other words, static sites are only static after they have been generated. (See Three More Ways To Make Your Static Sites Smarter)

The ability to use variables and parameters in your source is essential when setting up translation to multiple languages. It’s the ability to use these parameters, variables, and other dynamic techniques during the build process – before the files become static – that allow you to account for more sophisticated scenarios like translation even though you’re using a static site generator.

My overall strategy is to translate the [dynamic] source as much as possible, so that I can easily roll in future updates and then regenerate the site as needed. I don’t want to just work with the static HTML output. If I have an update that adds a new link in the navigation, or if I add a new file and want to reference it on multiple pages, I want to be able to re-build my site with the latest updates.

To accomplish this, in my project I have multiple directories (for example, _pages_english, _pages_german, etc.) for the documentation pages – one directory for each language. I also have separate configuration files for each language (for example, configenglish.yml, configgerman.yml, etc.). Each configuration file specifies which page directories should be included in the build.

When I build a Jekyll site, Jekyll processes the build based on information in the configuration file. By making changes to the configuration file (such as specifying the project language or directories to be included), I can control the output.

For my sidebar navigation, I want to generate the navigation from the same YAML file rather than splitting my navigation into multiple files for each language, which I would then need to keep in sync. To accomplish this, in my sidebar data file, I have different instances of the title, but each entry uses the same URL and other properties. For example, a sidebar data entry might look like this:

    - title: Sample
      titleja: ガイドスタート
      titlede: Frschun Rontzahnfüllungen
      titlesp: Ejemplo
      url: /sample.html
      ref: sample

When Jekyll builds my site, there’s logic in my sidebar scripts (enforced by the language specified in the configuration file) that check whether the navigation entry contains title, titleja, titlede, or titlesp and if so, the entry gets included in the navigation using the appropriate language title.

See what I mean by static site generators not really being static? Their output is static, but they can build in dynamic ways that leverage variables, parameters, and other programming logic.

Links on the site are relative, so I can just publish the built site into a sub-directory (such as /ja/) from the base url, and the links should all work. Making links relative is actually a huge advantage when it comes to configuring your static site generator output. I don’t think you could easily push out translated sites with absolute links, since the links wouldn’t point to the language subdirectories.

For images, instead of using standard Markdown image tags, I have an image template (an include) that automatically appends a suffix on the image file name (such as “-german” or “-spanish”) based on the language specified in the configuration file. If a lang property is specified in my include template, then the template appends the language suffix onto the file name. If no lang property is specified, no suffix gets appended.

Further, if some screenshots aren’t translated, I can use conditional logic like this to filter them out of the translated outputs:

{% if site.language == "english" %}
{% include image.html file="myimagefile.png" %}
{% endif %}

I access values from my configuration file through the site namespace. So my configuration file just has a property that says this:

language: english

BTW, I realize that unless you’re already familiar with Jekyll, you will probably just glaze over these technical details. I explain them here only to reinforce the fact that you need more than just static Markdown in your source to handle a translation project.

There are some strings that are re-used in notes (tips, cautions, warnings, etc.) and in the interface (a “Submit Feedback” button, for example). I store these strings in the configuration file and then reference them with tags such as {{site.note}}. In the configuration file, that note might look like this:

alerts:
  note: Hinweis
  tip: Spitze
  warning: Warnung
  important: Wichtig

The note then uses a note include template (referenced just like the image template, except with {% include note.html content="..."%}). The template for the include contains content that looks something like this:

<div class="mynote">{{site.alerts.note}}: {{include.content}}</div>

The {{site.alerts.note}} references the value in the configuration file – in this case, since the configuration file is for the German build, the value is Hinweis. If you had other strings that you wanted to isolate like this, you could separate them out into either your configuration file or into other data files (stored in your _data folder) and reference them as variables in your code.

(BTW, these sample translations are just filler text – I haven’t actually translated them yet.)

So far, so good. But now comes the tricky part. Let’s say I have an include to re-use the same paragraph of content, and the translator accepts only HTML. My Markdown source for the re-used component looks like this:

{% include_relative myfile.md %} 

In the HTML output, the contents of myfile.md will automatically be included wherever I’ve included this note (which I wouldn’t know until I searched for all instances of this tag in my content). Won’t this result in the translator translating the same passage multiple times, which would increase the cost?

No, it shouldn’t. The idea of translation memory (standard in most translation tools) is that the translator gets prompted when a previously translated segment appears again in the text. If the translator isn’t using a computer-assisted translation tool (CAT) that provides translation memory, you probably shouldn’t be using the translator. Without translation memory, though, this would be a problem.

So the HTML files go out to the translators, they plug them into their CAT tools, translate the files, and regenerate them back into HTML or Markdown and return them to me. Now I take the translated content and insert it into my source Markdown files (not my site output), using either Markdown or HTML (Markdown file formats can process either). I can now generate my site from the source into multiple formats.

Theoretically this should work, though I say theoretically because I haven’t pushed my approach through the whole workflow yet.

Also, the scenario that I’ve outlined at a very high level here is just the best-case scenario. In real life, there are additional complications and quirks that I have to account for.

While static site generators are flexible and allow me to implement hacks to get around quirks or oddities in non-standard systems, at some point I have to take a stance against absolute flexibility and lay down some rules, such as not allowing authors to mix relative and absolute links in the sidebar navigation, or not allowing custom include templates that don’t implement translation variables.

I spent the last few days working through my translation scenario, and I hope I’ve accounted for all the details, but I won’t know for another few months until I’ve gone through this process multiple times. I’m taking Andrew’s initial advice to triple my estimates.

I will readily admit that the more languages and formats you output to (for example, suppose I was outputting to 10 languages in both HTML and PDF), the scenario I’ve used here will get more cumbersome. At some point, it might make sense to plug into tools like Paligo that have been designed from the ground up to support robust translation workflows.

Then again, I’m betting that each system has a learning curve along with some strengths and weaknesses, and given the variety of scenarios and requirements, push button solutions might not be advantageous over more flexible and custom setups using processes like I described here.

If you’ve managed translation projects and want to share your insights, please do so in the comments below. One reason I write posts about topics or techniques that I’m still developing is to learn from others ahead of time and hopefully avoid mistakes that I might otherwise make without this input.

Back to Top | Comments Off on The complexities of translation and the need for dynamic variables in the build process

Presentation recording: Hunting for API developer documentation jobs in the San Francisco Bay area, by Andrew Davis

Presentation description

Andrew gave the the presentation “Hunting for Dev Doc Work Around the Bay” on August 15, 2016 at the STC Silicon Valley chapter in Santa Clara, California.

Here’s a description of Andrew’s presentation:

The job market for developer documentation in Silicon Valley is the hottest in the country. But developer documentation jobs can be tricky to get, and also difficult to excel in, especially if you lack an engineering background. Despite the difficulties, technical writers in these positions can bring tremendous value to engineering teams. Immersed in developer documentation, you’re constantly learning new technologies and interacting closely with engineers.

In this presentation, Andrew Davis will talk about developer documentation jobs in the Bay area, specifically:

>

  • Why is developer doc work desirable?
  • Which skills and abilities open doors into developer doc jobs?
  • Which traits and assets keep the doors open?
  • Where is demand strongest?
  • What’s the work pay?
  • How can I get there from here?

Video recording

Here’s the video recording (slides + audio):

Slides

Here are the slides. The slides have detailed speaker notes, so be sure to check those out too.

Audio

Just want the audio?

You can download the MP3 file or subscribe in iTunes.

About Andrew Davis

Andrew DavisAndrew Davis has recruited technical content developers in the SF Bay Area since 1995. He is a former software industry Technical Writer and has a reputation for both understanding and championing the role of content development.

Andrew enjoys helping those who communicate complex information get ahead by recognizing and refining their value to technology companies. He’s candid and connected and, just as importantly, he likes to help tech industry workers achieve their goals and achieve independence from intermediaries.

Andrew ran Synergistech Communications during the Internet Gold Rush years and has recently returned to solo recruiting mode, incorporating his business as Tech Comm Talent. He remains focused on recruiting great technical content development talent for discerning local technology companies. Join him on LinkedIn (http://www.linkedin.com/in/synergistech) and go to Synergistech to learn more.

Back to Top | Comments Off on Presentation recording: Hunting for API developer documentation jobs in the San Francisco Bay area, by Andrew Davis

Will the docs-as-code approach scale? Responding to comments on my Review of Modern Technical Writing

Listen to this post:

You can download the MP3 file or subscribe in iTunes.

Responses to Review post were enormous

The number of comments on my recent post, Review of Andrew Etter’s ebook on Modern Technical Writing showed an overwhelming interest in this topic. The post went viral. In addition to 300+ clicks on Twitter and Linkedin, there were about 1,200 page views and an equal number of clicks on the audio files. People spent an average of 5 minutes 11 seconds reading the post. The post had a lively discussion both on techwr-l and in the post’s comments.

I want to respond generally to at least one trend in the comments – the idea that the docs-as-code solution won’t scale.

By the way, it’s not as if the techniques Etter describes in Modern Technical Writing (or rather the “docs as code” approach) are new. Lots of people (myself included) have been emphasizing static site generators, version control, lightweight markup languages, and other approaches for documentation for a long time.

But Etter’s ebook seems to legitimize the approach, and his title “Modern Technical Writing” might have irked a few people who suddenly felt that their XML/CCMS-based approaches were being labeled antiquated.

In their responses, a few people argued that that the docs-as-code approach would only work for small shops. For larger documentation scenarios, it wouldn’t scale.

For example, one reader said:

Having worked in ginormous and more boutique-sized content projects, the tips and methods described in Andrew’s book are geared more towards the boutique-sized end of the spectrum. Which is fine, as there are many writers who work on projects of that size.

Another person (who noted that he works for easyDITA) said:

I’m sure it could work for a very small team but when you get into tens of thousands of topics (not uncommon) it would be a nightmare. And the entire publishing process enabled by a CCMS offers the advantage of eliminating constant formatting while enabling real time updates. Having multiple Jekyll sites, for example, means keeping track of what is where and somehow making them easily accessible to end users. And what about search? How do you search across all these apps with one search? Etc.,etc.

He continued with some more commentary about content re-use and formatting later:

let’s imagine a part number changes that is used across multiple products and appears in multiple docs published to various media. How do you change every instance of that part number other than manually looking for them? There are so many reasons why structure was developed. The average tech writer spends as much as 40% of their time formatting their documentation for multiple delivery options. Formatting is automated in XML-based systems.

Another person also pointed out the difficulty of scaling these solutions, saying:

when content grows, from my experience, it’s really hard to manage it all with a self-made solution. Especially when it comes to translation!

These commenters make some good points, so let’s take a look at them in more detail. The points might be summarized as follows:

  1. For a project consisting of tens of thousands of topics, the solution would be a nightmare.
  2. You can’t identify and change a component that appears in multiple places.
  3. Translation becomes problematic.
  4. When you have multiple delivery options, this approach becomes inefficient.

For a project consisting of tens of thousands of topics, the solution would be a nightmare

Let’s look at the first objection: For a project consisting of tens of thousands of topics, the solution would be a nightmare.

I’m not entirely sure why a large docs-as-code project would be a nightmare. Microsoft and Rackspace are two examples of large documentation sites that follow a docs-as-code model. SendGrid, Balsamiq, and MongoDB are also more than just “boutique-sized” doc sites.

But are massive projects really an issue? How common are projects that have tens of thousands of topics?

I’ve been a tech writer for more than dozen years, and I’ve never worked on a single project that includes tens of thousands of topics. I may be an outlier, but almost all documentation at companies I’ve worked for include lots of different writers working on semi-independent, smallish projects.

A company may have a large amount of documentation, but the documentation is usually broken down into smaller groups based on different products. The documentation for each product tends to be fairly self-contained and rarely exceeds 500 topics per project. Additionally, usually the topics in one project don’t need to be re-used in the other projects.

Almost all large doc sites break down into aggregations of smaller projects (see the MongoDB docs as an example). The documentation may all be published on the same website, but each product’s documentation is usually segmented off from each other.

When I think of possible projects that might have tens of thousands of topics, I think of documenting robust software like the Adobe Creative Suite or maybe a Boeing 747 airplane. Is there something about the docs-as-code approach (and using a version control system like Github) that couldn’t scale to hold 50,000 topics for a single project? Because docs-as-code projects are text files rather than binary files, the projects might actually scale more easily.

Moving the source content into a Github repository, you might have 50 folders, each with 1,000 files. You could have one team working within a specific folder, each committing content into the project. Why wouldn’t that work?

We wouldn’t tell a large group of developers that they couldn’t scale their code using version control systems, right? Let’say a company has 1,000 developers. They will find a way to manage their code using various repositories. Teams work in specific repos, and large teams working in the same repo develop in separate branches and then merge those branches into the master.

Developers do usually have package management systems that control and automate builds, though. Maybe that’s the equivalent of the CCMS for content?

I’ll concede that I don’t have a lot of experience working on massive documentation projects, so maybe my views would change here given more exposure to the problems that crop up with these projects.

You can’t identify and change a component that appears in multiple places

Let’s look at another objection: You can’t identify and change a component that appears in multiple places.

This idea assumes that you can’t re-use components in a static site generator model. Actually, you can. I talked about this in my series comparing DITA versus Jekyll. See Creating re-usable chunks (conref) in Jekyll versus DITA.

In fact, re-using content is easier in a static site generator than it is with DITA, because DITA requires the content to be “valid” in every place it’s re-used. This means a note that appears inside a task element might not be valid if the note appears outside the task.

But it does get difficult if you have to re-use content across projects. For example, suppose you have 5 different technical writers working on 5 separate projects, with each project in a separate repository. How do you re-use a topic from Project A in Project B?

When I ran into this scenario at a former company, we changed our repository model from separate repos to the same repo. Every writer started committing to the same repo but in different folders. As long as you didn’t edit other people’s content without letting them know, you don’t run into merge conflict scenarios.

Technical writers aren’t used to version control workflows, so many of these concepts may be unfamiliar. But working in the same project has a number of benefits:

  • You can see what other writers are doing by looking at their commit logs.
  • All projects can be controlled by the same stylesheets, templates, and other project files.
  • Writers can help each other out by solving problems that others run into.

Writers who aren’t ready to release content work in separate branches until they’re ready to release. Then they merge their branch into the master.

Let’s say you want to maintain separate repos but still re-use content across the repos. Jekyll recently released a feature called gem-based themes. With gem-based themes, you deliver the theme files (the includes, layouts, styles, and other assets) through Ruby gems. This mechanism allows you to deliver updates across repos without requiring each writer to manually copy in updates to the theme.

While you could manage theme updates this way, you could also distribute re-usable content in the same way. Instead of theme-based includes, you could deliver content to be included across projects.

Another way to re-use content across projects is by rendering the content into JSON and then pulling the content into your project where it’s needed. I explored this approach in Help APIs and UI Tooltips. This is also more or less the approach used by Contentful, an API-based content management system.

One bit of info that static site generators won’t show you, though, is an automatic UI display that lists where the content is re-used. You would likely need to do a text search for the re-used component. However, searching for text in an editor like Atom, Sublime, or WebStorm is quite easy. It’s not as if you’re using Notepad to author content.

Translation becomes problematic

Let’s look at another objection: Translation becomes problematic.

Translation is something I’m working on right now. Although most translation agencies can readily consume Word, HTML, or XML, consuming Markdown is less standard and sometimes more problematic since Markdown has different flavors or syntax.

However, in my tests with two translation agencies, both agencies had Markdown filters that could process the Markdown syntax without problems. I’ll write more on the topic of translation in another post. (I’m currently adjusting my Jekyll theme to accommodate translation.)

When you have multiple delivery options, this approach becomes inefficient

Let’s look at the final objection: When you have multiple delivery options, this approach becomes inefficient.

Without question, most static site generators are optimized to deliver web-based HTML output. They aren’t optimized for generating PDFs, so if PDF is a huge requirement, you might want to use a standard XML-based system.

That said, you can generate beautiful PDFs using a static site generator. I explained how to do it using Prince XML with Jekyll here: Generating PDFs. It’s a little trickier to set up, but you also have greater control to customize the output without having to dig into XSL-FO stylesheets.

Generally I’m content with a single HTML output, because you should avoid PDF for all the reasons I listed in this post: Why do we need PDFs?

Other reasons: Developer documentation

There’s another reason why you might want to embrace the docs-as-code model: if you’re working in developer documentation.

For example, take a look at this documentation site: PlayFab. Much of the documentation is built from JSON files that the API generates (you can see the JSON here). Their SDK generator, built on NodeJS, also makes use of the same JSON. How would you integrate this kind of workflow into an XML/CCMS based system? If you can generate some doc material from code annotations, that’s usually the approach engineers prefer.

When I was developing my API documentation course, I surveyed about 100 REST APIs. Almost all of them follow similar characteristics:

  • The docs are somewhat small (at least not tens of thousands of pages).
  • The docs are published on custom-branded websites.
  • The docs almost never provide PDF output.
  • The docs are rarely translated.
  • The docs sometimes often have publishing workflows that pull from code repositories.
  • The docs often include interactive API explorers that allow users to try out requests.
  • The docs follow specific templates for REST APIs (including sections such as endpoint definitions, parameters, sample requests, sample responses, and so on.)
  • The docs often have Hello World tutorials (instead of strictly task-based patterns).
  • The docs’ interactivity is driven through Swagger, RAML, or API Blueprint specs.

Developer documentation does not often contain the same problems that CCMSs solve – such as multi-channel output, PDF generation, content re-use, strict enforcement of information typing patterns, topic metadata, and so on.

Given all of these differences, is it hard to imagine that a docs-as-code approach might be a better fit for developer documentation?

When you work in developer documentation, you don’t want to be behind a WYSIWYG interface. You want to work in the raw code, and a text editor like WebStorm feels like a natural fit if you’re already testing your code in Android Studio (since both use the same Jetbrains UI framework).

Using for loops, if-else logic, and filters available in Liquid feel right at home to me. The idea that I can iterate through a collection of items and do something with the content is pretty awesome. It means I can push content out into JSON or other custom formats, create my own scripts, or come up with ways to handle what I need (for example, creating a custom include template that I can populate with parameters.)

When I’m working in the code, I’m not at the mercy of a black box. I can build my own doc tooling to suit my needs and workflow, and I can incorporate programming workflows and techniques more directly in my content.

Conclusion

Overall, maybe some of the disagreement about whether the docs-as-code approach scales is due to differences in product categories more than product sizes. Working on something like machinery documentation that is translated into 10 languages, has numerous PDFs for different user roles, and has lots of content re-use is much different from working on a software API documentation project for a targeted developer audience.

Even so, I still think the techniques used in the docs-as-code model might work for a wider variety of documentation projects than some of the critics would agree to. This is an emerging model that will prove disruptive, and with that disruption comes a lot of friction.

Back to Top | Comments Off on Will the docs-as-code approach scale? Responding to comments on my Review of Modern Technical Writing

Other Stuff