How YAML actually works

0

Image by: Opensource.com CC-by-SA 4.0

YAML stands for “YAML Ain’t Markup Language” and it’s described on YAML.org as a human-friendly data serialization language. In other words, it’s a standardized way to make a list. To ensure that the list you make can be reliably parsed, though, there are strict rules around how YAML is formatted, and that sometimes confuses new YAML users.

The good news is that YAML is essentially governed by just three principles.

Indent your YAML

The first rule of YAML formatting is that each node in your list must be indented further than its parent node, and all sibling nodes use the same indentation.

For example, suppose you have the item operating system in a YAML list. This item contains Linux and BSD. The child elements (Linux and BSD) must each be indented to the right more than the operating system element, but equal to one another.

---
- operating systems:
  - Linux
  - BSD

It’s a common convention to indent in multiples of two, probably because one space is harder to notice at a glance compared to two spaces. It’s not technically required to indent in multiples of two, but many YAML linters (applications that verify valid YAML) return warnings for indents with odd numbers of spaces.

Create a sequence in YAML

The second principle of YAML is the sequence datatype. This is the simplest datatype in YAML, and it’s basically a fancy term for “list”, which is exactly what it constructs. When you list singular items (called a scalar in YAML) on a separate line, you create a sequence.

Simple though it may seem, this is a valid YAML sequence:

---
Linux
BSD
Illumos

It’s also valid to write it with dashes in front:

---
- Linux
- BSD
- Illumos

That’s all a sequence is. There are no child nodes in a sequence, because a sequence is just a list of singular items, one item on each line. For a child node, you need a mapping.

Create a mapping in YAML

The third principle of YAML is the mapping datatype. A mapping is a key and value pair. When you want to include a term and its definition in a YAML file, then you use a mapping.

This is a valid mapping in YAML:

---
OS: Linux

As you might guess, you can have a sequence of mappings:

---
OS: Linux
CPU: AMD
RAM: 32G

As usual, you can also write this with dashes as bullet points:

---
- OS: Linux
- CPU: AMD
- RAM: 32G

That’s not all you can do with a mapping, though. You can also include a sequence within a mapping! You’ve already seen this in this very article, in fact. My first example of YAML was a mapping containing a sequence:

---
operating systems:
  - Linux
  - BSD

This maps the values Linux and BSD to the parent node operating system.

Making YAML simple

You now know everything you need to know about YAML! You can learn more about further features of the format (such as commenting, line breaks, null nodes, and so on) by reading the official YAML specification, but what you’ve learnt in this article may be all you ever need.

In addition to the basics I’ve provided in this article, there are two general rules to keep in mind:

  1. Use yamllint or some kind of linter or validator before committing your YAML to production. A good linter alerts you of suboptimal syntax that could cause some parsers (human or computer) trouble and, most importantly, catches errors that definitely render your YAML unusable.
  2. When in doubt, keep in mind the three principles described in this article! These are the building blocks of YAML: sequence, mapping, and indentation. When a linter throws an error, it means you’ve violated one of these principles. Look at the offending line and determine whether it’s a sequence or a mapping, and verify indentation.

YAML is designed to be easy, but often when something is simple in technology it’s because it maintains strict rules that can’t withstand variation. Don’t confuse YAML with an arbitrary bullet point list you jot down in a notebook before you go shopping. YAML is a simple database of information, carefully structured for reliable parsing. Learn it once, and write it well.