An Introduction to YAML

Introduction

YAML stands for "YAML ain't markup language" (recursion acronym). It is a language used to store data and represent it in a format that is readable and serialized.

It used to be called "Yet another markup language". Why? Because it was created at a time when markup languages were at the forefront of connectivity and data visualization.

With due time, it became a common misconception about how it functions. It was recognized as more data-oriented rather than just a document markup. Hence it was changed to "YAML ain't markup language".

Well, if you are guessing what a markup language is? It is nothing but the way how you structure pages or documents by adding different elements like tags, headers, lists, components, and so on. A good example is HTML!

What is YAML exactly?

As mentioned earlier, it provides us with a format that helps us to visualize and store data in a simple way that is easier to read. It is similar to XML and JSON files. It also performs data serialization and deserialization.

what does it mean?

Serializing data is converting objects (code + data) into a machine-readable (common standard) format (streams of bytes) that can be transmitted and understood by various computer environment applications. Deserialization is just the opposite. It is just extracting the necessary information for us to visualize the code.

Let's understand with the help of an example.

A developer writes a few lines of code for an Android application. Now he thinks to scale to various applications like web apps, websites, and so on. So he needs to write code or a configuration file in such a way that it can be understood by all of them. He can achieve this by implementing data serialization.

Why YAML?

The configuration files for Docker, Kubernetes, Ansible, Prometheus, all are written in YAML. Hence it is a widely used format for writing configurations and manifests for many DevOps tools and applications.

Let's say you created some applications and you want to run a few of them on these many servers on the cloud. So Kubernetes will create an object or a pod that will run those applications. But first, you need to provide a configuration file or a YAML file that tells Kubernetes what your object containing the application should look like.

What are its benefits?

It is simple and easy to read.
It gives high importance to syntax, indentation being one of them.
It can be converted to JSON, XML, etc.
Most languages use YAML.
YAML files can be easily shared and used across different platforms and operating systems without requiring any special software or tools.
It is widely used in configuration files for software applications, build systems, continuous integration/continuous deployment (CI/CD) pipelines, and infrastructure-as-code (IaC) tools like Ansible, Terraform, Docker Compose, orchestration tool like Kubernetes and so many out there.
It is more powerful when representing complex data.
There are libraries and parsers available for most popular programming languages to work with YAML.
Parsing is easy.

Hands-on understanding

Let's have a brief understanding of how YAML code is structured and written.The syntax of a YAML code is very simple. It is written in key-value pairs.

Scalars

The first type of data that can be written in YAML is scalars. These include strings, numbers, booleans, null, timestamps, etc.

A string is a collection of characters that can be anything. There are 5 ways to write a string -

Double quotes (")
Single quotes (')
No quotes (plain)
Folded strings (>)
Multiple line strings (|)

"name": "xyz"
name: 'xyz'
name: xyz

Code becomes messy when a single line becomes long enough to scroll sideways. To prevent this, the symbol ">" can be added to let the compiler know that even if we are breaking a single line of code into multiple lines, it should treat it as a single line. This is how folding is done.

description: >
  This is a long description
  that spans multiple lines
  in YAML.

And when you want your lines of code to be treated exactly as mentioned even if they are folded into multiple lines, we can use the pipe symbol (|).

message: |
  This is a multiline
  string in YAML.
  Line breaks and spacing
  are preserved as is.

Since we know what a string looks like in YAML, let's move on to other scalars. Some self-explanatory examples are...

"mammals": "humans" # string
name: John Doe  # String
---
age: 25        # Number -> integer
marks: 98.3    # float
exponents: 78E56  # exponent number
notANum: .nan   # not a number
---
isStudent: true  # Boolean alternatives -> TRUE, yes, YES, on, ON
canFly: false    # FALSE, no, NO, off, OFF.
---
score: null    # Null value
timestamp: 2023-06-03T10:15:00Z  # Timestamp

datatypes can be explicitly specified by mentioning "!! (datatype)" in front of the value...

---
# integer numbers
zero: !!int 0
positiveNumber: !!int 67
negativeNumber: !!int -90
binaryNum: !!int 10011
octalVal: !!int 0342
hexaVal: !!int 0x46
commaValue: !!int +69_000 # 69,000
---
# floating point numbers
marks: !!float 86.3
infinite: !!float .inf
---
# boolean
canRun: !!bool No
canFly: !!bool Yes
---
# string 
name: !!str xyz
---
# null
haveKids: !!null null # or NULL or ~
---
# timestamp
defaultTime: !! timestamp 2023-06-03T10:15:00Z # UTC time
IndiaTime: !! timestamp 2023-06-03T10:15:00Z +5:30 # gives India Time

List

The second one is a list or a map. In YAML, it represents an ordered collection of values. Each item starts with a hyphen. Each item can be of any YAML data type, including scalars, lists, or dictionaries. This is how you can write down a list of things...

- apple
- banana
- orange
- mango

A list can be written in a block style referring to a key element.

supercars:
  - Lamborghini
  - Buggati
  - Aston Martin
  - Audi
  - Mercedes

...make sure you space them out properly because YAML has a very strict indentation rule.

Let's say, it becomes irritating to take care of the indentation problem, you can add the components of the above list separated by commas between square brackets.

This is called the float style.

supercars: [Lamborghini, Buggati, Aston Martin, Audi, Mercedes]

Lists can also be explicitly sequenced by using the "!!seq" tag as follows..

supercars: !!seq
  - Lamborghini
  - Buggati
  - Aston Martin
  - Audi
  - Mercedes

A list can also have one or two items missing. In this case, it is called a sparse sequence or a sparse list.

supercars:
  - Lamborghini
  - 
  - Aston Martin
  - 
  - Mercedes

Nested sequence of lists....

  - apple
  - banana
  - orange
  - mango
-
  - Lamborghini
  - Buggati
  - Aston Martin
  - Audi
  - Mercedes

In the above example, we have a nested sequence of lists with the first list of fruits and the second list of supercars.

'!!pairs' is a tag used to represent a sequence of key-value pairs. It helps to assign a key to have multiple values..

specs: !!pairs
  - car1: red
  - car1: 4-wheel drive

'!!set' allows to have unique values...

values: !!set
  ? value1
  ? value2
  ? value3

Dictionary

The third one is a dictionary. It is an object giving information about various types of data represented by the key element. The key is of string data type, whereas the value can be of any type. We can say that a dictionary is an unordered list of component features.

car:
  brand: Lamborghini
  model: Aventador SVJ
  year: 2023
  top_speed: 217 mph
  body_style: Coupe
  color: Arancio Atlas (Orange)
  features:
    - Carbon fiber body panels
    - Active aerodynamics
    - Adaptive suspension
    - Ceramic brakes
  price: $517,770

It can also be typed in the flow notation like this..

car:
  brand: Lamborghini
  model: Aventador SVJ
  year: 2023
  keyFeatures: { top_speed: 217 mph, body_style: Coupe }

'!!omap' allows us to have an ordered map or dictionary..

fruits: !!omap
  - apple: red
  - orange: orange
  - banana: yellow

The next one is nested data types. A YAML file can have lists and dictionaries nested within another list or dictionaries.

In the following code, we can see that we have a list of fruits, each being a dictionary (name, color, nutrients). And each dictionary contains a list of nutrients the fruit gives.

fruits:
  - name: apple
    color: red
    nutrients:
      - vitamin C
      - fiber
  - name: banana
    color: yellow
    nutrients:
      - vitamin B6
      - potassium
  - name: orange
    color: orange
    nutrients:
      - vitamin C
      - folate

Different types of documents can be separated by 3 dashes like this " --- ", and to end your document, add "..." , for ex.

---
- name: John Doe
  age: 25
  isStudent: true
- brand: Apple
  product: iPhone
  price: 999
---
title: My Todo List
tasks:
  - task: Buy groceries
    priority: high
  - task: Pay bills
    priority: medium
...

In the above example, we see that the first doc. is a list of dictionaries (John Doe and Apple), whereas the second doc. represents a dictionary key-value pair and a list of tasks each being a dictionary within.

The code can be commented simply by putting a hash symbol (#) in front of the line like this..

colors:
  - red
  # - blue
  - green
  # - pink

Anchors and Aliases

These help to reuse properties in YAML using symbols &, <<, and * . '&' is used to define anchors, '<<' to merge properties, and '*' to create aliases allowing us to reuse data.

# Define an anchor using &
  person: &person_details
  name: John Doe
  age: 30

# Use the << to merge additional properties with the anchored values
customer:
  <<: *person_details
  email: johndoe@example.com

# Another alias referencing the same anchored values
employee: *person_details
    age: 26

In this example, we define an anchor named person_details first using &, then with the anchor name (&person_details). The anchor is used in a dictionary that contains a person's name and age.

Then, in the 'customer' section, we use the '<<' syntax to combine the anchored values with extra characteristics. The '<<' makes sure that the 'person_details' anchor's properties should be combined with the current dictionary. As a result, the 'customer' dictionary will include the 'person_details' anchor's 'name' and 'age' properties, as well as the extra 'email' property.

Finally, we use 'person_details' to construct an alias called 'employee'. The '*' character followed by the anchor name (*person_details) generates an alias that refers to the anchored values. As a result, the 'employee' dictionary will have the same 'name' and 'age' properties. One additional thing happening over here is, the property 'age' is being overridden with the value 26.

Comparison among YAML, JSON, and XML

Like YAML, JSON (Javascript Object Notation) too performs its way of representing data in a key-value format. Except for the fact that it has a stricter syntax which includes spaces, indentation, mandatory double quotes in keys and values, and most importantly curly brackets.

On the other hand, XML (Extensible Markup language) uses tags like HTML to show data.

Let's have a look at the tree representation of a person. He has a name, age, and address. The address in turn contains street and city properties. (As simple as that)

Let's compare them code-wise...

person:
  name: Alex
  age: 30
  address:
    street: 123 Bridgeton lane
    city: Brooklyn

{
  "person": {
    "name": "Alex",
    "age": 30,
    "address": {
      "street": "123 Bridgeton lane",
      "city": "Brooklyn"
    }
  }
}

<person>
  <name>Alex</name>
  <age>30</age>
  <address>
    <street>123 Bridgeton lane</street>
    <city>Brooklyn</city>
  </address>
</person>

YAML from a technical perspective

Let's see how YAML works in a real-world scenario.

Ansible

Ansible, a prominent IT automation and configuration management tool, frequently employs YAML. YAML is used in Ansible to define playbooks, which are high-level descriptions of a system's desired state.

Ansible playbooks are used to automate repetitive operations that execute actions automatically.

Down below we have an Ansible playbook that installs Nginx, replaces the existing default Nginx landing page with the supplied template, and ultimately allows TCP access on port 80.

---
- hosts: all
  become: yes
  vars:
    page_title: Spacelift
    page_description: Spacelift is a sophisticated CI/CD platform for Terraform, CloudFormation, Pulumi, and Kubernetes.
  tasks:
    - name: Install Nginx
      apt:
        name: nginx
        state: latest

    - name: Apply Page Template
      template:
        src: files/spacelift-intro.j2
        dest: /var/www/html/index.nginx-debian.html

    - name: Allow all access to tcp port 80
      ufw:
        rule: allow
        port: '80'
        proto: tcp

Kubernetes

YAML is widely used in Kubernetes as the preferred format for specifying and configuring various cluster resources. Users can declare the attributes, behavior, and relationships of Kubernetes resources using YAML files, which serve as a declarative representation of the desired state of objects.

Kubernetes operates on a state model, attempting to attain the desired state declaratively from the present state. Kubernetes defines the Kubernetes object with YAML files, which are then applied to the cluster to build resources such as pods, services, and deployments.

Kubernetes is a container orchestrator which does more than that including deployment, scaling, and management of containerized applications.

Here is an example of a YAML file that describes a resource kind "deployment" that runs an app "my-app" with 3 replicas of it.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: my-app-image:latest
        ports:
        - containerPort: 8080

GitLab CI/CD

GitLab is a DevOps platform that is web-based and offers a full collection of tools for managing the entire software development lifecycle. It includes tools for teams to interact, version control their code, automate CI/CD pipelines, track bugs, manage repositories, and deploy apps.

Here is an example of YAML's contribution to building CI/CD pipelines.

stages:
  - build
  - test
  - deploy

variables:
  IMAGE_TAG: "latest"

build_job:
  stage: build
  script:
    - docker build -t my-app:${IMAGE_TAG} .
    - docker push my-app:${IMAGE_TAG}

test_job:
  stage: test
  script:
    - docker run my-app:${IMAGE_TAG} npm test

deploy_job:
  stage: deploy
  script:
    - kubectl apply -f deployment.yaml

The YAML file in this example describes a pipeline with three stages: build, test, and deploy. Each level has a job that completes particular responsibilities.

The "variables" section defines an "IMAGE_TAG" variable with the value "latest" that may be used throughout the pipeline.

The "build_job" class represents a job in the build stage. It creates a Docker image for a given project, tags it with the "IMAGE_TAG" variable, and uploads it to a container registry.

The "test_job" class represents a job in the testing stage. It executes the project's tests and runs the Docker image built during the build step.

The "deploy_job" class represents a job in the deployment stage. It employs the Kubernetes deployment settings specified in the "deployment.yaml".

Conclusion

YAML provides a robust and user-friendly means of representing and exchanging data. Because of its readability, versatility, compatibility, and integration capabilities, it is a popular choice in a variety of disciplines, including configuration files, data serialization, and system orchestration. Developers can improve productivity, code maintainability, and cooperation within their projects by exploiting YAML's features.