Consul ACLs vs Nomad ACLs

Caveat: Consul’s ACL system was revamped in November of 2018 so this comparison is only valid for Consul prior to version 1.4.0. In particular, Consul’s token system in >= 1.4.0 has a strong resemblance to Nomad’s.

One of the more interesting things about Nomad’s ACL system is the ways in which it differs compared to Consul. ACLs came to Consul in September of 2014 and to Nomad in November of 2017, and I can’t help but think the intervening years have informed improvements in the way Nomad handles ACLs. If you (like me) made assumptions that both ACL systems would largely resemble each other, then you’ve learned something about your own assumptions too!

Speaking of assumptions: I’ll assume that you (dear reader) have a passing familiarity with both Consul and Nomad so that I can dive straight into the differences between their ACL systems. If you’re not familiar, here’s an intro to Consul which is geared towards service discovery, and an intro to Nomad which is a workload scheduler. Both have ACL systems which can provide fine-grained restrictions on what actions an API consumer can perform.

Fundamental Differences in ACL Scopes Link to heading

Perhaps the biggest difference between Consul and Nomad’s ACL systems is scope. To join a Consul cluster with all protection features enabled, an agent must have:

A valid ACL token
A valid gossip encryption key
A valid client-side TLS certificate

Compare this to Nomad, where to join an agent must have:

A valid gossip encryption key
A valid client-side TLS certificate

This is a kind of interesting omission for Nomad. Consul agents must have an ACL token with the agent:write policy just to join the cluster whereas there’s no such requirement for Nomad (the ACL policy need to join a Nomad cluster is simply none).

This makes a certain amount of sense. Nomad nodes have only one function, which is to run scheduled workloads. If the cluster operator’s goal is to prevent a rogue machine from joining the cluster and accepting workloads, then client-side TLS and gossip encryption is enough. Nomad nodes don’t modify the cluster and can, at best, modify themselves through changes to their configuration file. No ACL policy is required to simply participate in the cluster.

Consul nodes have comparatively broader responsibilities such as registering services and healthchecks, operations which modify the catalog. The catalog is Consul, so membership in the cluster does not implicitly mean a node can mutate the catalog. Catalog policies can be broad (allowing a node to register any service it wants) or narrow (scoping service registration so that a rogue node can’t register a MitM service pretending to be an authentication API). I’ll admit it seems odd that nodes require an ACL token just to join the cluster, which means I’m overlooking something.

How ACLs are Managed and Consumed Link to heading

At a high-level, Consul and Nomad both have a similar ACL model:

Capabilities describe actions that an API consumer can take (register a service, query a node)
Policy documents describe a list of capabilities
Tokens are associated with policy documents which can be used to authenticate API consumers

Whereas Consul and Nomad’s approach to ACL scope is a matter of different requirements, Nomad’s approach to ACL management is an objective improvement over Consul. I’m sure the only reason that Consul doesn’t have the same ACL management approach as Nomad is because of the difficulty in retrofitting such a system after the ship has already sailed.

Anyway, some examples are in order! In Consul’s ACL system, tokens and policies are associated explicitly; you can see this with Consul’s /acl/create endpoint:

PUT /acl/create

{
  "ID": "cea03e7f-e217-448c-b1d4-d7dafbcd4805",
  "Name": "some-acl-token",
  "Type": "client",
  "Rules": "..."
}

Essentially we’re creating a policy document with inline rules and getting back a token that points at (and only at) this document. The token itself (the ID field) can be set explicitly, or we can leave the field blank and let Consul generate and return one itself.

In Nomad’s ACL system, policy documents and tokens are separate objects. A policy document first must be created:

POST /acl/policy/some-acl-policy

{
  "Name": "some-acl-policy",
  "Description": "An example policy",
  "Rules": "..."
}

This endpoint does not return a token. Instead we create a token which will be associated with one or more existing policy documents:

POST /acl/token

{
  "Name": "some-acl-token",
  "Type": "client",
  "Policies": ["some-acl-policy"]
}

Nomad then returns the token’s accessor key (essentially its internal name) and a secret key which grants access to attached policies. Yep, we’re talking plural! A token can be associated with multiple policies, and a policy can be associated with multiple tokens. Operationally this seems friendlier, in that policy documents can be reused by multiple tokens, and modifications to said policies do not require finding and modifying all consumers of said tokens.

Another subtle difference: Nomad tokens cannot be asserted at creation; the cluster itself will always generate and return accessor and secret keys itself.

That’s All! Link to heading

Hopefully this is helpful to someone else! Nomad’s ACL system is still fairly new at this time of writing, so there isn’t a lot of information out in the wild yet. Overall the differences between how Consul and Nomad scope and manage their ACL systems are minor (but important) considerations for shops preparing to deploy these tools.