Tales from Vector Land - Smuggling Variables

It’s another Vector post! As with the last, I won’t be a post describing Vector’s basic usage (Vector’s docs do that pretty well!) but in short, Vector is an extremely competent tool for building observability pipelines. This post is about a neat trick you can do with its log namespacing feature, which effectively splits the log event into the log body . and its metadata, %.

Normal Usage Link to heading

Log namespacing is the thing that namespaces a log body into the root object . and metadata about it into a separate root object %. Metadata is set by Vector sources; for example, the http_server source adds %http_server.path to events which is useful for router components making decisions about logs, for executing conditions inside Vector remaps, or even to parse data out of a URL pattern like /v1/logs/:source/:destination.

The % metadata object travels alongside the event . the whole way through the pipeline, is accessible in all the same places, is mutable, and changes to the metadata object persist across different component. Take the following remaps for example:

[transforms.one]
type   = "remap"
inputs = ["some_input"]
source = '''
non_prod = "tier:non-prod"
prod     = "tier:prod"

# Cast from `any`
%datadog_agent.ddtags = array!(%datadog_agent.ddtags)

# Set tier based on `env` tag
if includes(%datadog_agent.ddtags, "env:prod") {
  %datadog_agent.ddtags = push(%datadog_agent.ddtags, prod)
} else {
  %datadog_agent.ddtags = push(%datadog_agent.ddtags, non_prod)
}
'''

[transforms.two]
type   = "remap"
inputs = ["one"]
source = '''

three_days = "use_index:3day"
more_days  = "use_index:main"

# Cast from `any`
%datadog_agent.ddtags = array!(%datadog_agent.ddtags)

# Set retention based on `tier` tag
if includes(%datadog_agent.ddtags, "tier:non-prod") {
  %datadog_agent.ddtags = push(%datadog_agent.ddtags, three_days)
} else {
  %datadog_agent.ddtags = push(%datadog_agent.ddtags, more_days)
}
'''

Though a bit contrived, here’s what’s going on: the datadog_agent source jams a bunch of stuff into % include tags, since technically tags are metadata and not part of the log body itself. If an event comes in with an env:test tag, then the event will exit the second component with tags of

["env:test","tier:non-prod","use_index:3day"]

Since tags are part of metadata, metadata must be mutable. It would be weird for Vector not to let you set tags, since jazzing up your observability data is what it’s all about. That got me thinking: if you can modify metadata, can you also create it?

Oh yes.

From simplest to goofiest, here’s a few uses:

Invisible Annotations Link to heading

The simplest way to make use of custom metadata is to treat it as a global-ish variable that travels with your event. Vector also allows you to instantiate regular variables inside VRL programs, but those are scoped to (at most) the component they live inside. Metadata, on the other hand, travels:

[transforms.parse]
type   = "remap"
inputs = ["some_input"]
source = '''
if (parsed, err = parse_json(.); err == null) {
  %custom.parsed = "json"
  . = parsed

} else if (parsed, err = parse_syslog(.); err == null) {
  %custom.parsed = "syslog"
  . = parsed

} else if (parsed, err = parse_aws_alb_log(.); err == null) {
  %custom.parsed = "aws_alb"
  . = parsed

} else {
  %custom.parsed = "unrecognized"
}
'''

Here we’re parsing a raw log which might be JSON or syslog or an ALB log. At the first successful match, the %custom.parsed metadata is set and then follows the event through the pipeline, which we can use to make future decisions about what to do with the event, or even interpolate into templates.

As an aside, trying multiple different parsers is a good use for the coalesce operator – the above approach is more verbose to demonstrate setting custom metadata.

Sentinel Values Link to heading

Vector has ways to error check fallible functions, but if you’re ever 100% sure that – pinky swear! – a fallible function won’t fail, you can include a ! to skip the error check:

[transforms.florp]
type   = "remap"
inputs = ["some_input"]
source = '''
# This:
.florp = string!(.florp)

# Is equivalent-ish to this:
.florp, err = string(.florp)
if err != null {
  abort
}
'''

In string!(.florp), if .florp is not a string, or if it is null (which is, incidentally, also not a string) then string!() throws an error and the component aborts.

One of Vector’s promises is that when a component aborts, it won’t leave an event in a half-modified state. A remap can only modify events if it doesn’t throw an error; if it does throw an error, the event passes through the component unmodified. This is a net good thing, but can be a bit tricksy to troubleshoot. If Vector is processing a million logs a minute and 2-3 of them are failing due to a bad cast somewhere, it might be hard to actually go find those logs. You were so certain that .florp was always a string, but somewhere it isn’t! If only you could find that specific log event where that error occured.

With metadata, you can set a sentinel value so that a failed step gets tagged:

[transforms.big_transform]
type   = "remap"
inputs = ["some_input"]
source = '''
# Attempt a number of things in a very long transform
# ...
# ...

# It's so long, and our unit tests miss a typecast that will fail:
.bonk = string!(.florp)

%custom.sentinel = "ok"
'''

[transforms.sentinel_check]
type   = "remap"
inputs = ["big_transform"]
source = '''
if is_nullish(%custom.sentinel) {
  # Tag/annotate the log to let us know
  .big_transform_failed = true
}
'''

If big_transform fails, then none of its changes happen, which means %custom.sentinel never gets set. The second transform checks this, and when it finds the sentinel value is null, it annotates the log so that you can go find it by searching your log store for the custom attribute.

Since the purpose of these sentinel checks is to find unexpected errors, the sentinel check itself should be kept simple, and tested.

Speaking of tests…

Smugging in per-test values Link to heading

One of my favourite things about Vector is its unit testing framework. Sometimes though it can be tricky to test conditions that might only occur in production. For example:


[transforms.some_remap]
type   = "remap"
inputs = ["some_input"]
source = '''

if "${ENV}" == "prod" {
  # Do one thing
} else {
  # Do something else
}
'''

It would be a bit awkward (but not impossible) to try to run unit tests with different values of the ENV environment variable, but this may be in conflict with, for example, not executing a pipeline in ENV=prod during a CI/CD run. We don’t have time for me to go off on a “proximal causes” tangent here so let’s just agree that pretending something is prod when it isn’t can lead to bad things.

And anyway you can do this instead:


[transforms.some_remap]
type   = "remap"
inputs = ["some_input"]
source = '''

env = string(%ci.environment) ?? "${ENV}

if env == "prod" {
  # Do one thing
} else {
  # Do something else
}
'''

Oh yeah. This is cool. The thing about unit tests is that they let you set both the event . and metadata % in the inputs stanza:

[[tests]]
name = "some test"

[[tests.inputs]]
insert_at = "some_remap"
type      = "vrl"
source    = '''
.msg            = "hello"
%ci.environment = "prod"
'''

When VRL executes env = string(%ci.environment) ?? "${ENV} we find our friend, the coalesce operator. Vector tries to cast %ci.environment to a string which, when running tests, will succeed because %ci.environment was set in the test inputs! During regular operation though, the string!() function will fail because is null %ci.environment. Since that would throw an error, coalesce falls back to ${ENV}.

There are other methods of injecting different values into tests using environment variables or a secrets backend, but what I like about this method is that the value of %ci.environment can be set per test.

Metadata is fun and cool Link to heading

There are probably some other neat tricks you can do with metadata thanks to its versatility. Metadata % travels with the event ., can be accessed or modified anywhere the event can be accessed or modified, and changes to metadata persist (or are otherwise atomicized) the same as changes to the event. You could do all of these things without log namespacing just by packing the data into the regular event, but then you’d have to remember to discard these extra fields before sinking an event to its final destination.

At the end of the day, metadata’s real killer feature is being discarded. It’s clean, it’s easy, and it’s fun.