Add vmagent role#2704
Draft
scibi wants to merge 3 commits into
Draft
Conversation
6e90968 to
501a042
Compare
Install and configure VictoriaMetrics vmagent via systemd template units, with multi-instance support, SHA256-verified binary installs and DebOps secret integration for remote-write bearer tokens. Co-authored-by: Cursor <cursoragent@cursor.com>
Import the service playbook from layer/agent.yml alongside the other observability agents, register global handlers, add the role to the Monitoring section of role-index.rst and document the new role in CHANGELOG.rst. Co-authored-by: Cursor <cursoragent@cursor.com>
501a042 to
442dd6d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new
debops.vmagentrole that installs and configures VictoriaMetrics'vmagenton Debian-family hosts. The role manages one or more named instances via a single hardenedvmagent@.servicesystemd template unit, sources the binary from the upstreamvmutilsGitHub release archive (with full SHA256 verification), and integrates with the standard DebOpssecretmechanism for per-instance bearer tokens. Wires the new playbook intolayer/agent.ymlso thatdebops run siteconfigures vmagentalongside the other observability agents (Filebeat, Metricbeat, Telegraf, Zabbix Agent).
A first-class air-gapped install path is built in: the binary can be sourced from an internal HTTP(S) mirror (Nexus / Artifactory / MinIO / nginx), copied from the Ansible Controller, picked up from a path already on the host (e.g. Packer-baked), or its management can be skipped entirely. The same SHA256 contract applies to every channel.
Motivation
VictoriaMetrics does not publish an APT repository - the official binaries are distributed only as
.tar.gzarchives on the GitHub Releases page, which makes thedebops.apt-style pattern unusable. Most DebOps deployments of vmagent today rely on either (a) running it as a container (which adds an unnecessary network namespace between the agent and the host exporters it scrapes) or (b) hand-rolledresourcesrole tasks + manual systemd units.A dedicated DebOps role offers:
vmagent@default.service,vmagent@aggregator.service, ...) sharing the same binary with independent persistent queues and remote-write targets.vmagent__version+vmagent__archive_sha256_mapgate every install; bumping is an explicit two-line change.secret/vmagent/instances/<name>/bearer_tokenon the Controller, never in cleartext in inventory.ProtectSystem=strict,PrivateTmp,ProtectKernel*, emptyCapabilityBoundingSet,MemoryDenyWriteExecute,StateDirectory=vmagent/%i(only writable path for the daemon), and aTimeoutStopSeclarge enough for the persistent queue to flush before SIGKILL.Design
Systemd template unit
A single
vmagent@.servicetemplate unit drives all instances. Per-instance state lives in two files under/etc/vmagent:<name>.yml- Prometheus-formatscrape_configs:consumed by-promscrape.config=.<name>.env-EnvironmentFile=containingARGS="..."(CLI flags).Persistent queues live under
/var/lib/vmagent/<name>/, restricted to the instance viaStateDirectory=vmagent/%iso that hardening (ProtectSystem=strict) does not block writes.Adding a new instance is a single YAML entry under
vmagent__instances; removing one isstate: 'absent'with full cleanup of unit / config / env / queue files.Validated instance definitions
Each instance passes through an
assertinmanage_instance.ymlbefore any files are deployed:A missing
name, an unknownstate, or a present instance with noremote_write_urlsfails the play immediately with a clear message instead of silently deploying a broken systemd unit.Binary install waterfall
The role evaluates five sources of the
vmagentbinary, in order, and uses the first one that resolves:vmagent__skip_install: True- role is a no-op for binary management (image-baked deployments).vmagent -versionalready reports the matching version - short circuit, no downloads.vmagent__local_archive_path- archive already present on the remote host.vmagent__controller_archive_path- archive copied from the Ansible Controller viaansible.builtin.copy.ansible.builtin.get_urlfromvmagent__release_url={{ release_base_url }}/v{{ version }}/{{ archive_name }}.The same SHA256 from
vmagent__archive_sha256_map[arch]is enforced in every branch. For the controller-side and local-host branches the check runs as a separatestat+assertstep after the file is in place.Restart semantics
The role registers separate change variables for binary, config, env, and secret per instance, and only restarts when one of that instance's files changes:
Editing one instance's scrape config does not bounce other instances on the same host - important when multiple instances target different remote-write tenants.
Hardened systemd unit
vmagentonly needs to read its config, write to its queue directory, listen on a loopback port for its HTTP endpoint, and open outbound TCP to the remote-write target. The hardened template appliesProtectSystem=strict,PrivateTmp,ProtectKernel*,ProtectClock,ProtectHostname,RestrictRealtime,RestrictSUIDSGID,RestrictNamespaces,LockPersonality,MemoryDenyWriteExecute, an emptyCapabilityBoundingSet, andRestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX AF_NETLINK. The only writable path is{{ vmagent__home }}/%i, scoped viaReadWritePaths. Additional restrictions (e.g.IPAddressAllow=) can be appended viavmagent__systemd_hardening_extrawithout forking the role.Hook points
tasks/vmagent/{pre_main,post_main}.ymlare empty placeholders that project-level overrides can populate via the standarddebops.debops.task_srclookup. Useful for hooks like "wait for vmagent to report/-/ready" or "register a Prometheus scrape target for the agent's own metrics".What this PR adds
ansible/roles/vmagent/defaults/main.yml(~370 lines, fully documented)meta/main.yml,COPYRIGHTtasks/main.yml- user/group/dirs, binary install include, systemd template unit, loop overvmagent__combined_instancestasks/main_env.yml- pre-task computingvmagent__secret_directoriestasks/install_binary.yml- five-stage binary delivery waterfall with mandatory SHA256 verificationtasks/manage_instance.yml- per-instance deploy / remove block with validation and selective restarttasks/vmagent/{pre_main,post_main}.yml- hook placeholderstemplates/etc/vmagent/instance.yml.j2- Prometheusscrape_configs:renderertemplates/etc/vmagent/instance.env.j2- CLI-flag renderer (handles bools, scalars, lists for repeating flags)templates/etc/systemd/system/vmagent@.service.j2- hardened systemd template unittemplates/etc/ansible/facts.d/vmagent.fact.j2- Python local facts exposing version + active instancestemplates/lookup/vmagent_env_secret_directories.j2roles/global_handlers/handlers/vmagent.ymlwithRestart vmagent instances, registered alphabetically inroles/global_handlers/handlers/main.yml.ansible/playbooks/service/vmagent.ymlvmagent/main_envaspre_taskto populatevmagent__secret_directoriesroles:order:secret(creates secret dirs) ->vmagent(main role). Nokeyringrole - no APT repo.ansible/playbooks/layer/agent.ymldocs/ansible/roles/vmagent/getting-started.rst- what vmagent is, prerequisites, minimal inventory, multi-instance pattern, secret management, tagsdefaults-detailed.rst-vmagent__ref_instances,vmagent__ref_binary_source(waterfall),vmagent__ref_systemd_hardening_extraguide-victoriametrics-integration.rst- typical patterns: per-host scrape + remote write, self-monitoring on the VictoriaMetrics host, bearer-token auth, multi-target fan-out, aggregatorguide-airgapped-install.rst- the four non-GitHub delivery patterns step by stepdocs/ansible/role-index.rstupdated.system):ansible/views/system/inventory/groups.yml- new file:debops_service_vmagent.children = debops_all_hosts, so vmagent rolls out to every managed host without touchinghosts_*.ymlfiles.ansible/views/system/inventory/group_vars/all/vmagent.yml- central VictoriaMetrics endpoint default (https://vmetrics.sciborek.com/api/v1/write) and a placeholder emptyscrape_configsready fornode_exporter.ansible/views/system/inventory/host_vars/vmetrics.sciborek.com/vmagent.ymlhttp://127.0.0.1:8428/api/v1/writeto bypass nginx + DNS.Dependencies
Independent - depends only on
master. No coupling with any other PR in flight.Testing
Tested in a homelab DebOps deployment running Debian on unprivileged Proxmox LXC containers, with a central VictoriaMetrics single-node already exposed at
vmetrics.sciborek.com:8428viadebops.docker_service+ nginx:/usr/local/bin/vmagentabsent.debops run service/vmagentdownloadsvmutils-linux-amd64-v1.144.0.tar.gz, verifies SHA256, installs the binary, deploys the unit, startsvmagent@default.service, queries on the central VM confirm ingestion within 30 s.vmagent__release_base_url: 'https://nexus.lan/raw/vendor/victoriametrics/releases'and the archive pre-uploaded to the mirror. No GitHub traffic; SHA256 verification still applies.vmagent__controller_archive_pathpointed at a tarball infiles/. Role copies + verifies + extracts; binary lands at the expected version./usr/local/bin/vmagent,vmagent__skip_install: True. Role manages only configuration / systemd, never touches the binary.vmagent@default.service+vmagent@aggregator.service. Modifying the aggregator'sscrape_configstriggers exactly one restart - of the aggregator unit only.state: 'absent'on an instance stops + disables the unit, removes config / env / queue. Removingvmagent__instancesentries is verified by inspectingansible_local.vmagent.instancesafter a re-run.changed=0for the role;vmagent.factreports unchanged version.systemd-analyze security vmagent@default.servicereturns anExposure levelof1.x SAFE.Compatibility
v1.144.0(the pinned default; bumping is a two-line change indefaults/main.yml).linux-amd64andlinux-arm64(viavmagent__arch_map). Other architectures require an extra entry invmagent__archive_sha256_map.Checklist
vmagent__variable namespace,[ 'role::vmagent', 'skip::vmagent' ]tags,become: Trueon the playbook level)service/<name>.ymlstructure (pre_task computes dependent vars, thensecret-> main role)docs/ansible/roles/vmagent/(getting-started, defaults-detailed, two guides, all listed inrole-index.rst)tasks/vmagent/{pre_main,post_main}.ymlhook placeholders provided so default-config users don't need to create empty filesTimeoutStopSeclarge enough for persistent queue flush before SIGKILL