Netdata - v1.44.0


Table of Contents

Steady to our schedule, this is another great Netdata release!

[!IMPORTANT]
Stay informed about upcoming changes and potential deprecations by reviewing the deprecation notice sections. This will help you plan for any necessary adjustments to ensure a smooth transition.

Netdata Growth

  • 66k+ GitHub Stars ⭐
    Since October 2023, Netdata is leading the observability category in the CNCF landscape, surpassing Elasticsearch. Thank you for your love ❤️! Give Netdata a ⭐ too, on GitHub!

  • 600M+ docker hub pulls
    Netdata runs with about 200k docker hub downloads per day. Since June 2023 we are a Verified Publisher, so that Netdata pulls don't count against docker hub pull limits for our users, allowing all our users to integrate Netdata to their CI/CD toolchains.

Release Summary

  • Netdata beats Prometheus in all aspects: this version of Netdata includes significant improvement allowing Netdata to be a lot more performant than Prometheus, at scale. Full performance analysis included.
  • Netdata Journal Logs: Netdata can now deal with huge systemd-journal databases and is available for the host logs when Netdata runs in a container.
  • First beta version of Netdata's log2journal: a utility to extract, convert, transform and send to systemd-journal any kind of structured logs (including JSON and logfmt logs), similar to what promtail does for Loki.
  • More Netdata Functions: monitor containers and VMs, network interfaces, mount points, block devices, systemd units, systemd services, and more!
  • Netdata now logs to journal instead of log files and the results are amazing!

Release Highlights

Netdata beats Prometheus in all aspects

image

We tested Netdata and Prometheus at scale, both ingesting 2.7 million metrics per second. On the same workload, Netdata vs Prometheus needs:

  • 35% less CPU
  • 49% less RAM
  • 12% less bandwidth
  • 75% less disk space
  • 98% less disk I/O

Read the full performance comparison between Netdata and Prometheus.

To achieve these astonishing results, we made the following changes to Netdata since the previous release:

New SLOTS streaming protocol

A new streaming protocol, allows Netdata children and parents to share a common index of the metrics streamed, allowing the parents to receive metrics without consulting hashtables, reducing the overall overhead on parents by about 30%, without increasing the overhead on children (the children just number each metric).

The new protocol, called SLOTS, is automatically selected when both the child and the parent support it.

Streaming compression algorithms

Streaming now supports multiple compression algorithms. Previous Netdata releases supported only LZ4, which is known for its speed and average compression ratio. This release adds support for ZSTD, GZIP, and BROTLI.

ZSTD provides the best balance between compression ratio and CPU consumption, and therefore it is now the default.

The compression algorithms selection order can be configured on parents, in stream.conf, at the [API] section (parents), by setting compression algorithms order = zstd lz4 brotli gzip.

If you need to save most bandwidth at the expense of CPU utilization set this so that brotli or gzip appear first in the list, before zstd and lz4.

This also means that parents can now have a different compression order for each API key, allowing the use of different API keys based on the location of the child (i.e. children that are on billable egress bandwidth can use an API key that prefers the best compression, like brotli and gzip, while children on non-billable egress bandwidth can use an API key that prefers the best CPU utilization, like zstd or lz4).

Gorilla compression beta

Gorilla compression is a time series data compression technique, developed by Facebook for their time series database, Gorilla. It's particularly efficient for compressing data that changes incrementally over time, which is a common characteristic of time series data.

This release of Netdata includes an adaptation of Gorilla compression, which once enabled, provides 30% additional memory reduction to Netdata.

This was not ready when we compared Netdata and Prometheus, so the Gorilla compression benefits weren't accounted in the comparison. By enabling Gorilla compression, Netdata memory reduction is 70%+ compared to Prometheus.

To try Gorilla compression, edit netdata.conf and set at the [db] section, dbengine page type = gorilla.

Keep in mind that enabling Gorilla compression changes the dbegnine file format to Gorilla compressed metrics. This version of Netdata can read Gorilla-compressed data from dbengine even if Gorilla compression is not enabled, but previous versions of Netdata cannot read it. So, enable Gorilla, only if you don't plan to switch back to a previous version of Netdata.

Our plan is to have Gorilla compression enabled by default at the next release of Netdata.

systemd-journal logs

Our systemd-journal.plugin was already quite faster (10x) than journalctl, but still it was slow when the journal databases is huge (e.g. at journals centralization points where hundreds or thousands of nodes push their logs).

In this release, we introduce several changes to allow the plugin to work promptly in such environments.

Sampling and estimations

The biggest performance issue with systemd-journal logs is the query performance when dealing with huge logs databases.

To overcome this performance issue and provide prompt responses to queries, Netdata now uses the following strategy:

  1. The latest 500k log entries read from journal files work like before: we read all of them and all the values for all their fields, so that we can have accurate histograms and counters per field value at the filters.
  2. Once we hit the 500k log entries limit on a single query, we turn on sampling and estimations.
  3. Sampling distributes 500k more log entries to all the journal files to be read, so that the total log entries queried for their field values will be 1M. This means that if we have to read 100 files, 10k log entries per file will be sampled and 10k log entries more will be unsampled. Since files are usually spread over time, this provides a good sample across time.
  4. When the sampling threshold is hit, Netdata continues reading more log entries without querying the values of the fields. These log entries appear as [unsampled] at the histogram. We know these log entries are there, but the value counters on the field filters do not include them.
  5. When the [unsampled] threshold is hit, and we have read more than 1% of each file, Netdata estimates the number of entries that will be read from the file and skips the rest of it. This estimation appears as [estimated] in the histogram.

The above process allows Netdata to provide a histogram of the logs in a timely manner, even when the number of log entries in the visible timeframe is several dozen million.

A similar process is usually used by log management systems, including Grafana Loki and Elasticsearch. However, Netdata takes a much bigger sample of the data (other systems usually sample only a few thousand log entries, while Netdata usually samples more than a million) and the visualization allows exposing the exact sampling and estimations made at the histogram.

Image showing [unsampled] and [estimated] on a systemd journal system that collects about 10k nginx log entries per second:
image

Read more about journals query performance.

journals scan

On busy logs centralization servers, the number of journal files available in /var/log/journal/remote can grow significantly, slowing down directory listing (even ls -l is very slow on them).

To overcome this issue, Netdata now uses inotify events and sorts the files to be scanned from the latest to the oldest.

These changes allow Netdata to present the logs user interface for the most recent journals, immediately after a Netdata restart, while the journals database is scanned in the background.

Logs UI is now available when using Netdata docker images

We switched Netdata docker images from Alpine Linux to Debian, so that libsystemd will be available inside the docker image, allowing systemd-journal.plugin to be compiled and shipped with Netdata docker images.

Using Netdata docker images, Netdata can now query the host system journal files, while running inside the container.

MESSAGE_ID support

systemd-journal has a nice feature where certain events of common interest are given a specific MESSAGE_ID. Several such MESSAGE_IDs have been assigned to track common events, like coredumps, units start/stop events, VMs start/stop events, time changes, etc. In total, we found more than 50 total unique events that are tracked this way.

This version if systemd-journal.plugin automatically tracks and annotates these MESSAGE_IDs using their names allowing quick spotting of events of common interest.

This feature is available at the MESSAGE_ID field filter, at the right side of the dashboard.

log2journal, a new tool on your quiver for managing logs

log2journal is a new utility allowing the conversion of log files into structured systemd-journal log entries. This is currently in beta.

The utility allows processing logs like this:

tail -F /var/log/nginx/access.log |\
   log2journal -c nginx-combined |\
   systemd-cat-native

The above builds a basic pipeline for converting the access.log of an Nginx web server into structured log entries in the local systemd-journal.

  • tail is responsible for feeding the latest logs lines to log2journal. Multiple files can be specified and log2journal can also pick up the filename from tail and add it as a field to the journal logs.
  • log2journal extracts fields from the log lines it is fed with. This is a powerful tool that can read json and logfmt logs, but also extract fields using PCRE2 patterns from any log. It supports filtering, renaming, and rewriting rules using command line arguments or yaml configuration files. The output of log2journal is the standard Journal Export Format.
  • systemd-cat-native is another new Netdata utility, reading standard Journal Export Format entries, which are then sent to a local or remote systemd-journal system.

Read more here.

Image showing structured nginx logs into systemd-journal:
image

Netdata now logs to systemd-journal

The logging layer of Netdata has been rewritten, so that Netdata logs now go to the systemd-journal, in a namespace called netdata.

The obvious outcome is that now you can monitor Netdata logs, using Netdata's systemd-journal.plugin user interface and thanks to journal namespaces, this does not pollute the system logs. But this is just the beginning...

Netdata utilizes the MESSAGE_ID feature of systemd-journal to register:

  • all alert transitions
  • all alert notifications
  • all connections from Netdata children
  • all connections to Netdata parents

This means that the systemd-journal.plugin user interface, and journalctl can now be used to list all such events uniformly.

Screenshot of Netdata alert transitions in systemd-journals:
image

All Netdata logs are now structured. Netdata can also log in json or logfmt formats. We introduced a lot of new fields to track every aspect of Netdata, in a uniform and consistent way. Read more here.

Furthermore, we introduced a new tool called systemd-cat-native allowing any application or shell script to send structured logs to systemd-journal. Read more here.

Functions, power up your troubleshooting toolkit!

Several new Functions have been added to help us in our troubleshooting journeys. On top of processes, streaming and systemd-journal, we are leveraging the wide range of collectors and metrics Netdata has and bring data in a different visual representation.

The updated list can be found on our documentation here, and you can find a summary of the currently available functions with the corresponding CLI tool it relates to:

| Function | Description | Alternative to CLI tools | plugin - module |
|:-------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|:-----------------------------------------------------------------------------------------------------------|
| block-devices | Disk I/O activity for all block devices, offering insights into both data transfer volume and operation performance. | iostat | proc |
| containers-vms | Insights into the resource utilization of containers and QEMU virtual machines: CPU usage, memory consumption, disk I/O, and network traffic. | docker stats, systemd-cgtop | cgroups |
| ipmi-sensors | Readings and status of IPMI sensors. | ipmi-sensors | freeipmi |
| mount-points | Disk usage for each mount point, including used and available space, both in terms of percentage and actual bytes, as well as used and available inode counts. | df | diskspace |
| network interfaces | Network traffic, packet drop rates, interface states, MTU, speed, and duplex mode for all network interfaces. | bmon, bwm-ng | proc |
| processes | Real-time information about the system's resource usage, including CPU utilization, memory consumption, and disk IO for every running process. | top, htop | apps |
| systemd-journal | Viewing, exploring and analyzing systemd journal logs. | journalctl | systemd-journal |
| systemd-list-units | Information about all systemd units, including their active state, description, whether or not they are enabled, and more. | systemctl list-units | systemd-journal |
| systemd-services | System resource utilization for all running systemd services: CPU, memory, and disk IO. | systemd-cgtop | cgroups |
| streaming | Comprehensive overview of all Netdata children instances, offering detailed information about their status, replication completion time, and many more. | | |

In the short-term, we will keep adding more (hopefully) helpful Functions but have longer-term plan where we will want to expand this functionality to potentially allow taking and storing snapshots of the results based on: triggered alerts, or periodical configuration.

In case you have suggestions we have a running GitHub Discussion open here.

New Alert Notification Integrations to Netdata Cloud

We've been working on adding more Alert Notification Integrations to Netdata Cloud and recently added the following new ones:
* Amazon Simple Notification Service (Amazon SNS), and
* Telegram

image

The full list of Alert Notification Integrations from Netdata Cloud can be found on our documentation here.

Acknowledgments

  • @ClaraCrazy for improving degraded adapters detection in python.d/megacli.
  • @thomasbeaudry for adding UPS selftest and status metrics to charts.d/apcupsd.
  • @watsonbox for adding LBAs written/read metrics to python.d/smartd_log.
  • @sepek for correcting an error in the "Change how long Netdata stores metrics" guide.
  • @seniorquico for fixing parsing and adding MAINT status metrics to python.d/haproxy.
  • @luisj1983 for correcting errors in the Health API documentation.
  • @andyundso for improving apps plugin by adding Erlang in apps_groups.conf.
  • @vobruba-martin for adding various improvements to go.d/mysql.

Contributions

Collectors

Improvements - Add more cases for megacli adapter degraded state (python.d/megacli) ([#16522](https://github.com/netdata/netdata/pull/16522), [@ClaraCrazy](https://github.com/ClaraCrazy)) - Improve estimations accuracy (systemd-journal.plugin) ([#16467](https://github.com/netdata/netdata/pull/16467), [@ktsaou](https://github.com/ktsaou)) - Implement estimations (systemd-journal.plugin)([#16445](https://github.com/netdata/netdata/pull/16445), [@ktsaou](https://github.com/ktsaou)) - Improve startup time (systemd-journal.plugin) ([#16443](https://github.com/netdata/netdata/pull/16443), [@ktsaou](https://github.com/ktsaou)) - Implement sampling (systemd-journal.plugin) ([#16433](https://github.com/netdata/netdata/pull/16433), [@ktsaou](https://github.com/ktsaou)) - Add cgroup current pids metric (cgroups.plugin) ([#16369](https://github.com/netdata/netdata/pull/16369), [@ilyam8](https://github.com/ilyam8)) - Add Ipmi-sensors function (freeipmi.plugin) ([#16363](https://github.com/netdata/netdata/pull/16363), [@ilyam8](https://github.com/ilyam8)) - Add UPS status code metric (charts.d/apcupsd) ([#16361](https://github.com/netdata/netdata/pull/16361), [@thomasbeaudry](https://github.com/thomasbeaudry)) - Add Mount-points function (diskspace.plugin) ([#16345](https://github.com/netdata/netdata/pull/16345), [@ilyam8](https://github.com/ilyam8)) - Add Block-devices function (proc/diskstats) ([#16338](https://github.com/netdata/netdata/pull/16338), [@ilyam8](https://github.com/ilyam8)) - Add UsedBy field to Network-interfaces function (proc/proc_net_dev) ([#16337](https://github.com/netdata/netdata/pull/16337), [@ilyam8](https://github.com/ilyam8)) - Add various improvements to Network-interfaces function (proc/proc_net_dev)([#16336](https://github.com/netdata/netdata/pull/16336), [@ilyam8](https://github.com/ilyam8)) - Add Network-interfaces function (proc/proc_net_dev) ([#16334](https://github.com/netdata/netdata/pull/16334), [@ilyam8](https://github.com/ilyam8)) - Add Systemd-list-units function (systemd-journal.plugin) ([#16318](https://github.com/netdata/netdata/pull/16318), [@ktsaou](https://github.com/ktsaou)) - Add Containers-vms function (cgroups.plugin) ([#16314](https://github.com/netdata/netdata/pull/16314), [@ktsaou](https://github.com/ktsaou)) - Add UPS selftest status metric (charts.d/apcupsd) ([#16286](https://github.com/netdata/netdata/pull/16286), [@thomasbeaudry](https://github.com/thomasbeaudry)) - Add a configuration option to set private cleanup timeout (statsd.plugin) ([#16269](https://github.com/netdata/netdata/pull/16269), [@MrZammler](https://github.com/MrZammler)) - Add container_device label to network interfaces (cgroups.plugin) ([#16261](https://github.com/netdata/netdata/pull/16261), [@ilyam8](https://github.com/ilyam8)) - Add selecting multiple sources support (systemd-journal.plugin) ([#16252](https://github.com/netdata/netdata/pull/16252), [@ktsaou](https://github.com/ktsaou)) - Add total LBAs written/read metrics (python.d/smartd_log) ([#16245](https://github.com/netdata/netdata/pull/16245), [@watsonbox](https://github.com/watsonbox)) - Add Erlang to apps_groups.conf (apps.plugin) ([#16231](https://github.com/netdata/netdata/pull/16231), [@andyundso](https://github.com/andyundso)) - Add support for Proxmox vms/containers name resolution in Docker (cgroups.plugin) ([#16193](https://github.com/netdata/netdata/pull/16193), [@ilyam8](https://github.com/ilyam8)) - Add nested JSON support to log parser (go.d/weblog) ([#1416](https://github.com/netdata/go.d.plugin/pull/1416), [@ilyam8](https://github.com/ilyam8))
Bug fixes #### Bug Fixes - Fix configuration loading (charts.d.plugin ) ([#16471](https://github.com/netdata/netdata/pull/16471), [@ilyam8](https://github.com/ilyam8)) - Fix an issue where systemd-journal would stop trying different socket paths after the first failure (systemd-journal.plugin) ([#16458](https://github.com/netdata/netdata/pull/16458), [@ktsaou](https://github.com/ktsaou)) - Fix parsing PD without NCQ status (python.d/adaptec_raid) ([#16400](https://github.com/netdata/netdata/pull/16400), [@ilyam8](https://github.com/ilyam8)) - Fix Systemd-list-units function expiration time ([#16393](https://github.com/netdata/netdata/pull/16393), [@ilyam8](https://github.com/ilyam8)) - Fix lack of system.net when running inside LXC ([#16364](https://github.com/netdata/netdata/pull/16364), [@ilyam8](https://github.com/ilyam8)) - Fix memory leak in Systemd-list-units function (systemd-journal.plugin) ([#16333](https://github.com/netdata/netdata/pull/16333), [@ktsaou](https://github.com/ktsaou)) - Fix server status parsing and add MAINT status chart (python.d/haproxy) ([#16253](https://github.com/netdata/netdata/pull/16253), [@seniorquico](https://github.com/seniorquico))
Other #### Other - Skip timestamp when logging to journald (python.d.plugin) ([#16516](https://github.com/netdata/netdata/pull/16516), [@ilyam8](https://github.com/ilyam8)) - Mute stock jobs logging during check() (python.d.plugin) ([#16515](https://github.com/netdata/netdata/pull/16515), [@ilyam8](https://github.com/ilyam8)) - Improvement performance of the plugin (systemd-journal.plugin) ([#16509](https://github.com/netdata/netdata/pull/16509), [@ktsaou](https://github.com/ktsaou)) - Don't create runtime disk config by default (proc/diskspace, proc/diskstats) ([#16503](https://github.com/netdata/netdata/pull/16503), [@ilyam8](https://github.com/ilyam8)) - Don't create runtime device config by default (proc/proc_net_dev) ([#16501](https://github.com/netdata/netdata/pull/16501), [@ilyam8](https://github.com/ilyam8)) - Disable netdata monitoring section by default ([#16480](https://github.com/netdata/netdata/pull/16480), [@MrZammler](https://github.com/MrZammler)) - Change apps oom and net charts order (ebpf.plugin) ([#16395](https://github.com/netdata/netdata/pull/16395), [@thiagoftsm](https://github.com/thiagoftsm)) - Fix "differ in signedness" warn in cgroups plugin ([#16391](https://github.com/netdata/netdata/pull/16391), [@ilyam8](https://github.com/ilyam8)) - Fix throttle_duration chart context (cgroups.plugin) ([#16367](https://github.com/netdata/netdata/pull/16367), [@ilyam8](https://github.com/ilyam8)) - Hide summary columns in network and block devices functions (proc/diskstats, proc/proc_net_dev) ([#16347](https://github.com/netdata/netdata/pull/16347), [@ktsaou](https://github.com/ktsaou)) - Fix crash when a container has no CPU/mem metrics in Containers-vms function (cgroups.plugin) ([#16331](https://github.com/netdata/netdata/pull/16331), [@ilyam8](https://github.com/ilyam8)) - Add tcp v6 connect calls to Ebpf_socket function (ebpf.plugin) ([#16316](https://github.com/netdata/netdata/pull/16316), [@thiagoftsm](https://github.com/thiagoftsm)) - Update journal sources once per minute (systemd-journal.plugin) ([#16298](https://github.com/netdata/netdata/pull/16298), [@ktsaou](https://github.com/ktsaou)) - Minor updates and cleanup (systemd-journal.plugin) ([#16267](https://github.com/netdata/netdata/pull/16267), [@ktsaou](https://github.com/ktsaou)) - Stop using deprecated distutils module (python.d.plugin) ([#16259](https://github.com/netdata/netdata/pull/16259), [@MrZammler](https://github.com/MrZammler)) - Remove charts.d/nut ([#16230](https://github.com/netdata/netdata/pull/16230), [@ilyam8](https://github.com/ilyam8)) - Don't log an error opening `cgroup.procs`/`tasks` if it does not exist (cgroups.plugin) ([#16196](https://github.com/netdata/netdata/pull/16196), [@ilyam8](https://github.com/ilyam8)) - Improve exposing metrics by creating a chart for each app group (ebpf.plugin) ([#16139](https://github.com/netdata/netdata/pull/16139), [@thiagoftsm](https://github.com/thiagoftsm)) - Skip timestamp when logging to journald (go.d.plugin) ([#1418](https://github.com/netdata/go.d.plugin/pull/1418), [@ilyam8](https://github.com/ilyam8)) - Replace logger with structured logger (go.d.plugin) ([#1418](https://github.com/netdata/go.d.plugin/pull/1418), [@ilyam8](https://github.com/ilyam8)) - Use SHOW REPLICA STATUS for MySQL v8.0.22+ (go.d/mysql) ([#1392](https://github.com/netdata/go.d.plugin/pull/1392), [@vobruba-martin](https://github.com/vobruba-martin)) - Use performance_schema instead of information_schema for MySQL v8.0.22+ (go.d/mysql) ([#1390](https://github.com/netdata/go.d.plugin/pull/1390), [@vobruba-martin](https://github.com/vobruba-martin))

Packaging/Installation

All changes - Add curl example to create_netdata_conf() ([#16498](https://github.com/netdata/netdata/pull/16498), [@ilyam8](https://github.com/ilyam8)) - Fix incorrect DEB package build dependencies ([#16483](https://github.com/netdata/netdata/pull/16483), [@Ferroin](https://github.com/Ferroin)) - Update go.d plugin version to v0.57.1 ([#16465](https://github.com/netdata/netdata/pull/16465), [@ilyam8](https://github.com/ilyam8)) - Add sbindir_POST to the PATH of bash scripts using `systemd-cat-native` ([#16456](https://github.com/netdata/netdata/pull/16456), [@ilyam8](https://github.com/ilyam8)) - Add LogNamespace to Netdata systemd unit ([#16454](https://github.com/netdata/netdata/pull/16454), [@ilyam8](https://github.com/ilyam8)) - Add linking daemon.log to stderr in docker ([#16447](https://github.com/netdata/netdata/pull/16447), [@ilyam8](https://github.com/ilyam8)) - Add support for installing a specific major version of the agent on install ([#16413](https://github.com/netdata/netdata/pull/16413), [@Ferroin](https://github.com/Ferroin)) - Improve handling around EPEL requirement for RPM packages ([#16406](https://github.com/netdata/netdata/pull/16406), [@Ferroin](https://github.com/Ferroin)) - Add zstd-dev to install-required-packages ([#16370](https://github.com/netdata/netdata/pull/16370), [@ilyam8](https://github.com/ilyam8)) - Fix zstd in static build ([#16349](https://github.com/netdata/netdata/pull/16349), [@ilyam8](https://github.com/ilyam8)) - Cleanup systemd unit files After ([#16332](https://github.com/netdata/netdata/pull/16332), [@ilyam8](https://github.com/ilyam8)) - Update openssl to 3.1.4 for static build ([#16303](https://github.com/netdata/netdata/pull/16303), [@tkatsoulas](https://github.com/tkatsoulas)) - Fix not removing /etc/cron.d/netdata-updater-daily during uninstall ([#16233](https://github.com/netdata/netdata/pull/16233), [@ilyam8](https://github.com/ilyam8)) - Rename auto-update-method to auto-update-type in kickstart ([#16229](https://github.com/netdata/netdata/pull/16229), [@ilyam8](https://github.com/ilyam8)) - Remove support for Alpine 3.15 ([#16205](https://github.com/netdata/netdata/pull/16205), [@tkatsoulas](https://github.com/tkatsoulas)) - Switch to using Debian as a base for our Docker images. ([#15823](https://github.com/netdata/netdata/pull/15823), [@Ferroin](https://github.com/Ferroin))

Documentation

All changes - Add with-systemd-units-monitoring example to docker ([#16513](https://github.com/netdata/netdata/pull/16513), [@ilyam8](https://github.com/ilyam8)) - Add /var/log mount to docker install guide ([#16496](https://github.com/netdata/netdata/pull/16496), [@ilyam8](https://github.com/ilyam8)) - Fix spelling in Cloud on-prem documentation ([#16490](https://github.com/netdata/netdata/pull/16490), [@M4itee](https://github.com/M4itee)) - Fix disable or silence alerts examples in Health API doc ([#16446](https://github.com/netdata/netdata/pull/16446), [@luisj1983](https://github.com/luisj1983)) - Fix icon filename in freebsd meta ([#16441](https://github.com/netdata/netdata/pull/16441), [@shyamvalsan](https://github.com/shyamvalsan)) - Add Cloud on-prem documentation ([#16440](https://github.com/netdata/netdata/pull/16440), [@M4itee](https://github.com/M4itee)) - Fix typos in Health reference doc ([#16439](https://github.com/netdata/netdata/pull/16439), [@MrZammler](https://github.com/MrZammler)) - Adds Cloud Telegram notification metadata ([#16424](https://github.com/netdata/netdata/pull/16424), [@juacker](https://github.com/juacker)) - Fix chart labels examples in Health reference doc ([#16423](https://github.com/netdata/netdata/pull/16423), [@MrZammler](https://github.com/MrZammler)) - Improve Netdata Functions documentation ([#16421](https://github.com/netdata/netdata/pull/16421), [@shyamvalsan](https://github.com/shyamvalsan)) - Add /etc/localtime mount to docker install guide ([#16392](https://github.com/netdata/netdata/pull/16392), [@ilyam8](https://github.com/ilyam8)) - Remove 'families' from Health reference ([#16380](https://github.com/netdata/netdata/pull/16380), [@ilyam8](https://github.com/ilyam8)) - Fix indentation in Cloud AWA SNS notification meta ([#16379](https://github.com/netdata/netdata/pull/16379), [@ilyam8](https://github.com/ilyam8)) - Remove alert guides that are not in the repository.([#16375](https://github.com/netdata/netdata/pull/16375), [@ilyam8](https://github.com/ilyam8)) - Remove unused cloud notification methods docs ([#16372](https://github.com/netdata/netdata/pull/16372), [@ilyam8](https://github.com/ilyam8)) - Add configuration documentation for Cloud AWS SNS ([#16371](https://github.com/netdata/netdata/pull/16371), [@car12o](https://github.com/car12o)) - Correct time unit for dbengine tier 2 explanation ([#16368](https://github.com/netdata/netdata/pull/16368), [@sepek](https://github.com/sepek)) - Add assorted improvements to the version policy draft ([#16362](https://github.com/netdata/netdata/pull/16362), [@Ferroin](https://github.com/Ferroin)) - Import alert guides from Netdata Assistant ([#16355](https://github.com/netdata/netdata/pull/16355), [@ralphm](https://github.com/ralphm)) - Update packaging instructions ([#16344](https://github.com/netdata/netdata/pull/16344), [@tkatsoulas](https://github.com/tkatsoulas)) - Fix readme images ([#16327](https://github.com/netdata/netdata/pull/16327), [@Ancairon](https://github.com/Ancairon)) - Fix nightly tag in helm deploy meta ([#16326](https://github.com/netdata/netdata/pull/16326), [@ilyam8](https://github.com/ilyam8)) - Added section Blog posts to readme ([#16323](https://github.com/netdata/netdata/pull/16323), [@Aliki92](https://github.com/Aliki92)) - Add a note for the docker deployment alongside with cetus ([#16312](https://github.com/netdata/netdata/pull/16312), [@tkatsoulas](https://github.com/tkatsoulas)) - Fix missing privileges and mounts in Docker Swarm deploy doc ([#16308](https://github.com/netdata/netdata/pull/16308), [@ilyam8](https://github.com/ilyam8)) - Use proper icons for deploy integrations ([#16305](https://github.com/netdata/netdata/pull/16305), [@Ancairon](https://github.com/Ancairon)) - Change True/False to yes/no in collectors docs ([#16289](https://github.com/netdata/netdata/pull/16289), [@ilyam8](https://github.com/ilyam8)) - Fix 404s in markdown files ([#16285](https://github.com/netdata/netdata/pull/16285), [@Ancairon](https://github.com/Ancairon)) - Add Forward Secure Sealing in Systemd-Journal doc ([#16247](https://github.com/netdata/netdata/pull/16247), [@ktsaou](https://github.com/ktsaou)) - Fix apps plugin metric names in meta ([#16243](https://github.com/netdata/netdata/pull/16243), [@ilyam8](https://github.com/ilyam8)) - Add active/passive journald centralization without encryption guide ([#16236](https://github.com/netdata/netdata/pull/16236), [@tkatsoulas](https://github.com/tkatsoulas)) - Add document outlining our versioning policy and public API ([#16227](https://github.com/netdata/netdata/pull/16227), [@Ferroin](https://github.com/Ferroin)) - Cleanup journald centralization guide ([#16225](https://github.com/netdata/netdata/pull/16225), [@Ancairon](https://github.com/Ancairon)) - Update info about custom dashboards ([#16121](https://github.com/netdata/netdata/pull/16121), [@elizabyte8](https://github.com/elizabyte8)) - Add info to native packages docs about mirroring our repos ([#16069](https://github.com/netdata/netdata/pull/16069), [@Ferroin](https://github.com/Ferroin)) - Add getting started with netdata Cloud on-prem doc ([#15954](https://github.com/netdata/netdata/pull/15954), [@M4itee](https://github.com/M4itee)

Other Notable Changes

Improvements - Implement removal of unregistered ephemeral host chart labels ([#16486](https://github.com/netdata/netdata/pull/16486), [@stelfrag](https://github.com/stelfrag)) - Implement ephemeral hosts cleanup ([#16381](https://github.com/netdata/netdata/pull/16381), [@stelfrag](https://github.com/stelfrag)) - Implement structured logging and log to Systemd journal by default ([#16357](https://github.com/netdata/netdata/pull/16357), [@ktsaou](https://github.com/ktsaou)) - Made Streaming function available to all users ([#16346](https://github.com/netdata/netdata/pull/16346), [@ktsaou](https://github.com/ktsaou)) - Improve database corruption detection during runtime ([#16343](https://github.com/netdata/netdata/pull/16343), [@stelfrag](https://github.com/stelfrag)) - Add Brotli support for streaming ([#16287](https://github.com/netdata/netdata/pull/16287), [@ktsaou](https://github.com/ktsaou)) - Add ZSTD compression support for streaming ([#16268](https://github.com/netdata/netdata/pull/16268), [@ktsaou](https://github.com/ktsaou)) - Add script to generate self-signed-certificates for systemd-journal-remote ([#16235](https://github.com/netdata/netdata/pull/16235), [@ktsaou](https://github.com/ktsaou)) - Optimizations to make busy parents more efficient ([#16127](https://github.com/netdata/netdata/pull/16127), [@ktsaou](https://github.com/ktsaou)) - Add support for gorilla pages for tier 0 ([#15969](https://github.com/netdata/netdata/pull/15969), [@vkalintiris](https://github.com/vkalintiris))
Bug Fixes - Fix occasional shutdown deadlock ([#16495](https://github.com/netdata/netdata/pull/16495), [@stelfrag](https://github.com/stelfrag)) - Fix reusing TCP port when binding ([#16420](https://github.com/netdata/netdata/pull/16420), [@ilyam8](https://github.com/ilyam8)) - Fix counting spaces as CPU cores when reading `cpuset.cpus` ([#16385](https://github.com/netdata/netdata/pull/16385), [@ilyam8](https://github.com/ilyam8)) - Improve agent to cloud status update process ([#16342](https://github.com/netdata/netdata/pull/16342), [@stelfrag](https://github.com/stelfrag))
Other - Improve logging clarity: change log level to debug for dbengine routine operations on start ([#16518](https://github.com/netdata/netdata/pull/16518), [@ilyam8](https://github.com/ilyam8)) - Improve logging clarity: remove logging of system info ([#16517](https://github.com/netdata/netdata/pull/16517), [@ilyam8](https://github.com/ilyam8)) - Add prefix to logs-management chart names ([#16514](https://github.com/netdata/netdata/pull/16514), [@Dim-P](https://github.com/Dim-P)) - Fix "has aggregated" debug output in apps.plugin ([#16512](https://github.com/netdata/netdata/pull/16512), [@ilyam8](https://github.com/ilyam8)) - Various improvements to log2journal part 4 ([#16510](https://github.com/netdata/netdata/pull/16510), [@ktsaou](https://github.com/ktsaou)) - Improve logging clarity: reclassify specific messages as information part2 ([#16508](https://github.com/netdata/netdata/pull/16508), [@ilyam8](https://github.com/ilyam8)) - Fix coverity issue 410232 ([#16507](https://github.com/netdata/netdata/pull/16507), [@stelfrag](https://github.com/stelfrag)) - Improve logging clarity: reclassify specific messages as information part1 ([#16505](https://github.com/netdata/netdata/pull/16505), [@ilyam8](https://github.com/ilyam8)) - Fix CID 410152 dereference after null check ([#16502](https://github.com/netdata/netdata/pull/16502), [@stelfrag](https://github.com/stelfrag)) - Various improvements to log2journal part 2 ([#16494](https://github.com/netdata/netdata/pull/16494), [@ktsaou](https://github.com/ktsaou)) - Fix builds on macOS due to missing endianness functions ([#16489](https://github.com/netdata/netdata/pull/16489), [@vkalintiris](https://github.com/vkalintiris)) - Add missing yaml elements to log2journal ([#16488](https://github.com/netdata/netdata/pull/16488), [@ktsaou](https://github.com/ktsaou)) - Add option to submit logs to systemd journal to logs-management ([#16485](https://github.com/netdata/netdata/pull/16485), [@Dim-P](https://github.com/Dim-P)) - Add function cancellability to logs-management ([#16484](https://github.com/netdata/netdata/pull/16484), [@Dim-P](https://github.com/Dim-P)) - Move log2journal to collectors/ ([#16481](https://github.com/netdata/netdata/pull/16481), [@ktsaou](https://github.com/ktsaou)) - Add YAML configuration support to log2journal ([#16479](https://github.com/netdata/netdata/pull/16479), [@ktsaou](https://github.com/ktsaou)) - Log alarm notifications to health.log ([#16476](https://github.com/netdata/netdata/pull/16476), [@ktsaou](https://github.com/ktsaou)) - Cleanup systemd-journald plugin code ([#16475](https://github.com/netdata/netdata/pull/16475), [@ktsaou](https://github.com/ktsaou)) - Check context post processing queue before sending status to Cloud ([#16472](https://github.com/netdata/netdata/pull/16472), [@stelfrag](https://github.com/stelfrag)) - Fix error limit to respect the log every ([#16469](https://github.com/netdata/netdata/pull/16469), [@stelfrag](https://github.com/stelfrag)) - Fix analytics logs ([#16462](https://github.com/netdata/netdata/pull/16462), [@ktsaou](https://github.com/ktsaou)) - Fix logs bashism ([#16461](https://github.com/netdata/netdata/pull/16461), [@ktsaou](https://github.com/ktsaou)) - Fix log2journal incorrect log ([#16460](https://github.com/netdata/netdata/pull/16460), [@ktsaou](https://github.com/ktsaou)) - Fix various logging environment variables issues ([#16459](https://github.com/netdata/netdata/pull/16459), [@ktsaou](https://github.com/ktsaou)) - Add environment variable for the Journald socket path ([#16457](https://github.com/netdata/netdata/pull/16457), [@ktsaou](https://github.com/ktsaou)) - Fix missing argument on log init in xenstat plugin ([#16451](https://github.com/netdata/netdata/pull/16451), [@vkalintiris](https://github.com/vkalintiris)) - Change log flood protection to 1000 log lines / 1 minute ([#16450](https://github.com/netdata/netdata/pull/16450), [@ilyam8](https://github.com/ilyam8)) - Change the interface_speed alarm value to Mbit ([#16429](https://github.com/netdata/netdata/pull/16429), [@ilyam8](https://github.com/ilyam8)) - Improve logging clarity: removing logging errors from reading filtered alerts ([#16417](https://github.com/netdata/netdata/pull/16417), [@MrZammler](https://github.com/MrZammler)) - Bring back chart id to `title` in /api/v1/charts ([#16416](https://github.com/netdata/netdata/pull/16416), [@ilyam8](https://github.com/ilyam8)) - Fix an issue where reused connections were counted as new ([#16414](https://github.com/netdata/netdata/pull/16414), [@ilyam8](https://github.com/ilyam8)) - Remove queue limit from ACLK sync event loop ([#16411](https://github.com/netdata/netdata/pull/16411), [@stelfrag](https://github.com/stelfrag)) - Implement using /host/etc/hostname in Docker ([#16401](https://github.com/netdata/netdata/pull/16401), [@ilyam8](https://github.com/ilyam8)) - Fix v0 dashboard ([#16389](https://github.com/netdata/netdata/pull/16389), [@ilyam8](https://github.com/ilyam8)) - Use pre-configured message_ids to identify common logs ([#16383](https://github.com/netdata/netdata/pull/16383), [@ktsaou](https://github.com/ktsaou)) - Switch alarm_log to use the buffer json functions ([#16360](https://github.com/netdata/netdata/pull/16360), [@stelfrag](https://github.com/stelfrag)) - Switch to using buffer json functions in charts/chart endpoints ([#16359](https://github.com/netdata/netdata/pull/16359), [@stelfrag](https://github.com/stelfrag)) - Replace rrdset_is_obsolete & rrdset_isnot_obsolete ([#16351](https://github.com/netdata/netdata/pull/16351), [@MrZammler](https://github.com/MrZammler)) - Add rrddim_get_last_stored_value to simplify function code in internal collectors ([#16348](https://github.com/netdata/netdata/pull/16348), [@ilyam8](https://github.com/ilyam8)) - Add api/v2 support to h2o ([#16340](https://github.com/netdata/netdata/pull/16340), [@underhood](https://github.com/underhood)) - Improve unittests ([#16329](https://github.com/netdata/netdata/pull/16329), [@stelfrag](https://github.com/stelfrag)) - Fix coverity warnings in cgroups ([#16328](https://github.com/netdata/netdata/pull/16328), [@ilyam8](https://github.com/ilyam8)) - Rename newly added functions ([#16325](https://github.com/netdata/netdata/pull/16325), [@ktsaou](https://github.com/ktsaou)) - Improve performance of alarm log queries by keeping precompiled statements ([#16321](https://github.com/netdata/netdata/pull/16321), [@stelfrag](https://github.com/stelfrag)) - Fix journal file index when collision is detected ([#16319](https://github.com/netdata/netdata/pull/16319), [@stelfrag](https://github.com/stelfrag)) - Optimize database before agent shutdown ([#16317](https://github.com/netdata/netdata/pull/16317), [@stelfrag](https://github.com/stelfrag)) - Improve shutdown when collectors are active ([#16315](https://github.com/netdata/netdata/pull/16315), [@stelfrag](https://github.com/stelfrag)) - Fix using absolute path for echo in netdata-claim.sh ([#16300](https://github.com/netdata/netdata/pull/16300), [@ilyam8](https://github.com/ilyam8)) - Fix various issues identified by coverity ([#16294](https://github.com/netdata/netdata/pull/16294), [@ktsaou](https://github.com/ktsaou)) - Fix renames in freebsd ([#16292](https://github.com/netdata/netdata/pull/16292), [@ktsaou](https://github.com/ktsaou)) - Fix retention loading ([#16290](https://github.com/netdata/netdata/pull/16290), [@ktsaou](https://github.com/ktsaou)) - Add cmd args for reading specific files to local_listeners ([#16273](https://github.com/netdata/netdata/pull/16273), [@ilyam8](https://github.com/ilyam8)) - Fix the issue of streaming disconnection when sending REPORT_JOB_STATUS ([#16272](https://github.com/netdata/netdata/pull/16272), [@underhood](https://github.com/underhood)) - Fix sources match in systemd-journal plugin ([#16271](https://github.com/netdata/netdata/pull/16271), [@ktsaou](https://github.com/ktsaou)) - Fix coverity issue 403725 ([#16265](https://github.com/netdata/netdata/pull/16265), [@stelfrag](https://github.com/stelfrag)) - Improve dimension ML model load ([#16262](https://github.com/netdata/netdata/pull/16262), [@stelfrag](https://github.com/stelfrag)) - Improvements to Dyncfg ([#16250](https://github.com/netdata/netdata/pull/16250), [@ktsaou](https://github.com/ktsaou)) - Drop an unused index from aclk_alert table ([#16242](https://github.com/netdata/netdata/pull/16242), [@stelfrag](https://github.com/stelfrag)) - Add DYNCFG_RESET command to Dynamic configuration ([#16241](https://github.com/netdata/netdata/pull/16241), [@underhood](https://github.com/underhood)) - Reuse ML load prepared statement ([#16240](https://github.com/netdata/netdata/pull/16240), [@stelfrag](https://github.com/stelfrag)) - Fix meta unittest ([#16221](https://github.com/netdata/netdata/pull/16221), [@stelfrag](https://github.com/stelfrag)) - Minimize hashtable collisions in facets ([#16215](https://github.com/netdata/netdata/pull/16215), [@ktsaou](https://github.com/ktsaou)) - Improve context load on startup ([#16203](https://github.com/netdata/netdata/pull/16203), [@stelfrag](https://github.com/stelfrag)) - Add support for h2o evloop netdata stream ([#14868](https://github.com/netdata/netdata/pull/14868), [@underhood](https://github.com/underhood)) - Add Logs Management ([#13291](https://github.com/netdata/netdata/pull/13291), [@Dim-P](https://github.com/Dim-P))

Deprecation notice

Changed in this release

In accordance with our previous deprecation notice, the following items in this release have been changed:

Other unannounced changes:

  • Netdata internal metrics (Netdata Monitoring section) are disabled by default to reduce the overall data volume. Later we plan to enable only important internal metrics by default.

Can be enabled in netdata.conf by uncommenting and changing no to yes:

ini [plugins] # netdata monitoring = no # netdata monitoring extended = no

  • Logging
  • Logs format changed to logfmt.
  • Default logging destination changed to systemd-journal (systemd-only): logs are now sent to the "netdata" namespace in systemd-journal. Systemd-journal provides a centralized repository for all system logs, making it easier to manage and search for logs. To override the default behavior and continue using the file-based logging, refer to the netdata.conf file and make the necessary changes under the [logs] section.
  • File-based logging: error.log renamed to daemon.log.

Will be changed in the next release

  • To ensure seamless compatibility with future updates, we recommend transitioning from source-built installations to our distribution packages or static binaries. Starting with our next release, we will no longer guarantee compatibility when updating source-built installations. This change allows us to focus on enhancing the stability and feature delivery for the rest of our supported installation methods.

  • Gorilla compression will be enabled by default.

  • The Google Cloud Pub Sub and the AWS Kinesis exporters will be removed in the next release. Both of them were not maintained and were not used when building packages. Users can consult the exporting documentation for alternative exporters to use.

  • The database modes map and save will be removed in the next release. The dbengine database mode will be used to persist metrics on disk automatically.

  • Per-core CPU metrics will be disabled by default to reduce data volume. Summary (per-system) metrics are still collected. This change enhances performance and resource utilization. Disabled metrics:

  • cpu.cpu (utilization).
  • cpu.interrupts (all interrupts).
  • cpu.softirqs (software interrupts).
  • cpu.softnet_stat (software interrupts related to network receive work).
  • cpu.cpu_cstate_residency_time (idle states).

Can be enabled in netdata.conf by uncommenting and changing no to yes:

```ini
[plugin:proc:/proc/stat]
# per cpu core utilization = no
# cpu idle states = no

[plugin:proc:/proc/interrupts]
# interrupts per core = no

[plugin:proc:/proc/softirqs]
# interrupts per core = no

[plugin:proc:/proc/net/softnet_stat]
# softnet_stat per core = no
```

  • To optimize system performance, several eBPF.plugin modules have been disabled by default. While these modules provide valuable insights into system resource usage, they can also contribute to system overhead. They will expose metrics using Functions (run on demand and for a limited period of time). These modules include:
  • cachestat
  • fd
  • process
  • oomkill
  • shm
  • swap

Netdata Release Meetup

Join the Netdata team on the 11th of December at 16:30 UTC for the Netdata Release Meetup.

Together we’ll cover:
* Release Highlights.
* Acknowledgments.
* Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1800 engineers are already using it!

Details

date
Dec. 6, 2023, 6:15 p.m.
name
v1.44.0
type
Minor
👇
Register or login to:
  • 🔍View and search all Netdata releases.
  • 🛠️Create and share lists to track your tools.
  • 🚨Setup notifications for major, security, feature or patch updates.
  • 🚀Much more coming soon!
Continue with GitHub
Continue with Google
or