Monitoring and managing the impact of query performance on Osquery

Performance testing and bench-marking Osquery queries

The Osquery watchdog

Scheduled query may have failed: <query_name>

Osquery watchdog command line options

  • --disable_watchdog=false
    If true the watchdog process is disabled and running workers will not be restarted. If false any performance limit as defined below is violated, the “worker” process will be restarted.
  • --watchdog_level=0
    The watchdog supervisor can be run in one of three modes. These modes are used to configure the performance limits:
  • Normal --watchdog_level=0 (default)
    The default performance limits are 200MB memory cap and 25% CPU usage for 9 seconds. The default mode allows for 10 restarts of the worker if the limits are violated.
  • Restrictive --watchdog_level=1
    The restrictive profile allows a 100MB memory cap and 18% CPU usage for 9 seconds. The restrictive mode allows for only 4 restarts before the service is disabled.
  • Disabled --watchdog_level=-1.
    It is better to set the watchdog to disabled. Rather than disabling the watchdog outright. As the worker/watcher concept is used for extensions too.
  • Watchdog memory limits --watchdog_memory_limit=0
    The memory limit is expressed as a value representing MB. It is recommend to allocate more than 200M, but somewhere less than 1G. Zercurity uses 400MB (--watchdog_memory_limit=400) which we’ve found to be a good upper limit.
  • Watchdog CPU limits --watchdog_utilization_limit=0
    The utilization limit value is the maximum number of CPU cycles counted as the processes. The default is 90. meaning less 90 seconds of cpu time per 3 seconds of wall time is allowed.
  • Watchdog delay --watchdog_delay=60
    This is the delay in seconds before the watchdog process starts enforcing memory and CPU utilization limits. The default is 60 seconds. This allows the daemon to perform resource intense actions, such as forwarding logs, at startup.

Osquery profiler: profile.py

pip3 install psutil
git
clone git@github.com:osquery/osquery.git
{
"schedule": {
"proc1": {
"query": "SELECT * FROM processes;",
"interval": 60
},
"proc2": {
"query": "SELECT * FROM processes WHERE pid > 1000;",
"interval": 60
}
}
}
./tools/analysis/profile.py --shell `which osqueryi` --config test.confProfiling query: SELECT * FROM processes;
U:2 C:1 M:3 F:0 D:2 proc1 (1/1): utilization: 43.95 cpu_time: 0.44 memory: 26013696 fds: 4 duration: 1.0
Profiling query: SELECT * FROM processes WHERE pid > 1000;
U:2 C:1 M:3 F:0 D:2 proc2 (1/1): utilization: 43.95 cpu_time: 0.44 memory: 26013696 fds: 4 duration: 1.0
./tools/analysis/profile.py --shell `which osqueryi` --query "SELECT * FROM processes;" --rounds 3 --count 10Profiling query: SELECT * FROM processes; U:3  C:2  M:3  F:0  D:3  manual (1/3): utilization: 78.66 cpu_time: 2.37 memory: 26435584 fds: 4 duration: 3.0 
U:3 C:2 M:3 F:0 D:3 manual (2/3): utilization: 84.87 cpu_time: 3.41 memory: 26210304 fds: 4 duration: 4.0
U:3 C:2 M:3 F:0 D:2 manual (3/3): utilization: 75.25 cpu_time: 2.27 memory: 25980928 fds: 4 duration: 2.5
U:3 C:2 M:3 F:0 D:3 manual avg: utilization: 79.59 cpu_time: 2.6 memory: 26208938 fds: 4.0 duration: 3.2
Osquery profiler results.
  • Utilization U:3
    CPU usage as a percentage. This value can be greater than 100% for processes with threads running across different CPUs.
  • CPU time C:2
    Shows the total CPU time. Containing user, system, children_user, system_user.
  • Memory M:26208938
    Shows the total memory used in bytes. The above example would be 26MB used.
  • File descriptors (FDS) F:4
    Shows the number of file descriptors used by the osqueryi process during query execution.
  • Duration D:3.2
    The number of seconds elapsed whilst running the query.

Osquery osquery_schedule

SELECT * FROM osquery_schedule WHERE last_executed > 0;
Querying Osquery’s osquery_schedule table to fetch performance information related to our scheduled queries.
  • name
    This is the name of the scheduled query (as defined in your config). Zercurity however, will always provide the query UUID as its name. However, the query field will show you the actual query.
  • query
    The exact query which was run e.g. SELECT * FROM users;
  • interval
    The number of seconds this query is set to run. Please note this is not guaranteed as per the Osquery scheduler.
  • executions
    This is the number of times the query was executed
  • last_executed
    UNIX time stamp in seconds of the last completed execution
  • denylisted
    If the query keeps hitting the limits imposed by the Osquery watchdog. The query will be prevented from running in the future. So the users machine isn’t impacted. The denylisted result will either by a 1 if the query has been denylisted. Or 0 if the query is still able to run.
  • output_size
    Total number of bytes generated by the query.
  • wall_time
    Total wall time spent executing. This is the elapsed time, including time spent waiting for its turn on the CPU. Note: that this is that total amount of time and not the last result.
  • user_time
    Total user time spent executing in user land. Note: that this is that total amount of time and not the last result.
  • system_time
    Total system time spent executing by the kernel. Note: that this is that total amount of time and not the last result.
  • average_memory
    This is the average private memory (resident_size) left after executing. Divided by the number of executions.
SELECT name, query, interval, executions, last_executed, denylisted, output_size,
IFNULL(system_time / executions, 0) AS average_system_time,
IFNULL(user_time / executions, 0) AS average_user_time,
IFNULL(wall_time / executions, 0) AS average_wall_time,
ROUND((average_memory * '10e-7'), 2) AS average_memory_mb
FROM osquery_schedule;

Optimizing queries

Table delay

File Hashing

Its all over!

--

--

--

Real-time security and compliance delivered.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Understanding Git Rebase Basics

New Enemy Movement for Laser Enemy — 2D Galaxy Shooter in Unity!

Why nullable types?

HOW TO MAKE INSTAGRAM FOLLOWERS SCRAPER BOT USING DATAKUND?

Integration Architecture — Part 1

ImageColorSwitcher in Flutter: Part 1 Raster Image Coloring

ImageColorSwitcher in Flutter: Part I-Raster Image Coloring

Top 5 use cases of Python Set

Deep Dive on Spring Boot and JPA Implementation: A to Z

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Zercurity

Zercurity

Real-time security and compliance delivered.

More from Medium

Collecting and parsing IoCs at scale

Cloning Oracle Pluggable Databases on DBCS

Accelerating IOT Transformation with Cloud Networking as-a-Service

Ditching Password for Ditching Security and Democracy ?