Managing compliance with Osquery for local & remote workers

11 min readApr 29, 2020

Remote working is obviously quite topical at the moment. However, with our prior update regarding the introduction of workflows into the Zercurity platform. I wanted to run through some examples of how Osquery can be used to automate the monitoring of your fleets hygiene and notify users directly to remedy issues, taking the pressure off your IT teams.

Zercurity’s workflow engine. Allowing notifications to be sent to users if the compliance state of their system changes.

Background

Its incredibly important to continually monitor the compliance state of systems deployed within your environment. It helps gauge the changes in user behavior and whether policies need to be changed to better reflect your users current habits. Otherwise, as is human nature — those controls sometimes end up being circumvented.

Zercurity makes use of an open source tool called Osquery. A fantastic tool which lets you remotely query your infrastructure just like you would a database using SQL.

One of the key use cases for Osquery is monitoring & compliance.

Compliance

We’ve gone to great lengths to build thousands checks using Osquery based on many common cybersecurity best practices and frameworks (CIS, Cyber essentials, etc), which are useful. However, for the majority of cases you just want to check basic compliance policies are being adhered too, as an example:

Is disk encryption turned on?
Is the firewall enabled?
Are updates enabled and being kept up-to-date?
Is anti-virus enabled & definitions up-to-date?

These are all things that on the face of it you’d assume are all working based on system policies. However, whether some users need to flexibility to turn these services on and off, such as developers. Have been intentionally or unintentionally disabled. Or simply have failed to start. Its important to keep tabs on these kind of checks. Especially if you’re trying to adhere to a compliance framework.

Osquery

Osquery is great for running quick checks against the scenarios listed above.

In the examples below we’re using Osquery version 4.2.0 as some of the tables queried in the examples below may not exist in earlier versions or may have been removed or deprecated in future versions.

In all the examples below you’ll notice in our statements we try to return either a 1 or 0 indicating a pass or failure by using the COUNT(*) function. You can change the query to suite your needs.

Disk encryption

Osquery provides two tables bitlocker_info for Windows and disk_encryption for Mac and Linux. For Mac and Linux we first need to get the device name mounted at our filesystem root / . Once we have that we can check the encryption status of that drive. For Windows, we can simply check the Bitlocker for drive C: .

# MacOSX & Linux
SELECT COUNT(*) AS passed FROM disk_encryption 
WHERE name IN (
  SELECT device FROM mounts WHERE path = '/'
);# Windows
SELECT COUNT(*) AS passed FROM bitlocker_info
WHERE protection_status = 1
AND conversion_status = 1
AND drive_letter = 'C:';

Firewalls

Osquery helpfully provides the alf table for Mac OSX to quickly check the firewall state including whether the host is in stealth mode. As for Linux we’re using the augeas table. Whilst the Augeas parser has support for many file configuration formats, the Debian uwf.conf isn’t supported. Though it’s very easy to add support for additional file types. As for Windows we’re able to run a simple registry query to check the firewall state.

# MacOSX
SELECT COUNT(*) AS passed FROM alf 
WHERE global_state = 1;# Linux
# NOTE: We use custom Augeas lenses in order to parse the uwf.conf 
#       file on Debian based systems.
SELECT COUNT(*) AS passed FROM augeas 
WHERE path = "/etc/ufw/ufw.conf" 
AND label = "ENABLED" 
AND value = "yes";# Windows
SELECT COUNT(*) AS passed FROM registry 
WHERE key = 'HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SharedAccess\Parameters\FirewallPolicy\StandardProfile'
AND name = 'EnableFirewall'
AND data = '1';

System updates

Checking whether the system is up-to-date is a little trickier using Osquery alone and something we’ve covered in depth in earlier posts. However, to check that auto-update is at least enabled. For Mac OSX we can parse the com.apple.SoftwareUpdate.plist file to ensure AutomaticCheckEnabled is enabled or 1 . For Linux we just need to check for the existence of the 50unattended-upgrades file on Debian. You could also write a custom parser for Augeas to check the file has been configured correctly. Lastly, for Windows a registry key check for AU (Automatic Updates) and the NoAutoUpdate key to ensure its disabled. There are may other registry keys that can additionally be checked within the AU registry key, dependent on your corporate policy.

# MacOSX
SELECT value AS passed FROM plist 
WHERE path = '/Library/Preferences/com.apple.SoftwareUpdate.plist' 
AND key = 'AutomaticCheckEnabled';# Linux
# This will only check for the existence of unattended upgrades but
# will not ensure the configuration
SELECT COUNT(*) AS passed FROM file 
WHERE path = '/etc/apt/apt.conf.d/50unattended-upgrades';# Windows
SELECT COUNT(*) AS passed FROM registry 
WHERE key = 'HKEY_LOCAL_MACHINE\\SOFTWARE\\Policies\\Microsoft\\Windows\\WindowsUpdate\\AU' 
AND name = 'NoAutoUpdate' AND data = '0';

Anti-virus definitions

Anti-virus is a little more difficult as it depends on your AV vendor and what if any API access they might provide. However, generally file checks or hashes will work. Or in the case of Windows registry checks will generally let you query this information. Though it may need to be checked against the Vendor who will typically publish the latest definitions revision number which you can make a comparison against.

This is where in general the need for post processing comes into play. To take a result from a given query and act off of the back of it. In some cases even chaining these actions together to create workflows.

Workflows

Our workflow platform lets you design and run your applications without the need for code. You can subscribe to events generated by Zercurity or scheduled intervals. You can event perform tasks based off of events from external sources. We’re trying to simplify complexity around data-transformation between services that has become common within Cybersecurity.

Workflows are made up of a series of steps, with the output of one step acting as input into the next. Development is easier and more intuitive using Workflows than running and maintaining scripts. We translate your workflow into a state machine diagram that’s easy to understand, explain to others, and easy to change.

Our workflow engine tracks and instruments each step. Even retrying when there are errors. You can build long-running workflows for machine learning jobs, report generation and IT automation. You can also build high volume, short duration workflows for data ingestion, and streaming data processing.

Automating compliance

Given both the examples above and the ability to craft Workflows to automate tasks we can combine these two tools. To notify users if and when their systems compliance state changes.

Osquery

In the query examples above we’ve used the COUNT(*) function to represent either a 1 or 0. I.e. they query has found a result based on the provided conditions and is thus passing. Or no result has been found and thus is in contention.

osquery> SELECT COUNT(*) AS passed FROM alf WHERE global_state = 1;
+--------+
| passed |
+--------+
| 1      |
+--------+

If you’re using Zercurity. Osquery compliance checks can be created to periodically run and check the state of various policies. If and when the state changes from passing to failing Zercurity will fire an event. For asset, compliance changes its the ASSET_COMPLIANCE event. Which in its JSON format looks like this:

{
  "uuid": "5c15f50e-432f-43fa-b169-40ea0b2957d1",
  "timestamp": "2020-03-04T14:18:14+00:00",
  "level": "INFO",
  "action": "ASSET_COMPLIANCE",
  "meta": null,
  "changes": {
    "passed": {
      "last": false,
      "latest": true
    }
  },
  "item": {
    "id": "45e0bba6-9775-4dc9-b280-7ab4d35c5e0b",
    "name": "FW-MAC Firewall check"
  },
  "company": {
    "uuid": "f50dfca1-c8b0-40ca-80cd-984d2d6dce10",
    "name": "Zercurity"
  },
  "user": null,
  "team": {
    "uuid": "53d27c4a-41f5-4d37-83ab-93532dbfeab6",
    "name": "Zercurity"
  },
  "asset": {
    "uuid": "4e8924a8-31ff-4895-8956-92879b8c617e",
    "name": "James (laptop)"
  }
}

This will be the input that’s dispatched to your Workflow. From the Zercurity interface, you can see all the latest compliance events. These are only triggered when a change in state has occurred.

From the events tab you can filter by Asset->Compliance to see all the latest compliance changes

Building our workflow

I’m not going to go through each step in the process of creating a workflow to message a given user on Slack as there is a pre-built example that you can use from within the Workflow builder. However, I do just want to explain briefly how it all works.

Zercurity can integrate with several third party services. In our example we’re using Google g-suite to automatically tie user accounts to assets. This is so we’ll be able to know which system belongs to which user. Who we can then notify via Slack.

From the Workflows menu click Create workflow this will give you a basic starting example which you can either build on our use one of our many pre-built examples.

Triggers

Triggers are used to execute your workflow. There are two types of triggers you can either subscribe to events generated by Zercurity. Or you can schedule a workflow to be executed at a specific interval or given time. You may also execute workflows directly via our API.

For our example we’re making use of the ASSET_COMPLIANCE event which is triggered every time a compliance check rule changes either from passed to failing or visa versa.

States

The next stage is to build our workflow. Which is comprised of states. These states accept an input, transform the data and return an output which is then either passed onto another state or returned as the final output.

When creating a new workflow your starting JSON schema will look something like this:

{
  "name": "Your workflow",
  "description": "Your new workflow",
  "entry": "start",
  "states": {
    "start": {
      "type": "PASS",
      "description": "Your entry function",
      "end": true
    }
  }
}

In the root of your document, you have both a name and a description of your workflow. Followed by the entry which provides the name of the first state that is called when your workflow is executed.

The states key provides a JSON dictionary of every state in your workflow. Each key represents a new state and its value is your state definition. Which is comprised of:

Description A short description of your state and its function.
Type States can have multiple purposes. PASS Directs the input to the output with no transformation on the data. TASK Executes a given task on the data transforming the input. PARALLEL Is used to branch the execution and perform tasks in parallel. Useful for fetching data and merging the result. CHOICE Used to create a decision based on the input provided. FAIL Used to throw an exception or abort the workflow. SUCCESS Used to successfully end the workflow’s execution.

Lastly, one of two keys are usually provided. Either:

End Which if set will cause the workflow to terminate with a successful completion.
Next Which takes a string as its value containing the name of the next state to execute which will pass the output of this state as its input.

There are additional properties that can be set dependent on the state type. However, these are the basics.

Our execution plan

As shown above, the event we receive will either be passing or failing. With our workflow, we’re really only interested in the failing checks so that we can notify our user. For this reason, our first state will have the type CHOICE . In the event, no decision is made the default key will be used to execute the end state, which halts the workflow. Note that end is not a reserved word and needs to be defined as its own state with type SUCCESS.

"check_rule": {
  "default": "end",
  "type": "CHOICE",
  "description": "Check if we have a failing compliance rule",
  "choices": [{
    "variable": "$.changes.passed.latest",
    "booleanEquals": true,
    "description": "Rule is passing so ignore it",
    "next": "passing_rule"
  }, {
    "variable": "$.changes.passed.latest",
    "booleanEquals": false,
    "description": "Rule is failing so lets notify the user",
    "next": "failing_rule"
  }]
}

The CHOICE type lets us execute a state based on a decision we can make on our input. We want to ensure that $.changes.passed.latest is false i.e. failing. In this example, we only really need to define our failing choice.

As if the value was true the default state would be used to continue. However, I wanted to provide the pass case to show how multiple decisions can be described.

Our next state, failing_rule is to simply grab more information about our failed compliance rule. Including remediation steps, the user can take to resolve the issue. We’re also going to need the identity of the owner of the Asset in order to send them a message.

For fetching multiple blocks of information at the same time, the PARALLEL state type is really useful. Each of its execution branches is going to call the Zercurity API and merge the result into our output.

"get_compliance_rule": {
  "resource": "zrn:zercurity:api:compliance:frameworks:get",
  "description": "Get more information about our compliance rule",
  "parameters": {
    "uuid": "$.item.id"
  },
  "mapping": {
    "result": "$.compliance"
  },
  "type": "TASK"
}

The example state above simply calls the Zercurity API with the compliance rule’s UUID . Taken from our input, $.item.id . The result is then merged into our output using the mapping result key $.compliance combined with our original input.

Once we’ve got the results from our internal API calls. We can then use Slack’s API — using the same TASK type to send a message directly to the user. Zercurity also lets you incorporate Slack’s own UI blocks to make this an interactive experience.

https://api.slack.com/tools/block-kit-builder

You can copy and paste the JSON directly into the blocks parameter for your TASK state. Below is a full example of sending a message to a Slack user using the TASK state.

"slack_message_user": {
  "resource": "zrn:integration:slack:chat:postMessage",
  "type": "TASK",
  "description": "Send a message directly to the user",
  "parameters": {
    "blocks": [{
      "text": {
        "text": "Hi {{$.owner.name}}, The compliance rule {{$.compliance.rule}} is *failing*. Please follow these steps to resolve the issue.",
        "type": "mrkdwn"
      },
      "type": "section"
    }, {
      "text": {
        "text": "$.compliance.remediation",
        "type": "mrkdwn"
      },
      "type": "section"
    }, {
      "elements": [{
        "text": {
          "text": "Yes, I've fixed it",
          "type": "plain_text",
          "emoji": false
        },
        "style": "primary",
        "type": "button",
        "value": "yes"
      }, {
        "text": {
          "text": "Remind me later",
          "type": "plain_text",
          "emoji": false
        },
        "style": "danger",
        "type": "button",
        "value": "no"
      }],
      "type": "actions"
    }],
    "channel": "$.slack.channel"
  },
  "events": [{
    "next": "slack_yes",
    "response": {
      "text": "I've fixed it",
      "replace_original": "true"
    },
    "name": "yes"
  }, {
    "next": "slack_no",
    "response": {
      "text": "Ok, I'll check back in tomorrow.",
      "replace_original": "true"
    },
    "name": "no"
  }]
}

Zercurity also provides the ability to hook Slack’s interactive components. If you provide a value to your action this can be mapped to an event within your state. In the above example, we provide two buttons one to request that the test be re-run and the other to remind the user again in 24-hours. This can obviously be customized to your corporate policy.

Putting it all together

Whilst we’ve covered a lot there are multiple states that have been intentionally missed out in the interest of time. A full, working example is available via our app. However, below is a short video of the whole process working end-to-end, telling me to get my firewall turned back on!

A workflow showing the notification via Slack when a compliance rule fails.

I hope that shows off some of the exciting things you can build with Zerucirty’s workflows. To help automate your compliance tasks and not only help educate and encourage end-users to be responsible for their own cybersecurity but also take the strain off of IT. Having to continually monitor and administer machines should they no longer be compliant.

We’re also trying to take the data from these workflows to help provide high-level KPIs to security teams to identify where there might be common threads of re-occurring issues within their policies to better fit the organisation’s cybersecurity objectives.

Thanks!

Managing compliance with Osquery for local & remote workers

Background

Compliance

Osquery

Disk encryption

Firewalls

System updates

Anti-virus definitions

Workflows

Automating compliance

Osquery

Building our workflow

Triggers

States

Our execution plan

Putting it all together

Written by Zercurity