Capturing Osquery query results with AWS Firehose (Kinesis) and AWS Athena

7 min readMar 1, 2021

Osquery provides inbuilt support for multiple external logging capabilities one of those is AWS Firehose via Kinesis.

AWS Kinesis in their own words. Is a a real-time, fully managed, and scalable platform for streaming data on Amazon Web Services. AWS Firehose, however, is a service to load streaming data into an external service such as AWS S3, Redshift and Elastic Search.

Why?

Whilst Zercurity captures and presents the data we query across your many assets. You may want to log whats going on in its entirety. Capturing exactly what going on in Zercurity a your own queries. For processing in external systems such as Elastic Search or AWS S3.

How?

Getting Osquery to log to another external source is super simple. The --logger_plugin can be used to specify additional endpoints split by a comma delimiter.

By default Osquery uses the filesystem plugin to write logs to disk. Whilst Zercurity also makes use of the tls plugin for logging errors back to the Zercurity platform. For our setup we’ll be using the following additional parameters:

--aws_access_key_id "ACCESS_KEY"
--aws_secret_access_key "SECRET_KEY"
--aws_region "us-east-1"
--aws_kinesis_stream "osquery"
--aws_firehose_stream "osquery"

Configuring AWS Kinesis

However, before all that. We need to first setup and configure our AWS Firehose to receive and process the incoming data.

AWS S3

We’re going to use S3 as the destination for our Firehose as we’ll then be able to use AWS Athena to query our results.

From the AWS console navigate to the S3 Management console and create a new bucket. It doesn’t need to have any special permissions or configuration. Just make sure its not public.

Creating our S3 bucket for our Osquery Firehose logs.

Kinesis

From the AWS console navigate to the Kinesis dashboard and click “Create delivery stream”. From here name your new delivery stream and ensure that the Source is set to “Direct PUT or other sources”. As the Osquery agent will be directly interfacing with the AWS API.

Creating our Osquery Firehose for “Direct PUT or other sources”

Click next for “Process records” as we’re just going to dump the raw output straight into S3. Here you can call other AWS services such as Lambda to clean up and enrich the records before they reach their final destination.

Choose “S3” as our destination and select the name of the Bucket you configured earlier.

Choosing our S3 bucket for our AWS Kinesis Firehose

For the final stage you can leave everything as is. Letting AWS create a new IAM role for the new delivery stream and S3 bucket.

Create our IAM role for our Kinesis delivery stream.

Creating our Kinesis IAM policy

For our Osquery agent we’ll need to create a programmatic user account in order for the Osquery agent to send data to our Firehose. The first step is to setup a new IAM policy for which we’re going apply the required permissions to push data into our AWS Kinesis Firehose.

You’ll need just require the one permission PutRecordBatch. Also ensure the Resource is the same ARN shown in your delivery stream. Which you can find by clicking on it from the main list view. Don’t use any wildcards.

Creating our programmatic user

Lastly, we need to create our new programmatic user (which can be done via the Add user wizard).

Attach our Osquery policy with the Firehose service and PutBatchRecord action.

Then lastly, download and store the provided AWS keys. These will be the keys we’ll provide to Osquery to communicate with the AWS Firehose API.

Downloading our API keys for Osquery to access the AWS Firehose API

Configuring Osquery to send data to AWS Kinesis Firehose

The Zercurity configuration file currently uses both the filesystem and tls plugins as its logger. We’re going to add the AWS Firehose to this as well.

However, if you have the tls plugin configured (which Zercurity uses). Osquery will be passed a set of Certificates provided by the user config--tls_server_certs. This flag currently override all certificate key chains even outside of the TLS scope. The key chain for Zercurity resides in the following directories.

/usr/local/zercurity/zercurity.pem  # Mac OSX
/opt/zercurity/zercurity.pem  # Linux
C:\Program Data\zercurity\zercurity.pem  # Windows

NOTE: By passing Osquery your own certificate key chain. It also overrides that of the AWS Firehose plugin as well. This configuration issue can be seen in the following tickets: 6964 and 3437.

If the certificates are incorrectly configured you’ll encounter the following error:

Exception making HTTP request to URL (https://firehose.eu-central-1.amazonaws.com): certificate verify failed

To correctly install the required certificates you’ll need to provide the entire certificate chain AWS uses for its Firehose. You can download these certificates using openssl and append them to the Zercurity key chain file.

openssl s_client -showcerts -verify 5 -connect firehose.eu-central-1.amazonaws.com:443

Or if you’re on Linux and have ca-certificates installed. You can always use that instead as your trusted certificate key chain. This is a massive catch-all but saves time done line if you’re doing a lot of external TLS interactions. For example using the curl table.

cat /etc/ssl/certs/ca-certificates.crt

Updating your Osquery configuration

To test things locally before deploying the new configuration changes to your wider fleet you’ll need to update your Osquery flags file which can be found in the following paths:

/Library/LaunchDaemons/com.zercurity.osqueryd.plist  # Mac OSX
/etc/osquery/osquery.flags  # Linux
C:\Program Data\zercurity\osquery\osquery.flags  # Windows

To this flags file you’ll need add the following parameters:

--aws_access_key_id "ACCESS_KEY"
--aws_secret_access_key "SECRET_KEY"
--aws_region "us-east-1"
--aws_firehose_stream "osquery"

And then update the --logger_plugin flag to include the Firehose flag like so:--logger_plugin=filesystem,tls,aws_firehose . For Mac OSX you’ll need to follow the XML plist format.

Once done you can restart the Osquery service.

# For Mac OSX run the following from the command line to restart the Osquery servicesudo launchctl unload \
/Library/LaunchDaemons/com.zercurity.osqueryd.plist
sudo launchctl load \
/Library/LaunchDaemons/com.zercurity.osqueryd.plist# For Linux. Depending on your distribution you can run one of the following:sudo systemctl restart osqueryd
sudo /etc/init.d/osqueryd restart# Windows Start->Run. Launch services.msc and then restart the Osquery serviceservices.msc

Once restarted after a few minutes you’ll see your S3 bucket begin to fill with the Osquery results.

These files will contain the results of the queries being run across your fleet (the example below contains a socket network event form the sockets table) and will look something like this:

{
  "name":"f64af85f-a05e-4601-98cd-6c9a8f35feec",
  "hostIdentifier":"MacBook-Pro.local",
  "calendarTime":"Sun Feb 28 09:40:01 2021 UTC",
  "unixTime":"1614505201",...
 
 "columns: {
   "action":"CONNECT",
   "family":"2",
   "local_address":"192.168.15.249",
   "local_port":"62531",
   "path":"\/usr\/bin\/ssh",
   "pid":"78775",
   "protocol":"6",
   "remote_address":"192.168.31.248",
   "remote_port":"22",
   "timestamp":"1614456741"
  },
  "action":"added",
  "log_type":"result"
}

AWS S3 Athena

Now that we’ve got all our wonderful data being dumped in our S3 bucket we can configure AWS Athena to let us query this data stored within our S3 bucket.

Given the unstructured mishmash of data. We’re going to use AWS Glue to crawl our S3 repository to build a schema for us.

From the data sources tab choose “Connect data source” and then choose “AWS Glue data catalog”.

If you’re doing this for a specific query or you’ve pre-filtered the results being dumped into your S3 bucket you can go ahead and choose “Add a table and enter schema information manually”.

However, given we’ve just got a Firehose of data coming in choose the pre-selected “Set up crawler”.

From here follow the steps, making sure to select that all sub-folders are crawled. As the buckets created are created by date down to the hour, plenty of sub-folders. If this is a large set then you may only want to crawl the new folders or a certain directory prefix. To cut down on the crawl time.

You’ll have to let AWS define yet another IAM role to be created in order for the crawler to access your S3 bucket. We’re also going to set that this bucket be crawled on-demand for now.

Giving the AWS Glue crawler access to our S3 bucket containing our Osquery result data

Once the crawler has been created. Select the Osquery S3 crawler and click “Run crawler” from the menu above. This will take a few minutes depending on the bucket size.

Running our new on-demand AWS Glue crawler

Once complete you’ll now see your AWS Athena data sources and databases populated with your newly crawled tables.

The data we’re interested in the most is confined to the nested columns table. This is where one of the pre-processors during the Firehose ingest could be configured to either flatten the data with an AWS Lambda function. Or break the data out depending on the executing query.

For now, however, we can construct a simple query to just return our ingested socket data. Note the nested columns table used in order to access the underlying query result.

Querying Osquery sockets data with AWS Athena

Its all over!

We hope you found this helpful. Getting AWS Firehose and AWS Athena working alongside Zercurity. Please feel free to get in touch if you have any questions.