File retrieval with Osquery using carves on Zercurity

4 min readNov 8, 2020

With remote server support in Osquery you can remotely grab files from systems using the carves table within Osquery as simply as:

path LIKE '/Users/tim/Downloads/%'
AND carve = 1

The query above will fetch all the files residing within the users home directory. The wildcard% can be used anywhere within the directory path. Carving is supported across Windows, Mac OSX and Linux.

Osquery will round up all the files that match the path and bundle them up into an archive as either a .tar or .zst and uploaded back to the remote server.


Zercurity is free to deploy locally and free to test online. The GIF below shows you a quick query to lift any and all files found within any Users download fonder on Mac OSX.

Downloading files from a users Download folder using Osquery Carves on Zercurity.

Extracting the archives

To open your carved archive you’ll need to use either untar or tar to extract the archive.

untar bundle.tar  # extract TAR archive
tar -I zstd -xvf bundle.zst # extract ZST archive

If you get the following error: tar (child): zstd: Cannot exec: No such file or directory . You’ll need to install Facebook’s zstd package.

sudo apt install zstd

Osquery carver remote server settings

In order to get carving working you will require a remote Osquery server. You can download an example server here. Or you can spin up Zercurity using docker-compose here (which is configured with carving enabled by default).


Alongside these flags for letting the Osquery agent know about the remote server. The flags highlighted in bold are required. However, for ad-hoc carves you’ll need to configure a distributed TLS endpoint.


Osquery carver server

There are two important resources that are needed in order for the client to upload data back to server.

The /start resource

The start resource is initially hit by the client to retrieve a session token and let the server know how many blocks will be uploaded the server. As the archive is chunked up into blocks as specified by the carver_block_size. The requesting payload is as follows:

"block_count": 17,
"block_size": 5120000,
"carve_size": 878610143,
"carve_id":" f47cfd64-4750-449b-a78d-2b631f317ae4",
"request_id": "5028338a-2461-41c6-b6e3-1414ec78b208",
"node_key": "hFTdv8F...y39YkKaEC"

The block_count is the number of blocks the archive has been divided up into. During the /upload resource being called the client will provide a block_id to indicate the current block being uploaded. Once all 17 blocks have been uploaded they can be stitched together to form the final file.

The block_size is the clients carver_block_size as configured by the remote server of local config file.

The carve_size is the total payload size which can be checked against the final size of stitched together archive.

The carve_id is the UUID given to the current carve job. This is not supplied in the /upload requests.

The request_id is name or identifier of the query that kicked off the carve job. Zercurity uses UUIDs for each query. However, this may just the the query name depending on how your remote server is configured.

Note: There is currently a bug where the request_id may be incorrect and the id of another job currently in-flight is used instead. Please see the GitHub issue for more information.

Lastly, the node_key is the shared secret used to authenticate the client with the remote server. This is also not provided during the subsequent /upload requests.

The client expects a session_id . This should be a unique token to allow the server to both authenticate the client and identify the current carve job for the subsequent /upload requests. Using a HMAC token with a payload is a useful way of passing data to the client that can be relayed back to the server to help identify the current carve.

"session_id": "qhKvSKME...gxEEw37Z"

The /upload resource

Once the /start request has completed and a session_id returned. The client will start to upload the necessary blocks. The payload for each request will be as follows:

"block_id": 4,
"session_id": "qhKvSKME...gxEEw37Z"
"request_id": "5028338a-2461-41c6-b6e3-1414ec78b208"
"data": "aDZYWmV3dUJ0NW..FaczlmaGYZlgyVUQ="

Given the cave task block_id is the current segment being uploaded to the server. This will form part of the total blocks as per the block_size .

The session_id is the token provided to the client in the response during the /start request.

As before the request_id is the identifier of the requesting cave.

Lastly, the data field houses a base64 encoded value of the current segment. This needs to be decoded and then either stored to disk or in memory

Finishing up

The Osquery client will not post a final request to let the server know it has finished uploading all the blocks. Instead the server needs to keep track of all the blocks uploaded. Once all the blocks have been received. Matching the final block_count . All the uploaded blocks can be stitched back together to form the final archive.

The last step is to work out the archive type. Which will be either a .tar or .zst archive. By checking the first 4 bytes of the file for the following magic bytes: \x28\xB5\x2F\xFD .

Below is some example python code for pulling all the blocks together and working out the archive file format:

file_ext = 'tar'
file_size = 0
fd, path = tempfile.mkstemp()

f = open(fd, 'wb')
for block_id in range(0, carve.block_count):
raw_data = get_block(carve.uuid, block_id)
file_size += len(raw_data)
if block_id == 0 and raw_data[0:4] == b'\x28\xB5\x2F\xFD':
file_ext = 'zst'
except Exception as e:
raise e
return path, file_ext, file_size

Feel free to reach out directly if you have any questions.