How to integrate EQL into your tooling
At DerbyCon I had a conversation with Ross Wolf (@rw_access) from EndGame about the capabilities of EQL (Event Query Language) and how to integrate it in other tools. The purpose of this blog is to share my gained knowledge in that area and Python code to help others to integrate EQL within their tools.
What is EQL?
See the citation below for a quick introduction to EQL. For more details and examples see the blogs on Endgame’s website [1, 2, 3].
EQL is a language that can match events, generate sequences, stack data, build aggregations, and perform analysis. EQL is schemaless and supports multiple database backends. It supports field lookups, boolean logic, comparisons, wildcard matching, and function calls. EQL also has a preprocessor that can perform parse and translation time evaluation, allowing for easily sharable components between queries.
Source: Endgame
Starting end of 2018, EndGame made EQL open source, and therefore you can start using it within your own tools or extending the capabilities of EQL. I think it is truly great to see this being shared with the community! I also like the fact it is written in Python: a very popular language for security tools.
When we wanted to have the capability within DeTT&CT (written in Python) to include or exclude certain objects (e.g. to filter or highlight detections for ATT&CK techniques with a low score), I was happy that I did not have to write the code myself to achieve this, but instead rely mainly on EQL doing that for me in a much better way.
integrate into your code
Now, let’s explain how to integrate EQL in your Python tool to query your data. Before you continue, make sure to have the EQL Python libraries available on your system. This can easily be done via pip: pip install eql
EQL: Not only for JSON and security events
EQL is used mainly for security events stored in the JSON format, but EQL can also be used for other purposes and data formats. As long as you provide the data as a list of Python dictionaries and have the key-value pairs be compatible with the JSON data format.
Within DeTT&CT we make use of several YAML files to store data. Luckily in Python, YAML is also presented as a list of dictionaries. To have these YAML files compatible with the JSON specs, I had to make sure to serialise any date key-value pairs of the type ‘Datetime.date’ (after loading the YAML file into a list of dictionaries) to a Python string:
Create EQL events
The first step (or second when you first had to make the data compatible with the JSON specs) is to create EQL Event objects, which are used in EQL to perform data analytics on. Although this is not a mandatory step, it does provide you with more control over how the Events are created.
Set the value for the ‘event_type’ to be used within your search queries:
Having a good value will make it very clear on what kind of data you are performing your search query:
For example: “web_proxy where …” instead of “generic where …”
When not creating the EQL Events yourself, EQL will try to derive the ‘event_type’ based on any field within your data named ‘event_type’ or ‘event_type_full’.
If it does not result in a value for the ‘event_type’, the default value ‘generic’ will be used.
Similar to the above, you can specify which data field will be used as the timestamp for your events. Otherwise, it tries to derive the timestamp based on a data field named ‘timestamp’.
Start with creating a list of EQL events from a list of Python dictionaries containing the data you want to query. The below function ‘_create_events’ does precisely that. When you are for example dealing with YAML as an input, the function ‘_serialize_date’ will serialise every ‘Datetime.date’ key-value pair to a String before creating the EQL Events:
Learn the schema
The next step is to learn the schema of the data. In the code below the variable ‘events’ is the list of EQL Event objects created previously by the function ‘_create_events’:
Learning the schema is not a necessary step to execute EQL search queries. But it does give you the capability to print a detailed error message when making syntax mistakes in your query.
See the picture for an example of an error message, which nicely highlights that ‘pi’ is an unrecognised field. As shown in the schema, ‘pid’ should be used instead.
The EQL library does not print this error message and the schema for you. Fortunately, this only requires a few lines of code:
Create the EQL engine and Execute the query
After you have created a list of EQL Events from your data and optionally learned the schema, you can continue with creating the EQL Python engine to execute the query. The function below will return a list of dictionaries containing the Event data that match your query, which you can use within your own tooling to do whatever you want.
11: create a Python list to store the results of the query.
14-16: in my code, I choose to make use of an inner function ‘store_result’ to be used as a callback for the EQL engine. This function will be called after the query has finished and stores the result in the list ’query_result’.
18: create the EQL Python engine.
21-22: parse the query using the function: ‘parse_query’.
Because the parameter ‘implied_any’ is set to ‘True’, it will give the possibility to shorten queries by removing the ‘event_type’ and the ‘where’ clause. For example: “process where pid == 424” and “pid == 424” are both valid and return the same result.
23: add the query to the engine.
24-28: the query syntax error handling as discussed earlier.
29: add the callback function ‘store_result’ to the engine using ‘add_output_hook’.
32: execute the query by calling the function ‘stream_events’ from the engine. When finished, it will call the ‘store_result’ callback function to save the result in the list ’query_result’.
That is all to start using EQL within your tools.
The complete code example
All Python code for integrating EQL in your tool(s) can be downloaded from my GitHub account: github.com/marcusbakker/EQL. This code also includes functionality that was not shared above. Such as code that performs two example search queries on the files: example.json and data-sources-endpoints.yaml.
The code requires Python version 3 and was tested with v0.7 of EQL and v5.1.2 PyYAML. When run successfully it should produce the following output: