-
Notifications
You must be signed in to change notification settings - Fork 20
Comparator
The MAEC Comparator is a python-maec API (formerly implemented as a separate utility), currently implemented in the Bundle module (bundle/bundle.py) that permits for some basic comparisons to be made between two or more MAEC Bundles. Currently this works at the Object-level (including those embedded as Associated Objects in Actions), but we plan on adding support for comparison between other MAEC entities in the near future.
Currently the comparator allows for the following comparisons to be made:
- Find all unique Objects in one or more MAEC Bundles
- Find all common* Objects between two or more MAEC Bundles
*By common, we mean those in different Bundles that are of the same type with certain matching properties (the set on which to match can be specified - see below), while ignoring those that are not relevant for such a comparison (e.g., ids and descriptions). For example, two File Objects in two different Bundles with the same File_Path would be considered common.
For finding common Objects in two more MAEC Bundles, it is important to be able to control the set of properties that one cares to match on, especially when dealing with some of the more complicated Objects. As such, we've created a relatively simple way to define the set of Objects and their respective properties to match on. The syntax is the following Python dictionary:
match_on = {'RootObjectComplexType':['matching_element_name', 'other_matching_element_name']}
That is, for each type of Object one wishes to match on, one must simply specify name of the Object's root complex type (e.g. FileObjectType), along with a list of element names that you want to use in the matching. Note that this list effectively constitutes an AND, as every element specified must match for the Object type for a successful match to be performed.
For example, if you wish to match only on File Objects that have the same File_Name and File_Path, one would create the following dictionary:
match_on = {"FileObjectType": ["file_name", "file_path"]}
Elements that may embedded in other element hierarchies may be specified by writing out the path to the element starting from the root of the object, using '.' as a separator between the different layers of element names. For example, to match on the Path element in an Image_Info section of a Process Object, one would use:
match_on = {"ProcessObjectType": ["image_info.path"]}
Accordingly, for matching on multiple embedded elements in the same path, simply use a '/' between each of the element names. For example, to match on the Path and Command_Line elements in an Image_Info section of a Process Object, one would use:
match_on = {"ProcessObjectType": ["image_info.path/command_line"]}
Finally, in the case of embedded list-based elements, use the '.' notation as above, but do not include the element used to signify a list entry. For example, to match on the Data element in the list of Values contained in a Registry Object, instead of "values.value.data", one would use:
match_on = {"WindowsRegistryKeyObjectType": ["values.data"]}
If one does not wish to specify the Objects and their properties that they wish to match on, the API includes a default dictionary for this purpose, which includes some commonly observed Objects and some of their relevant properties. Currently this dictionary is the following:
match_on = {"FileObjectType": ["file_name", "file_path"],
"WindowsRegistryKeyObjectType": ["hive","key"],
"WindowsMutexObjectType": ["name"],
"SocketObjectType": ["address_value", "port_value"],
"WindowsPipeObjectType": ["name"],
"ProcessObjectType": ["name"]}
The two input parameters to the comparator class are:
- A list of Bundles (specifically, python-maec bundle.Bundle instances) to be compared.
- A dictionary describing the Objects and their elements to match on (as described above). This dictionary is optional; if not specified, the default dictionary described in the previous section will be used.
To instantiate and use the MAEC comparator, simply import the python-maec MAEC Bundle class from the bundle module and call the 'compare' Bundle class method:
from maec.bundle.bundle import Bundle
comparison_results = Bundle.compare(bundle_list, match_on)
This will perform both the unique (for each Bundle in the bundle_list) and common (between all of the Bundles in the bundle_list) comparisons and return a ComparisonResults object, described in the next section.
Calling compare() on the Bundle returns a ComparisonResults object that contains the results of both the unique and intersecting comparisons. Accordingly, this object has two methods:
-
get_common()
: Returns a list of the Objects common to all Bundles. Each common Object is captured in a dictionary with the following structure, which captures the matching properties of the Object, along with the instance(s) (since there is the possibility that more than one Object may match) of the Object found in each Bundle. The instances are represented as a separate dictionary, with the keys representing the IDs of the Bundles, and values representing the IDs of the matching Objects:
common_objects = [{"object" : "<matching object properties>",
"object_instances" : {"bundle_id_1" : ["matching_object_id_1", "matching_object_id_n"...]
"bundle_id_n : ...}}]
-
get_unique()
: Returns a dictionary of the Objects unique to each Bundle. The dictionary has the following structure, with the keys representing the IDs of the Bundles, and values representing the IDs of the Objects unique to their respective Bundles:
unique_objects = {"bundle_id_1" : ["unique_object_id_1", "unique_object_id_n"...]
"bundle_id_n : ...}}
The following is a sample code snippet that reads in two MAEC Bundle instances (assumed to be XML files on disk), performs the comparison between them, and then prints out the common and unique Objects found.