Data Dictionary Authoring Guide

This guide details the process of data dictionary authoring, by describing the structure and organization of events.

Lets recap what a data dictionary event is, before going into the nitty-gritty details.


Data dictionaries are atomic structures of base events emitted by log sources on a platform. These structures contain the definition of a base event and its fields.

The event definition enables the entity to be fully functional by itself. It describes the title, code, platform, log source, and other meta-data that provides context. The event definition plays a major role on how the event is consumed.

The event fields contain the list of fields available. Each field have properties that provide context about the field, and ultimately enable it to be correlated with other OSSEM data like the Common Data Model and Detection Data Model.

OSSEM data dictionaries are structured to be as lean as possible, the reason is twofold: not only to avoid redundant information between events, but also to promote the adoption of external references (pointers).

OSSEM data dictionaries are represented in YAML. Again, the goal is to find the best balance between human readability, and ease of automation, hence YAML was a relatively easy pick when choosing an OSSEM data language.


The event definion fields are:

  • Title: the event title if any, otherwise use the event code

  • Description: the description of the event

  • Platform: the platform where the log source is hosted

  • Log source: the log source that generates the event

  • Event code: the code or ID of the event

  • Reference list: text and link for external references relevant to the event

  • Tag list: tags applicable to the event

For every field in the event, the properties are:

  • Standard name: the standard name assigned to the field name, if applicable

  • Standard type: the standard type assigned to the field type, if applicable

  • Name: the field name as per vendor documentation

  • Type: the field type as per vendor documentation

  • Description: the field description

  • Sample value: the field sample value, if applicable

An example of an Windows Security Auditing Event 4616 follows:

title: 'Event ID 4616: The system time was changed.'
description: This event generates every time system time was changed.
platform: windows
log_source: Microsoft-Windows-Security-Auditing
event_code: '4616'
- standard_name: user_sid
  standard_type: TBD
  name: SubjectUserSid
  type: SID
  description: SID of account that requested the "change system time" operation.
  sample_value: S-1-5-21-3457937927-2839227994-823803824-1104
- standard_name: user_name
  standard_type: TBD
  name: SubjectUserName
  type: UnicodeString
  description: the name of the account that requested the "change system time" operation.
  sample_value: dadmin
- standard_name: user_domain
  standard_type: TBD
  name: SubjectDomainName
  type: UnicodeString
  description: subject's domain or computer name.
  sample_value: CONTOSO
- standard_name: user_logon_id
  standard_type: TBD
  name: SubjectLogonId
  type: HexInt64
  description: 'hexadecimal value that can help you correlate this event with recent
    events that might contain the same Logon ID, for example, "4624: An account was
    successfully logged on".'
  sample_value: '0x48f29'
- standard_name: TBD
  standard_type: TBD
  name: PreviousTime
  type: FILETIME
  description: previous time in UTC time zone.
  sample_value: '2015-10-09T05:04:30.000941900Z'
- standard_name: TBD
  standard_type: TBD
  name: NewTime
  type: FILETIME
  description: new time that was set in UTC time zone.
  sample_value: '2015-10-09T05:04:30.000000000Z'
- standard_name: process_id
  standard_type: TBD
  name: ProcessId
  type: Pointer
  description: hexadecimal Process ID of the process that changed the system time.
    Process ID (PID) is a number used by the operating system to uniquely identify
    an active process.
  sample_value: '0x1074'
- standard_name: process_path
  standard_type: TBD
  name: ProcessName
  type: UnicodeString
  description: full path and the name of the executable for the process.
  sample_value: C:\Windows\WinSxS\amd64_microsoft-windows-com-surrogate-core_31bf3856ad364e35_6.3.9600.16384_none_25a8f00faa8f185c\dllhost.exe
- text: MS SOURCE
- text: MS Security Auditing Category - System
- text: MS Security Auditing Sub-category - Audit Security State Change
- etw_level_Informational
- etw_task_task_0
- version_1
- System
- Audit Security State Change


The standard_name and standard_type are special properties of event fields, as they represent the first layer of data standardization on the event.

In the example above (event 4616), the SubjectUserSid name was translated to user_logon_id standard name. This translation ensures the data dictionary is aligned with the Common Data Model (CDM) User schema, reduces complexity, and enhances the development of detection analytics.

The standard type is still work in progress, thus the to-be-defined default, but it aims at defining standard field types like ‘boolean’, ‘list’, ‘json’, etc. The goal is to provide guidance to anyone post-processing/enriching base events.

Note that its not mandatory that you define a standard name for every field on your event. Some good practices when defining standard names include:

  • Search for the field name in other OSSEM events. Its not uncommon that you can apply the same standard name to identical field names, specially if the log source is the same.

  • Check if the standard name already exists in one of the Common Data Model entities schema.


OSSEM built-in data dictionaries are primarily organized in a file system folder structure, that ensures the grouping according to data dictionaries characteristics. While there is no limit to the folder depth, the root folder and log sources must follow a predefined structure.

Data dictionaries are located in /source/data_dictionaries, the root folder. The first level of organization is by platform.

An example of a build-in Sysmon data dictionary follows:

├── data_dictionaries   <--------- root folder
│   ├── README.yml
│   ├── windows         <--------- platform (operating system/sensor folder)
│   │   ├── README.yml
│   │   ├── sysmon      <--------- log source folder
│   │   │   ├── README.yml
│   │   │   └── events  <--------- events folder
│   │   │       ├── event-1.yml
│   │   │       ├── event-7.yml
│   │   │       ├── event-8.yml <- data dictionary entry

Each platform folder contain sub-folder for the log source, which in turn always contain an events/folder where the events are stored.

Since the platform and log source properties are already defined in the event, is fairly straightforward to figure out where to store your events.

Because ensuring the consistency of this folder structure can be tricky, specially when dealing with dozens of log sources, OSSEM provides README files that provide additional information about the current folder. These files are particularly helpful when converting OSSEM to markdown, where they are used as indexes.


Similarly to data dictionaries, README files are also defined in YAML, contain the following properties:

  • title: for example the log source title

  • description: for example the log source description

  • images: text and path to images relevant to the readme

  • references: text and link for external references relevant to the readme

An example of a README follows:

title: Sysmon Event Logs
description: System Monitor (Sysmon) is a Windows system service and device 
  driver that, once installed on a system, remains resident across system 
  reboots to monitor and log system activity to the Windows event log. It 
  provides detailed information about process creations, network connections, 
  and changes to file creation time. By collecting the events it generates 
  using Windows Event Collection or SIEM agents and subsequently analyzing 
  them, you can identify malicious or anomalous activity and understand how 
  intruders and malware operate on your network.
- title: Data model
  source: /resources/images/SysmonDataModel.png
- text: Sysmon Source
- text: TrustedSec Sysinternals Sysmon Community Guide

Go to top