Main Contents »

Copyright © 2020 Ashok P. Nadkarni. All rights reserved.

Windows includes the Event Tracing for Windows (ETW) facility to analyse the sequence and timing of application and operating system events for problem diagnosis and performance analysis. This chapter describes using Tcl to control, read and write traces.

1. Introduction

The devil is in the details.
— Anonymous

Windows performance counters provide a high level view of system performance and resource utilization. However, troubleshooting functional problems or analyzing performance bottlenecks and complex data flows requires a level of detail and an understanding of event sequencing that is not available with performance counters. This capability is provided by the ETW facility which allows for the recording of a sequence of events (a trace) which can be analysed in real time or through post-processing. Both system and individual application performance can be tracked to an extreme level of detail, even across multiple applications and systems.

The process of event tracing involves the event provider, which may be an application or the system itself, writing a event record (or just event) containing a description of the occurence of some activity of interest, to an event trace from where it can be read by an event consumer. Event controllers are programs that control what events get written and to which event trace they are written.

The TWAPI twapi_etw package lets Tcl applications participate in all aspects of this process. You can use it to gather and analyze traces of other applications and the system. You can also use it to add tracing output to the Tcl application itself.

Note The sample code sessions in this chapter are from a Windows 8 system. The event providers used for demonstration purposes may not be present on other Windows versions or may not support the same features.

The sample code assumes the following commands have been executed to load TWAPI.

% package require twapi
→ 4.4.1
% namespace path twapi

2. ETW Concepts

2.1. Events

An event trace is a sequence of event records, each of which is composed of a standard system defined header with well defined fields that are common to all events, such as the time that the event was written. It may also optionally contain custom data that is specific to that event type.

The event header contains the following fields shown in ETW event header fields, though some fields may have empty values if the field is not applicable for the event or is not defined by the provider.

Table 1. ETW event header fields
Field Description

Event id

A 16-bit integer that identifies the event type within the provider.

Version

The 8-bit integer that is the version of the event type. An event’s structure is defined by the provider, the event id and the version.

Timestamp

A 128-bit value which contains the time the event was logged in 100ns units since Jan 1, 1601.

Process Id

The PID of the process that logged the event.

Thread Id

The TID of the thread that logged the event.

SID

The SID of the account under which the logging thread was running.

Kernel time

The elapsed execution time for kernel-mode instructions in 100ns units.

User time

The elapsed execution time for user-mode instructions in 100ns units.

Channel

An integer identifying the channel to which the event should be directed. See the discussion of channels in the Windows Event Log chapter.

Channel name

A human-readable, localized string corresponding to the integer channel identifier.

Level

An integer between 0 and 255 that is used to indicate the severity or importance of the event. Lower values are considered of greater criticality. Event providers and viewers can be configured log or show events based on event levels. Windows predefines levels 0 through 5 as Log Always, Critical, Error, Warning, Informational and Verbose.

Level name

A human-readable, localized string corresponding to each integer level. Windows provides the localized names corresponding to its predefined levels above. Event providers can optionally provide names for other levels that they might use.

Task

An integer that identifies a specific logical component within the event provider.

Task name

A human-readable, localized string corresponding to the integer task field.

Opcode

An integer in the range 0-255 that identifies the operation pertaining to which the event is logged. Providers may define opcodes in the range 11-238. The remaining are predefined by Windows with specific intended semantics. Providers may use these as well when writing events. Some of the Windows-defined opcodes are described in Consuming Event Traces. When a provider uses tasks, it can define per-task opcodes as well. These task level opcodes follow the same range restrictions.

Opcode name

A human-readable, localized string corresponding to the integer opcode. Windows provides the localized strings for the predefined opcode values.

Keywords

A bit mask (32 or 64-bit depending on the Windows version) used for grouping events of a similar type. The semantics of each bit is defined by the provider. Keywords can be used to control which events are written by a provider.

Keyword names

A set of localized strings corresponding to the set bits in the keywords bit mask.

ETW event user data fields

In addition to the above standard fields, an event may optionally contain custom user data, the structure of which is completely upto the writer of the event. For example, the user data fields for a trace event that records process starting may contain fields with the process id and the path of the executable. The fields for a network connection trace event may contain the local and remote addresses and ports.

The format of this user event data need not be published if the event will only be read or consumed by some private application that already knows about that specific structure. However, for the custom fields to be decoded and intelligently interpreted by any arbitrary event consuming application, including the event viewing programs included with Windows, the event structure must be published.

In Windows versions prior to Vista, the structure of the event was defined in the Managed Object Format (MOF) and registered in the Windows Management Instrumentation (WMI) database where event consumers could use it to decode events. Although this method is supported in Vista and later versions of Windows as well, the recommended method is now to define events using an XML manifest which is embedded in event provider executable. The twapi_etw package supports consuming both types of events. However, for compatibility with Windows XP and Windows Server 2003, events are written using the older MOF-based format.

Some ETW events may also be associated with messages which are constructed from a string template specific to the event id in that provider and the data contained in the user data fields for the event.

2.2. Event Traces

Events are written to a event trace which is identified by a name and consists of a set of buffers which store the events and can be read by event consuming applications or written to an associated file for persistent storage. Events are written to one or more traces from where they can be read by any interested parties.

Note The maximum number of traces allowed depends on the Windows version. Exceeding this will result in an error when creating a new trace.

The Windows kernel provides a special preconfigured trace named NT Kernel Logger that receives events from the operating system kernel. There are some special considerations when using this trace as described in The NT Kernel Logger.

2.3. ETW Providers

Event providers are components that write events to a trace. The provider is responsible for the format and content of the written events but has no control over which trace(s) the events are actually sent; that is an ETW controller function.

Controllers send configuration information to providers in the form of enable flags and the enable level. The interpretation of these is entirely to the discretion of the event provider.

By convention, applications treat the enable flags as a bit mask that indicates the class(es) of events to log. For example, each bit might correspond to a specific subsystem.

2.4. ETW Controllers

Event controllers manage the overall process of event tracing including

  • Starting and stopping the an event trace.

  • Attaching event providers to a trace. Multiple event providers may be attached to a single trace. However, an event provider of the older MOF-based variety can only log to one trace at time.

  • Passing control information to the providers as to what events are to be logged.

  • Controlling resources used by the event trace, e.g. the number of allocated buffers for storing events, and the persistent trace file, if any, to which the events are written.

2.5. Event Consumers

The last type of component that plays an role in event tracing is the event consumer. This is any application that reads event records either from a persistent log file or from a real time trace and interprets them as per the published definition from an event provider.

Microsoft itself provides several applications that process and display ETW event traces and are either shipped with the operating system or free to download. Examples are the Windows Event Viewer, the Microsoft Network Monitor and xperf which is part of the Windows Performance Analyzer.

2.6. Event tracing security

3. Controlling Event Traces

This section describes the Tcl commands required to implement an ETW controller.

Windows comes with several programs that can act as ETW controllers. Depending on the Windows versions, these include logman, wevtutil, and the Windows Performance Recorder. Microsoft also provides a free download - Windows Performance Analyzer - which includes xperf. The specific set of programs available depends on the Windows version. Even for the same program, the available options and commands differ between Windows versions. The examples in this chapter that use these programs may have to be adapted for your system.

3.1. Starting a trace

Note The procedure for starting a trace for Windows kernel events differs from what is described here and is described in The NT Kernel Logger.

The only item that absolutely must be specified when starting or creating an event trace is its name. Since a event trace is identified by its name, it must be unique on the system. Like file names, trace names are case-insensitive.

More often than not, you will also want to specify the name of the log file to which events are to be written. This is not strictly necessary since events can also be consumed in real-time without being written to a log file. This is described in Real Time Tracing. In most cases, it is advisable to specify a log file for a couple of reasons. Firstly, it reduces the chance that events are lost because the consumer was not able to read the events fast enough. Secondly, most usage scenarios call for the event traces to be iteratively processed which requires the events to have been persisted to a file.

The command etw_start_trace returns a handle to a new event trace.

% set process_log [file join $::env(TEMP) process.etl]
→ D:/temp/process.etl
% set htrace [etw_start_trace MyTrace -logfile $process_log -maxfilesize 1]
→ 41

By default, the log file is written in circular fashion so that newer events overwrite older events once the specified maximum log file size is reached. This behaviour can be changed through the -filemode option to the etw_start_trace command. For example, the -filemode rotate option will create a new log file once the maximum file size is reached. See the documentation of etw_start_trace for the possible values of -filemode and their effect.

Various other options can be specified when starting a trace. The more commonly used ones are shown in Trace configuration options.

Table 2. Trace configuration options
Option Description

-buffersize BUFFERSIZE

Size in kilobytes of the buffers used in the event trace. This is only an requested size and will be adjusted by ETW.

-clockresolution RESOLUTION

The clock resolution to use when timestamping events in the event trace. Specify RESOLUTION as qpc to use the query performance counter. This is the highest resolution but is relatively expensive. Specify system to use system time. This is lower resolution but is cheaper in run time cost and is the default. Specify cpucycle to use the CPU cycle counter. This has the highest resolution and is the cheapest to obtain. However it can be very inaccurate as it is affected by power and temperature management features of the CPU.

-filemode FILEMODE

The file logging mode which allows specification of several logging mode in addition to those discussed earlier. See the command documentation for details.

-logfile FILEPATH

The name of the log file in which to write events. This option need not be specified if the event trace is only a real time trace.

-maxbuffers MINBUFS

The minimum number of buffers to allocate.

-minbuffers MAXBUFS

The maximum number of buffers to allocate.

-maxfilesize MAXMB

The maximum size of the log file in megabytes.

A new created event trace has no providers attached to it and therefore will not receive any events. For events to be written to the newly started event trace, one or more providers need to added to it. We next proceed to just that.

3.2. Discovering providers

Before adding a provider to a trace, we need to find out which provider it is that should be added. A listing of all providers can be obtained through the etw_get_providers command.

% print_sorted [etw_get_providers]
→ .NET Common Language Runtime
  ACPI Driver Trace Provider
  Active Directory Domain Services: SAM
  Active Directory: Kerberos Client
  Active Directory: NetLogon
...Additional lines omitted...

Alternatively, the list is also available through the one of the previously mentioned Windows programs, for example, logman

d:\src\tcl-on-windows\book> logman query providers
→
  Provider                                 GUID
  -------------------------------------------------------------------------------
  .NET Common Language Runtime             {E13C0D23-CCBC-4E12-931B-D9CC2EEE27E4}
  ACPI Driver Trace Provider               {DAB01D4D-2D48-477D-B1C3-DAAD0CE6F06B}
...Additional lines omitted...

Knowing the name of the provider is not sufficient. When adding a provider to an event trace, a controller also needs to indicate to the provider which events it should log based on the following criteria:

  1. Level - the provider is expected to only log events with severity level less than or equal to this level (note lower values indicate higher severity).

  2. Keywords - only events whose keyword mask has at least one bit in common with the specified keyword mask should be logged. Each keyword bit usually corresponds to a specific component or function of the provider. If the keyword mask is specified as 0, the provider generally assumes a default set of keywords as opposed to assuming that no keywords match.

Note Depending on the Windows version, these criteria may be enforced by Windows or be left to the provider. Event consumers should be prepared to deal with events that were not requested to be logged.

Specifying the level numbers is generally straightforward as levels have the same semantics for all providers. However, the keyword masks and their meaning do not have standard definitions. A provider is free to assign any meaning for each keyword bit so you need to look up them up on a per-provider basis. For older providers that do not use manifests to describe events, this information is sometimes only available through the provider’s documentation. For newer manifest based providers, and providers that come with the system, the easiest way is to use the logman command.

d:\src\tcl-on-windows\book> logman query providers Microsoft-Windows-Kernel-Process
→
  Provider                                 GUID
  -------------------------------------------------------------------------------
  Microsoft-Windows-Kernel-Process         {22FB2CD6-0E7B-422B-A0C7-2FAD1FD0E716}

  Value               Keyword              Description
  -------------------------------------------------------------------------------
  0x0000000000000010  WINEVENT_KEYWORD_PROCESS
  0x0000000000000020  WINEVENT_KEYWORD_THREAD
  0x0000000000000040  WINEVENT_KEYWORD_IMAGE
...Additional lines omitted...

This shows the list of keyword bits defined by the Microsoft-Windows-Kernel-Process provider and a brief description of each. We can guess from the above listing that this provider logs events related to processes, threads and image loading. We will assume only process events are of interest to us.

Note The above example also shows that there is a GUID associated with ETW provider. You can obtain this from Tcl with the etw_get_provider_guid command or with the -detail option to etw_get_providers. However, the provider GUID is rarely needed.

3.3. Adding providers to a trace

We can now go ahead and enable the provider in the trace we created earlier. We will enable messages at informational level and below for process related events (keyword mask 0x10) only.

Readers who have been careful about reading the fine print will question how we have added what appears to be a provider of kernel events to the trace when we had earlier said kernel traces needed to be treated differently. This point is explained in The NT Kernel Logger.

The etw_enable_provider command attaches the provider to the trace. Just to ensure that there are events written to the trace we will start a child process and have it immediately exit.

% etw_enable_provider $htrace Microsoft-Windows-Kernel-Process 0x10 informational 1
% after 100 2
% exec cmd /c  exit
1 0x10 → flag to log process creation
2 Providers are enabled asynchronously so we provide a small delay to ensure we capture the exec trace

Event providers can be added to a trace at any time and multiple event providers can be attached to a trace. You can even change the settings for a previously enabled provider.

For example, we can request for image loading events to be logged as well.

% etw_enable_provider $htrace Microsoft-Windows-Kernel-Process 0x50 informational 1
% exec cmd /c  exit
1 0x50 → flag to log image loading

3.4. Querying the status of a trace

When a trace is running we can query its status from Tcl with etw_query_trace which returns a dictionary containing various pieces of information about the trace.

% array set trace_status [etw_query_trace $htrace]
% print_array trace_status
→ trace_status(age_limit)              = 0
  trace_status(buffer_count)           = 16
  trace_status(buffer_size)            = 64
  trace_status(buffers_written)        = 1
  trace_status(clock_resolution)       = system
  trace_status(enable_flags)           = 0
  trace_status(events_lost)            = 0
  trace_status(flush_timer)            = 0
  trace_status(free_buffers)           = 13
  trace_status(log_buffers_lost)       = 0
  trace_status(logfile)                = D:\temp\process.etl
  trace_status(logfile_mode)           = 4194306
  trace_status(logger_tid)             = 9624
  trace_status(max_buffers)            = 38
  trace_status(max_file_size)          = 1
  trace_status(min_buffers)            = 16
  trace_status(real_time_buffers_lost) = 0
  trace_status(trace_guid)             = {63D30266-0923-11EB-9ACD-ACED5C4BFA28}
  trace_status(trace_name)             = MyTrace

Many of the status fields are self-explanatory and you can refer to the etw_query_trace documentation. Some fields reflect the configuration settings of the trace and are discussed in the next section. Here we draw attention to only those fields pertaining to lost events.

An event trace has a set of associated buffers, buffer_count in number, of a specific size, buffer_size KB, that store events as they are written by providers. As each buffer fills up, it is written out to the persistent log file or provided to a Real Time Tracing consumer. This frees up the buffer to receive more events. If the rate at which buffers can be flushed cannot keep up with the rate at which events are being written to the trace, new events are thrown away. The events_lost counter keeps track of this number. The trace may then have to be configured to increase the number of buffers.

The log_buffers_lost and real_time_buffers_lost fields counts the number of buffers that could not be written to disk or delivered to real-time consumers respectively. Note this is not necessarily due to direct event rate problems; for example, the disk may be inaccessible or the consuming process may have died. Changing buffer configuration in such cases may not help.

The next section describes how to change the configuration of a trace to deal with these issues.

3.5. Configuring a trace

Some aspects of the configuration of an event trace can be modified while it is running. For example, you may want to increase the number of buffers used by the trace if you see events being lost as described in the previous section.

% set new_max [expr {$trace_status(max_buffers) + 1}]
→ 39
% dict get [etw_update_trace $htrace -maxbuffers $new_max] max_buffers
→ 39

The command returns the state of the event trace in the same dictionary format as the evt_query_trace. As you can see, the max_buffers configuration parameter is incremented.

Another configuration setting that is often useful is -logfile which switches the trace to a different log file. See the documentation of etw_update_trace for other configuration options that can be set.

3.6. Removing providers from a trace

Event providers can be removed from a trace at any time with the etw_disable_trace_provider.

% etw_disable_provider $htrace Microsoft-Windows-Kernel-Process

It is good practice to do this before stopping a trace. If an event provider remains enabled when a trace is stopped, behaviour depends on the version of Windows and the specific provider. On newer versions of Windows with manifest based providers, when a trace with the same name is started again, the provider will be automatically be reenabled for that trace. On older systems and with MOF based providers, the exact behaviour depends on the provider implementation and whether it was enabled for other traces in the meantime. It is best to not rely on this.

3.7. Stopping a trace

Stopping a trace is simple.

% etw_stop_trace MyTrace
→ logfile {D:\temp\process.etl} trace_name MyTrace trace_guid {{63D30266-0923-1...

The etw_stop_trace command returns the final state information about the trace.

Notice we supplied the name of event trace to etw_stop_trace, not the trace handle. We could have used the trace handle as well. In fact, most commands demonstrated above will accept either the trace name or the trace handle. Using the trace handle is slightly more efficient but since the commands are infrequently invoked, this is not of much consequence.

Tip Using the event trace name has one useful advantage. It allows you to control event traces that were started by other applications. For example, you can manipulate a trace that was started by xperf.

3.8. The NT Kernel Logger

Windows contains one specific predefined trace named NT Kernel Logger which logs kernel events like disk I/O, context switches etc. This trace has one provider - Windows Kernel Trace - attached to it. Kernel traces differ from the user traces discussed so far in several respects.

  1. There can be only one instance of a kernel event trace and the name has to be NT Kernel Logger. Attempting to create a new kernel trace while one is active will result in an error.

  2. Kernel events can only be logged to this event trace and not to any other user event trace.

  3. Conversely, other providers cannot be added to the trace with etw_enable_provider as was the case with user traces.

  4. Since etw_enable_provider cannot be used with the NT Kernel Logger trace, the tracing parameters, like keywords and level, have to be specified when the trace is started.

  5. Certain logging options, like the rotate value for -logmode cannot be specified.

Note The second point does not strictly hold on newer Windows versions. Listing providers will show ones with names prefixed with Microsoft-Windows-Kernel-, for example Microsoft-Windows-Kernel-Process, which we used in our earlier examples. These providers allow kernel events to be written in user mode event traces as well. However, the other characteristics mentioned about the NT Kernel Logger trace still hold.

Although the etw_start_trace command can be used to start a kernel trace, the easier method is to use the etw_start_kernel_trace wrapper which allows the kernel components of interest to be specified without having to know the appropriate keyword masks.

In the example below, we trace a socket connection using the kernel logger.

% set user_net_log [file join $::env(TEMP) kernel_network.etl]
→ D:/temp/kernel_network.etl
% set kernel_net_log [file join $::env(TEMP) kernel_network.etl]
→ D:/temp/kernel_network.etl
% set ktrace [etw_start_kernel_trace [list tcpip] -logfile $kernel_net_log]
→ 65535
% after 100
% close [socket www.google.com 80]
% after 100
% etw_stop_trace $ktrace
→ logfile {D:\temp\kernel_network.etl} trace_name {NT Kernel Logger} trace_guid...

We will analyze and compare these trace in Consuming Event Traces.

The kernel components that can be traced depend on the Windows version. See the documentation for etw_start_kernel_trace for the allowed values for the various Windows versions. The references Improve Debugging And Performance Tuning With ETW, Core OS Events in Windows 7, Part 1 and Core Instrumentation Events in Windows 7, Part 2 provide additional information about the various types of kernel events and their usage in application analysis.

4. Consuming Event Traces

Event traces written to a log file can be read by an event trace consumer application for display and analysis. Just like for ETW controllers, Windows includes several command line and UI programs for this purpose. Depending on the Windows version, these include tracerpt, xperf and the Windows Event Viewer application.

Although the built-in Windows programs are fairly sophisticated and comprehensive (especially in newer Windows versions), there are times when why programmatically consuming event traces is useful, for example for reacting to events or customized application analysis and correlation.

In this section, we describe the commands for processing events from an ETW trace file from Tcl. In the simplest case, the TWAPI command etw_dump returns a record array that contains parsed events from one or more trace files. For more sophisticated processing or where the event set is very large, lower level operations that provide more flexibility can be used. Both these approaches are described here.

4.1. Reading traces with etw_dump

The etw_dump command returns a record array containing trace records from one or more trace files. The recordarray command can then be used to extract specific records or fields. In the following example, we extract the first record pertaining to process start events as a dictionary and print it.

% print_dict [lindex [recordarray getlist [etw_dump $process_log] -filter \
    {{-taskname eq ProcessStart}} -format dict] 0]
→ -channel        = 16
  -channelname    = Microsoft-Windows-Kernel-Process/Analytic
  -eventguid      = {00000000-0000-0000-0000-000000000000}
  -eventid        = 1
  -kerneltime     = 82031250
  -keywordmask    = 9223372036854775824
  -keywords       = WINEVENT_KEYWORD_PROCESS
  -level          = 4
  -levelname      = Information
  -message        = Process %1 started at time %3 by parent %4 running in sessi...
  -opcode         = 1
  -opcodename     = Start
  -pid            = 21852
  -properties     = ProcessID 24428 ProcessSequenceNumber 21586 CreateTime 2020...
  -providerguid   = {22FB2CD6-0E7B-422B-A0C7-2FAD1FD0E716}
  -providername   = Microsoft-Windows-Kernel-Process
  -sid            =
  -task           = 1
  -taskname       = ProcessStart
  -tid            = 21384
  -timecreated    = 132466947087647778
  -usertime       = 43437500
  -version        = 3

The trace record fields correspond to the fields described in ETW event header fields with the -properties field being a dictionary that holds the custom event data specified to the event type.

Note the message string is actually a string template that includes insert placeholders of the form %N. The message string for an event is formed by replacing the placeholders with appropriate values from the -properties dictionary for the trace record. The etw_format_event_message does this as shown in the sample code below.

% proc print_process_starts {etl_file} {
    set ra [etw_dump $etl_file] 1
    set event_recs [recordarray getlist $ra -filter {
        {-taskname eq ProcessStart}
    } -format dict] 2
    foreach rec $event_recs {
        array set arec $rec
        puts "$arec(-timecreated):  [etw_format_event_message $arec(-message) \
            $arec(-properties)]"
    }
}
% print_process_starts $process_log 3
→ 132466947087647778: Process 24428 started at time 2020-10-09T05:25:08.764Z by
   ↳ parent 21852 running in session 1 with name
   ↳ \Device\HarddiskVolume3\Windows\System32\cmd.exe.
  132466947089103424: Process 19308 started at time 2020-10-09T05:25:08.910Z by
   ↳ parent 21852 running in session 1 with name
   ↳ \Device\HarddiskVolume3\Windows\System32\cmd.exe.
1 Read trace records into a recordarray
2 Get list of ProcessStart records each formatted as a dictionary
3 process_log holds the name of the process trace file we had created earlier

Although the above example only processed one trace file, etw_dump as well as the lower level commands discussed later, will accept multiple trace file arguments. The Windows libraries will merge the records from multiple traces.

Tip This merge facility is useful on older Windows systems where the kernel can only only log to a particular session that is not writable by any other providers. In this case, you can simply have the user mode provider log to a different file and then process the two files as a single event trace source.

4.2. Processing traces using callbacks

The command etw_dump, and the similar etw_dump_to_file, is useful for quick dumps or searches of log files. However, they have the drawback of reading all trace records into memory which may not be desirable if the traces are large and only a subset of the records are of interest. We now look at the lower level programming interfaces which allow consuming of events in smaller chunks using callbacks.

The first step in processing events from an event trace log file is to open it with etw_open_file.

% set kfile [etw_open_file $kernel_net_log]
→ 68

We also need a trace formatter which will translate the raw binary events into a form we can consume.

% set formatter [etw_open_formatter]
→ tdh7

We define a callback that will be invoked for every buffer in the traces. The callback needs a formatter as it will print records of interest. We will explicitly pass this as the first parameter. The other two parameters are appended on every invocation of the callback and contain the trace buffer and the raw event records respectively. The callback will format the raw events using etw_format_events and then iterates over the returned record array using a filter to restrict the records to those related to TCP/IP. The NT Kernel Logger distinguishes between different kernel components through -taskname event record field values like DiskIo or TcpIp. We will filter the record array accordingly and print out the connection information for each event.

Note Even on newer versions of Windows, the NT Kernel Logger is MOF-based and has an empty message field so we leave it out.
% proc etw_cb {fmt bufd raw_event_list} {
    set formatted_events [etw_format_events $fmt $bufd $raw_event_list]
    recordarray iterate arec $formatted_events -filter {{-taskname eq TcpIp}} {
        puts "$arec(-timecreated) $arec(-opcodename) $arec(-properties)"
    }
}

We then do the actual processing of events by invoking etw_process_events. This command will invoke the callback for every buffer in the trace and only return when all the events have been consumed or the callback has terminated the event processing. Note the formatter is explicitly passed to the callback as its first parameter.

% etw_process_events -callback [list etw_cb $formatter] $kfile
→ 132466947092502159 ConnectIPV4 PID 21852 size 0 daddr 172.217.163.36 saddr
   ↳ 192.168.1.3 dport 80 sport 51751 mss 1380 sackopt 1 tsopt 0 wsopt 1 rcvwin
   ↳ 262200 rcvwinscale 8 sndwinscale 8 seqnum 0
  132466947092643327 DisconnectIPV4 PID 21852 size 0 daddr 172.217.163.36 saddr
   ↳ 192.168.1.3 dport 80 sport 51751 seqnum 0

We finally need to close the trace consuming session as well as the formatter to release allocated resources.

% etw_close_session $kfile
% etw_close_formatter $formatter

There are additional options that can be used with the processing of events using callbacks, for example, terminating the processing before all events are processed or limiting the retrieved events to a specific time interval. See the documentation of etw_process_events for details.

5. Real Time Tracing

ETW allows event traces to be consumed in real time in addition to or instead of being logged to a file for later processing. Normally, the buffers holding the trace data are flushed to the log file when they become full or when the session is closed. With real time mode, the buffers are flushed within a specific (system dependent) time even if they are not full. If real-time and logging to file are both enabled, this implies that the log file could grow quickly since buffers are written in fixed size blocks which may be only partially filled.

Warning Real time mode had implementation issues on Windows systems prior to Vista due to which its use was advised only for low volume notification type events.

For real-time tracing,

  • The event trace itself must be configured to be a real-time session, and

  • The event trace consumer must read events directly from the session instead of a log file.

5.1. Setting up a real time session

The mechanics of setting up a real-time trace are almost identical to those described in Starting a trace. The only difference is that the -realtime option needs to be specified. Although not shown in the sample code below, the -logfile option could have been specified as before to log the trace to a file as well.

% set htrace [etw_start_trace MyRTrace -realtime 1]
→ 41
% etw_enable_provider $htrace Microsoft-Windows-Kernel-Process 0x10 informational 1
% after 10 2
1 0x10 → flag to log process creation
2 Providers are enabled asynchronously so we provide a small delay to ensure we capture the exec trace

5.2. Consuming a real time session

Consuming a real time session is also similar to consuming a trace from a log file with the following differences:

  • Because trace events will consumed directly in real time instead of a log file, the command etw_open_session has to be used instead of etw_open_file.

  • The etw_dump command is not suitable for consuming real time traces for obvious reasons. The etw_process_events command must be used instead.

  • In many situations the events need to be consumed in a separate thread.

This last point needs some elaboration. The Win32 function ProcessTrace on which etw_process_events is layered, does not return from the call until the trace is stopped or until the callback return a break status. In the former case, the program is prevented from doing any other work while the trace exists. In the latter case, the program has to periodically call etw_process_events but risks losing events.

Both these strategies are feasible depending on the application involved, rate of events etc. Below we demonstrate a third option - use of a separate thread - which alleviates both potential issues.

We start off by loading the Thread package which allows manipluation of threads at the Tcl script level.

% package require Thread
→ 2.8.5

We then create a thread using thread::create, passing it a script that will do the actual processing of events in real time. The script is identical to that described in Processing traces using callbacks with the exception that since we are dealing with a real time trace, etw_open_session is called instead of etw_open_file. The script collects events in real time and queues them to be retrieved by the main thread. This is merely to demonstrate passing data between threads. In a real application, the created thread could do all required processing itself.

% set consumer_thread [thread::create {
    package require twapi
    namespace path twapi
    proc etw_cb {fmt bufd raw_event_list} {
        set formatted_events [etw_format_events $fmt $bufd $raw_event_list]
        recordarray iterate arec $formatted_events {
            lappend ::events [array get arec] 1
        }
    }
    set formatter [etw_open_formatter]
    set hsess [etw_open_session MyRTrace]
    etw_process_events -callback [list etw_cb $formatter] $hsess
    thread::wait 2
}]
→ tid0000000000000788
1 Do whatever processing is desired for the event here. For our tutorial we simply append to a global variable which will be retrieved by the main thread.
2 We want to wait until the main thread retrieves the event list

Having started a separate thread for consuming trace events, back in the main thread we start a child process so as to trigger a trace event. We use the combination of after and vwait for the process to start and exit.

% after 5000 {set ::done 1}
→ after#6
% exec cmd /c exit &
→ 4588
% vwait ::done

We finish up as described in earlier sections by disabling the trace provider and stopping the trace.

% etw_disable_provider $htrace Microsoft-Windows-Kernel-Process
% etw_stop_trace $htrace
→ logfile {} trace_name MyRTrace trace_guid {{63D30290-0923-11EB-9ACD-ACED5C4BF...

Stopping the trace will also cause the etw_process_events command in the secondary thread to return. Because of the thread::wait at the end of the secondary thread script, the thread will not exit but will continue processing messages from other threads. This allows the main thread to retrieve the collected events and print them.

% print_list [thread::send $consumer_thread {set ::events}]
→ -opcodename Start -eventguid {{00000000-0000-0000-0000-000000000000}} -taskna...
  -opcodename Stop -eventguid {{00000000-0000-0000-0000-000000000000}} -tasknam...

Finally, after retrieving the data we release the secondary thread so it can exit.

% thread::release $consumer_thread
→ 0

6. Providing Events

So far we have described how to implement event trace controllers and consumers. Now we look at implementing an event trace provider. This will allow us to use ETW to analyze the behaviour of our Tcl programs.

Before an event trace provider writes events to an event trace, it has to register itself. Any ETW controller can then attach the provider to a trace session to which events written by the provider will be directed.

TWAPI includes a predefined provider that is registered through the etw_twapi_provider_register call. The etw_log_message command can then be used to write trace events. In addition, TWAPI provides convenience commands that make use of Tcl’s built-in trace command to write ETW traces without having to make changes to the code being analyzed.

Registering a provider permits trace events to be written. However, trace consumers need to have access to event definitions in order to parse event traces. The etw_install_twapi_mof command registers the TWAPI ETW provider event definitions in the system so they can be accessed by consumers.

The next section demonstrates these features.

Note For compatibility with older Windows versions, this provider is MOF-based and can therefore only be attached to a single trace session at a time.

6.1. Tracing Tcl execution with ETW

We now show a simple example of using ETW to analyse program flow when retrieving a URL using the Tcl http package.

We start off by registering ourselves as a ETW provider and starting a ETW trace. This second action could also be done from a different ETW controller program such as logman.

% set hprov [etw_twapi_provider_register]
→ 66430108458154464
% set http_log [file join $::env(TEMP) http.etl]
→ D:/temp/http.etl
% set htrace [twapi::etw_start_trace MyHttpTrace -logfile $http_log]
→ 41

We then load the http package and use the Tcl trace command to have the TWAPI ETW execution trace logger automatically invoked whenever the http::geturl command is executed.

% package require http
→ 2.9.1
% trace add execution ::http::geturl {enter leave} [list \
    ::twapi::etw_execution_tracker $hprov]

As discussed in a previous section we need to add ourselves to the event trace. Because we want to also correlate execution with network events, we will also add the Microsoft-Windows-Kernel-Network provider to the trace.

Tip We use the Microsoft-Windows-Kernel-Network provider instead of the NT kernel logger so that we can add it to the same trace. However, this provider is not available on older Windows versions.
% twapi::etw_enable_provider $htrace [twapi::etw_twapi_provider_guid] -1 verbose
% twapi::etw_enable_provider $htrace Microsoft-Windows-Kernel-Network -1 verbose
% after 10 1
1 Traces are controlled asynchronously so we put in a small delay to make sure they are enabled before we run our code.

We now fetch a URL.

% etw_log_message $htrace "This event just demonstrates use of etw_log_message"
% http::geturl http://www.google.com
→ ::http::1

Assuming no more tracing is to be done, we remove the traces and unregister the provider.

% etw_disable_provider $htrace Microsoft-Windows-Kernel-Network
% etw_disable_provider $htrace [etw_twapi_provider_guid]
% etw_stop_trace $htrace
→ logfile {D:\temp\http.etl} trace_name MyHttpTrace trace_guid {{63D30291-0923-...
% etw_unregister_provider $hprov
% trace remove execution ::http::geturl {enter leave} [list \
    ::twapi::etw_execution_tracker $hprov]

Now that the tracing is done, we will read the trace as described in Consuming Event Traces. We first need to install out event definitions for the trace file to be parsed and then dump the events. Note we filter out events from MSNT_SystemTrace as they are of no interest for our purposes.

% etw_install_twapi_mof
% recordarray iterate arec [etw_dump $http_log] -filter {{-providername ne \
    MSNT_SystemTrace}} {
    puts "$arec(-timecreated): $arec(-providername) $arec(-opcodename) \
        $arec(-properties)"
}
→ 132466947162515550: TwapiETWProvider Message Message {{This event just
   ↳ demonstrates use of etw_log_message}}
  132466947162517646: TwapiETWProvider ExecutionTrace Operation enter Command
   ↳ {{http::geturl http://www.google.com}} Code {{}} Result {{}} Context {{}}
  132466947162690862: Microsoft-Windows-Kernel-Network Connection attempted. PID
   ↳ 21852 size 0 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 mss
   ↳ 1380 sackopt 1 tsopt 0 wsopt 1 rcvwin 262200 rcvwinscale 8 sndwinscale 8 seqnum
   ↳ 0 connid 0
  132466947162974441: Microsoft-Windows-Kernel-Network Data sent. PID 21852 size
   ↳ 189 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 startime
   ↳ 16947179 endtime 16947180 seqnum 0 connid 0
  132466947163693000: Microsoft-Windows-Kernel-Network Protocol copied data on
   ↳ behalf of user. PID 21852 size 5648 daddr 172.217.163.36 saddr 192.168.1.3 dport
   ↳ 80 sport 51752 seqnum 0 connid 0
  132466947163693631: Microsoft-Windows-Kernel-Network Data received. PID 21852
   ↳ size 5648 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
   ↳ connid 0
  132466947163695007: Microsoft-Windows-Kernel-Network Protocol copied data on
   ↳ behalf of user. PID 21852 size 1059 daddr 172.217.163.36 saddr 192.168.1.3 dport
   ↳ 80 sport 51752 seqnum 0 connid 0
  132466947163695238: Microsoft-Windows-Kernel-Network Data received. PID 21852
   ↳ size 1059 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
   ↳ connid 0
  132466947163695579: Microsoft-Windows-Kernel-Network Protocol copied data on
   ↳ behalf of user. PID 21852 size 120 daddr 172.217.163.36 saddr 192.168.1.3 dport
   ↳ 80 sport 51752 seqnum 0 connid 0
  132466947163695772: Microsoft-Windows-Kernel-Network Data received. PID 21852
   ↳ size 120 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
   ↳ connid 0
  132466947163723013: Microsoft-Windows-Kernel-Network Protocol copied data on
   ↳ behalf of user. PID 21852 size 5648 daddr 172.217.163.36 saddr 192.168.1.3 dport
   ↳ 80 sport 51752 seqnum 0 connid 0
  132466947163723756: Microsoft-Windows-Kernel-Network Data received. PID 21852
   ↳ size 5648 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
   ↳ connid 0
  132466947163823975: Microsoft-Windows-Kernel-Network Protocol copied data on
   ↳ behalf of user. PID 21852 size 4236 daddr 172.217.163.36 saddr 192.168.1.3 dport
   ↳ 80 sport 51752 seqnum 0 connid 0
  132466947163824253: Microsoft-Windows-Kernel-Network Data received. PID 21852
   ↳ size 4236 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
   ↳ connid 0
  132466947163825306: Microsoft-Windows-Kernel-Network Data received. PID 21852
   ↳ size 179 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
   ↳ connid 0
  132466947163899066: TwapiETWProvider ExecutionTrace Operation leave Command
   ↳ {{http::geturl http://www.google.com}} Code 0 Result ::http::1 Context {{}}
  132466947164047744: Microsoft-Windows-Kernel-Network Disconnect issued. PID
   ↳ 21852 size 0 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum
   ↳ 0 connid 0
  132466947164295180: Microsoft-Windows-Kernel-Network Data received. PID 10072
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
  132466947164295497: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164296721: Microsoft-Windows-Kernel-Network Data received. PID 10072
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
  132466947164297033: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164323703: Microsoft-Windows-Kernel-Network Data received. PID 10072
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
  132466947164324001: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164324616: Microsoft-Windows-Kernel-Network Data received. PID 10072
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
  132466947164324825: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164325279: Microsoft-Windows-Kernel-Network Data received. PID 10072
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
  132466947164325575: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164326494: Microsoft-Windows-Kernel-Network Data received. PID 10072
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
  132466947164326752: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164361206: Microsoft-Windows-Kernel-Network Data received. PID 12052
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
  132466947164361466: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164362229: Microsoft-Windows-Kernel-Network Data received. PID 12052
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
  132466947164362414: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164362928: Microsoft-Windows-Kernel-Network Data received. PID 12052
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
  132466947164363032: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164364332: Microsoft-Windows-Kernel-Network Data received. PID 12052
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
  132466947164364485: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164372876: Microsoft-Windows-Kernel-Network Data received. PID 10072
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
  132466947164373059: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164374262: Microsoft-Windows-Kernel-Network Data received. PID 10072
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
  132466947164374391: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
   ↳ endtime 16947194 seqnum 0 connid 0
  132466947164559307: Microsoft-Windows-Kernel-Network Data received. PID 12052
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
  132466947164559536: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947196
   ↳ endtime 16947196 seqnum 0 connid 0
  132466947164560206: Microsoft-Windows-Kernel-Network Data received. PID 12052
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
  132466947164560443: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947196
   ↳ endtime 16947196 seqnum 0 connid 0
  132466947164560719: Microsoft-Windows-Kernel-Network Data received. PID 12052
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
  132466947164560817: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947196
   ↳ endtime 16947196 seqnum 0 connid 0
  132466947164561306: Microsoft-Windows-Kernel-Network Data received. PID 12052
   ↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
  132466947164561397: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
   ↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947196
   ↳ endtime 16947196 seqnum 0 connid 0
  132466947164879973: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size
   ↳ 969 daddr 104.244.42.65 saddr 192.168.1.3 dport 443 sport 51348 startime
   ↳ 16947194 endtime 16947199 seqnum 0 connid 0

For purpose of exposition, we have simply printed the event records. For detailed analysis of program flow, you would add enterstep and leavestep to the trace command arguments. The events would then include enough information to build a complete call tree including nested calls, arguments and return values at each level. Inclusion of system trace information, the network traffic in this case, allows inclusion of flow information through the operating system as well.

Warning For all practical purposes, the Tcl trace add execution command disables Tcl’s byte code compilation. The above technique is therefore not suitable for execution time measurements. For that purpose, you need to add the calls to log ETW events, either explicitly to the commands of interest or by redefining those commands to log an event before and after invoking the original command.

7. References

SDKETW

Event Tracing, Windows SDK documentation. User guide and reference for ETW.

PARK2007

Improve Debugging And Performance Tuning With ETW, Park, Buch, MSDN Magazine, April 2007. Overview of the new manifest based ETW architecture, API’s and tools introduced in Vista.

PARK2009

Core OS Events in Windows 7, Part 1, Park, Bendetov, MSDN Magazine, September 2009. Describes the kernel ETW providers in Windows 7 and their use.

PARK2009A

Core Instrumentation Events in Windows 7, Part 2, Park, Bendetov, MSDN Magazine, October 2009. Continuation of prior article with additional kernel providers and usage examples.

SOUL2012

Inside Windows Debugging, Soulami, Microsoft Press, 2012. Part 3 of this book provides detailed coverage of using ETW for application analysis.

TUNS2002

Developing WMI Solutions, Tunstall, Cole, Addison-Wesley, 2002. Chapter 13 details all aspects of the older MOF-based ETW implementation.