Copyright © 2020 Ashok P. Nadkarni. All rights reserved.
Windows includes the Event Tracing for Windows (ETW) facility to analyse the sequence and timing of application and operating system events for problem diagnosis and performance analysis. This chapter describes using Tcl to control, read and write traces.
1. Introduction
The devil is in the details.
Windows performance counters provide a high level view of system performance and resource utilization. However, troubleshooting functional problems or analyzing performance bottlenecks and complex data flows requires a level of detail and an understanding of event sequencing that is not available with performance counters. This capability is provided by the ETW facility which allows for the recording of a sequence of events (a trace) which can be analysed in real time or through post-processing. Both system and individual application performance can be tracked to an extreme level of detail, even across multiple applications and systems.
The process of event tracing involves the event provider, which may be an application or the system itself, writing a event record (or just event) containing a description of the occurence of some activity of interest, to an event trace from where it can be read by an event consumer. Event controllers are programs that control what events get written and to which event trace they are written.
The TWAPI twapi_etw
package lets Tcl applications participate
in all aspects of this process. You can use it to gather
and analyze traces of other applications and the system. You can
also use it to add tracing output to the Tcl application itself.
The sample code sessions in this chapter are from a Windows 8 system. The event providers used for demonstration purposes may not be present on other Windows versions or may not support the same features. |
The sample code assumes the following commands have been executed to load TWAPI.
% package require twapi
→ 4.4.1
% namespace path twapi
2. ETW Concepts
2.1. Events
An event trace is a sequence of event records, each of which is composed of a standard system defined header with well defined fields that are common to all events, such as the time that the event was written. It may also optionally contain custom data that is specific to that event type.
The event header contains the following fields shown in ETW event header fields, though some fields may have empty values if the field is not applicable for the event or is not defined by the provider.
Field | Description |
---|---|
Event id |
A 16-bit integer that identifies the event type within the provider. |
Version |
The 8-bit integer that is the version of the event type. An event’s structure is defined by the provider, the event id and the version. |
Timestamp |
A 128-bit value which contains the time the event was logged in 100ns units since Jan 1, 1601. |
Process Id |
The PID of the process that logged the event. |
Thread Id |
The TID of the thread that logged the event. |
SID |
The SID of the account under which the logging thread was running. |
Kernel time |
The elapsed execution time for kernel-mode instructions in 100ns units. |
User time |
The elapsed execution time for user-mode instructions in 100ns units. |
Channel |
An integer identifying the channel to which the event should be directed. See the discussion of channels in the Windows Event Log chapter. |
Channel name |
A human-readable, localized string corresponding to the integer channel identifier. |
An integer between 0 and 255 that is used to
indicate the severity or importance of the event. Lower values
are considered of greater criticality. Event providers
and viewers can be configured log or show events based on event
levels. Windows predefines levels 0 through 5 as |
|
Level name |
A human-readable, localized string corresponding to each integer level. Windows provides the localized names corresponding to its predefined levels above. Event providers can optionally provide names for other levels that they might use. |
Task |
An integer that identifies a specific logical component within the event provider. |
Task name |
A human-readable, localized string corresponding to the integer task field. |
Opcode |
An integer in the range 0-255 that identifies the operation pertaining to which the event is logged. Providers may define opcodes in the range 11-238. The remaining are predefined by Windows with specific intended semantics. Providers may use these as well when writing events. Some of the Windows-defined opcodes are described in Consuming Event Traces. When a provider uses tasks, it can define per-task opcodes as well. These task level opcodes follow the same range restrictions. |
Opcode name |
A human-readable, localized string corresponding to the integer opcode. Windows provides the localized strings for the predefined opcode values. |
A bit mask (32 or 64-bit depending on the Windows version) used for grouping events of a similar type. The semantics of each bit is defined by the provider. Keywords can be used to control which events are written by a provider. |
|
Keyword names |
A set of localized strings corresponding to the set bits in the keywords bit mask. |
In addition to the above standard fields, an event may optionally contain custom user data, the structure of which is completely upto the writer of the event. For example, the user data fields for a trace event that records process starting may contain fields with the process id and the path of the executable. The fields for a network connection trace event may contain the local and remote addresses and ports.
The format of this user event data need not be published if the event will only be read or consumed by some private application that already knows about that specific structure. However, for the custom fields to be decoded and intelligently interpreted by any arbitrary event consuming application, including the event viewing programs included with Windows, the event structure must be published.
In Windows versions prior to Vista, the structure of the event
was defined in the Managed Object Format (MOF) and registered
in the Windows Management Instrumentation (WMI) database
where event consumers could use it to decode events.
Although this method is supported in Vista and later versions
of Windows as well, the recommended method is now to
define events using an XML manifest which is embedded in
event provider executable. The twapi_etw
package supports
consuming both types of events. However, for compatibility
with Windows XP and Windows Server 2003, events are written
using the older MOF-based format.
Some ETW events may also be associated with messages which are constructed from a string template specific to the event id in that provider and the data contained in the user data fields for the event.
2.2. Event Traces
Events are written to a event trace which is identified by a name and consists of a set of buffers which store the events and can be read by event consuming applications or written to an associated file for persistent storage. Events are written to one or more traces from where they can be read by any interested parties.
The maximum number of traces allowed depends on the Windows version. Exceeding this will result in an error when creating a new trace. |
The Windows kernel provides a special preconfigured trace named
NT Kernel Logger
that receives events from the operating system
kernel. There are some special considerations when using this
trace as described in The NT Kernel Logger.
2.3. ETW Providers
Event providers are components that write events to a trace. The provider is responsible for the format and content of the written events but has no control over which trace(s) the events are actually sent; that is an ETW controller function.
Controllers send configuration information to providers in the form of enable flags and the enable level. The interpretation of these is entirely to the discretion of the event provider.
By convention, applications treat the enable flags as a bit mask that indicates the class(es) of events to log. For example, each bit might correspond to a specific subsystem.
2.4. ETW Controllers
Event controllers manage the overall process of event tracing including
-
Starting and stopping the an event trace.
-
Attaching event providers to a trace. Multiple event providers may be attached to a single trace. However, an event provider of the older MOF-based variety can only log to one trace at time.
-
Passing control information to the providers as to what events are to be logged.
-
Controlling resources used by the event trace, e.g. the number of allocated buffers for storing events, and the persistent trace file, if any, to which the events are written.
2.5. Event Consumers
The last type of component that plays an role in event tracing is the event consumer. This is any application that reads event records either from a persistent log file or from a real time trace and interprets them as per the published definition from an event provider.
Microsoft itself provides
several applications that process and display ETW event traces
and are either shipped with the operating system or free to
download. Examples are the Windows Event Viewer, the Microsoft Network
Monitor and xperf
which is part of the Windows Performance
Analyzer.
2.6. Event tracing security
3. Controlling Event Traces
This section describes the Tcl commands required to implement an ETW controller.
3.1. Starting a trace
The procedure for starting a trace for Windows kernel events differs from what is described here and is described in The NT Kernel Logger. |
The only item that absolutely must be specified when starting or creating an event trace is its name. Since a event trace is identified by its name, it must be unique on the system. Like file names, trace names are case-insensitive.
More often than not, you will also want to specify the name of the log file to which events are to be written. This is not strictly necessary since events can also be consumed in real-time without being written to a log file. This is described in Real Time Tracing. In most cases, it is advisable to specify a log file for a couple of reasons. Firstly, it reduces the chance that events are lost because the consumer was not able to read the events fast enough. Secondly, most usage scenarios call for the event traces to be iteratively processed which requires the events to have been persisted to a file.
The command etw_start_trace
returns a handle to a new event trace.
% set process_log [file join $::env(TEMP) process.etl]
→ D:/temp/process.etl
% set htrace [etw_start_trace MyTrace -logfile $process_log -maxfilesize 1]
→ 41
By default, the log file
is written in circular fashion so that newer events overwrite older
events once the specified maximum log file size is reached. This behaviour
can be changed through the -filemode
option to the etw_start_trace
command. For example, the -filemode rotate
option will create
a new log file once the maximum file size is reached. See the
documentation of etw_start_trace
for the possible values
of -filemode
and their effect.
Various other options can be specified when starting a trace. The more commonly used ones are shown in Trace configuration options.
A new created event trace has no providers attached to it and therefore will not receive any events. For events to be written to the newly started event trace, one or more providers need to added to it. We next proceed to just that.
3.2. Discovering providers
Before adding a provider to a trace, we need to find out which
provider it is that should be added. A listing of all providers
can be obtained through the etw_get_providers
command.
% print_sorted [etw_get_providers]
→ .NET Common Language Runtime
ACPI Driver Trace Provider
Active Directory Domain Services: SAM
Active Directory: Kerberos Client
Active Directory: NetLogon
...Additional lines omitted...
Alternatively, the list is also available through the one of the previously
mentioned Windows programs, for example, logman
d:\src\tcl-on-windows\book> logman query providers → Provider GUID ------------------------------------------------------------------------------- .NET Common Language Runtime {E13C0D23-CCBC-4E12-931B-D9CC2EEE27E4} ACPI Driver Trace Provider {DAB01D4D-2D48-477D-B1C3-DAAD0CE6F06B} ...Additional lines omitted...
Knowing the name of the provider is not sufficient. When adding a provider to an event trace, a controller also needs to indicate to the provider which events it should log based on the following criteria:
-
Level - the provider is expected to only log events with severity level less than or equal to this level (note lower values indicate higher severity).
-
Keywords - only events whose keyword mask has at least one bit in common with the specified keyword mask should be logged. Each keyword bit usually corresponds to a specific component or function of the provider. If the keyword mask is specified as
0
, the provider generally assumes a default set of keywords as opposed to assuming that no keywords match.
Depending on the Windows version, these criteria may be enforced by Windows or be left to the provider. Event consumers should be prepared to deal with events that were not requested to be logged. |
Specifying the level numbers is generally straightforward as levels
have the same semantics for all providers. However, the keyword
masks and their meaning do not have standard definitions. A provider
is free to assign any meaning for each keyword bit so you
need to look up them up on a per-provider basis. For older
providers that do not use manifests to describe events, this
information is sometimes only available through the provider’s
documentation. For newer manifest based providers, and providers
that come with the system, the easiest way is to use the
logman
command.
d:\src\tcl-on-windows\book> logman query providers Microsoft-Windows-Kernel-Process → Provider GUID ------------------------------------------------------------------------------- Microsoft-Windows-Kernel-Process {22FB2CD6-0E7B-422B-A0C7-2FAD1FD0E716} Value Keyword Description ------------------------------------------------------------------------------- 0x0000000000000010 WINEVENT_KEYWORD_PROCESS 0x0000000000000020 WINEVENT_KEYWORD_THREAD 0x0000000000000040 WINEVENT_KEYWORD_IMAGE ...Additional lines omitted...
This shows the list of keyword bits defined by the
Microsoft-Windows-Kernel-Process
provider and a brief
description of each. We can guess from the above listing
that this provider logs events related to processes, threads
and image loading. We will assume only process events
are of interest to us.
The above example also shows that there is a GUID associated
with ETW provider. You can obtain this from Tcl with
the etw_get_provider_guid command or with the -detail
option to etw_get_providers . However, the provider GUID
is rarely needed.
|
3.3. Adding providers to a trace
We can now go ahead and enable the provider in the trace
we created earlier. We will enable messages at informational
level and below for process related events (keyword mask 0x10
) only.
The etw_enable_provider
command attaches the provider to
the trace. Just to ensure that there are events written to the trace
we will start a child process and have it immediately exit.
% etw_enable_provider $htrace Microsoft-Windows-Kernel-Process 0x10 informational
% after 100
% exec cmd /c exit
0x10 → flag to log process creation | |
Providers are enabled asynchronously so we provide a small delay to ensure we capture the exec trace |
Event providers can be added to a trace at any time and multiple event providers can be attached to a trace. You can even change the settings for a previously enabled provider.
For example, we can request for image loading events to be logged as well.
% etw_enable_provider $htrace Microsoft-Windows-Kernel-Process 0x50 informational
% exec cmd /c exit
0x50 → flag to log image loading |
3.4. Querying the status of a trace
When a trace is running we can query its status from Tcl with
etw_query_trace
which returns a dictionary containing various
pieces of information about the trace.
% array set trace_status [etw_query_trace $htrace]
% print_array trace_status
→ trace_status(age_limit) = 0
trace_status(buffer_count) = 16
trace_status(buffer_size) = 64
trace_status(buffers_written) = 1
trace_status(clock_resolution) = system
trace_status(enable_flags) = 0
trace_status(events_lost) = 0
trace_status(flush_timer) = 0
trace_status(free_buffers) = 13
trace_status(log_buffers_lost) = 0
trace_status(logfile) = D:\temp\process.etl
trace_status(logfile_mode) = 4194306
trace_status(logger_tid) = 9624
trace_status(max_buffers) = 38
trace_status(max_file_size) = 1
trace_status(min_buffers) = 16
trace_status(real_time_buffers_lost) = 0
trace_status(trace_guid) = {63D30266-0923-11EB-9ACD-ACED5C4BFA28}
trace_status(trace_name) = MyTrace
Many of the status fields are self-explanatory and you can refer
to the etw_query_trace
documentation. Some fields reflect the
configuration settings of the trace and are discussed in the
next section. Here we draw attention to only those fields pertaining
to lost events.
An event trace has a set of associated buffers, buffer_count
in
number, of a specific size, buffer_size
KB, that store events
as they are written by providers. As each buffer fills up, it is
written out to the persistent log file or provided to a
Real Time Tracing consumer.
This frees up the buffer to receive more events.
If the rate at which buffers can be flushed cannot keep
up with the rate at which events are being written to the trace,
new events are thrown away. The events_lost
counter keeps
track of this number. The trace may then have to be configured
to increase the number of buffers.
The log_buffers_lost
and real_time_buffers_lost
fields
counts the number of buffers that could not be written to disk
or delivered to real-time consumers respectively.
Note this is not necessarily due to direct event rate problems; for
example, the disk may be inaccessible or the consuming process
may have died. Changing buffer configuration in such cases
may not help.
The next section describes how to change the configuration of a trace to deal with these issues.
3.5. Configuring a trace
Some aspects of the configuration of an event trace can be modified while it is running. For example, you may want to increase the number of buffers used by the trace if you see events being lost as described in the previous section.
% set new_max [expr {$trace_status(max_buffers) + 1}]
→ 39
% dict get [etw_update_trace $htrace -maxbuffers $new_max] max_buffers
→ 39
The command returns the state of the event trace in the same
dictionary format as the evt_query_trace
. As you can see,
the max_buffers
configuration parameter is incremented.
Another configuration setting that is often useful is -logfile
which
switches the trace to a different log file. See the documentation of
etw_update_trace
for other configuration options that can be set.
3.6. Removing providers from a trace
Event providers can be removed from a trace at any time with
the etw_disable_trace_provider
.
% etw_disable_provider $htrace Microsoft-Windows-Kernel-Process
It is good practice to do this before stopping a trace. If an event provider remains enabled when a trace is stopped, behaviour depends on the version of Windows and the specific provider. On newer versions of Windows with manifest based providers, when a trace with the same name is started again, the provider will be automatically be reenabled for that trace. On older systems and with MOF based providers, the exact behaviour depends on the provider implementation and whether it was enabled for other traces in the meantime. It is best to not rely on this.
3.7. Stopping a trace
Stopping a trace is simple.
% etw_stop_trace MyTrace
→ logfile {D:\temp\process.etl} trace_name MyTrace trace_guid {{63D30266-0923-1...
The etw_stop_trace
command returns the final state information
about the trace.
Notice we supplied the name of event trace to etw_stop_trace
,
not the trace handle. We could have used the trace handle as well.
In fact, most commands demonstrated above will accept either
the trace name or the trace handle.
Using the trace handle is slightly more efficient but since the
commands are infrequently invoked, this is not of much consequence.
Using the event trace name has one useful advantage. It allows
you to control event traces that were started by other applications.
For example, you can manipulate a trace that was started by
xperf .
|
3.8. The NT Kernel Logger
Windows contains one specific predefined trace named
NT Kernel Logger
which logs kernel events like disk I/O,
context switches etc. This trace has one provider -
Windows Kernel Trace
- attached to it. Kernel traces differ from the
user traces discussed so far in several respects.
-
There can be only one instance of a kernel event trace and the name has to be
NT Kernel Logger
. Attempting to create a new kernel trace while one is active will result in an error. -
Kernel events can only be logged to this event trace and not to any other user event trace.
-
Conversely, other providers cannot be added to the trace with
etw_enable_provider
as was the case with user traces. -
Since
etw_enable_provider
cannot be used with theNT Kernel Logger
trace, the tracing parameters, like keywords and level, have to be specified when the trace is started. -
Certain logging options, like the
rotate
value for-logmode
cannot be specified.
The second point does not strictly hold on newer Windows versions.
Listing providers will show ones
with names prefixed with Microsoft-Windows-Kernel- , for example
Microsoft-Windows-Kernel-Process , which we used in our earlier
examples. These providers allow kernel events to be written
in user mode event traces as well. However, the other characteristics
mentioned about the NT Kernel Logger trace still hold.
|
Although the etw_start_trace
command can be used to start
a kernel trace, the easier method is to use the
etw_start_kernel_trace
wrapper which allows the kernel components
of interest to be specified without having to know the appropriate
keyword masks.
In the example below, we trace a socket connection using the kernel logger.
% set user_net_log [file join $::env(TEMP) kernel_network.etl]
→ D:/temp/kernel_network.etl
% set kernel_net_log [file join $::env(TEMP) kernel_network.etl]
→ D:/temp/kernel_network.etl
% set ktrace [etw_start_kernel_trace [list tcpip] -logfile $kernel_net_log]
→ 65535
% after 100
% close [socket www.google.com 80]
% after 100
% etw_stop_trace $ktrace
→ logfile {D:\temp\kernel_network.etl} trace_name {NT Kernel Logger} trace_guid...
We will analyze and compare these trace in Consuming Event Traces.
The kernel components that can be traced depend on the Windows version.
See the documentation for etw_start_kernel_trace
for the allowed
values for the various Windows versions. The references
Improve Debugging And Performance Tuning With ETW, Core OS Events in Windows 7, Part 1 and Core Instrumentation Events in Windows 7, Part 2 provide
additional information about
the various types of kernel events and their usage in application
analysis.
4. Consuming Event Traces
Event traces written to a log file can be read
by an event trace consumer application for display and analysis.
Just like for ETW controllers, Windows includes several
command line and UI programs for this purpose. Depending
on the Windows version, these include tracerpt
, xperf
and the Windows Event Viewer application.
Although the built-in Windows programs are fairly sophisticated and comprehensive (especially in newer Windows versions), there are times when why programmatically consuming event traces is useful, for example for reacting to events or customized application analysis and correlation.
In this section, we describe the commands for processing events
from an ETW trace file from Tcl. In the simplest case, the
TWAPI command etw_dump
returns a record array
that contains parsed events from one or more trace files. For
more sophisticated processing or where the event set is very
large, lower level operations that provide more flexibility
can be used. Both these approaches are described here.
4.1. Reading traces with etw_dump
The etw_dump
command returns a record array containing
trace records from one or more trace files. The recordarray
command can then be used to extract specific records or fields.
In the following example, we extract the first record pertaining
to process start events as a dictionary and print it.
% print_dict [lindex [recordarray getlist [etw_dump $process_log] -filter \
{{-taskname eq ProcessStart}} -format dict] 0]
→ -channel = 16
-channelname = Microsoft-Windows-Kernel-Process/Analytic
-eventguid = {00000000-0000-0000-0000-000000000000}
-eventid = 1
-kerneltime = 82031250
-keywordmask = 9223372036854775824
-keywords = WINEVENT_KEYWORD_PROCESS
-level = 4
-levelname = Information
-message = Process %1 started at time %3 by parent %4 running in sessi...
-opcode = 1
-opcodename = Start
-pid = 21852
-properties = ProcessID 24428 ProcessSequenceNumber 21586 CreateTime 2020...
-providerguid = {22FB2CD6-0E7B-422B-A0C7-2FAD1FD0E716}
-providername = Microsoft-Windows-Kernel-Process
-sid =
-task = 1
-taskname = ProcessStart
-tid = 21384
-timecreated = 132466947087647778
-usertime = 43437500
-version = 3
The trace record fields correspond to the fields described in
ETW event header fields with the -properties
field
being a dictionary that holds the
custom event data specified
to the event type.
Note the message string is actually a string template that includes
insert placeholders of the form %N
. The message string for
an event is formed by replacing the placeholders with appropriate
values from the -properties
dictionary for the trace record.
The etw_format_event_message
does this as shown in the sample
code below.
% proc print_process_starts {etl_file} {
set ra [etw_dump $etl_file]
set event_recs [recordarray getlist $ra -filter {
{-taskname eq ProcessStart}
} -format dict]
foreach rec $event_recs {
array set arec $rec
puts "$arec(-timecreated): [etw_format_event_message $arec(-message) \
$arec(-properties)]"
}
}
% print_process_starts $process_log
→ 132466947087647778: Process 24428 started at time 2020-10-09T05:25:08.764Z by
↳ parent 21852 running in session 1 with name
↳ \Device\HarddiskVolume3\Windows\System32\cmd.exe.
132466947089103424: Process 19308 started at time 2020-10-09T05:25:08.910Z by
↳ parent 21852 running in session 1 with name
↳ \Device\HarddiskVolume3\Windows\System32\cmd.exe.
Read trace records into a recordarray | |
Get list of ProcessStart records each formatted as a dictionary | |
process_log holds the name of the process trace file we had created earlier |
Although the above example only processed one trace file, etw_dump
as well as the lower level commands discussed later, will accept
multiple trace file arguments. The Windows libraries
will merge the records from multiple traces.
This merge facility is useful on older Windows systems where the kernel can only only log to a particular session that is not writable by any other providers. In this case, you can simply have the user mode provider log to a different file and then process the two files as a single event trace source. |
4.2. Processing traces using callbacks
The command etw_dump
, and the similar etw_dump_to_file
, is
useful for quick dumps or searches of log files. However, they
have the drawback of reading all trace records into memory which
may not be desirable if the traces are large and only a subset of
the records are of interest.
We now look at the lower level programming
interfaces which allow consuming of events in smaller chunks
using callbacks.
The first step in processing events from an event trace log file
is to open it with etw_open_file
.
% set kfile [etw_open_file $kernel_net_log]
→ 68
We also need a trace formatter which will translate the raw binary events into a form we can consume.
% set formatter [etw_open_formatter]
→ tdh7
We define a callback that will be invoked for every
buffer in the traces. The callback needs a formatter as it
will print records of interest. We will explicitly pass this
as the first parameter. The other two parameters are appended
on every invocation of the callback and contain the trace buffer
and the raw event records respectively. The callback will format
the raw events using etw_format_events
and then iterates
over the returned record array using a filter to restrict
the records to those related to TCP/IP. The NT Kernel Logger
distinguishes between different kernel components through
-taskname
event record field values like DiskIo
or TcpIp
.
We will filter the record array accordingly and print out
the connection information for each event.
Even on newer versions of Windows, the NT Kernel Logger is MOF-based and has an empty message field so we leave it out. |
% proc etw_cb {fmt bufd raw_event_list} {
set formatted_events [etw_format_events $fmt $bufd $raw_event_list]
recordarray iterate arec $formatted_events -filter {{-taskname eq TcpIp}} {
puts "$arec(-timecreated) $arec(-opcodename) $arec(-properties)"
}
}
We then do the actual processing of events by invoking
etw_process_events
. This command will invoke the callback for
every buffer in the trace and only return when all the events
have been consumed or the callback has terminated the event
processing. Note the formatter is explicitly passed to the
callback as its first parameter.
% etw_process_events -callback [list etw_cb $formatter] $kfile
→ 132466947092502159 ConnectIPV4 PID 21852 size 0 daddr 172.217.163.36 saddr
↳ 192.168.1.3 dport 80 sport 51751 mss 1380 sackopt 1 tsopt 0 wsopt 1 rcvwin
↳ 262200 rcvwinscale 8 sndwinscale 8 seqnum 0
132466947092643327 DisconnectIPV4 PID 21852 size 0 daddr 172.217.163.36 saddr
↳ 192.168.1.3 dport 80 sport 51751 seqnum 0
We finally need to close the trace consuming session as well as the formatter to release allocated resources.
% etw_close_session $kfile
% etw_close_formatter $formatter
There are additional options that can be used with the processing
of events using callbacks, for example, terminating the processing
before all events are processed or limiting the retrieved events
to a specific time interval. See the documentation of
etw_process_events
for details.
5. Real Time Tracing
ETW allows event traces to be consumed in real time in addition to or instead of being logged to a file for later processing. Normally, the buffers holding the trace data are flushed to the log file when they become full or when the session is closed. With real time mode, the buffers are flushed within a specific (system dependent) time even if they are not full. If real-time and logging to file are both enabled, this implies that the log file could grow quickly since buffers are written in fixed size blocks which may be only partially filled.
Real time mode had implementation issues on Windows systems prior to Vista due to which its use was advised only for low volume notification type events. |
For real-time tracing,
-
The event trace itself must be configured to be a real-time session, and
-
The event trace consumer must read events directly from the session instead of a log file.
5.1. Setting up a real time session
The mechanics of setting up a real-time trace are almost identical
to those described in Starting a trace. The only difference
is that the -realtime
option needs to be specified. Although not
shown in the sample code below, the -logfile
option could
have been specified as before to log the trace to a file as well.
% set htrace [etw_start_trace MyRTrace -realtime 1]
→ 41
% etw_enable_provider $htrace Microsoft-Windows-Kernel-Process 0x10 informational
% after 10
0x10 → flag to log process creation | |
Providers are enabled asynchronously so we provide a small delay to ensure we capture the exec trace |
5.2. Consuming a real time session
Consuming a real time session is also similar to consuming a trace from a log file with the following differences:
-
Because trace events will consumed directly in real time instead of a log file, the command
etw_open_session
has to be used instead ofetw_open_file
. -
The
etw_dump
command is not suitable for consuming real time traces for obvious reasons. Theetw_process_events
command must be used instead. -
In many situations the events need to be consumed in a separate thread.
This last point needs some elaboration. The Win32 function
ProcessTrace
on which etw_process_events
is layered, does not
return from the call until the trace is stopped or until the
callback return a break
status. In the former case, the program
is prevented from doing any other work while the trace exists.
In the latter case, the program has to periodically call
etw_process_events
but risks losing events.
Both these strategies are feasible depending on the application involved, rate of events etc. Below we demonstrate a third option - use of a separate thread - which alleviates both potential issues.
We start off by loading the Thread
package which allows manipluation
of threads at the Tcl script level.
% package require Thread
→ 2.8.5
We then create a thread using thread::create
, passing it a script
that will do the actual processing of events
in real time. The script is identical to that described
in Processing traces using callbacks with the exception that since
we are dealing with a real time trace, etw_open_session
is called instead of etw_open_file
. The script collects events in
real time and queues them to be retrieved by the main thread.
This is merely to demonstrate passing data between threads. In
a real application, the created thread could do all required processing
itself.
% set consumer_thread [thread::create {
package require twapi
namespace path twapi
proc etw_cb {fmt bufd raw_event_list} {
set formatted_events [etw_format_events $fmt $bufd $raw_event_list]
recordarray iterate arec $formatted_events {
lappend ::events [array get arec]
}
}
set formatter [etw_open_formatter]
set hsess [etw_open_session MyRTrace]
etw_process_events -callback [list etw_cb $formatter] $hsess
thread::wait
}]
→ tid0000000000000788
Do whatever processing is desired for the event here. For our tutorial we simply append to a global variable which will be retrieved by the main thread. | |
We want to wait until the main thread retrieves the event list |
Having started a separate thread for consuming trace events,
back in the main thread
we start a child process so as to trigger a trace event.
We use the combination of after
and vwait
for the process
to start and exit.
% after 5000 {set ::done 1}
→ after#6
% exec cmd /c exit &
→ 4588
% vwait ::done
We finish up as described in earlier sections by disabling the trace provider and stopping the trace.
% etw_disable_provider $htrace Microsoft-Windows-Kernel-Process
% etw_stop_trace $htrace
→ logfile {} trace_name MyRTrace trace_guid {{63D30290-0923-11EB-9ACD-ACED5C4BF...
Stopping the trace will also cause the etw_process_events
command
in the secondary thread to return. Because of the thread::wait
at the end of the secondary thread script, the thread will not
exit but will continue processing messages from other threads.
This allows the main thread to retrieve the collected events
and print them.
% print_list [thread::send $consumer_thread {set ::events}]
→ -opcodename Start -eventguid {{00000000-0000-0000-0000-000000000000}} -taskna...
-opcodename Stop -eventguid {{00000000-0000-0000-0000-000000000000}} -tasknam...
Finally, after retrieving the data we release the secondary thread so it can exit.
% thread::release $consumer_thread
→ 0
6. Providing Events
So far we have described how to implement event trace controllers and consumers. Now we look at implementing an event trace provider. This will allow us to use ETW to analyze the behaviour of our Tcl programs.
Before an event trace provider writes events to an event trace, it has to register itself. Any ETW controller can then attach the provider to a trace session to which events written by the provider will be directed.
TWAPI includes a predefined
provider that is registered through the etw_twapi_provider_register
call. The etw_log_message
command can then be used to write
trace events. In addition, TWAPI provides convenience commands
that make use of Tcl’s built-in trace
command to write ETW traces
without having to make changes to the code being analyzed.
Registering a provider permits trace events to be written. However,
trace consumers need to have access to event definitions in order
to parse event traces. The etw_install_twapi_mof
command
registers the TWAPI ETW provider event definitions in the system
so they can be accessed by consumers.
The next section demonstrates these features.
For compatibility with older Windows versions, this provider is MOF-based and can therefore only be attached to a single trace session at a time. |
6.1. Tracing Tcl execution with ETW
We now show a simple example of using ETW to analyse program
flow when retrieving a URL using the Tcl http
package.
We start off by registering ourselves as a ETW provider and starting
a ETW trace. This second action could also be done from a different
ETW controller program such as logman
.
% set hprov [etw_twapi_provider_register]
→ 66430108458154464
% set http_log [file join $::env(TEMP) http.etl]
→ D:/temp/http.etl
% set htrace [twapi::etw_start_trace MyHttpTrace -logfile $http_log]
→ 41
We then load the http
package and use
the Tcl trace
command to have the TWAPI ETW execution
trace logger automatically invoked whenever
the http::geturl
command is executed.
% package require http
→ 2.9.1
% trace add execution ::http::geturl {enter leave} [list \
::twapi::etw_execution_tracker $hprov]
As discussed in a previous section
we need to add ourselves to the event trace. Because we want to
also correlate execution with network events, we will also
add the Microsoft-Windows-Kernel-Network
provider to the trace.
We use the Microsoft-Windows-Kernel-Network provider instead of
the NT kernel logger so that we can add it to the same trace. However,
this provider is not available on older Windows versions.
|
% twapi::etw_enable_provider $htrace [twapi::etw_twapi_provider_guid] -1 verbose
% twapi::etw_enable_provider $htrace Microsoft-Windows-Kernel-Network -1 verbose
% after 10
Traces are controlled asynchronously so we put in a small delay to make sure they are enabled before we run our code. |
We now fetch a URL.
% etw_log_message $htrace "This event just demonstrates use of etw_log_message"
% http::geturl http://www.google.com
→ ::http::1
Assuming no more tracing is to be done, we remove the traces and unregister the provider.
% etw_disable_provider $htrace Microsoft-Windows-Kernel-Network
% etw_disable_provider $htrace [etw_twapi_provider_guid]
% etw_stop_trace $htrace
→ logfile {D:\temp\http.etl} trace_name MyHttpTrace trace_guid {{63D30291-0923-...
% etw_unregister_provider $hprov
% trace remove execution ::http::geturl {enter leave} [list \
::twapi::etw_execution_tracker $hprov]
Now that the tracing is done, we will read the trace as described
in Consuming Event Traces. We first need to install
out event definitions for the trace file to be parsed and then
dump the events. Note we filter out events from MSNT_SystemTrace
as they are of no interest for our purposes.
% etw_install_twapi_mof
% recordarray iterate arec [etw_dump $http_log] -filter {{-providername ne \
MSNT_SystemTrace}} {
puts "$arec(-timecreated): $arec(-providername) $arec(-opcodename) \
$arec(-properties)"
}
→ 132466947162515550: TwapiETWProvider Message Message {{This event just
↳ demonstrates use of etw_log_message}}
132466947162517646: TwapiETWProvider ExecutionTrace Operation enter Command
↳ {{http::geturl http://www.google.com}} Code {{}} Result {{}} Context {{}}
132466947162690862: Microsoft-Windows-Kernel-Network Connection attempted. PID
↳ 21852 size 0 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 mss
↳ 1380 sackopt 1 tsopt 0 wsopt 1 rcvwin 262200 rcvwinscale 8 sndwinscale 8 seqnum
↳ 0 connid 0
132466947162974441: Microsoft-Windows-Kernel-Network Data sent. PID 21852 size
↳ 189 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 startime
↳ 16947179 endtime 16947180 seqnum 0 connid 0
132466947163693000: Microsoft-Windows-Kernel-Network Protocol copied data on
↳ behalf of user. PID 21852 size 5648 daddr 172.217.163.36 saddr 192.168.1.3 dport
↳ 80 sport 51752 seqnum 0 connid 0
132466947163693631: Microsoft-Windows-Kernel-Network Data received. PID 21852
↳ size 5648 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
↳ connid 0
132466947163695007: Microsoft-Windows-Kernel-Network Protocol copied data on
↳ behalf of user. PID 21852 size 1059 daddr 172.217.163.36 saddr 192.168.1.3 dport
↳ 80 sport 51752 seqnum 0 connid 0
132466947163695238: Microsoft-Windows-Kernel-Network Data received. PID 21852
↳ size 1059 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
↳ connid 0
132466947163695579: Microsoft-Windows-Kernel-Network Protocol copied data on
↳ behalf of user. PID 21852 size 120 daddr 172.217.163.36 saddr 192.168.1.3 dport
↳ 80 sport 51752 seqnum 0 connid 0
132466947163695772: Microsoft-Windows-Kernel-Network Data received. PID 21852
↳ size 120 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
↳ connid 0
132466947163723013: Microsoft-Windows-Kernel-Network Protocol copied data on
↳ behalf of user. PID 21852 size 5648 daddr 172.217.163.36 saddr 192.168.1.3 dport
↳ 80 sport 51752 seqnum 0 connid 0
132466947163723756: Microsoft-Windows-Kernel-Network Data received. PID 21852
↳ size 5648 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
↳ connid 0
132466947163823975: Microsoft-Windows-Kernel-Network Protocol copied data on
↳ behalf of user. PID 21852 size 4236 daddr 172.217.163.36 saddr 192.168.1.3 dport
↳ 80 sport 51752 seqnum 0 connid 0
132466947163824253: Microsoft-Windows-Kernel-Network Data received. PID 21852
↳ size 4236 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
↳ connid 0
132466947163825306: Microsoft-Windows-Kernel-Network Data received. PID 21852
↳ size 179 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum 0
↳ connid 0
132466947163899066: TwapiETWProvider ExecutionTrace Operation leave Command
↳ {{http::geturl http://www.google.com}} Code 0 Result ::http::1 Context {{}}
132466947164047744: Microsoft-Windows-Kernel-Network Disconnect issued. PID
↳ 21852 size 0 daddr 172.217.163.36 saddr 192.168.1.3 dport 80 sport 51752 seqnum
↳ 0 connid 0
132466947164295180: Microsoft-Windows-Kernel-Network Data received. PID 10072
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
132466947164295497: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164296721: Microsoft-Windows-Kernel-Network Data received. PID 10072
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
132466947164297033: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164323703: Microsoft-Windows-Kernel-Network Data received. PID 10072
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
132466947164324001: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164324616: Microsoft-Windows-Kernel-Network Data received. PID 10072
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
132466947164324825: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164325279: Microsoft-Windows-Kernel-Network Data received. PID 10072
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
132466947164325575: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164326494: Microsoft-Windows-Kernel-Network Data received. PID 10072
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
132466947164326752: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164361206: Microsoft-Windows-Kernel-Network Data received. PID 12052
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
132466947164361466: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164362229: Microsoft-Windows-Kernel-Network Data received. PID 12052
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
132466947164362414: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164362928: Microsoft-Windows-Kernel-Network Data received. PID 12052
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
132466947164363032: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164364332: Microsoft-Windows-Kernel-Network Data received. PID 12052
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
132466947164364485: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164372876: Microsoft-Windows-Kernel-Network Data received. PID 10072
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
132466947164373059: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164374262: Microsoft-Windows-Kernel-Network Data received. PID 10072
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54068 sport 54067 seqnum 0 connid 0
132466947164374391: Microsoft-Windows-Kernel-Network Data sent. PID 10072 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54067 sport 54068 startime 16947194
↳ endtime 16947194 seqnum 0 connid 0
132466947164559307: Microsoft-Windows-Kernel-Network Data received. PID 12052
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
132466947164559536: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947196
↳ endtime 16947196 seqnum 0 connid 0
132466947164560206: Microsoft-Windows-Kernel-Network Data received. PID 12052
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
132466947164560443: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947196
↳ endtime 16947196 seqnum 0 connid 0
132466947164560719: Microsoft-Windows-Kernel-Network Data received. PID 12052
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
132466947164560817: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947196
↳ endtime 16947196 seqnum 0 connid 0
132466947164561306: Microsoft-Windows-Kernel-Network Data received. PID 12052
↳ size 1 daddr 127.0.0.1 saddr 127.0.0.1 dport 54062 sport 54061 seqnum 0 connid 0
132466947164561397: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size 1
↳ daddr 127.0.0.1 saddr 127.0.0.1 dport 54061 sport 54062 startime 16947196
↳ endtime 16947196 seqnum 0 connid 0
132466947164879973: Microsoft-Windows-Kernel-Network Data sent. PID 12052 size
↳ 969 daddr 104.244.42.65 saddr 192.168.1.3 dport 443 sport 51348 startime
↳ 16947194 endtime 16947199 seqnum 0 connid 0
For purpose of exposition, we have simply printed the event records.
For detailed analysis of program flow, you would add
enterstep
and leavestep
to the trace
command arguments. The
events would then include enough information to build a
complete call tree including nested calls, arguments and return values
at each level. Inclusion of system trace information, the
network traffic in this case, allows inclusion of flow information
through the operating system as well.
For all practical purposes,
the Tcl trace add execution command disables Tcl’s byte code
compilation. The above technique is therefore not suitable for
execution time measurements. For that purpose, you need to add
the calls to log ETW events, either explicitly to the commands of
interest or by redefining those commands to log an event before and
after invoking the original command.
|
7. References
- SDKETW
-
Event Tracing, Windows SDK documentation. User guide and reference for ETW.
- PARK2007
-
Improve Debugging And Performance Tuning With ETW, Park, Buch, MSDN Magazine, April 2007. Overview of the new manifest based ETW architecture, API’s and tools introduced in Vista.
- PARK2009
-
Core OS Events in Windows 7, Part 1, Park, Bendetov, MSDN Magazine, September 2009. Describes the kernel ETW providers in Windows 7 and their use.
- PARK2009A
-
Core Instrumentation Events in Windows 7, Part 2, Park, Bendetov, MSDN Magazine, October 2009. Continuation of prior article with additional kernel providers and usage examples.
- SOUL2012
-
Inside Windows Debugging, Soulami, Microsoft Press, 2012. Part 3 of this book provides detailed coverage of using ETW for application analysis.
- TUNS2002
-
Developing WMI Solutions, Tunstall, Cole, Addison-Wesley, 2002. Chapter 13 details all aspects of the older MOF-based ETW implementation.