Exploring Tcl internals from script - Part II

Published 2017-08-04, updated 2017-08-04

A previous blog post described the representation command and its use for introspecting Tcl's internal structures for storing data. I promised a follow-up post that talked about Tcl's compiled byte code and the disassemble command for inspecting it. Well, only two years later, here is that post as promised.

To be fair to myself, the delay is a consequence of all my spare cycles being devoted to writing a book, The Tcl Programming Language, which is now finally complete. In fact this post is adapted from material from the book.

Code execution

Let us start with the sequence of steps required to execute a procedure. In simplifed terms,

At the time of procedure definition, Tcl does very little work per se. The body of the procedure is simply stored in a location associated with the procedure name. It is a plain string at this point; there is no check of any kind for syntax errors etc.
When a procedure is invoked, a check if made to see if it has a valid compiled form. If not, the procedure body is compiled into byte code. You can think of byte codes as being an instruction set for a virtual machine -- the Tcl byte code interpreter. Compiling into byte code saves on the parsing step the next time the procedure is invoked resulting in faster execution.
Any supplied arguments are pushed onto the call stack and the procedure is invoked.

Of these steps, the second is the one of most interest and we can use the disassemble command to examine what the compiled body of a procedure looks like. We will look at the byte code generated for a simple procedure, demo, that returns the first word in a string.

proc demo {line} {
    set words [split $line $::splitter]
    return [lindex $words 0]
}

The `disassemble` command

We can look at the compiled form of this procedure with the disassemble command which, like representation is placed in the tcl::unsupported namespace. Note, as an aside, that there is also a getbytecode command that is similar but returns the disassembled code as a dictionary instead of in human readable form.

After defining the above procedure, type the following command at the Tcl shell prompt and examine the resulting output.

% tcl::unsupported::disassemble proc demo
�?? ByteCode 0x000000000403D300, refCt 1, epoch 16, interp 0x0000000003B661A0 (epoch 16)
    Source "\n    set words [split $line $::splitter]\n    return [..."
    Cmds 4, src 70, inst 30, litObjs 2, aux 0, stkDepth 3, code/src 0.00
    Proc 0x0000000003F89A10, refCt 1, args 1, compiled locals 2
        slot 0, scalar, arg, "line"
        slot 1, scalar, "words"
    Commands 4:
        1: pc 0-11, src 5-39        2: pc 0-8, src 16-38
        3: pc 12-28, src 45-68        4: pc 21-27, src 53-67
    Command 1: "set words [split $line $::splitter]..."
    Command 2: "split $line $::splitter..."
      (0) push1 0   # "split"
      (2) loadScalar1 %v0   # var "line"
      (4) push1 1   # "::splitter"
      (6) loadStk 
      (7) invokeStk1 3 
      (9) storeScalar1 %v1  # var "words"
      (11) pop 
    Command 3: "return [lindex $words 0]..."
      (12) startCommand +17 2   # next cmd at pc 29, 2 cmds start here
    Command 4: "lindex $words 0..."
      (21) loadScalar1 %v1  # var "words"
      (23) listIndexImm 0 
      (28) done 
      (29) done

Going through the disassembled listing line by line will provide us with a basic understanding of the byte code interpreter.

The first part of the disassembly provides summary information about the compiled procedure. The fields are shown in the table below.

Field	Description
`ByteCode`	Memory address where the compiled byte code is stored.
`interp`	Memory address of the Tcl interpreter structure owning the byte code
`refCt`	The number of references to the byte code structure which is reference counted.
`epoch`	The compiler epoch.
`Source`	The Tcl source code from which the byte code was compiled.
`Cmds`	The number of script level commands in the compiled source
`src`	The number of characters in the script
`inst`	The number of bytes in the generated byte code
`litObjs`	The size of the local literals table.
`stkDepth`	The maximum stack depth utilized by this byte code fragment. This is used at run time to preallocate the required stack space before the byte code is executed.
`Proc`	The address of the procedure descriptor associated with the byte code.
`args`	The number of parameters defined by the procedure.
`compiled locals`	Size of the local variable table.
`slot`	Entries in the local variable table.

The remaining lines in the disassembly are discussed below.

Compiler epochs

The epoch field denotes a compiler epoch. Earlier we mentioned in passing that a procedure may have associated byte code which is invalid and needs to be recompiled. This can happen because of Tcl's dynamic nature. For instance, consider what happens when a built-in command like lindex is redefined. Calls to many built-in commands are compiled to a sequence of inline byte code instructions like listIndexImm above. If this command is redefined, the compiled byte code for our procedure would not longer be correct as listIndexImm would (likely) not be the correct implementation of the redefined lindex. Tcl detects this situation by maintaining a compiler epoch. This epoch is incremented on any action, such as command definition, that would invalidate any compiled byte code. The epoch at the time a procedure is compiled is also stored with the compiled byte code as shown above. When the procedure is called, the procedure is recompiled if its compilation epoch is not the same as the current compiler epoch.

The local literals table

As we saw in representation, the compiler maintains a table of literals so that they can be shared across the interpreter. Correspondingly, the compiled byte code unit maintains a table that points to locations in this global literal table. The purpose of this indirection is that the corresponding byte code instructions can use a single byte to index into the local literal table which is not possible for the global table which tends to be quite large. For example, the byte code instruction

push1 0

pushes the first (i.e. index 0) from the literal table on to the evaluation stack.

Note that the literal table is not just used for literal operands that appear in the script but also for things like names of commands and variables. In our example, there are two literals in the table as indicated by the value of the litObjs field. The disassembler-generated comments for the push1 instructions at locations 0 and 4 indicate these correspond to the command name split and global variable ::splitter respectively. Notice though that there is no literal object corresponding to the lindex command. We shall shortly see why that is so.

The local variable table

Local variables and arguments are stored in slots allocated in the call frame when a procedure is invoked. Our disassembly shows two slots defined for our procedure, one for the line argument and the other for the words local variable.

slot 0, scalar, arg, "line"
slot 1, scalar, "words"

The key thing to note about local variables is that access to slots is much faster than access to global and namespace variables. We will see why in the next section.

The stack machine

The Tcl byte code interpreter is stack-based (as opposed to register-based) where byte code instructions operate on operands placed on an evaluation stack. This evaluation stack is internal to the byte code interpreter and not to be confused with the C-stack or procedure call stack discussed earlier. Moreover, each Tcl interpreter (not to be confused with the byte code interpreter!) has its own evaluation stack.

The call to split in our example is compiled to the byte code

Command 2: "split $line $::splitter..."
  (0) push1 0   # "split"
  (2) loadScalar1 %v0   # var "line"
  (4) push1 1   # "::splitter"
  (6) loadStk 
  (7) invokeStk1 3 
  (9) storeScalar1 %v1  # var "words"
  (11) pop

Here the actual call to split takes place via the invokeStk1 instruction. The corresponding three arguments to be pushed on the stack are the name of the command to be invoked, the value of the line local variable, and the value of the ::splitter global variable.

The command name, split, is stored in the local literals table at index 0. The push1 0 instruction pushes it onto the evaluation stack.
The local variable line is stored at in the local variable slot 0. The loadScalar1 instruction pushes the value of the variable at this slot onto the stack.
The global variable ::splitter needs to be handled differently. The variable name is stored at index 1 in the local literal table. However, we need to pass the value of this variable and not the name. This is done in two steps. First, the push1 1 instruction places the literal table entry at index 1, i.e. the literal ::splitter, on to the top of the stack. The loadStk instruction then looks up the variable by name and replaces the top of the stack with the value of the variable.

This now explains why access to local variables is much faster than to globals. The former is a direct indexed look-up into the local slots whereas the latter involves locating the global via a hash table (as part of execution of loadStk) and then retrieving its value.

The invokeStk1 instruction looks up the supplied command name and executes it, passing the arguments on the stack. The result of the command replaces all arguments passed in. The storeScalar1 instruction then stores the value on the top of the stack into the local variable slot at index 1. Finally, the stack is cleaned up by popping the topmost entry with pop.

Inlined byte code

The byte code instruction invokeStk1 used to invoke commands needs to resolve the command name in the current namespace context to locate the appropriate C function to call, wrap arguments into a form accessible to the C function, arrange for exception handling and so on. These operations make invocation of commands relatively expensive. As an optimization therefore, Tcl will directly inline byte code for many built-in commands at the call site itself. In our example, contrast the call to split with the call to lindex. The latter does not involve a call to invokeStk at all. Instead the byte code sequence implementing the command, a single instruction listIndexImm in this case, is directly inserted in the procedure body.

(21) loadScalar1 %v1    # var "words"
(23) listIndexImm 0

At the time of compilation, before inlining the command, the compiler ensures that the name resolves to the built-in command and not some procedure of the same name. Moreover as discussed before, redefinition of the command invalidates the generated byte code by bumping the compiler epoch. This ensures the inlined code is valid whenever in use.

You might have noticed the inlined command is prefixed with a startCommand instruction. This is present to take care of some special checks that are not needed with non-inlined invocation because the invokeStk1 command internally takes care of them. These checks include verifying that the inlined byte code is still valid by checking the compiler epoch and that any set interpreter limits are not crossed.

Note that at some point in the future, it is possible that the split command may have an inline implementation as well. This would depend on the complexity of the implementation and how commonly the command is used in programs.