Future Electronics – Debug and Trace in Embedded ARM Designs: Introducing CoreSight

By: Francesco Sinacori, Field Applications Engineer, Future Electronics

Read this to find out about:

  • The benefits of ARM’s CoreSight debug and trace tool for users of complex and multi-core processors
  • The simpler implementation of CoreSight that may be used with MCUs based on an ARM Cortex-M core
  • The potential to perform software optimization in CoreSight in addition to the basic bugfinding and -fixing functions

Microcontrollers based on ARM® processor cores have a very powerful and sophisticated set of features for debug and trace, which are provided within an overall framework called CoreSight. Because these features are so powerful, however, they are also complex.

For users of ARM’s simpler set of processor cores for microcontrollers, the ARM Cortex®-M series, the features of CoreSight and the overhead that it places on the silicon implementation of the core are over-specified. Yet in order to gain the benefit of the high performance debug and trace capabilities in CoreSight, it is important to understand the function of each component and to work out how to use these functions to best meet the needs of a microcontroller design project.

This is particularly the case given that, in embedded design, there is always a trade-off between system performance and reliability on the one hand, and design time, design cost and time-to-market on the other. Adopting a more effective method of debugging can make this trade-off far less painful.

This article presents a simple view of the debug and trace capabilities provided by CoreSight, and shows how its functions are implemented in two families of ARM Cortex-M-based MCUs, providing the tools for software optimization without unnecessarily burdening the MCU design.

The Role of Software Optimization
Following the path of Moore’s Law, microcontrollers have become ever more powerful computing and signal-processing devices. With more computing power available to the designer, the software in embedded devices has become more complex and intricate.

So now embedded designers using microcontrollers have to invest a far greater proportion of their development effort in software optimization and quality assurance than they ever did in the past. Indeed, this is now a fundamental requirement for ensuring that a product is successful.

In this software optimization effort, debugging and software tracing are the primary techniques available to the designer. It is worth emphasizing that the purpose of debugging is not simply to fix bugs; it extends to the profiling of system behavior and to performance analysis, helping the designer to make the operation of the system as a whole as efficient and effective as possible.

The Types of Debugging Operation
The debugging of a microcontroller can be either invasive or non-invasive.

In invasive debugging, the user may halt (creating break points) and run the processor, and examine and alter the processor’s registers and the MCU’s memory (creating watch points).

In non-invasive debugging, the system is accessed without disturbing the normal code-execution flow, while obtaining information about running threads and the associated program and/or data flow (that is, tracing).

Users of MCUs based on ARM processor cores can perform these debug and trace functions through CoreSight. CoreSight, an advanced debug and trace system, was developed by ARM in part to help the designers of complex, multicore Systems-on-Chip (SoCs), since previous generations of debug and trace tools built for conventional single-core processors were inadequate for these more complex devices. It enables a single debugger connection to control and interrogate all processors and their memory, and a single trace port to provide visibility of all trace sources in the system.

In terms of its basic construction, CoreSight consists of a collection of interface specifications and protocols, and a set of debug, trace and connection components implementing them.

One of the big advantages of this architecture is that the debug and trace interfaces are de-coupled from the processor design, a feature which allows SoC designers to add debug and trace capabilities for other IP cores to the CoreSight infrastructure.

But for users of the simpler, single-core ARM Cortex-M series, it also means that the complex CoreSight architecture can be broken down, to enable the user to study separately its debug and trace features.

Debugging Capabilities in CoreSight
Figure 1 shows a simplified schematic of the CoreSight debug architecture.

Figure 1: CoreSight implements debug features through a set of interface specifications and protocols

Figure 1: CoreSight implements debug features through a set of interface specifications and protocols

The debugging features are accessed through the Debug Access Port (DAP). This port provides a single external interface to a CoreSight system, providing real-time access for the debugger to Advanced Microcontroller Bus Architecture (AMBA) system memory and peripheral registers, to all debug configuration registers, and to JTAG scan chains. Access to the memory system is provided without any need to halt the core.

Externally, the DAP is linked to the system via a physical serial interface (a standard JTAG port or a reduced pin-count Serial Wire Debug (SWD) port.)

The DAP accesses the cores through a debug bus, and can also be connected to a system bus for memory-mapped peripheral access and memory download. The Cross Trigger Matrix (CTM) and Cross Trigger Interface (CTI) on the debug bus allow for synchronization of operations between multiple cores, such as synchronous start and stop operations.

Debug components inside the system can be automatically located by the external debugger through a ROM table inside the DAP, which defines the topology of the system.

Figure 2: The architecture governing trace operations in CoreSight

Figure 2: The architecture governing trace operations in CoreSight

Performing Trace Operations in CoreSight
The trace architecture in the CoreSight system (see Figure 2) supports both hardware and software trace. In software trace, software generated debug messages are generated. The Instrumentation Trace Macrocell (ITM) simplifies and shrinks the overhead of such a process, providing a dedicated trace buffer with a deterministic cycle time.

Hardware trace is supported through the Embedded Trace Macrocell (ETM). The ETM monitors processor activity, generating instruction traces (which are very useful in software profiling and analysis of code coverage).

Several trace streams (of different types and/or from different sources) can be multiplexed through the Trace Funnel component. Finally, combined trace streams may be stored in onchip Embedded Trace Buffers, or may leave the chip for the external trace port through the Trace Port Interface Unit (TPIU).

When combined, the debug and trace capabilities of the entire CoreSight architecture are structured as shown in Figure 3.

Figure 3: The complete CoreSight architecture (Source: ARM)

Figure 3: The complete CoreSight architecture (Source: ARM)

As stated above, this architecture is scaled for the requirements of complex, multi-core SoCs. So what about users of the relatively simple ARM Cortex-M microcontrollers? The characteristics of standard CoreSight components – both gate count and power consumption – could in many cases be incompatible with the limitations of the typical applications for which these smaller cores are suitable.

This is why the debug components for ARM Cortex-M cores are configured differently from traditional CoreSight components, while remaining compliant with its communications interface and protocols. (This allows the ARM Cortex-M cores to be integrated into multi-core systems with other ARM processors with a unified trace and debug system.)

Specifically, microcontrollers based on the ARM Cortex-M3 and -M4 cores will have a special version of the TPIU that occupies a smaller area of the core’s die.

The debug features embedded in the ARM Cortex-M cores, then, are a subset of the ARM Coresight Design Kit. These features are shown in Figure 4, an implementation of the CoreSight architecture in a commercial series of ARM Cortex-M4-based microcontrollers from STMicroelectronics, part of its STM32 family.

It should be noted that as the TPIU and ETM require dedicated pins, they are available only in versions with a relatively large package, where the corresponding pins are mapped.

A careful look at Figure 4 reveals additional debug components not included in the simplified debug architecture shown in Figure 1: a Flash Patch and Break point unit (FPB) and a Debug Watch point and Trace unit (DWT).

The FPB can be used either to support hardware break points in both program code and literals, or as a patch unit which may be used to correct software bugs located in the code memory space.

The DWT consists of comparators and counters, supporting hardware watch points, ETM triggers, program counter sampling and data address samplers. The counters also provide the capability to generate profiling information such as sleep cycles, interrupt overhead, clock-per-instruction readings, clock cycle and folded instructions. This information is very useful in software optimisation and real-time application debugging.

The DWT can also generate hardware trace packets sent to the TPIU by the ITM; each packet is time-stamped through a 21-bit counter in the ITM.

Simpler cores such as the ARM Cortex-M0 and ARM Cortex-M0+ series do not support ETM or TPIU. Figure 5 shows how debugging is supported in the STM32L0 family of MCUs from STMicroelectronics, based on the ARM Cortex-M0 core. The Break Point Unit (BPU) supports break points on instruction fetches, and only on the first 512Mb of program memory. It should be said that this memory coverage is ample for the types of applications implemented by STM32L0 MCUs.

The implementation of debugging in some commercial ARM Cortex-M0+-based MCUs is a little different. For instance, the Atmel SAMD20 and NXP Semiconductors LPC800 and LPC11U6x families perform execution trace through a Micro Trace Buffer (MTB). Using this optional, low gate-count, CoreSight-compliant component, the programmer allocates a small part of the system SRAM to function as a trace buffer; the MTB stores instruction-flow information to the reserved SRAM as a circular buffer.

Figure 4: Implementation of CoreSight architecture in an ARM Cortex-M4 MCU from STMicroelectronics

Figure 4: Implementation of CoreSight architecture in an ARM Cortex-M4 MCU from STMicroelectronics

Figure 5: Debug architecture of the STM32L0 family of microcontrollers

Figure 5: Debug architecture of the STM32L0 family of microcontrollers

After the processor is halted, for example at a break point, the debugger can retrieve the trace information via the Single Wire Debug (SWD) connection. The MTB can also support ‘oneshot’ triggering. This trace peripheral is very powerful and allows program trace also on very small cores.

ARM has provided the designer of complex, multi-core SoCs with a powerful debug and trace tool in CoreSight, but it is somewhat over-specified for users of MCUs based on the smaller ARM Cortex-M series of cores. But the flexibility of the CoreSight architecture has allowed for a streamlined implementation in various MCU families from ST and other MCU vendors, giving design engineers access to powerful tools for software optimization without burdening their systems with an excessive gate-count or power-consumption overhead, and without creating an excessively complex tool for the designer to use.