Real-Time Embedded Systems Lab
School of Computing, Informatics, and Decision Systems Engineering
Ira A. Fulton School of Engineering, Arizona State University
Home Contact us Site Map
Faculty People



Modeling and Analysis of Embedded Software in Multicore Systems

Multicore processor has been applied widely in server and desktop systems. Given its advantages of high performance and energy efficiency, the technology has received a great interest in embedded application domain. It may look like a straightforward practice to port multi-threaded embedded software in multicore processor platforms as long as the underline RTOS is SMP-ready. However, to run multi-thread embedded software correctly, any potential data races, caused by concurrent accesses of shared data, must be detected and identified. To fully utilize the computation power of multi-core architecture, thread synchronization and scheduling mechanisms should be augmented for the reduction of operation overhead and execution interference among processor cores. Finally, any nondeterministic execution resulted from timing and scheduling variations should be confined.

Model of Embedded Program Execution

Most embedded applications are constructed with multiple threads to handle concurrent events. Threads are synchronized for proper sharing of resource and data. To model the concurrent execution of threads and the interaction with external environment, we adopt the following model:

T1 … Tn : Application threads
T0 : System thread for OS and device activities
Event e is generated from the execution of event function f – Ex: f ® e
                     Thread enters f ”  -- Logical order is decided
                     “Event e happens” -- e is globally posted

     Embedded program execution is presented by a graph G = <V, E>  where V is the set of events and (ea, eb)ÎE  iff ea happens before eb

Analysis of Probe Effect

Software instrumentation has been a convenient and portable approach for dynamic analysis, debugging, or profiling of program execution. Unfortunately, instrumentation may change the temporal behavior of multi-threaded program execution and result in different ordering of thread operations, which is called probe effect. While the approaches to reduce instrumentation overhead, to enable reproducible execution, and to enforce deterministic threading have been studied, no research has yet answered if an instrumented execution has the same behavior as the program execution without any instrumentation and how the execution gets changed if there were any. In this research, we propose a simulation-based analysis to detect the changes of execution event ordering that are induced by instrumentation operations. The execution model of a program is constructed from the trace of instrumented program execution and is used in a simulation analysis where instrumentation overhead is removed. As a consequence, we can infer the ordering of events in the original program execution and verify the existence of probe effect resulted from instrumentation.  

On the Existence of Probe Effect in Multi-threaded Embedded Programs -- EMSOFT 2014 (pdf)

Race Detection for C/C++ Embedded Programs

To detect races precisely without false alarms, vector clock based race detectors can be applied if the overhead in time and space can be contained. This is indeed the case for the applications developed in object-oriented programming language where objects can be used as detection units. On the other hand, embedded applications, often written in C/C++, necessitate the use of fine-grained detection approaches that lead to significant execution overhead. In this research, we look into a dynamic granularity algorithm for vector clock based data race detectors. The algorithm exploits the fact that neighboring memory locations tend to be accessed together and can share the same vector clock archiving dynamic granularity of detection. The proposed heuristic for sharing vector clock is simple but robust, can result in performance improvement in time and space, and is with minimal loss of detection accuracy. The algorithm is implemented on top of FastTrack and uses Intel PIN tool for dynamic binary instrumentation. Experiment results on benchmarks show that, in average, the race detection tool using the dynamic granularity algorithm is 43% faster than the FastTrack with byte granularity and is with 60% less memory usage. Comparison with existing industrial tools, Valgrind DRD and Intel Inspector XE, also suggests that the proposed dynamic granularity approach is very viable.

Efficient Data Race Detection for C/C++ Programs Using Dynamic Granularity -- IPDPS 2014 (pdf)

Profiling with Minimal Instrumentation

For program optimization and debugging, dynamic analysis tools, e.g., profiler, data race detector, are widely used. To gather execution information, software instrumentation is often employed for its portability and convenience. Unfortunately, instrumentation overhead may change the execution of a program and lead to distorted analysis results, i.e., probe effect. In embedded software which usually consists of multiple threads and external inputs, program executions are determined by the timing of external inputs and the order of thread executions. Hence, probe effect incurred in an analysis of embedded software will be more prominent than in desktop software. This research investigates a reliable dynamic analysis method for embedded software using deterministic replay. The idea is to record thread executions and I/O with minimal record overhead and to apply dynamic analysis tools in replayed execution. For this end, we have developed a record/replay framework called P-Replayer, based on Lamport’s happens-before relation. Our experimental results show that dynamic analyses can be managed in the replay execution enabled by P-Replayer as if there is no instrumentation on the program.

Dynamic Analysis of Embedded Software using Execution Replay -- ISORC 2014 (pdf)

Replay Debugger

The non-deterministic behavior of multi-threaded embedded software makes cyclic debugging difficult. Even with the same input data, consecutive runs may result in different executions and reproducing the same bug is itself a challenge. Despite the fact that several approaches have been proposed for deterministic replay, none of them attends to the capabilities and functionalities that replay can comprise for better debugging. We introduce a practical replay mechanism for multi-threaded embedded software. The Replay Debugger, based on Lamport clock, offers a user controlled debugging environment in which the program execution follows the identical partially ordered happened-before dependency among threads and IO events as that of the recorded run. With the order of thread synchronizations assured, users can focus their debugging effort in the program behavior of any threads while having a comprehension of thread-level concurrency. Using a set of benchmark programs, experiment results of a prototyped implementation show that, in average, the software based approach incurs a small probe effect of 3.3% in its record stage.

Replay Debugging for Multi-threaded Embedded Software -- EUC 2010 (pdf)


Our Research
Our Goal
We do research on  Real-time Java, Embedded Software and Systems, Smart homes, and so on...
To build reliable real-time embedded system, contribute computer engineering community and make our future better.

 Copyright [2011] [RTES, School of Computing, Informatics, and Decision Systems Engineering. Arizona State University]. All rights reserved