In embedded systems, the terms concurrency and parallelism are often used interchangeably, but they describe fundamentally different concepts. Misunderstanding them can lead to subtle bugs, wasted CPU cycles, or systems that fail under load. This guide clarifies both ideas, shows how they apply to resource-constrained microcontrollers, and provides practical steps to design robust concurrent systems.
Why the Distinction Matters in Embedded Systems
Embedded developers frequently juggle multiple tasks: reading sensors, processing data, updating displays, and communicating over buses like I2C or CAN. On a single-core microcontroller, only one instruction executes at a time. Yet we still need the system to appear responsive—handling a button press while a motor is running. That is concurrency: managing multiple tasks in overlapping time intervals, not necessarily simultaneously. Parallelism, by contrast, requires multiple execution units (e.g., multi-core CPUs or dedicated hardware peripherals) to run tasks at the exact same instant. In many embedded projects, concurrency is a design necessity, while parallelism is a performance optimization available only on certain hardware.
Why does this distinction matter? Consider a typical IoT sensor node. It must sample an analog input every 10 ms, transmit data via Wi-Fi every second, and respond to a user button. If you naively treat these as parallel tasks on a single core, you might attempt to run them in separate infinite loops—but the CPU can only execute one loop at a time, so the sensor sampling will be delayed during Wi-Fi transmission. The correct approach is to design concurrency using a scheduler (like a real-time operating system or a cooperative round-robin loop) that interleaves the tasks. Parallelism would only help if you had a second core to offload the Wi-Fi stack, but even then, concurrency management is still needed to coordinate shared resources.
Another common mistake is assuming that using an RTOS automatically gives you parallelism. An RTOS provides threads and synchronization primitives, but on a single core, threads still run one at a time. The OS merely switches between them quickly, creating the illusion of simultaneity. True parallelism requires hardware support. Understanding this helps you set realistic performance expectations and avoid over-engineering solutions with complex thread pools when a simple state machine would suffice.
The Core Difference in One Sentence
Concurrency is about dealing with lots of things at once (structuring a program to handle multiple tasks), while parallelism is about doing lots of things at once (executing multiple tasks simultaneously). In embedded systems, you almost always need concurrency; parallelism is a bonus when hardware permits.
Concurrency Models: From Bare-Metal to RTOS
Concurrency in embedded systems can be implemented in several ways, each with trade-offs in complexity, memory footprint, and determinism. The three most common models are the super loop (bare-metal), cooperative multitasking, and preemptive multitasking via an RTOS.
Super Loop with Interrupts
The simplest form of concurrency is a main loop that polls flags set by interrupt service routines (ISRs). For example, a timer interrupt sets a 'sample_sensor' flag every 10 ms; the main loop checks the flag, reads the sensor, and clears it. This model is easy to implement and has minimal overhead, but it can suffer from priority inversion if a long-running task delays the loop. It works well for systems with few tasks and loose timing requirements.
Cooperative Multitasking
In cooperative multitasking, tasks voluntarily yield control back to a scheduler. Each task runs to completion or until a predefined yield point. This model avoids race conditions because only one task runs at a time, and yields happen at known points. However, a misbehaving task that never yields can hang the entire system. Cooperative schedulers are common in very resource-constrained devices (e.g., 8-bit PIC or AVR) where an RTOS would be too heavy.
Preemptive Multitasking with an RTOS
A real-time operating system (RTOS) like FreeRTOS or Zephyr provides preemptive scheduling: the kernel can interrupt a running task to switch to a higher-priority task. This gives more responsive concurrency, especially for time-critical operations. The trade-off is increased complexity—you must manage shared resources with mutexes, semaphores, and queues to prevent race conditions. Memory overhead is also higher, typically requiring a few KB of RAM for task stacks. For medium-to-complex projects (e.g., a drone flight controller or a smart home hub), an RTOS is often the right choice.
Comparison Table
| Model | Pros | Cons | Typical Use Case |
|---|---|---|---|
| Super Loop + ISRs | Minimal code, low RAM, predictable | Poor for many tasks, ISR latency | Simple sensors, LED controllers |
| Cooperative | No race conditions, low overhead | Task can block entire system | 8-bit MCUs, low-power devices |
| Preemptive RTOS | Responsive, priority-based | Higher RAM, sync complexity | Complex IoT, robotics, automotive |
Parallelism in Embedded: When and How to Use It
Parallelism becomes relevant when your hardware offers multiple cores or specialized accelerators. Many modern microcontrollers (e.g., dual-core ESP32, i.MX RT series, or STM32H7 with dual Cortex-M7/M4) allow true simultaneous execution. However, parallelism introduces new challenges: cache coherency, shared memory access, and load balancing. The decision to use parallelism should be driven by performance bottlenecks, not by a desire to use all cores.
Identifying Parallelism Opportunities
Start by profiling your system to find CPU-bound tasks that can run independently. For example, in a digital signal processing application, one core might handle FFT computations while another manages user interface and communication. Communication between cores should be minimized to avoid contention. Use message passing or shared memory with atomic operations. Many dual-core chips provide inter-processor interrupts (IPIs) for efficient signaling.
Common Parallelism Patterns
One pattern is the asymmetric multiprocessing (AMP) approach, where each core runs a separate application or RTOS instance. This is simpler to implement but requires careful partitioning of peripherals and memory. Another is symmetric multiprocessing (SMP), where a single OS instance manages both cores, automatically scheduling tasks. SMP is more flexible but demands OS support and can suffer from lock contention. For most embedded projects, AMP is easier to get right because it avoids complex synchronization.
When Parallelism Does Not Help
Parallelism does not speed up I/O-bound tasks (waiting for a sensor or network packet) because the bottleneck is external. It also adds overhead for task creation and synchronization. If your system is already meeting timing requirements on a single core, adding parallelism may increase power consumption and code complexity without tangible benefit. Always measure before parallelizing.
Designing a Concurrent System: A Step-by-Step Approach
Building a robust concurrent system requires deliberate design, especially in embedded environments where resources are scarce. Follow these steps to avoid common pitfalls.
Step 1: Identify Concurrent Activities
List all tasks your system must perform, along with their timing constraints (deadlines, periods, and maximum latencies). Group tasks that share resources or data. For example, a temperature logger might have: read sensor (10 ms period), update LCD (100 ms period), log to SD card (1 s period), and handle button press (asynchronous).
Step 2: Choose a Concurrency Model
Based on the number of tasks and their urgency, select a model. If you have fewer than 5 tasks with loose timing, a super loop or cooperative scheduler may suffice. For more tasks or hard real-time constraints, use an RTOS. Document the decision and its rationale.
Step 3: Define Task Priorities and Stack Sizes
In an RTOS, assign priorities based on deadlines. Higher priority for tasks that must meet strict timing. Allocate stack sizes carefully—too small causes stack overflow, too large wastes RAM. Use tools like FreeRTOS's stack watermark to tune sizes.
Step 4: Manage Shared Resources
Identify all shared data (global variables, peripheral registers, buffers). Protect each with a mutex, semaphore, or critical section. Prefer message queues for passing data between tasks to avoid direct sharing. For ISRs, use only lock-free mechanisms (e.g., atomic flags or ring buffers).
Step 5: Test Under Worst-Case Load
Simulate the maximum expected workload (e.g., all interrupts firing at highest rate, all tasks active). Monitor for missed deadlines, stack overflows, and priority inversions. Use a logic analyzer or trace tool to verify timing.
Common Pitfalls and How to Avoid Them
Even experienced developers fall into concurrency traps. Here are the most frequent ones in embedded systems, along with mitigations.
Race Conditions on Shared Variables
When two tasks or an ISR and a task modify a variable without synchronization, the result can be corrupted. For example, a 16-bit variable may be written by an ISR and read by a task; if the write is interrupted after the first byte, the task sees a half-updated value. Always use atomic operations or disable interrupts around critical sections. For multi-byte variables, use mutexes or copy to a temporary buffer.
Deadlocks
A deadlock occurs when two tasks each hold a resource the other needs. For instance, Task A locks mutex 1 and waits for mutex 2, while Task B locks mutex 2 and waits for mutex 1. To avoid this, always acquire mutexes in a consistent order (lock ordering). Use a timeout when acquiring locks to detect potential deadlocks.
Priority Inversion
When a high-priority task is blocked waiting for a resource held by a low-priority task, and a medium-priority task preempts the low-priority task, the high-priority task is indirectly delayed. RTOSes like FreeRTOS offer priority inheritance to mitigate this: the low-priority task temporarily inherits the high priority while holding the resource. Enable this feature for mutexes that protect critical resources.
Starvation
A task may never get CPU time if higher-priority tasks always run. This is common in preemptive systems with poorly chosen priorities. Use a scheduling policy that ensures all tasks make progress, such as round-robin for same-priority tasks, or a periodic task that yields.
Decision Checklist: Concurrency vs. Parallelism for Your Project
Use this checklist to decide which approach fits your embedded system. Answer each question honestly to avoid over-engineering.
Checklist Questions
- How many concurrent tasks do you have? Fewer than 5? Consider super loop or cooperative. More than 5? Consider RTOS.
- Do any tasks have hard real-time deadlines? If yes, preemptive RTOS is recommended for predictable scheduling.
- Is your MCU single-core or multi-core? Single-core: concurrency only. Multi-core: parallelism possible but not mandatory.
- Is a task CPU-bound? If it uses 100% CPU for long periods (e.g., heavy math), parallelism on another core may help.
- What is your power budget? Parallelism often increases power consumption. For battery devices, concurrency with sleep modes may be better.
- How much RAM can you spare? RTOS stacks consume RAM. If memory is tight (<16 KB), consider cooperative or super loop.
- Do you need to reuse code across projects? An RTOS provides a standardized API, making porting easier.
Mini-FAQ
Q: Can I use both concurrency and parallelism in the same system? Yes. For example, on a dual-core MCU, you might run an RTOS on one core (managing concurrency for I/O tasks) and a bare-metal loop on the other (handling a CPU-intensive algorithm). Communication between cores must be carefully synchronized.
Q: Does using an RTOS guarantee real-time behavior? No. Real-time performance depends on task priorities, interrupt handling, and system load. An RTOS only provides the mechanisms; you must design the system correctly.
Q: What is the simplest way to add concurrency to an existing super loop? Start by moving time-critical operations into timer ISRs and using flags. Then refactor the main loop into a cooperative scheduler with explicit yield points. This minimizes code changes.
Synthesis: Building Robust Systems with the Right Mental Model
Understanding the difference between concurrency and parallelism is not just academic—it directly impacts the reliability and performance of your embedded systems. Concurrency is about structuring your code to handle multiple tasks, while parallelism is about executing them simultaneously on suitable hardware. Most embedded projects need concurrency; parallelism is a tool to be applied only when profiling shows a clear bottleneck that cannot be solved otherwise.
Start by analyzing your system's tasks and timing requirements. Choose a concurrency model that matches your resource constraints and complexity. If you opt for an RTOS, invest time in learning synchronization primitives and testing under worst-case conditions. For parallelism, consider AMP for simpler implementation and reserve SMP for when you have OS support and need dynamic load balancing.
Finally, remember that the goal is not to use every core or thread available, but to build a system that meets its requirements reliably. A simple, well-tested super loop often outperforms a complex RTOS in both stability and maintainability. Use the checklist in this guide to make informed decisions, and always validate with real hardware under expected loads.
Comments (0)
Please sign in to post a comment.
Don't have an account? Create one
No comments yet. Be the first to comment!