<sub>2025-03-09 @13:16</sub>
#digital-garden #lua #programming #ccsds
# Parsing CCSDS Headers
I've come across code that attempts CCSDS parsing at the byte level. This got me thinking, is byte-by-byte parsing efficient compared to parsing at the word level?
Decoding a CCSDS header by working with words first and then extracting fields **offers both performance (time) and memory (space) benefits**.
I can think of four main reasons why this approach is more efficient. The impact depends on machine architecture and the call frequency of the function doing the decoding.
- **Fewer memory accesses**
- **Avoiding alignment penalties**
- **Faster bitwise manipulation vs. multi-byte assembly**
- **Better space efficiency and cache benefits**
# Less memory accesses
Operating on words instead of bytes reduces memory accesses, depending on machine architecture.
For example, reading a single 32-bit word allows multiple CCSDS fields to be processed at once. This is **more efficient than fetching/reconstructing fields byte-by-byte**, especially when fields span multiple bytes. Since **CCSDS fields are word-aligned**, it makes sense to treat them as such.
Many CPU architectures optimize for word-sized operations, as memory buses, caches, and registers are designed to efficiently fetch aligned words.
By contrast, **byte-by-byte extraction incurs additional overhead**. If a function extracts fields one byte at a time, each operation requires extra shifts and masks, increasing instruction count. On a **resource-constrained embedded system**, this overhead can be costly.
A **bad example** of byte-by-byte access in Lua:
```Lua
function CcsdsHdr._decode(bytes)
return {
bit.band(bit.rshift(bytes[1], 5), 0x07),
bit.band(bit.rshift(bytes[1], 4), 0x01),
bit.band(bit.rshift(bytes[1], 3), 0x01),
bit.bor(bit.lshift(bit.band(bytes[1], 0x07), 8), bytes[2]),
bit.band(bit.rshift(bytes[3], 6), 0x03),
bit.bor(bit.lshift(bit.band(bytes[3], 0x3F), 8), bytes[4]),
bit.bor(bit.lshift(bytes[5], 8), bytes[6]),
}
end
```
# Avoid alignment penalties
Since CCSDS fields are word-aligned, accessing them as words ensures natural memory alignment, preventing unaligned access penalties.
Unaligned accesses on some architectures (like ARM and RISC-V) can cause:
- Performance penalties (multiple memory accesses instead of one)
- CPU traps or faults (depending on system behavior)
Working at the word level ensures aligned memory fetches, avoiding these issues.
# Bit manipulation may be faster than multi-byte assembly
With a single word fetch, fields can be extracted using bit shifts and masks, rather than manually assembling multiple bytes into an integer.
Most CPUs have efficient bitwise instructions (`AND`, `SHIFT`, `OR`) that operate on entire words in a single cycle, making this approach both faster and simpler.
For example:
```Lua
local word1 = bit.bor(bit.lshift(bytes[1], 8), bytes[2])
local apid = bit.band(word1, 0x7FF)
```
This approach **reduces instruction count** compared to manually shifting and combining multiple bytes.
# Better space efficiency and cache benefits
Fetching word-aligned data makes better use of CPU caches, since cache lines are usually word-sized or larger.
Byte-wise operations can cause more cache misses, since:
- Each byte fetch may access a different cache line, increasing latency.
- Frequent memory accesses increase cache thrashing, reducing efficiency.
By reducing memory accesses, word-based parsing helps keep cache utilization optimal.
# Conclusion
While I haven’t benchmarked this yet, the reasoning suggests that word-aligned parsing is a more efficient approach for decoding CCSDS headers.
Instead of extracting fields byte-by-byte, using aligned word operations reduces memory accesses, avoids unaligned penalties, and leverages efficient bitwise operations.
Here’s a better example of a function that processes words instead of bytes, improving both efficiency and readability:
```Lua
function CcsdsHdr._decode(bytes)
local word1 = bit.bor(bit.lshift(bytes[1], 8), bytes[2])
local word2 = bit.bor(bit.lshift(bytes[3], 8), bytes[4])
local word3 = bit.bor(bit.lshift(bytes[5], 8), bytes[6])
return {
bit.band(bit.rshift(word1, 13), 0x07),
bit.band(bit.rshift(word1, 12), 0x01),
bit.band(bit.rshift(word1, 11), 0x01),
bit.band(bit.rshift(word1, 0), 0x7FF),
bit.band(bit.rshift(word2, 14), 0x03),
bit.band(bit.rshift(word2, 0), 0x3FFF),
bit.band(bit.rshift(word3, 0), 0xFFFF),
}
end
```
Or perhaps it doesn't matter and the overhead is minimal at best. I'm going to eat some waffles and watch "The Gorge" w/ Miles Teller and Anya Taylor-Joy. Looks pretty good.