<sub>2025-03-09 @13:16</sub> #digital-garden #lua #programming #ccsds # Parsing CCSDS Headers I've come across code that attempts CCSDS parsing at the byte level. This got me thinking, is byte-by-byte parsing efficient compared to parsing at the word level? Decoding a CCSDS header by working with words first and then extracting fields **offers both performance (time) and memory (space) benefits**. I can think of four main reasons why this approach is more efficient. The impact depends on machine architecture and the call frequency of the function doing the decoding. - **Fewer memory accesses** - **Avoiding alignment penalties** - **Faster bitwise manipulation vs. multi-byte assembly** - **Better space efficiency and cache benefits** # Less memory accesses Operating on words instead of bytes reduces memory accesses, depending on machine architecture. For example, reading a single 32-bit word allows multiple CCSDS fields to be processed at once. This is **more efficient than fetching/reconstructing fields byte-by-byte**, especially when fields span multiple bytes. Since **CCSDS fields are word-aligned**, it makes sense to treat them as such. Many CPU architectures optimize for word-sized operations, as memory buses, caches, and registers are designed to efficiently fetch aligned words. By contrast, **byte-by-byte extraction incurs additional overhead**. If a function extracts fields one byte at a time, each operation requires extra shifts and masks, increasing instruction count. On a **resource-constrained embedded system**, this overhead can be costly. A **bad example** of byte-by-byte access in Lua: ```Lua function CcsdsHdr._decode(bytes) return { bit.band(bit.rshift(bytes[1], 5), 0x07), bit.band(bit.rshift(bytes[1], 4), 0x01), bit.band(bit.rshift(bytes[1], 3), 0x01), bit.bor(bit.lshift(bit.band(bytes[1], 0x07), 8), bytes[2]), bit.band(bit.rshift(bytes[3], 6), 0x03), bit.bor(bit.lshift(bit.band(bytes[3], 0x3F), 8), bytes[4]), bit.bor(bit.lshift(bytes[5], 8), bytes[6]), } end ``` # Avoid alignment penalties Since CCSDS fields are word-aligned, accessing them as words ensures natural memory alignment, preventing unaligned access penalties. Unaligned accesses on some architectures (like ARM and RISC-V) can cause: - Performance penalties (multiple memory accesses instead of one) - CPU traps or faults (depending on system behavior) Working at the word level ensures aligned memory fetches, avoiding these issues. # Bit manipulation may be faster than multi-byte assembly With a single word fetch, fields can be extracted using bit shifts and masks, rather than manually assembling multiple bytes into an integer. Most CPUs have efficient bitwise instructions (`AND`, `SHIFT`, `OR`) that operate on entire words in a single cycle, making this approach both faster and simpler. For example: ```Lua local word1 = bit.bor(bit.lshift(bytes[1], 8), bytes[2]) local apid = bit.band(word1, 0x7FF) ``` This approach **reduces instruction count** compared to manually shifting and combining multiple bytes. # Better space efficiency and cache benefits Fetching word-aligned data makes better use of CPU caches, since cache lines are usually word-sized or larger. Byte-wise operations can cause more cache misses, since: - Each byte fetch may access a different cache line, increasing latency. - Frequent memory accesses increase cache thrashing, reducing efficiency. By reducing memory accesses, word-based parsing helps keep cache utilization optimal. # Conclusion While I haven’t benchmarked this yet, the reasoning suggests that word-aligned parsing is a more efficient approach for decoding CCSDS headers. Instead of extracting fields byte-by-byte, using aligned word operations reduces memory accesses, avoids unaligned penalties, and leverages efficient bitwise operations. Here’s a better example of a function that processes words instead of bytes, improving both efficiency and readability: ```Lua function CcsdsHdr._decode(bytes) local word1 = bit.bor(bit.lshift(bytes[1], 8), bytes[2]) local word2 = bit.bor(bit.lshift(bytes[3], 8), bytes[4]) local word3 = bit.bor(bit.lshift(bytes[5], 8), bytes[6]) return { bit.band(bit.rshift(word1, 13), 0x07), bit.band(bit.rshift(word1, 12), 0x01), bit.band(bit.rshift(word1, 11), 0x01), bit.band(bit.rshift(word1, 0), 0x7FF), bit.band(bit.rshift(word2, 14), 0x03), bit.band(bit.rshift(word2, 0), 0x3FFF), bit.band(bit.rshift(word3, 0), 0xFFFF), } end ``` Or perhaps it doesn't matter and the overhead is minimal at best. I'm going to eat some waffles and watch "The Gorge" w/ Miles Teller and Anya Taylor-Joy. Looks pretty good.