<sub>2026-03-14 @1800</sub> #python #simulator # Why My Sampling Loop Needed Drift Compensation I have been building a small host application that connects to an in-game physics server and pulls telemetry data at a fixed rate. The idea is to sample a bundle of signals at some fixed rate then feed those samples into a ML model for anomaly classification. The project is a personal learning environment. Nothing production, nothing high stakes. Just a way to get hands-on with signal capture and ML outside of work. At some point I realized my sampler was slowly falling behind. Not dramatically, not in a way that broke anything immediately, but consistently. The samples were not landing where I expected them to on the timeline. I started digging into my design and realize I was not compensating for drift. I knew what it was doing mechanically. I did not understand why it mattered so much until all of this started happening. It took a few visualizations and some honest tracing through the numbers before it clicked. ## The Naive Approach and Why It Drifts The naive (straightforward) version of a fixed-rate loop might look something like the following. We do some work then we typically sleep for some time at which we want to throttle our loop. Is it good design? Probably not but like everything else it depends on what we're trying to do. ```python while True: do_work() sleep(period) ``` The problem is subtle here or it was for me initially. Assuming our period implies $250ms$ the `sleep(period)` means "sleep $200ms$ starting from right now, **after the work is already done**." If the work itself takes $5ms$, then each cycle actually takes $205ms$. That $5ms$ error is small, but it never cancels. It just stacks. After a thousand samples you are roughly $5$ seconds behind where you expected to be. The situation gets worse when one sample takes much longer than usual. Say the work takes $250ms$ instead of $5ms$. The naive loop finishes that work, then sleeps another full $200ms$ on top of it. The next sample fires at `work + sleep = 450ms` past the last one. That extra $200ms$ is just gone. The loop has no memory of what the original schedule was. ## What Deadline Scheduling Actually Does The alternative is to stop thinking in terms of "how long should I sleep" and start thinking in terms of "when is the next scheduled tick.". So I've been doing this now. ```python t_next += period sleep_until(t_next) ``` The deadline is computed once and advanced by a fixed amount every iteration. It does not re-anchor to whatever time it happens to be when work finishes. The schedule is rigid. The loop just tries to catch up to it. Could I do better? Definitely, but for now it gets me a suitable mechanism for drift compensation around a simulator. When a slow sample runs long and blows past the next deadline, `sleep_until(t_next)` returns immediately because that point on the clock is already in the past. The loop skips the sleep entirely and fires the next sample right away. One cycle later, the deadline is back in the future and the schedule resumes normally. The error is absorbed rather than inherited. A more complete example. ```Python while True: sleep_s = t_next - clock() if sleep_s > 0: time.sleep(sleep_s) # do work here ... t_next += period # drift compensating ``` ## The Moment It Clicked I was still fuzzy on this until I traced through it in a table with explicit columns for work time, sleep time, fires-at, and error. Watching the naive error column grow by $5ms$ every row made the accumulation obvious. Then injecting a slow sample and seeing the naive sleep column stay locked at $200ms$ while the error jumped by $450ms$ made the core problem clear. The naive loop has no mechanism to recover because it does not know where the schedule was supposed to be. The deadline table showed the slow row differently. The sleep column read $0ms$ indicating the sleep was skipped. The next normal row showed the error back near zero. That was the light bulb moment for me. ![[drift_compensation.gif]] ## Host Timing Jitter Is Not a Telemetry Anomaly One question that came up while working through this was whether compensating for the timing drift might accidentally hide real anomalies in the sampled data. If a sampling overrun shows up and the compensation removes it, are you losing signal? The answer depends on the source of the overrun and I realized I was confusing two layers. There's the in-game physics engine and my host-side application. I definitely want to correct for drift in my host application but the in-game physics engine is different. A ton of questions start to arise here and I just don't have the answers right now. For example, here's a couple I've thought of today. - Does the in-game physics engine have drift? How could I tell or measure it? Could I synthetically inject drift into the in-game's physics engine? If so, could I train a model to detect drift or events that lead to drift? Would that even be useful? - In real-time environments there's often some means of drift prevention. If a system is capable of correcting for it how do you detect it? Perhaps you don't care but detecting events that leads to drift still seems practical. Could you break out the drift as telemetry before correcting it? I am still not entirely sure where I land on this for my specific use case. The in-game physics engine runs on a fixed timestep far faster than what I'm sampling. In fact, it runs at 60Hz where I'm only sampling at 5Hz. Still work to do here. --- ## Notes to Myself 1. Is there some variant like `clock_nanosleep` with `TIMER_ABSTIME` in C? How does that convert to what I'm doing with `t_next += period` in Python? 2. Think about what it would look like to include `sampler_latency_ms` as a derived signal for feeding the model. For example, would a CNN treat it as a meaningful channel or would it just be noise in the spatial structure? 3. Consider whether there is value in stress-testing the my sampler (host application) intentionally by injecting synthetic slow cycles and observing how many cycles it takes to recover under different load conditions.