The buffered reader
The bulk-fetch gap against IfxPy stayed stubbornly at ~2× from Phase 36 through Phase 38. Two phases of codec optimization shrank it by a few percent each. Phase 39 — a connection-scoped buffered reader — closed it from 2.4× to ~1.05–1.15× in about thirty minutes of code plus ten minutes of architectural debugging.
This page is about both the technical change and the failure mode that hid the win for two phases.
The lever we couldn’t see
Section titled “The lever we couldn’t see”After Phase 38, profiling a 100,000-row fetch showed:
| Category | Self time | % of wall clock |
|---|---|---|
| I/O machinery | 555 ms | 66% |
| Codec | 205 ms | 24% |
| Other | ~80 ms | 10% |
The headline “I/O dominated” was true. The interesting half is the breakdown of that 555 ms:
- Actual
recv()syscalls: ~153 ms - Python wrapper overhead: ~400 ms
That ~400 ms was our own buffer abstraction — a read_exact loop that called recv() per fragment, reassembled fragments via bytes.join, and traversed two layers of cursor wrappers per call. For 100,000 rows that’s 451,402 calls to read_exact, each one paying Python wrapper cost the kernel didn’t cause.
The kernel was doing maybe 25–30 ms of work. The other 130 ms of the gap-vs-IfxPy was friction we had introduced ourselves.
The architecture pattern
Section titled “The architecture pattern”Both asyncpg (in buffer.pyx) and psycopg3 (in pq.PGconn) put a single growing read buffer on the protocol/connection object. The parser indexes into it via struct.unpack_from(buf, offset) rather than slicing copies. Refills happen via one large recv(64K) rather than many small recv()s for individual fields.
Phase 39 ports that pattern to informix-driver. The state machine:
┌───────────────────────────────┐ │ IfxSocket │ │ ─ socket: socket.socket │ │ ─ buf: bytearray (growable) │ │ ─ offset: int (read cursor) │ │ │ │ recv(64K) when buf exhausted │ └───────────────────────────────┘ ▲ │ reads via read_exact(n) │ ┌───────────────────────────────┐ │ SocketReader (per-PDU) │ │ ─ short-lived view │ │ ─ no buffer of its own │ └───────────────────────────────┘The reader is a parser-view. The buffer outlives the reader. When the parser asks for read_short(), the reader calls socket.read_exact(2), which slices two bytes out of the bytearray at offset and advances. If the bytearray runs out, socket.recv(64K) refills it.
Result: one recv() per ~64 KB of incoming data, not per field.
The architectural mistake the first pass got wrong
Section titled “The architectural mistake the first pass got wrong”The natural thing to call this is “BufferedSocketReader”. The natural thing to do is put the bytearray on the reader. That’s what I did first.
Then test_executemany_1000_rows hung. The kernel stack via cat /proc/PID/wchan said wait_woken — process blocked in recv() waiting for bytes that weren’t coming.
The bug was foreseeable, and it was architectural rather than implementational. Phase 33’s pipelined executemany sends N BIND+EXECUTE PDUs back-to-back and drains responses afterward. Each cursor read constructs a new reader instance. When my reader did recv(64K) and pulled in 600 bytes — 200 bytes for response 1, 400 bytes for response 2 — it consumed bytes for response 2 and then was destroyed. The next reader called recv(), the kernel buffer was empty, and we waited forever for bytes the kernel had already given to a dead reader.
The fix moved the buffer one level down. The bytearray and offset cursor live on IfxSocket (the connection-scoped wrapper) — readers are short-lived parser-views, the buffer outlives them.
# WRONG (first pass) — buffer scoped to readerclass BufferedSocketReader: def __init__(self, sock): self.sock = sock self.buf = bytearray() # ← dies with the reader self.offset = 0
# RIGHT (Phase 39) — buffer scoped to connectionclass IfxSocket: def __init__(self, sock): self.sock = sock self._read_buf = bytearray() # ← outlives all readers self._read_offset = 0
def read_exact(self, n): if self._read_offset + n > len(self._read_buf): self._refill() ...asyncpg and psycopg3 both place the buffer on the protocol/connection object. The architectural template was sitting in front of me before I started; I built the wrong shape anyway because “buffered reader” implies the buffer is on the reader.
It is not. The reader is a view. The buffer is state.
The numbers
Section titled “The numbers”A/B-measured against the same Docker container, warmed cache, only the env flag differing:
| Workload | Phase 38 | Phase 39 | Δ |
|---|---|---|---|
select_scaling_1000 | 2.901 ms | 1.716 ms | −41% |
select_scaling_10000 | 24.317 ms | 16.084 ms | −34% |
select_scaling_100000 | 250.363 ms | 168.982 ms | −32% |
Re-running the IfxPy comparison after Phase 39:
| Workload | IfxPy 2.0.7 (C) | informix-driver Phase 39 | Ratio |
|---|---|---|---|
select_scaling_1000 | 1.637 ms | 1.716 ms | 1.05× |
select_scaling_10000 | 15.07 ms | 16.08 ms | 1.07× |
select_scaling_100000 | 147.4 ms | 169.0 ms | 1.15× |
The 2.4× steady-state gap that existed before Phase 37 is now within 5–15% of the C driver, and the lower bound may already be within IfxPy’s own measurement noise (its IQR on the 100k workload is 21%; ours is 0.2%).
What the feature flag does
Section titled “What the feature flag does”The buffered reader ships enabled by default in version 2026.05.05.12. To opt out (debugging, regressing a workload, A/B-measuring your own code):
IFX_BUFFERED_READER=0 python my_app.pyThe flag is read once at connection construction. Existing connections in a pool aren’t affected by changing the env at runtime — close and reopen the pool to flip behavior.
What we learned
Section titled “What we learned”The general pattern: what’s visible gets optimization attention; what’s invisible gets written off as irreducible.
The codec is visible — there’s a loop, a _decode_varchar function, a struct.unpack call. You can read the inner loop and reason about it. Phases 37 and 38 attacked it, both got modest wins.
The I/O machinery looked invisible. _socket.read_exact is eight lines. The cursor’s _SocketReader wrapper is twelve. The framing reads — read_short, read_int, read_exact(payload_size) — are one-liners. What could be slow about that?
It was carrying ~30% of total wall time. Two phases of changelogs implicitly blamed “the protocol” for the remaining gap. The actual culprit was a few lines of bytes.join in a wrapper from Phase 1 that nobody had revisited.
The lesson is small and easy to state: a profile turns vibes into an attack surface. Write the closing paragraph after you’ve measured, not before.
Read more
Section titled “Read more”- Architecture overview → — where the buffered reader sits in the layer stack.
- Phase log → — the full progression from Phase 1 through Phase 39+.
- “The 156 Milliseconds I’d Been Hand-Waving About” — Claude’s reflection on the session in which Phase 39 shipped, including the two pushbacks that triggered the work.