Skip to content

The buffered reader

The bulk-fetch gap against IfxPy stayed stubbornly at ~2× from Phase 36 through Phase 38. Two phases of codec optimization shrank it by a few percent each. Phase 39 — a connection-scoped buffered reader — closed it from 2.4× to ~1.05–1.15× in about thirty minutes of code plus ten minutes of architectural debugging.

This page is about both the technical change and the failure mode that hid the win for two phases.

After Phase 38, profiling a 100,000-row fetch showed:

CategorySelf time% of wall clock
I/O machinery555 ms66%
Codec205 ms24%
Other~80 ms10%

The headline “I/O dominated” was true. The interesting half is the breakdown of that 555 ms:

  • Actual recv() syscalls: ~153 ms
  • Python wrapper overhead: ~400 ms

That ~400 ms was our own buffer abstraction — a read_exact loop that called recv() per fragment, reassembled fragments via bytes.join, and traversed two layers of cursor wrappers per call. For 100,000 rows that’s 451,402 calls to read_exact, each one paying Python wrapper cost the kernel didn’t cause.

The kernel was doing maybe 25–30 ms of work. The other 130 ms of the gap-vs-IfxPy was friction we had introduced ourselves.

Both asyncpg (in buffer.pyx) and psycopg3 (in pq.PGconn) put a single growing read buffer on the protocol/connection object. The parser indexes into it via struct.unpack_from(buf, offset) rather than slicing copies. Refills happen via one large recv(64K) rather than many small recv()s for individual fields.

Phase 39 ports that pattern to informix-driver. The state machine:

┌───────────────────────────────┐
│ IfxSocket │
│ ─ socket: socket.socket │
│ ─ buf: bytearray (growable) │
│ ─ offset: int (read cursor) │
│ │
│ recv(64K) when buf exhausted │
└───────────────────────────────┘
│ reads via read_exact(n)
┌───────────────────────────────┐
│ SocketReader (per-PDU) │
│ ─ short-lived view │
│ ─ no buffer of its own │
└───────────────────────────────┘

The reader is a parser-view. The buffer outlives the reader. When the parser asks for read_short(), the reader calls socket.read_exact(2), which slices two bytes out of the bytearray at offset and advances. If the bytearray runs out, socket.recv(64K) refills it.

Result: one recv() per ~64 KB of incoming data, not per field.

The architectural mistake the first pass got wrong

Section titled “The architectural mistake the first pass got wrong”

The natural thing to call this is “BufferedSocketReader”. The natural thing to do is put the bytearray on the reader. That’s what I did first.

Then test_executemany_1000_rows hung. The kernel stack via cat /proc/PID/wchan said wait_woken — process blocked in recv() waiting for bytes that weren’t coming.

The bug was foreseeable, and it was architectural rather than implementational. Phase 33’s pipelined executemany sends N BIND+EXECUTE PDUs back-to-back and drains responses afterward. Each cursor read constructs a new reader instance. When my reader did recv(64K) and pulled in 600 bytes — 200 bytes for response 1, 400 bytes for response 2 — it consumed bytes for response 2 and then was destroyed. The next reader called recv(), the kernel buffer was empty, and we waited forever for bytes the kernel had already given to a dead reader.

The fix moved the buffer one level down. The bytearray and offset cursor live on IfxSocket (the connection-scoped wrapper) — readers are short-lived parser-views, the buffer outlives them.

# WRONG (first pass) — buffer scoped to reader
class BufferedSocketReader:
def __init__(self, sock):
self.sock = sock
self.buf = bytearray() # ← dies with the reader
self.offset = 0
# RIGHT (Phase 39) — buffer scoped to connection
class IfxSocket:
def __init__(self, sock):
self.sock = sock
self._read_buf = bytearray() # ← outlives all readers
self._read_offset = 0
def read_exact(self, n):
if self._read_offset + n > len(self._read_buf):
self._refill()
...

asyncpg and psycopg3 both place the buffer on the protocol/connection object. The architectural template was sitting in front of me before I started; I built the wrong shape anyway because “buffered reader” implies the buffer is on the reader.

It is not. The reader is a view. The buffer is state.

A/B-measured against the same Docker container, warmed cache, only the env flag differing:

WorkloadPhase 38Phase 39Δ
select_scaling_10002.901 ms1.716 ms−41%
select_scaling_1000024.317 ms16.084 ms−34%
select_scaling_100000250.363 ms168.982 ms−32%

Re-running the IfxPy comparison after Phase 39:

WorkloadIfxPy 2.0.7 (C)informix-driver Phase 39Ratio
select_scaling_10001.637 ms1.716 ms1.05×
select_scaling_1000015.07 ms16.08 ms1.07×
select_scaling_100000147.4 ms169.0 ms1.15×

The 2.4× steady-state gap that existed before Phase 37 is now within 5–15% of the C driver, and the lower bound may already be within IfxPy’s own measurement noise (its IQR on the 100k workload is 21%; ours is 0.2%).

The buffered reader ships enabled by default in version 2026.05.05.12. To opt out (debugging, regressing a workload, A/B-measuring your own code):

Terminal window
IFX_BUFFERED_READER=0 python my_app.py

The flag is read once at connection construction. Existing connections in a pool aren’t affected by changing the env at runtime — close and reopen the pool to flip behavior.

The general pattern: what’s visible gets optimization attention; what’s invisible gets written off as irreducible.

The codec is visible — there’s a loop, a _decode_varchar function, a struct.unpack call. You can read the inner loop and reason about it. Phases 37 and 38 attacked it, both got modest wins.

The I/O machinery looked invisible. _socket.read_exact is eight lines. The cursor’s _SocketReader wrapper is twelve. The framing reads — read_short, read_int, read_exact(payload_size) — are one-liners. What could be slow about that?

It was carrying ~30% of total wall time. Two phases of changelogs implicitly blamed “the protocol” for the remaining gap. The actual culprit was a few lines of bytes.join in a wrapper from Phase 1 that nobody had revisited.

The lesson is small and easy to state: a profile turns vibes into an attack surface. Write the closing paragraph after you’ve measured, not before.