Bulk inserts (executemany)
executemany() is the right tool for bulk inserts and updates. With informix-driver’s pipelined implementation it’s 1.6× faster than IfxPy at 10k+ rows.
The basic shape
Section titled “The basic shape”rows = [(1, "alice"), (2, "bob"), (3, "carol"), ...] # 10_000 tuples
with conn: # opens a transaction cur = conn.cursor() cur.executemany( "INSERT INTO users (id, name) VALUES (?, ?)", rows, ) # commits on normal exitThat inserts 10,000 rows in ~161 ms against a loopback Informix container.
The 53× gotcha
Section titled “The 53× gotcha”# SLOW: 8.5 seconds for 10k rowsconn = informix_db.connect(..., autocommit=True)cur = conn.cursor()cur.executemany("INSERT ...", rows)
# FAST: 161 ms for 10k rowsconn = informix_db.connect(..., autocommit=False) # defaultwith conn: cur = conn.cursor() cur.executemany("INSERT ...", rows)The default is autocommit=False, so this only catches you if you’ve explicitly opted into autocommit.
Why it’s faster than IfxPy
Section titled “Why it’s faster than IfxPy”IfxPy’s executemany calls IfxPy.execute(stmt, tuple) internally per row. That’s one round-trip per row — for 10,000 rows on a 80 µs RTT, that’s 800 ms of just waiting for ACKs.
Phase 33 changed our executemany to pipeline the BIND+EXECUTE PDUs:
- Send all N BIND PDUs back-to-back without reading responses
- Send all N EXECUTE PDUs back-to-back
- Drain N response sets at the end
The kernel buffers the outbound bytes; the server processes the BIND/EXECUTE pipeline as fast as it can; we read all responses at the end. One RTT for the whole batch instead of N.
Chunking large batches
Section titled “Chunking large batches”For very large batches (millions of rows), break into chunks to bound memory:
def chunks(it, n): buf = [] for x in it: buf.append(x) if len(buf) >= n: yield buf buf = [] if buf: yield buf
with conn: cur = conn.cursor() for batch in chunks(huge_iterator, 10_000): cur.executemany("INSERT INTO logs ...", batch) # one transaction, many batched executemany calls — one commit at the end10,000 rows per chunk is a reasonable default; the per-chunk Python memory cost is ~ N × bytes_per_row. For 10k tuples of 5 small fields that’s a few MB.
Returning generated keys
Section titled “Returning generated keys”Informix’s SERIAL columns are server-assigned. To get the IDs back, use a single-row insert per row and read cur.lastrowid — executemany doesn’t return per-row IDs.
For batch inserts that need the IDs, the idiomatic pattern is:
INSERT INTO orders (id, customer_id, total) SELECT MAX(id) + ROWNUM, ?, ? FROM orders…or pre-generate IDs from a sequence in your app code. Or accept IfxPy’s same limitation: IfxPy.executemany doesn’t return per-row generated keys either.