Skip to content

Bulk inserts (executemany)

executemany() is the right tool for bulk inserts and updates. With informix-driver’s pipelined implementation it’s 1.6× faster than IfxPy at 10k+ rows.

rows = [(1, "alice"), (2, "bob"), (3, "carol"), ...] # 10_000 tuples
with conn: # opens a transaction
cur = conn.cursor()
cur.executemany(
"INSERT INTO users (id, name) VALUES (?, ?)",
rows,
) # commits on normal exit

That inserts 10,000 rows in ~161 ms against a loopback Informix container.

# SLOW: 8.5 seconds for 10k rows
conn = informix_db.connect(..., autocommit=True)
cur = conn.cursor()
cur.executemany("INSERT ...", rows)
# FAST: 161 ms for 10k rows
conn = informix_db.connect(..., autocommit=False) # default
with conn:
cur = conn.cursor()
cur.executemany("INSERT ...", rows)

The default is autocommit=False, so this only catches you if you’ve explicitly opted into autocommit.

IfxPy’s executemany calls IfxPy.execute(stmt, tuple) internally per row. That’s one round-trip per row — for 10,000 rows on a 80 µs RTT, that’s 800 ms of just waiting for ACKs.

Phase 33 changed our executemany to pipeline the BIND+EXECUTE PDUs:

  1. Send all N BIND PDUs back-to-back without reading responses
  2. Send all N EXECUTE PDUs back-to-back
  3. Drain N response sets at the end

The kernel buffers the outbound bytes; the server processes the BIND/EXECUTE pipeline as fast as it can; we read all responses at the end. One RTT for the whole batch instead of N.

For very large batches (millions of rows), break into chunks to bound memory:

def chunks(it, n):
buf = []
for x in it:
buf.append(x)
if len(buf) >= n:
yield buf
buf = []
if buf:
yield buf
with conn:
cur = conn.cursor()
for batch in chunks(huge_iterator, 10_000):
cur.executemany("INSERT INTO logs ...", batch)
# one transaction, many batched executemany calls — one commit at the end

10,000 rows per chunk is a reasonable default; the per-chunk Python memory cost is ~ N × bytes_per_row. For 10k tuples of 5 small fields that’s a few MB.

Informix’s SERIAL columns are server-assigned. To get the IDs back, use a single-row insert per row and read cur.lastrowidexecutemany doesn’t return per-row IDs.

For batch inserts that need the IDs, the idiomatic pattern is:

INSERT INTO orders (id, customer_id, total)
SELECT MAX(id) + ROWNUM, ?, ? FROM orders

…or pre-generate IDs from a sequence in your app code. Or accept IfxPy’s same limitation: IfxPy.executemany doesn’t return per-row generated keys either.