Bulk inserts (executemany)

executemany() is the right tool for bulk inserts and updates. With informix-driver’s pipelined implementation it’s 1.6× faster than IfxPy at 10k+ rows.

The basic shape

rows = [(1, "alice"), (2, "bob"), (3, "carol"), ...]  # 10_000 tuples

with conn:                                       # opens a transaction
    cur = conn.cursor()
    cur.executemany(
        "INSERT INTO users (id, name) VALUES (?, ?)",
        rows,
    )                                            # commits on normal exit

That inserts 10,000 rows in ~161 ms against a loopback Informix container.

The 53× gotcha

# SLOW: 8.5 seconds for 10k rows
conn = informix_db.connect(..., autocommit=True)
cur = conn.cursor()
cur.executemany("INSERT ...", rows)

# FAST: 161 ms for 10k rows
conn = informix_db.connect(..., autocommit=False)  # default
with conn:
    cur = conn.cursor()
    cur.executemany("INSERT ...", rows)

The default is autocommit=False, so this only catches you if you’ve explicitly opted into autocommit.

Why it’s faster than IfxPy

IfxPy’s executemany calls IfxPy.execute(stmt, tuple) internally per row. That’s one round-trip per row — for 10,000 rows on a 80 µs RTT, that’s 800 ms of just waiting for ACKs.

Phase 33 changed our executemany to pipeline the BIND+EXECUTE PDUs:

Send all N BIND PDUs back-to-back without reading responses
Send all N EXECUTE PDUs back-to-back
Drain N response sets at the end

The kernel buffers the outbound bytes; the server processes the BIND/EXECUTE pipeline as fast as it can; we read all responses at the end. One RTT for the whole batch instead of N.

Chunking large batches

For very large batches (millions of rows), break into chunks to bound memory:

def chunks(it, n):
    buf = []
    for x in it:
        buf.append(x)
        if len(buf) >= n:
            yield buf
            buf = []
    if buf:
        yield buf

with conn:
    cur = conn.cursor()
    for batch in chunks(huge_iterator, 10_000):
        cur.executemany("INSERT INTO logs ...", batch)
    # one transaction, many batched executemany calls — one commit at the end

10,000 rows per chunk is a reasonable default; the per-chunk Python memory cost is ~ N × bytes_per_row. For 10k tuples of 5 small fields that’s a few MB.

Returning generated keys

Informix’s SERIAL columns are server-assigned. To get the IDs back, use a single-row insert per row and read cur.lastrowid — executemany doesn’t return per-row IDs.

For batch inserts that need the IDs, the idiomatic pattern is:

INSERT INTO orders (id, customer_id, total)
  SELECT MAX(id) + ROWNUM, ?, ? FROM orders

…or pre-generate IDs from a sequence in your app code. Or accept IfxPy’s same limitation: IfxPy.executemany doesn’t return per-row generated keys either.