Arithmetic 2258 declarations in 197 modules

FormalRV.Arithmetic.Adder

FormalRV/Arithmetic/Adder.lean

FormalRV.Arithmetic.Adder ───────────────────────── An ADDER INTERFACE: a layout-parametric, reversible, in-place binary adder over the `Gate` IR. The point is to let higher gadgets (windowed multiplication, …) compose ANY adder without knowing its internal qubit layout — the interface exposes index functions saying WHERE each operand lives relative to a base offset, plus a decode-level correctness contract. Encoding is unified by the index functions `augendIdx`/`addendIdx`, not by a global re-layout: a consumer places its data at `addendIdx`, runs `circuit`, and reads the result at `augendIdx`. Both the Gidney patched ripple adder and the Cuccaro adder instantiate this (see `Arithmetic/Adder/*`). The `ancClean` precondition (the adder's internal carry block is 0) is what lets both adders qualify; it is preserved by `ancRestored`, so it is maintained inductively when an adder is run many times (e.g. once per window).

defdecodeReg

def decodeReg (idx : Nat → Nat) (n : Nat) (f : Nat → Bool) : Nat

Decode the `n`-bit register sitting at positions `idx 0 … idx (n-1)`, LSB-first: bit `i` (at qubit `idx i`) carries weight `2^i`.

definBlock

def inBlock (q span p : Nat) : Prop

`p` lies in the adder block `[q, q + span)`.

structureAdder

structure Adder

*The adder interface.** At base offset `q` and width `n`, `circuit n q` runs on the qubit block `[q, q + span n)` and, given the ancilla block clean, computes `augend ← (augend + addend) mod 2^n` in place, restoring the addend register and the ancillas and leaving everything outside the block untouched.

theoremdecodeReg_ext

theorem decodeReg_ext (idx : Nat → Nat) (n : Nat) (f g : Nat → Bool)
    (h : ∀ i, i < n → f (idx i) = g (idx i)) :
    decodeReg idx n f = decodeReg idx n g

`decodeReg` depends only on the values of `f` at the index positions.

FormalRV.Arithmetic.Adder.ContiguousTransport

FormalRV/Arithmetic/Adder/ContiguousTransport.lean

FormalRV.Arithmetic.Adder.ContiguousTransport ────────────────────────────────────────────── TRANSPORT INVESTIGATION (per Shor/GidneyInPlace/GIDNEY_INPLACE_DESIGN.md §5). QUESTION: can the verified INTERLEAVED Cuccaro adder correctness be transported by a fixed qubit PERMUTATION into a CONTIGUOUS-accumulator layout - augend / accumulator : q + i (contiguous) - addend / control : separate contiguous block reusing the verified ripple-carry arithmetic rather than re-proving it? ANSWER: YES. `Gate.applyNat` is built from point `update`s, so a generic index-RELABEL `relabelGate σ` (generalizing `GateShift.shiftBy`, which is the special case `σ = (· + s)`) transports the Boolean semantics for any INJECTIVE σ: applyNat (relabelGate σ g) f (σ p) = applyNat g (f ∘ σ) p. De-interleaving is exactly such an injective relabel. We §1 define the generic relabel + prove the transport; §2 define the de-interleaving map, prove its value equations, that it is injective, and that the augend / addend image blocks do not overlap; §3 DERIVE a transported `sumCorrect` for the contiguous layout directly from `cuccaroAdder.sumCorrect` — no new ripple-carry proof. SCOPE: this is the upstream dependency named in the design doc. It does NOT build the two-register product-add wrapper and does NOT touch the Shor scaffold.

defrelabelGate

def relabelGate (σ : Nat → Nat) : Gate → Gate
  | Gate.I          => Gate.I
  | Gate.X q        => Gate.X (σ q)
  | Gate.CX c t     => Gate.CX (σ c) (σ t)
  | Gate.CCX a b c  => Gate.CCX (σ a) (σ b) (σ c)
  | Gate.seq g₁ g₂  => Gate.seq (relabelGate σ g₁) (relabelGate σ g₂)

Relabel every qubit index of a gate through `σ`. `GateShift.shiftBy s = relabelGate (· + s)`.

theoremapplyNat_relabelGate

theorem applyNat_relabelGate (σ : Nat → Nat) (hσ : Function.Injective σ) (g : Gate) :
    ∀ (f : Nat → Bool) (p : Nat),
      Gate.applyNat (relabelGate σ g) f (σ p)
        = Gate.applyNat g (fun q => f (σ q)) p

*The relabel transport.** For an INJECTIVE `σ`, the relabeled gate acts on the image positions exactly as the original gate acts on the state pulled back along `σ`. Proved by the same structural induction as `GateShift.applyNat_shiftBy`.

theoremdecodeReg_comp_index

theorem decodeReg_comp_index (σ idx : Nat → Nat) (n : Nat) (g : Nat → Bool) :
    decodeReg (fun i => σ (idx i)) n g = decodeReg idx n (fun p => g (σ p))

`decodeReg` commutes with pulling the index function through `σ`: reading at `σ ∘ idx` over a state `g` is reading at `idx` over `g ∘ σ`.

theoremdecodeReg_congr

theorem decodeReg_congr (idx₁ idx₂ : Nat → Nat) (n : Nat) (H₁ H₂ : Nat → Bool)
    (h : ∀ i, i < n → H₁ (idx₁ i) = H₂ (idx₂ i)) :
    decodeReg idx₁ n H₁ = decodeReg idx₂ n H₂

`decodeReg` depends only on the read-out VALUES: two (index, state) pairs that yield the same bit at every position `i < n` decode to the same number. This is the join of `decodeReg_ext` (state) and an index change.

defdeinterleave

def deinterleave (n q : Nat) : Nat → Nat

De-interleaving map `σ` for width `n`, base `q`, on the block `[q, q + 2n + 1)`: carry-in `q` ↦ `q + 2n` (top of the block); augend bit `i` (`q + 2i + 1`) ↦ `q + i` (contiguous accumulator); addend bit `i` (`q + 2i + 2`) ↦ `q + n + i` (separate contiguous block). Identity outside the block.

defreinterleave

def reinterleave (n q : Nat) : Nat → Nat

Inverse map `ρ` (contiguous → interleaved), division-free; serves as a left inverse of `deinterleave` to witness injectivity.

theoremdeinterleave_augend

theorem deinterleave_augend (n q i : Nat) (hi : i < n) :
    deinterleave n q (q + 2 * i + 1) = q + i

theoremdeinterleave_addend

theorem deinterleave_addend (n q i : Nat) (hi : i < n) :
    deinterleave n q (q + 2 * i + 2) = q + n + i

theoremreinterleave_leftInverse

theorem reinterleave_leftInverse (n q : Nat) :
    Function.LeftInverse (reinterleave n q) (deinterleave n q)

`reinterleave` is a left inverse of `deinterleave`, hence the latter is injective (a genuine permutation of `Nat`).

theoremdeinterleave_injective

theorem deinterleave_injective (n q : Nat) : Function.Injective (deinterleave n q)

theoremdeinterleave_blocks_disjoint

theorem deinterleave_blocks_disjoint (n q i j : Nat) (hi : i < n) (hj : j < n) :
    deinterleave n q (q + 2 * i + 1) ≠ deinterleave n q (q + 2 * j + 2)
    ∧ deinterleave n q (q + 2 * i + 1) ≠ deinterleave n q q
    ∧ deinterleave n q (q + 2 * j + 2) ≠ deinterleave n q q

*Non-overlap of the image blocks.** The augend image `[q, q+n)`, the addend image `[q+n, q+2n)`, and the carry image `{q+2n}` are pairwise disjoint, so a bit string written across the contiguous layout decodes faithfully.

defcontiguousAdderCircuit

def contiguousAdderCircuit (n q : Nat) : Gate

The contiguous-layout circuit: the verified Cuccaro adder, relabeled.

theoremcontiguous_sumCorrect

theorem contiguous_sumCorrect (n q : Nat) (f : Nat → Bool)
    (hclean : f (q + 2 * n) = false) :
    decodeReg (fun i => q + i) n (Gate.applyNat (contiguousAdderCircuit n q) f)
      = (decodeReg (fun i => q + i) n f + decodeReg (fun i => q + n + i) n f) % 2 ^ n

*HEADLINE (deliverable 4): transported `sumCorrect`.** In the CONTIGUOUS layout — accumulator bit `i` at `q + i`, addend bit `i` at `q + n + i`, carry-in (clean) at `q + 2n` — the relabeled Cuccaro circuit computes `accumulator ← (accumulator + addend) mod 2^n` in place. Derived entirely from `cuccaroAdder.sumCorrect`; NO ripple-carry re-proof.

theoremdeinterleave_outside

theorem deinterleave_outside (n q p : Nat) (h : p < q ∨ q + 2 * n + 1 ≤ p) :
    deinterleave n q p = p

Outside the block `σ` is the identity.

theoremdeinterleave_maps_lt

theorem deinterleave_maps_lt (n q x : Nat) (hx : x < q + 2 * n + 1) :
    deinterleave n q x < q + 2 * n + 1

`σ` maps the block `[0, q+2n+1)` into itself (used for well-typedness).

theoremwellTyped_relabelGate

theorem wellTyped_relabelGate (σ : Nat → Nat) (hσ : Function.Injective σ) (dim : Nat)
    (hmap : ∀ x, x < dim → σ x < dim) :
    ∀ g, Gate.WellTyped dim g → Gate.WellTyped dim (relabelGate σ g)
  | Gate.I,        hg => hg
  | Gate.X q,      hg => hmap q hg
  | Gate.CX c t,   hg => ⟨hmap c hg.1, hmap t hg.2.1, fun h => hg.2.2 (hσ h)⟩
  | Gate.CCX a b c, hg =>
      ⟨hmap a hg.1, hmap b hg.2.1, hmap c hg.2.2.1,
        fun h => hg.2.2.2.1 (hσ h), fun h => hg.2.2.2.2.1 (hσ h),
        fun h => hg.2.2.2.2.2 (hσ h)⟩
  | Gate.seq g₁ g₂, hg =>
      ⟨wellTyped_relabelGate σ hσ dim hmap g₁ hg.1,

*Relabel preserves well-typedness** for an injective `σ` that maps `[0,dim)` into `[0,dim)` (the `≠` side-conditions survive because `σ` is injective).

theoremcontiguous_addendRestored

theorem contiguous_addendRestored (n q : Nat) (f : Nat → Bool) (i : Nat) (hi : i < n) :
    Gate.applyNat (contiguousAdderCircuit n q) f (q + n + i) = f (q + n + i)

Transported `addendRestored`: the addend register (contiguous block at `q+n+i`) is returned bit-for-bit.

theoremcontiguous_ancRestored

theorem contiguous_ancRestored (n q : Nat) (f : Nat → Bool) (hclean : f (q + 2 * n) = false) :
    Gate.applyNat (contiguousAdderCircuit n q) f (q + 2 * n) = false

Transported `ancRestored`: the carry-in (at `q+2n`) is returned clean.

theoremcontiguous_frame

theorem contiguous_frame (n q : Nat) (f : Nat → Bool) (p : Nat)
    (hp : ¬ inBlock q (2 * n + 1) p) :
    Gate.applyNat (contiguousAdderCircuit n q) f p = f p

Transported `frame`: anything outside the block `[q, q+2n+1)` is untouched.

theoremcontiguous_wellTyped

theorem contiguous_wellTyped (n q : Nat) :
    Gate.WellTyped (q + (2 * n + 1)) (contiguousAdderCircuit n q)

Transported `wellTyped`: the contiguous circuit is well-typed at `q + 2n + 1`.

theoremcontiguous_augendIdx_inBlock

theorem contiguous_augendIdx_inBlock (n q i : Nat) (hi : i < n) :
    inBlock q (2 * n + 1) (q + i)

theoremcontiguous_addendIdx_inBlock

theorem contiguous_addendIdx_inBlock (n q i : Nat) (hi : i < n) :
    inBlock q (2 * n + 1) (q + n + i)

theoremcontiguous_addendIdx_inj

theorem contiguous_addendIdx_inj (n q i j : Nat) (h : q + n + i = q + n + j) : i = j

theoremcontiguous_augend_addend_disjoint

theorem contiguous_augend_addend_disjoint (n q i j : Nat) (hi : i < n) :
    q + i ≠ q + n + j

Augend (`q+i`) and addend (`q+n+i`) are disjoint **for every augend index in use** (`i < n`, any `j`): `q+i < q+n ≤ q+n+j`. This is all any consumer needs.

theoremcontiguous_augend_addend_NOT_globally_disjoint

theorem contiguous_augend_addend_NOT_globally_disjoint (n q : Nat) :
    ¬ (∀ i j, q + i ≠ q + n + j)

*The one obligation that does NOT transport unchanged.** `Adder`'s field `augend_addend_disjoint : ∀ q i j, augendIdx q i ≠ addendIdx q j` is quantified over ALL `i j`. Cuccaro/Gidney satisfy it unboundedly via their stride-≥2 parity (`q+2i+1 ≠ q+2j+2`). A unit-stride contiguous augend cannot: at `i = n + j` we get `q + i = q + n + j`. Witness that the unbounded statement is genuinely FALSE for this layout (so a literal drop-in `Adder` instance is impossible without bounding the quantifier — which every call site already respects, since they invoke the field only at `i, j < bits`).

defcontiguousPackedAdder

def contiguousPackedAdder : TwoBaseBoundedAdder

*The contiguous-accumulator Cuccaro adder, packed, as a `TwoBaseBoundedAdder`.** Accumulator bit `i` at `accBase + i`, addend bit `i` at `addBase + i` with `addBase = accBase + n`, clean carry-in at `accBase + 2n`.

example(example)

example (n q : Nat) (f : Nat → Bool) (hclean : f (q + 2 * n) = false) :
    decodeReg (contiguousPackedAdder.accIdx n q) n
        (Gate.applyNat (contiguousPackedAdder.circuit n q (q + n)) f)
      = (decodeReg (contiguousPackedAdder.accIdx n q) n f
          + decodeReg (contiguousPackedAdder.addIdx n (q + n)) n f) % 2 ^ n

example(example)

example (n q : Nat) (f : Nat → Bool) (hclean : cuccaroAdder.ancClean f n q) :
    decodeReg (cuccaroAdder.toTwoBaseBounded.accIdx n q) n
        (Gate.applyNat (cuccaroAdder.toTwoBaseBounded.circuit n q q) f)
      = (decodeReg (cuccaroAdder.toTwoBaseBounded.accIdx n q) n f
          + decodeReg (cuccaroAdder.toTwoBaseBounded.addIdx n q) n f) % 2 ^ n

FormalRV.Arithmetic.Adder.Cuccaro

FormalRV/Arithmetic/Adder/Cuccaro.lean

FormalRV.Arithmetic.Adder.Cuccaro ───────────────────────────────── The exact-budget full Cuccaro ripple adder, packaged as an instance of the layout-parametric `Adder` interface (`FormalRV/Arithmetic/Adder.lean`). Layout (width `n`, base offset `q`, span `2n+1`): - `q + 0` : carry-in qubit (`ancClean := f q = false`). - `q + 2i + 1` : augend / running-sum bit `i` (modified in place). - `q + 2i + 2` : addend bit `i` (restored). All five interface obligations are discharged from the symbolic and decoded Cuccaro correctness theorems already proved in `Cuccaro/CuccaroFull.lean` and `Cuccaro/CuccaroDecoded.lean`.

theoremdecodeReg_augend_eq_target

theorem decodeReg_augend_eq_target (n q : Nat) (f : Nat → Bool) :
    decodeReg (fun i => q + 2 * i + 1) n f = cuccaro_target_val n q f

`decodeReg (fun i => q + 2i + 1)` agrees with the `cuccaro_target_val` recursive decoder. Both read LSB-first the qubits `q + 2i + 1`.

theoremdecodeReg_addend_eq_read

theorem decodeReg_addend_eq_read (n q : Nat) (f : Nat → Bool) :
    decodeReg (fun i => q + 2 * i + 2) n f = cuccaro_read_val n q f

`decodeReg (fun i => q + 2i + 2)` agrees with the `cuccaro_read_val` recursive decoder. Both read LSB-first the qubits `q + 2i + 2`.

theoremcuccaro_target_val_testBit

theorem cuccaro_target_val_testBit
    (bits q : Nat) (f : Nat → Bool) (i : Nat) (hi : i < bits) :
    (cuccaro_target_val bits q f).testBit i = f (q + 2 * i + 1)

Each target bit `i < bits` of `cuccaro_target_val bits q f` reads back the state bit at `q + 2i + 1`. (Self-contained converse to `cuccaro_target_val_eq_sum_when_bits_match`, by uniqueness of binary digits.)

theoremcuccaro_read_val_testBit

theorem cuccaro_read_val_testBit
    (bits q : Nat) (f : Nat → Bool) (i : Nat) (hi : i < bits) :
    (cuccaro_read_val bits q f).testBit i = f (q + 2 * i + 2)

Each read bit `i < bits` of `cuccaro_read_val bits q f` reads back the state bit at `q + 2i + 2`.

theoremAdder.carry_ext_below

theorem Adder.carry_ext_below
    (b₀ : Bool) (k : Nat) (f g f' g' : Nat → Bool)
    (hf : ∀ j, j < k → f j = f' j) (hg : ∀ j, j < k → g j = g' j) :
    Adder.carry b₀ k f g = Adder.carry b₀ k f' g'

`Adder.carry b₀ k f g` consults `f`/`g` only at indices `< k`, so it agrees under any pair of streams that match below `k`.

defcuccaroAdder

def cuccaroAdder : Adder

*The exact-budget Cuccaro ripple adder, as an `Adder`.** The augend (running-sum) register lives at `q + 2i + 1`, the addend register at `q + 2i + 2`, with the carry-in ancilla at `q` required clean (`= false`).

FormalRV.Arithmetic.Adder.Gidney

FormalRV/Arithmetic/Adder/Gidney.lean

FormalRV.Arithmetic.Adder.Gidney ──────────────────────────────── The Gidney patched ripple-carry adder, packaged as an instance of the layout-parametric `Adder` interface (`FormalRV/Arithmetic/Adder.lean`). Layout (width `n`, base offset `q`, span `3n+2`): - `q + 3i` : addend / read bit `i` (`a`, restored). - `q + 3i + 1` : augend / target / running-sum bit `i` (becomes `(a+b) mod 2^n`). - `q + 3i + 2` : carry bit `i` (`ancClean := f (q+3i+2) = false`; restored clean). THE CRUX: the underlying circuit `gidney_adder_full_faithful_no_measurement_patched` is hard-wired at base 0 (`read_idx i = 3i`, `target_idx i = 3i+1`, `carry_idx i = 3i+2`), unlike the base-parametric Cuccaro adder. We therefore introduce a generic qubit-relabelling `Gate.shiftBy k` (add `k` to every qubit index) and prove its `applyNat` / `WellTyped` transfer lemmas, then place the base-0 adder at base `q` as `Gate.shiftBy q (...)`. Small widths: for `n ≤ 1` the base adder degenerates to `Gate.I`, which is *not* a correct 1-bit adder (a 1-bit add needs `target ← target ⊕ read`). Since the interface lets us choose `circuit n q` per width, we use a bespoke correct circuit for `n = 0` (identity) and `n = 1` (`CX (q) (q+1)`), and the shifted Gidney adder for `n ≥ 2`.

defGate.shiftBy

def Gate.shiftBy (k : Nat) : Gate → Gate
  | Gate.I         => Gate.I
  | Gate.X q       => Gate.X (q + k)
  | Gate.CX c t    => Gate.CX (c + k) (t + k)
  | Gate.CCX a b c => Gate.CCX (a + k) (b + k) (c + k)
  | Gate.seq g₁ g₂ => Gate.seq (Gate.shiftBy k g₁) (Gate.shiftBy k g₂)

Add `k` to every qubit index of a `Gate`.

theoremGate.shiftBy_applyNat_below

theorem Gate.shiftBy_applyNat_below (k : Nat) (g : Gate) (f : Nat → Bool)
    (p : Nat) (hp : p < k) :
    Gate.applyNat (Gate.shiftBy k g) f p = f p

*Shift transfer (below the base).** The shifted circuit touches no qubit `< k`.

theoremGate.shiftBy_applyNat

theorem Gate.shiftBy_applyNat (k : Nat) (g : Gate) (f : Nat → Bool) :
    ∀ p, Gate.applyNat (Gate.shiftBy k g) f (p + k)
      = Gate.applyNat g (fun j => f (j + k)) p

*Shift transfer (conjugation).** Running the shifted circuit on `f` and reading at `p + k` equals running the base circuit on the down-shifted stream `fun j => f (j + k)` and reading at `p`.

theoremGate.wellTyped_le

theorem Gate.wellTyped_le {dim dim' : Nat} {g : Gate}
    (h : Gate.WellTyped dim g) (hle : dim ≤ dim') : Gate.WellTyped dim' g

*WellTyped monotonicity** (local helper): enlarging the dimension preserves well-typedness.

theoremGate.shiftBy_wellTyped

theorem Gate.shiftBy_wellTyped (k dim : Nat) (g : Gate)
    (h : Gate.WellTyped dim g) : Gate.WellTyped (k + dim) (Gate.shiftBy k g)

*Shift transfer (WellTyped).** If `g` is WellTyped at `dim` then the shifted circuit is WellTyped at `k + dim`.

theoremGate.applyNat_congr

theorem Gate.applyNat_congr {dim : Nat} {g : Gate}
    (h_wt : Gate.WellTyped dim g) (f f' : Nat → Bool)
    (hagree : ∀ i, i < dim → f i = f' i) :
    ∀ p, p < dim → Gate.applyNat g f p = Gate.applyNat g f' p

For a `Gate` WellTyped at `dim`, the output at positions `< dim` depends only on the input restricted to `[0, dim)`: if `f` and `f'` agree on `[0, dim)` then the outputs agree on `[0, dim)`.

theoremgidney_decodeReg_augend_eq_target

theorem gidney_decodeReg_augend_eq_target (n q : Nat) (f : Nat → Bool) :
    decodeReg (fun i => q + 3 * i + 1) n f
      = gidney_target_val n (fun j => f (q + j))

`decodeReg (fun i => q + 3i + 1)` agrees with the base-0 `gidney_target_val` decoder applied to the down-shifted stream `fun j => f (q + j)`. (Named `gidney_…` to avoid clashing with the Cuccaro bridge of the same shape in `Adder/Cuccaro.lean` when both adder instances are imported together.)

theoremgidney_decodeReg_addend_eq_read

theorem gidney_decodeReg_addend_eq_read (n q : Nat) (f : Nat → Bool) :
    decodeReg (fun i => q + 3 * i) n f
      = gidney_read_val n (fun j => f (q + j))

`decodeReg (fun i => q + 3i)` agrees with the base-0 `gidney_read_val` decoder applied to the down-shifted stream `fun j => f (q + j)`. (Named `gidney_…` to avoid clashing with the Cuccaro bridge of the same shape in `Adder/Cuccaro.lean` when both adder instances are imported together.)

theoremgidney_target_val_testBit

theorem gidney_target_val_testBit (n : Nat) (h : Nat → Bool) (i : Nat) (hi : i < n) :
    (gidney_target_val n h).testBit i = h (target_idx i)

Each target bit `i < n` of `gidney_target_val n h` reads back the state bit at `target_idx i = 3i + 1`.

theoremgidney_read_val_testBit

theorem gidney_read_val_testBit (n : Nat) (h : Nat → Bool) (i : Nat) (hi : i < n) :
    (gidney_read_val n h).testBit i = h (read_idx i)

Each read bit `i < n` of `gidney_read_val n h` reads back the state bit at `read_idx i = 3i`.

theoremgidney_first_wt'

theorem gidney_first_wt' (bits : Nat) (hbits : 2 ≤ bits) :
    Gate.WellTyped (3 * bits) gidney_adder_bit_step_faithful_first

theoremgidney_interior_wt'

theorem gidney_interior_wt' (bits i : Nat) (hi_pos : 0 < i) (hi_lt : i < bits - 1) :
    Gate.WellTyped (3 * bits) (gidney_adder_bit_step_faithful_interior i)

theoremgidney_last_wt'

theorem gidney_last_wt' (bits i : Nat) (hi_pos : 0 < i) (hi_lt : i < bits) :
    Gate.WellTyped (3 * bits) (gidney_adder_bit_step_faithful_last i)

theoremgidney_first_rev_wt'

theorem gidney_first_rev_wt' (bits : Nat) (hbits : 2 ≤ bits) :
    Gate.WellTyped (3 * bits) gidney_adder_bit_step_faithful_first_reverse_patched

theoremgidney_interior_rev_wt'

theorem gidney_interior_rev_wt' (bits i : Nat) (hi_pos : 0 < i) (hi_lt : i < bits - 1) :
    Gate.WellTyped (3 * bits)
      (gidney_adder_bit_step_faithful_interior_reverse_patched i)

theoremgidney_last_rev_wt'

theorem gidney_last_rev_wt' (bits i : Nat) (hi_pos : 0 < i) (hi_lt : i < bits) :
    Gate.WellTyped (3 * bits)
      (gidney_adder_bit_step_faithful_last_reverse_patched i)

theoremgidney_fwd_prop_wt'

theorem gidney_fwd_prop_wt' (bits : Nat) (hb2 : 2 ≤ bits) :
    ∀ k, k ≤ bits - 1 →
      Gate.WellTyped (3 * bits) (gidney_adder_forward_with_propagation k)

theoremgidney_fwd_full_wt'

theorem gidney_fwd_full_wt' (bits : Nat) (hb2 : 2 ≤ bits) :
    Gate.WellTyped (3 * bits) (gidney_adder_forward_faithful_full bits)

theoremgidney_final_cx_wt'

theorem gidney_final_cx_wt' (bits : Nat) (hb1 : 1 ≤ bits) :
    ∀ k, k ≤ bits →
      Gate.WellTyped (3 * bits) (gidney_final_cx_cascade k)

theoremgidney_fwd_prop_rev_wt'

theorem gidney_fwd_prop_rev_wt' (bits : Nat) (hb2 : 2 ≤ bits) :
    ∀ k, k ≤ bits - 1 →
      Gate.WellTyped (3 * bits)
        (gidney_adder_forward_with_propagation_reverse_patched k)

theoremgidney_fwd_full_rev_wt'

theorem gidney_fwd_full_rev_wt' (bits : Nat) (hb2 : 2 ≤ bits) :
    Gate.WellTyped (3 * bits)
      (gidney_adder_forward_faithful_full_reverse_patched bits)

theoremgidney_patched_wt_tight

theorem gidney_patched_wt_tight (bits : Nat) (hb2 : 2 ≤ bits) :
    Gate.WellTyped (3 * bits)
      (gidney_adder_full_faithful_no_measurement_patched bits)

*Tight WellTyped**: the patched Gidney adder is WellTyped at `3*bits`.

theoremgidney_input_agree

theorem gidney_input_agree (bits : Nat) (h : Nat → Bool)
    (hcl : ∀ i, i < bits → h (carry_idx i) = false) :
    ∀ p, p < 3 * bits →
      h p = adder_input_F bits (gidney_read_val bits h)
              (gidney_target_val bits h) p

On the qubit block `[0, 3*bits)`, any clean-carry stream `h` agrees with the canonical input `adder_input_F bits a b` for `a = gidney_read_val bits h`, `b = gidney_target_val bits h`.

theoremgidney_target_arbitrary

theorem gidney_target_arbitrary (bits : Nat) (hb2 : 2 ≤ bits) (h : Nat → Bool)
    (hcl : ∀ i, i < bits → h (carry_idx i) = false) :
    gidney_target_val bits
        (Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits) h)
      = (gidney_read_val bits h + gidney_target_val bits h) % 2 ^ bits

*Arbitrary-input target correctness** for the base-0 patched adder. For a clean-carry stream `h`, the target register decodes to `(read + target) mod 2^bits`.

theoremgidney_read_arbitrary

theorem gidney_read_arbitrary (bits : Nat) (hb2 : 2 ≤ bits) (h : Nat → Bool)
    (hcl : ∀ i, i < bits → h (carry_idx i) = false) (i : Nat) (hi : i < bits) :
    Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits) h
        (read_idx i) = h (read_idx i)

*Arbitrary-input read preservation** for the base-0 patched adder.

theoremgidney_carry_arbitrary

theorem gidney_carry_arbitrary (bits : Nat) (hb2 : 2 ≤ bits) (h : Nat → Bool)
    (hcl : ∀ i, i < bits → h (carry_idx i) = false) (i : Nat) (hi : i < bits) :
    Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits) h
        (carry_idx i) = false

*Arbitrary-input carry clearance** for the base-0 patched adder.

defgidneyCircuit

def gidneyCircuit (n q : Nat) : Gate

The width-`n` Gidney circuit at base offset `q`: identity for `n = 0`, a bare `CX` for the degenerate 1-bit add, and the shifted patched adder for `n ≥ 2`.

theoremdownshift_carry_clean

theorem downshift_carry_clean (n q : Nat) (f : Nat → Bool)
    (hcl : ∀ i, i < n → f (q + 3 * i + 2) = false) :
    ∀ i, i < n → (fun j => f (q + j)) (carry_idx i) = false

The down-shifted stream of a clean-carry block input is clean at every base-0 carry index `< n`.

theoremshiftBy_applyNat_base

theorem shiftBy_applyNat_base (q : Nat) (g : Gate) (f : Nat → Bool) (j : Nat) :
    Gate.applyNat (Gate.shiftBy q g) f (q + j)
      = Gate.applyNat g (fun i => f (q + i)) j

*Base-`q` shift transfer** in the `fun j => f (q + j)` convention used by the decoders: reading the shifted circuit's output at `q + j` equals the base-0 circuit run on the down-shifted stream, read at `j`.

defgidneyAdder

def gidneyAdder : Adder

*The Gidney patched ripple-carry adder, as an `Adder`.** The augend (running-sum / target) register lives at `q + 3i + 1`, the addend (read) register at `q + 3i`, with the carry block at `q + 3i + 2` required clean (`= false`). Span `3n + 2`.

FormalRV.Arithmetic.Adder.RelocatedTransport

FormalRV/Arithmetic/Adder/RelocatedTransport.lean

FormalRV.Arithmetic.Adder.RelocatedTransport ───────────────────────────────────────────── The RELOCATED contiguous two-base adder: a `TwoBaseBoundedAdder` whose accumulator block `[accBase, accBase+n)` and addend block `[addBase, addBase+n)` are at INDEPENDENT bases, with carry at `addBase + n` (John's convention), built by a GAP-PARAMETERIZED de-interleave relabel of the verified Cuccaro adder. This GENERALIZES `contiguousPackedAdder` (the gap-0 / `addBase = accBase+n` case) to any `addBase ≥ accBase + n`. Motivation (see `Shor/GidneyInPlace/Adder/Def/ProductAddLayout`): the faithful Gidney two-register product-add needs, in pass 2 (`a -= b·kInv`, accumulator `a`), the addend block at `accBase + 2·bits` (register `b` sits in between), which the packed adder cannot host. The relabel `relocate accBase addBase n`: carry-in `accBase` ↦ `addBase + n` (the global carry) augend bit i (`accBase+2i+1`) ↦ `accBase + i` (contiguous accumulator) addend bit i (`accBase+2i+2`) ↦ `addBase + i` (relocated addend) fill `[accBase+2n+1, addBase+n]` ↦ gap `[accBase+n, addBase)` (bijective filler) It is a genuine permutation of `[accBase, addBase+n]` (identity outside), proven injective by a division-free left inverse — the same method as `deinterleave`. SCOPE: this is layout/adder transport only. It does NOT build the product-add wrapper and proves NO product-add arithmetic.

defrelocate

def relocate (accBase addBase n : Nat) : Nat → Nat

Relocating de-interleave: Cuccaro(base `accBase`) → accumulator at `accBase+i`, addend at `addBase+i`, carry at `addBase+n`, gap positions filled bijectively. Valid as a permutation when `accBase + n ≤ addBase`.

defrelocateInv

def relocateInv (accBase addBase n : Nat) : Nat → Nat

Division-free left inverse of `relocate`.

theoremrelocate_augend

theorem relocate_augend (accBase addBase n i : Nat) (hi : i < n) :
    relocate accBase addBase n (accBase + 2 * i + 1) = accBase + i

theoremrelocate_addend

theorem relocate_addend (accBase addBase n i : Nat) (hi : i < n) :
    relocate accBase addBase n (accBase + 2 * i + 2) = addBase + i

theoremrelocate_outside

theorem relocate_outside (accBase addBase n p : Nat) (hv : accBase + n ≤ addBase)
    (h : p < accBase ∨ addBase + n < p) :
    relocate accBase addBase n p = p

Outside the bounding interval `[accBase, addBase+n]`, `relocate` is the identity (needs validity so that `p > addBase+n` cannot fall in the de-interleave range).

theoremrelocateInv_leftInverse

theorem relocateInv_leftInverse (accBase addBase n : Nat) (hv : accBase + n ≤ addBase) :
    Function.LeftInverse (relocateInv accBase addBase n) (relocate accBase addBase n)

`relocateInv` is a left inverse of `relocate` (so `relocate` is injective), under the validity precondition `accBase + n ≤ addBase`.

theoremrelocate_injective

theorem relocate_injective (accBase addBase n : Nat) (hv : accBase + n ≤ addBase) :
    Function.Injective (relocate accBase addBase n)

theoremrelocate_maps_lt

theorem relocate_maps_lt (accBase addBase n x : Nat)
    (hx : x < addBase + n + 1) : relocate accBase addBase n x < addBase + n + 1

`relocate` maps the block `[0, addBase+n+1)` into itself (for well-typedness).

defrelocatedAdderCircuit

def relocatedAdderCircuit (accBase addBase n : Nat) : Gate

The relocated contiguous adder circuit: the verified Cuccaro adder at base `accBase`, relabeled by `relocate`.

theoremrelocated_sumCorrect

theorem relocated_sumCorrect (n accBase addBase : Nat) (f : Nat → Bool)
    (hv : accBase + n ≤ addBase) (hclean : f (addBase + n) = false) :
    decodeReg (fun i => accBase + i) n (Gate.applyNat (relocatedAdderCircuit accBase addBase n) f)
      = (decodeReg (fun i => accBase + i) n f
          + decodeReg (fun i => addBase + i) n f) % 2 ^ n

Transported `sumCorrect`: accumulator `[accBase, accBase+n)`, addend `[addBase, addBase+n)`, clean carry-in at `addBase+n`.

theoremrelocated_addendRestored

theorem relocated_addendRestored (n accBase addBase : Nat) (f : Nat → Bool)
    (hv : accBase + n ≤ addBase) (i : Nat) (hi : i < n) :
    Gate.applyNat (relocatedAdderCircuit accBase addBase n) f (addBase + i) = f (addBase + i)

Transported `addendRestored`.

theoremrelocated_ancRestored

theorem relocated_ancRestored (n accBase addBase : Nat) (f : Nat → Bool)
    (hv : accBase + n ≤ addBase) (hclean : f (addBase + n) = false) :
    Gate.applyNat (relocatedAdderCircuit accBase addBase n) f (addBase + n) = false

Transported `ancRestored`: the carry-in at `addBase+n` is returned clean.

theoremrelocated_frame

theorem relocated_frame (n accBase addBase : Nat) (f : Nat → Bool) (p : Nat)
    (hv : accBase + n ≤ addBase)
    (hp : ¬ inBlock accBase (addBase + n + 1 - accBase) p) :
    Gate.applyNat (relocatedAdderCircuit accBase addBase n) f p = f p

Transported `frame` (bounding interval `[accBase, addBase+n+1)`).

theoremrelocated_gap_frame

theorem relocated_gap_frame (n accBase addBase : Nat) (f : Nat → Bool) (p : Nat)
    (hv : accBase + n ≤ addBase) (h1 : accBase + n ≤ p) (h2 : p < addBase) :
    Gate.applyNat (relocatedAdderCircuit accBase addBase n) f p = f p

*Gap-frame (the load-bearing preservation fact).** The gap `[accBase+n, addBase)` — which for the faithful pass-2 layout IS register `b`, used as the multiplicand — is left UNTOUCHED by the relocated adder, even though it lies INSIDE the coarse bounding frame. Proof: a gap position `p` is `relocate (p+n+1)`, and `p+n+1` lies in the fill domain ABOVE the Cuccaro support, so Cuccaro's frame-above applies.

theoremrelocated_wellTyped

theorem relocated_wellTyped (n accBase addBase dim : Nat) (hv : accBase + n ≤ addBase)
    (hdim : addBase + n + 1 ≤ dim) :
    Gate.WellTyped dim (relocatedAdderCircuit accBase addBase n)

Transported `wellTyped` at any dimension `dim ≥ addBase + n + 1`.

defrelocatedContiguousAdder

def relocatedContiguousAdder : TwoBaseBoundedAdder

*The relocated contiguous adder** (generalizes `contiguousPackedAdder` to an independent addend base; carry at `addBase + n`; valid when `accBase + n ≤ addBase`).

theoremrelocated_pass1_valid

theorem relocated_pass1_valid (w bits : Nat) :
    relocatedContiguousAdder.valid bits (1 + 2 * w + bits) (1 + 2 * w + 2 * bits)

Pass 1 (`b += a·k`): accumulator `b` at `1+2w+bits`, addend at the shared temp `1+2w+2bits` — the gap-0 (packed) case. `valid` holds.

theoremrelocated_pass2_valid

theorem relocated_pass2_valid (w bits : Nat) :
    relocatedContiguousAdder.valid bits (1 + 2 * w) (1 + 2 * w + 2 * bits)

Pass 2 (`a -= b·kInv`): accumulator `a` at `1+2w`, addend at the shared temp `1+2w+2bits` — the gap-`bits` (spread) case that the packed adder could not host. `valid` holds.

theoremrelocated_faithful_blocks_disjoint

theorem relocated_faithful_blocks_disjoint (w bits i j : Nat)
    (hi : i < bits) (hj : j < bits) :
    -- pass 2: acc a [1+2w, 1+2w+bits), addend [1+2w+2bits, 1+2w+3bits), carry 1+2w+3bits
    (1 + 2 * w + i ≠ 1 + 2 * w + 2 * bits + j)
    ∧ (1 + 2 * w + i ≠ 1 + 2 * w + 3 * bits)
    ∧ (1 + 2 * w + 2 * bits + j ≠ 1 + 2 * w + 3 * bits)

*Block disjointness for both passes** (carry, acc block, addend block pairwise distinct for the in-use indices). The disjointness of ALL source/intermediate Cuccaro indices under the relabel is exactly `relocate_injective`.

theoremrelocated_pass1_multiplicand_preserved

theorem relocated_pass1_multiplicand_preserved (w bits : Nat) (f : Nat → Bool)
    (i : Nat) (hi : i < bits) :
    Gate.applyNat (relocatedAdderCircuit (1 + 2 * w + bits) (1 + 2 * w + 2 * bits) bits)
        f (1 + 2 * w + i) = f (1 + 2 * w + i)

*Pass-1 multiplicand preserved.** In pass 1 the multiplicand `a` lives at `[1+2w, 1+2w+bits)`, BELOW the accumulator base `bReg = 1+2w+bits`, so the adder (which acts on `[bReg, …)`) leaves it untouched (the below-footprint frame).

theoremrelocated_pass2_multiplicand_preserved

theorem relocated_pass2_multiplicand_preserved (w bits : Nat) (f : Nat → Bool)
    (i : Nat) (hi : i < bits) :
    Gate.applyNat (relocatedAdderCircuit (1 + 2 * w) (1 + 2 * w + 2 * bits) bits)
        f (1 + 2 * w + bits + i) = f (1 + 2 * w + bits + i)

*Pass-2 multiplicand preserved — the crucial gap fact.** In pass 2 the multiplicand `b` lives at `[1+2w+bits, 1+2w+2bits)`, which is exactly the GAP of the adder (accumulator base `aReg = 1+2w`, addend base `1+2w+2bits`). Although `b` lies INSIDE the coarse bounding frame, `relocated_gap_frame` proves the adder leaves it untouched — so `b` is preserved while being read as the multiplicand.

FormalRV.Arithmetic.Adder.TwoBaseBoundedAdder

FormalRV/Arithmetic/Adder/TwoBaseBoundedAdder.lean

FormalRV.Arithmetic.Adder.TwoBaseBoundedAdder ────────────────────────────────────────────── A TWO-BASE, bounded, width-aware adder interface — the abstraction the faithful Gidney two-register product-add wants: b += a*k (accumulator b, multiplicand a in their OWN blocks) a -= b*kInv The accumulator and the addend live at INDEPENDENT base offsets `accBase` / `addBase`, each a contiguous (or stride-based) block, rather than the rigid single-base packing `addend = augend + n` of the single-base interface. This SUPERSEDES the earlier single-base `BoundedAdder` (now removed): it keeps the same two corrections forced by a contiguous layout — • width-aware indices `accIdx n accBase i` / `addIdx n addBase i`, and • bounded disjointness (`i, j < n`) — and adds a `valid : n → accBase → addBase → Prop` predicate so an instance can declare exactly which base layouts it supports. All correctness obligations are conditioned on `valid`. Two instances: • `Adder.toTwoBaseBounded` — every single-base `Adder` is a degenerate two-base adder with `accBase = addBase` (`valid := addBase = accBase`). So this is a strict generalization: existing Cuccaro/Gidney `Adder`s feed any consumer written against this interface, with no re-proof and no change to `Adder`. • `contiguousPackedAdder` (in `ContiguousTransport.lean`) — the contiguous Cuccaro transport, packed at `addBase = accBase + n` (`valid := addBase = accBase + n`); the convenience specialization that the §1–§4 transport built. NOTE on scope (honest): the footprint used by `frame`/`wellTyped` is the single bounding interval `[accBase, accBase + span n)` anchored at `accBase`. `valid` is expected to place the addend block inside it (`accBase ≤ addBase`, addend fits). A fully-relocatable footprint (addend block BELOW `accBase`, or far away with a disjoint-union footprint) is a further generalization, deliberately not built here. Whether to instead globally bound/refactor `Adder` is a separate library-level checkpoint requiring an instance/consumer audit.

structureTwoBaseBoundedAdder

structure TwoBaseBoundedAdder

*The two-base bounded adder interface.** `accIdx`/`addIdx` take the operand's own base; `valid n accBase addBase` says the instance is correct for that base layout; every obligation that depends on the layout is conditioned on it.

defAdder.toTwoBaseBounded

def Adder.toTwoBaseBounded (A : Adder) : TwoBaseBoundedAdder

*Every single-base `Adder` is a (degenerate) two-base adder** with the two bases coinciding (`valid := addBase = accBase`). The interleaved Cuccaro/Gidney layout is NOT a contiguous two-base layout (its addend `q+2i+2` is not `addBase+i` for any fixed `addBase`), so this records the honest fact that old `Adder`s only support the FIXED RELATIVE single-base layout — exposed here as the diagonal `accBase = addBase`, with the original (single-base) index functions.

FormalRV.Arithmetic.Correctness

FormalRV/Arithmetic/Correctness.lean

FormalRV.BQAlgo.Correctness — REUSABLE correctness primitives for Gate-IR-encoded arithmetic circuits. ## Status (2026-05-12) This module provides the bridge from `Gate` IR (in `Framework.Gate`) to classical-basis-state semantics (in `Framework.PadAction`'s `f_to_vec` infrastructure), so that any future arithmetic-circuit review can state correctness theorems of the form: > on classical input `f : Nat → Bool`, running the circuit produces > the basis state corresponding to the expected output function. Per CLAUDE.md hard rule "build a reusable framework, not one-off proofs", lemmas in this file are stated generically over Gate IR constructions. They are then applied to specific circuits (`gidney_adder_bit_step`, `prefix_and_step`, ...) in their own files. *Reusable primitives (this file):** - `gate_ccx_acts_on_basis`: Gate.CCX's classical-state action - `gate_cx_acts_on_basis`: Gate.CX's classical-state action - `gate_x_acts_on_basis`: Gate.X's classical-state action - `gate_seq_acts_on_basis`: sequential composition propagation *Application sites (other files):** - `BQAlgo/RippleCarryAdder.lean`: `gidney_adder_bit_step` correctness - `BQAlgo/UnaryLookup.lean`: `prefix_and_step` correctness - (future) Gidney measurement-AND with extended Gate IR

theoremgate_ccx_acts_on_basis

theorem gate_ccx_acts_on_basis (dim a b c : Nat)
    (ha : a < dim) (hb : b < dim) (hc : c < dim)
    (hab : a ≠ b) (hac : a ≠ c) (hbc : b ≠ c) (f : Nat → Bool) :
    uc_eval (Gate.toUCom dim (Gate.CCX a b c)) * f_to_vec dim f
      = f_to_vec dim (update f c (xor (f c) (f a && f b)))

A `Gate.CCX a b c` applied to a classical basis state `f_to_vec dim f` XORs the AND of bits `a` and `b` into bit `c`. This is the Gate-IR-level statement of the Toffoli's classical action, derived from `Framework.PadAction.f_to_vec_CCX_proved` via `Gate.toUCom_CCX`.

theoremgate_ccx_acts_on_basis_symm

theorem gate_ccx_acts_on_basis_symm (dim a b c : Nat)
    (ha : a < dim) (hb : b < dim) (hc : c < dim)
    (hab : a ≠ b) (hac : a ≠ c) (hbc : b ≠ c) (f : Nat → Bool) :
    uc_eval (Gate.toUCom dim (Gate.CCX a b c)) * f_to_vec dim f
      = f_to_vec dim (update f c (xor (f c) (f b && f a)))

Symmetric variant: CCX is unchanged by swapping controls. Just a notational convenience using `Bool.and_comm`.

theoremgate_cx_acts_on_basis

theorem gate_cx_acts_on_basis (dim c t : Nat)
    (hc : c < dim) (ht : t < dim) (hct : c ≠ t) (f : Nat → Bool) :
    uc_eval (Gate.toUCom dim (Gate.CX c t)) * f_to_vec dim f
      = f_to_vec dim (update f t (xor (f t) (f c)))

A `Gate.CX c t` applied to a classical basis state XORs bit `c` into bit `t`. Derived from `Framework.PadAction.f_to_vec_CNOT_proved`.

theoremgate_x_acts_on_basis

theorem gate_x_acts_on_basis (dim n : Nat) (h : n < dim) (f : Nat → Bool) :
    uc_eval (Gate.toUCom dim (Gate.X n)) * f_to_vec dim f
      = f_to_vec dim (update f n (!f n))

A `Gate.X n` applied to a classical basis state flips bit `n`. Derived from `Framework.PadAction.f_to_vec_X_uc_eval`.

theoremgate_cx_cx_id_on_basis

theorem gate_cx_cx_id_on_basis (dim c t : Nat)
    (hc : c < dim) (ht : t < dim) (hct : c ≠ t) (f : Nat → Bool) :
    uc_eval (Gate.toUCom dim (Gate.seq (Gate.CX c t) (Gate.CX c t)))
      * f_to_vec dim f
      = f_to_vec dim f

Applying `Gate.CX c t` twice to a classical basis state restores the original state. SQIR/SQIR/Equivalences.v line 109 analog (CNOT involution) lifted to the Gate IR / basis-action level. Direct lift of `f_to_vec_CNOT_CNOT` from `Framework/PadAction.lean`.

theoremgate_ccx_ccx_id_on_basis

theorem gate_ccx_ccx_id_on_basis (dim a b c : Nat)
    (ha : a < dim) (hb : b < dim) (hc : c < dim)
    (hab : a ≠ b) (hac : a ≠ c) (hbc : b ≠ c) (f : Nat → Bool) :
    uc_eval (Gate.toUCom dim (Gate.seq (Gate.CCX a b c) (Gate.CCX a b c)))
      * f_to_vec dim f
      = f_to_vec dim f

Applying `Gate.CCX a b c` twice to a classical basis state restores the original state. SQIR analog: CCX is self-inverse (Toffoli is an involution). Direct lift of `f_to_vec_CCX_involutive` via `Matrix.mul_assoc` (the SQIR form takes nested multiplication; our Gate.seq form takes a single composed CCX-CCX product).

theoremgate_x_x_id_on_basis

theorem gate_x_x_id_on_basis (dim n : Nat) (h : n < dim) (f : Nat → Bool) :
    uc_eval (Gate.toUCom dim (Gate.seq (Gate.X n) (Gate.X n)))
      * f_to_vec dim f
      = f_to_vec dim f

Applying `Gate.X n` twice to a classical basis state restores the original state. SQIR/SQIR/Equivalences.v line 68 analog (X_X_id) lifted to the Gate IR / basis-action level. Direct lift of `f_to_vec_X_X` from `Framework/PadAction.lean`. Completes the three-gate involution family (X, CX, CCX).

theoremgate_seq_acts_on_basis

theorem gate_seq_acts_on_basis (dim : Nat) (g₁ g₂ : Gate)
    (f g h : Nat → Bool)
    (h₁ : uc_eval (Gate.toUCom dim g₁) * f_to_vec dim f = f_to_vec dim g)
    (h₂ : uc_eval (Gate.toUCom dim g₂) * f_to_vec dim g = f_to_vec dim h) :
    uc_eval (Gate.toUCom dim (Gate.seq g₁ g₂)) * f_to_vec dim f
      = f_to_vec dim h

Sequential composition acts on basis states by composing the per- gate basis-state functions. Derived from `uc_eval_seq` (right-to- left matrix multiplication on `seq`).

defGate.applyNat

def Gate.applyNat : Gate → (Nat → Bool) → (Nat → Bool)
  | Gate.I,         f => f
  | Gate.X q,       f => update f q (!f q)
  | Gate.CX c t,    f => update f t (xor (f t) (f c))
  | Gate.CCX a b c, f => update f c (xor (f c) (f a && f b))
  | Gate.seq g₁ g₂, f => Gate.applyNat g₂ (Gate.applyNat g₁ f)

Boolean-function semantics of a `Gate` IR term as a transformation on `Nat → Bool` (the function-form parallel of `Framework.Semantics.apply` on `Fin n → Bool`). Uses the project's local `Framework.update`, matching `gate_*_acts_on_basis` exactly.

theoremuc_eval_toUCom_acts_on_basis

theorem uc_eval_toUCom_acts_on_basis (dim : Nat) (g : Gate)
    (h_wt : Gate.WellTyped dim g) (f : Nat → Bool) :
    uc_eval (Gate.toUCom dim g) * f_to_vec dim f
      = f_to_vec dim (Gate.applyNat g f)

*The Gate → BaseUCom → basis-state adapter.** For any well-typed `Gate` IR term `g`, the matrix action of `uc_eval (Gate.toUCom dim g)` on the classical basis state `f_to_vec dim f` equals the basis state of `Gate.applyNat g f`. Proved by structural induction on `g`, using the existing per-gate basis action lemmas plus `gate_seq_acts_on_basis` for composition. *Usage path to `f_modmult_circuit_MMI`**: given a future modular multiplier `g_modmult : Gate` with a Boolean-function correctness theorem `Gate.applyNat g_modmult (encode_pair x 0) = encode_pair (a*x%N) 0`, combine with this adapter and `f_to_vec_eq_basis_padEquiv` (in `Framework/PadAction.lean`) to obtain the `uc_eval ... * basis_vector ... = basis_vector ...` shape that `MultiplyCircuitProperty` requires.

theoremtoUCom_acts_on_basis_of_applyNat_index

theorem toUCom_acts_on_basis_of_applyNat_index
    {dim : Nat} {g : Gate}
    (h_wt : Gate.WellTyped dim g)
    (inputIndex outputIndex : Nat) (f : Nat → Bool)
    (h_input : f_to_vec dim f = basis_vector (2^dim) inputIndex)
    (h_output : f_to_vec dim (Gate.applyNat g f)
                  = basis_vector (2^dim) outputIndex) :
    uc_eval (Gate.toUCom dim g) * basis_vector (2^dim) inputIndex
      = basis_vector (2^dim) outputIndex

*Index-form Gate → BaseUCom → basis_vector adapter** (the `MultiplyCircuitProperty`-shaped specialisation). Given: a well-typed `Gate` term `g`, a Boolean bit-function `f` that encodes some input as the basis state at `inputIndex`, and the fact that `Gate.applyNat g f` re-encodes the output as the basis state at `outputIndex`, the matrix action of `uc_eval (Gate.toUCom dim g)` on `basis_vector (2^dim) inputIndex` yields exactly `basis_vector (2^dim) outputIndex`. This is precisely the shape of `MultiplyCircuitProperty`'s `uc_eval c (basis_vector …) = basis_vector …` clause; downstream, supply `inputIndex := x * 2^anc`, `outputIndex := (a * x % N) * 2^anc`, and a Boolean encoding `f` of `x` in the data register with the ancilla zeroed.

theoremGate.applyNat_oob

theorem Gate.applyNat_oob
    {dim : Nat} {g : Gate}
    (h_wt : Gate.WellTyped dim g)
    (f : Nat → Bool)
    {i : Nat} (hi : dim ≤ i) :
    Gate.applyNat g f i = f i

*Out-of-range preservation of `Gate.applyNat`.** For a `Gate` that is well-typed at `dim` qubits, `Gate.applyNat g f i = f i` for every position `i ≥ dim`. In other words, the gate's Boolean semantics only touches positions `< dim`; any position beyond the declared dimension is fixed. Proved by induction on `g`: `I`: identity, trivial. `X q`, `CX c t`, `CCX a b c`: `update f _ _ i = f i` whenever `i` differs from the updated index, which follows from `i ≥ dim` and the corresponding bound from `Gate.WellTyped`. `seq g₁ g₂`: chain the two inductive hypotheses. This is the bit-level analogue of "the gate matrix is padded with identity on the OOB qubits"; used downstream to satisfy the out-of-range branch of `eq_encodeDataZeroAnc_of_data_anc_oob` for modular-multiplier circuits.

FormalRV.Arithmetic.Cuccaro.Cuccaro

FormalRV/Arithmetic/Cuccaro/Cuccaro.lean

FormalRV.BQAlgo.Cuccaro — the Cuccaro–Draper–Kutin–Moulton ripple-carry adder, encoded as concrete `Gate` data over the Framework IR. Per CLAUDE.md "Paper-claim-first workflow", every claim has the form `paper_claim_X` (paper's stated number) + `X_meets_paper_claim` (theorem that our derivation matches). Either the proof closes (paper verified for this component) or it doesn't (gap found). This file covers cost claims (T-count). Semantic correctness — does the MAJ gadget actually compute the majority function on bits? — lives in `BQAlgo/CuccaroCorrectness.lean`. Refs: - Cuccaro, Draper, Kutin, Moulton, "A new quantum ripple-carry addition circuit" (arXiv:quant-ph/0410184). - SQIR/examples/shor/ModMult.v (Coq encoding we're mirroring). - SQIR/examples/shor/ResourceShor.v `bcgcount_MAJ` ≤ 3 (gate count, not T-count). Each MAJ has 1 CCX + 2 CX, so under the textbook 7-T Toffoli decomposition, T-count is 7 per MAJ.

defpaper_claim_MAJ_tcount

def paper_claim_MAJ_tcount : Nat

Per-MAJ T-count claim. Source chain: SQIR `bcgcount_MAJ` ≤ 3 (gate count) ⟹ 1 Toffoli + 2 CX per MAJ ⟹ 7 T-gates per Toffoli (textbook) ⟹ 7 T-gates per MAJ.

defpaper_claim_UMA_tcount

def paper_claim_UMA_tcount : Nat

Per-UMA T-count claim. Same derivation as MAJ.

theoremMAJ_meets_paper_claim

theorem MAJ_meets_paper_claim (a b c : Nat) :
    tcount (cuccaro_MAJ a b c) = paper_claim_MAJ_tcount

✅ MAJ meets the paper claim (T-count = 7).

theoremUMA_meets_paper_claim

theorem UMA_meets_paper_claim (a b c : Nat) :
    tcount (cuccaro_UMA a b c) = paper_claim_UMA_tcount

✅ UMA meets the paper claim (T-count = 7).

example(example)

example : tcount (cuccaro_MAJ 0 1 2) = 7

example(example)

example : tcount (cuccaro_UMA 0 1 2) = 7

example(example)

example : tcount (seq (cuccaro_MAJ 0 1 2) (cuccaro_UMA 0 1 2)) = 14

theoremMAJ_UMA_pair_tcount

theorem MAJ_UMA_pair_tcount (a b c a' b' c' : Nat) :
    tcount (seq (cuccaro_MAJ a b c) (cuccaro_UMA a' b' c')) = 14

Parametric MAJ+UMA pair cost: 14 T for any qubit assignment.

theoremtcount_cuccaro_maj_chain

theorem tcount_cuccaro_maj_chain (n q_start : Nat) :
    tcount (cuccaro_maj_chain n q_start) = 7 * n

T-count of an n-block MAJ chain is exactly `7 * n` (no cross-block savings from gate-level optimization alone).

defoptimize_ccx_pair_top

def optimize_ccx_pair_top : Gate → Gate
  | seq (CCX a b c) (CCX a' b' c') =>
      if a = a' ∧ b = b' ∧ c = c' then I
      else seq (CCX a b c) (CCX a' b' c')
  | seq g₁ g₂ => seq g₁ g₂
  | I => I
  | X q => X q
  | CX a b => CX a b
  | CCX a b c => CCX a b c

Top-level CCX-pair-removal: if the outermost `seq` contains the same `CCX a b c` on both sides, replace with `I` (zero T-count). All other shapes are returned unchanged.

example(example)

example : tcount (optimize_ccx_pair_top (seq (CCX 0 1 2) (CCX 0 1 2))) = 0

Smoke test: the optimization detects an identical adjacent CCX pair and reduces T-count from 14 to 0.

example(example)

example : tcount (optimize_ccx_pair_top (seq (CCX 0 1 2) (CCX 0 1 3))) = 14

Smoke test: when the two CCX's differ (different target), the optimizer leaves the circuit unchanged.

example(example)

example : optimize_ccx_pair_top (seq (X 0) (CX 0 1)) = seq (X 0) (CX 0 1)

Smoke test: non-CCX shapes are passed through.

theoremtcount_optimize_ccx_pair_top_le

theorem tcount_optimize_ccx_pair_top_le (g : Gate) :
    tcount (optimize_ccx_pair_top g) ≤ tcount g

The optimization never increases T-count: it either rewrites a matching CCX pair to `I` (drops 14 T's) or leaves the circuit unchanged. Combined with the semantic justification in `Framework.PadAction.CCX_CCX_id`, this is a real T-count monotonicity proof for a top-level circuit rewrite.

theoremgcount_optimize_ccx_pair_top_le

theorem gcount_optimize_ccx_pair_top_le (g : Gate) :
    gcount (optimize_ccx_pair_top g) ≤ gcount g

Gate-count monotonicity for the top-level CCX-pair rewrite. Same case structure as the T-count version: pair-match drops gcount from 2 to 0, all other shapes are unchanged.

defoptimize_ccx_pairs_deep

def optimize_ccx_pairs_deep : Gate → Gate
  | seq g₁ g₂ =>
      optimize_ccx_pair_top
        (seq (optimize_ccx_pairs_deep g₁) (optimize_ccx_pairs_deep g₂))
  | g => g

Recursive deep CCX-pair-removal: bottom-up, optimize children first, then apply the top-level rewrite. Catches nested patterns that `optimize_ccx_pair_top` alone misses.

example(example)

example :
    tcount (optimize_ccx_pairs_deep
      (seq (X 0) (seq (CCX 0 1 2) (CCX 0 1 2)))) = 0

Smoke test: nested CCX pair inside a seq is detected. Without the deep optimizer, `optimize_ccx_pair_top` alone wouldn't touch a CCX pair hidden behind a `seq (X 0) (...)`.

example(example)

example :
    tcount (optimize_ccx_pairs_deep (seq (CCX 0 1 2) (CCX 0 1 2))) = 0

Smoke test: deep optimizer also catches the trivial top-level case.

theoremtcount_optimize_ccx_pairs_deep_le

theorem tcount_optimize_ccx_pairs_deep_le (g : Gate) :
    tcount (optimize_ccx_pairs_deep g) ≤ tcount g

Deep optimization is also T-count-monotone-non-increasing. Inductive proof: assume both children's T-counts are bounded above by their pre-optimization values (IH), then chain through `seq` additivity and the top-level monotonicity result.

defoptimize_ccx_iter

def optimize_ccx_iter : Nat → Gate → Gate
  | 0, g => g
  | n + 1, g => optimize_ccx_iter n (optimize_ccx_pairs_deep g)

Nat-fueled iteration of `optimize_ccx_pairs_deep`. Useful as a fixpoint driver: pick an upper bound on the number of passes (e.g., bounded by gate count) and the result is guaranteed to be no worse than the input.

theoremtcount_optimize_ccx_iter_le

theorem tcount_optimize_ccx_iter_le (n : Nat) (g : Gate) :
    tcount (optimize_ccx_iter n g) ≤ tcount g

Iterated optimization preserves T-count monotonicity. Inductive on the fuel: each step composes the deep optimizer's bound with the previous iterates'.

example(example)

example : tcount (optimize_ccx_iter 5 (seq (CCX 0 1 2) (CCX 0 1 2))) = 0

Smoke: a top-level CCX pair optimizes to T-count 0 even with one iteration of fuel.

example(example)

example : tcount (optimize_ccx_iter 10 (CCX 0 1 2)) = 7

Smoke: a single (un-pairable) CCX is not affected by any number of iterations — T-count stays at 7.

defoptimize_I_top

def optimize_I_top : Gate → Gate
  | seq I g => g
  | seq g I => g
  | g       => g

Top-level identity-elimination: drops `I` from either side of an outermost `seq`.

theoremtcount_optimize_I_top

theorem tcount_optimize_I_top (g : Gate) :
    tcount (optimize_I_top g) = tcount g

Identity-elimination preserves T-count exactly (since `tcount I = 0`).

theoremtcount_optimize_I_top_le

theorem tcount_optimize_I_top_le (g : Gate) :
    tcount (optimize_I_top g) ≤ tcount g

T-count monotonicity follows trivially from exact equality.

theoremgcount_optimize_I_top

theorem gcount_optimize_I_top (g : Gate) :
    gcount (optimize_I_top g) = gcount g

Gate-count monotonicity for I-elimination: also exact preservation, since `gcount I = 0` (identity gates don't count).

theoremgcount_optimize_I_top_le

theorem gcount_optimize_I_top_le (g : Gate) :
    gcount (optimize_I_top g) ≤ gcount g

example(example)

example : tcount (optimize_I_top (seq I (CCX 0 1 2))) = 7

Smoke: chaining I-elimination with the deep CCX optimizer collapses a cascading pattern. `seq (CCX) (seq (CCX ; CCX) (CCX))` goes: deep → `seq (CCX) (seq I (CCX))` → I-elim doesn't fire at top-level yet, but if applied recursively it would. This smoke only checks top-level chaining behavior.

example(example)

example : optimize_I_top (seq (CCX 0 1 2) I) = CCX 0 1 2

Smoke: `seq (CCX) I` reduces to plain `CCX`.

defoptimize_I_pairs_deep

def optimize_I_pairs_deep : Gate → Gate
  | seq g₁ g₂ =>
      optimize_I_top
        (seq (optimize_I_pairs_deep g₁) (optimize_I_pairs_deep g₂))
  | g => g

Recursive deep identity-elimination: bottom-up, optimize children first, then apply the top-level rewrite.

theoremtcount_optimize_I_pairs_deep

theorem tcount_optimize_I_pairs_deep (g : Gate) :
    tcount (optimize_I_pairs_deep g) = tcount g

Deep I-elimination preserves T-count exactly (every step is exact).

defoptimize_full

def optimize_full (g : Gate) : Gate

The full single-pass optimizer: first reduce CCX pairs (which may introduce `I` placeholders), then sweep out the resulting `I`s.

theoremtcount_optimize_full_le

theorem tcount_optimize_full_le (g : Gate) :
    tcount (optimize_full g) ≤ tcount g

The combined optimizer is also T-count-monotone-non-increasing. Since deep I-elimination preserves T-count exactly, the combined bound is just the deep CCX bound.

example(example)

example : tcount (optimize_full
    (seq I (seq (CCX 0 1 2) (CCX 0 1 2)))) = 0

Smoke: directly-adjacent CCX pair under an `I` wrapper. The CCX pair is detected by the deep CCX-elim (returns `seq I I`), then the I-elim collapses it to `I`.

example(example)

example : tcount (optimize_full
    (seq (CCX 0 1 2) (seq (CCX 0 1 2) I))) = 14

Smoke: an un-adjacent CCX pair (separated by a seq wrapper) is NOT fully collapsed by a single `optimize_full` pass — the CCX elim only sees the inner `seq (CCX) I` after I-elim runs, which happens later. Documents the known limitation: one pass isn't a fixpoint.

defoptimize_full_iter

def optimize_full_iter : Nat → Gate → Gate
  | 0,     g => g
  | n + 1, g => optimize_full_iter n (optimize_full g)

Nat-fueled iteration of `optimize_full`. Each fuel step alternates CCX-pair removal and I-elimination, so two iterations suffice for the associativity-blocked example above.

theoremtcount_optimize_full_iter_le

theorem tcount_optimize_full_iter_le (n : Nat) (g : Gate) :
    tcount (optimize_full_iter n g) ≤ tcount g

Iterated combined optimization is monotone non-increasing in T-count. Inductive on fuel via `Nat.le_trans` and the single-pass bound.

example(example)

example : tcount (optimize_full_iter 2
    (seq (CCX 0 1 2) (seq (CCX 0 1 2) I))) = 0

Smoke: the associativity-blocked case from above now collapses to `I` (T-count 0) after 2 iterations. First iteration runs CCX-elim + I-elim, exposing a new top-level `seq (CCX) (CCX)` pair that the second iteration eliminates.

theoremgcount_optimize_ccx_pairs_deep_le

theorem gcount_optimize_ccx_pairs_deep_le (g : Gate) :
    gcount (optimize_ccx_pairs_deep g) ≤ gcount g

theoremgcount_optimize_I_pairs_deep

theorem gcount_optimize_I_pairs_deep (g : Gate) :
    gcount (optimize_I_pairs_deep g) = gcount g

theoremgcount_optimize_full_le

theorem gcount_optimize_full_le (g : Gate) :
    gcount (optimize_full g) ≤ gcount g

theoremgcount_optimize_full_iter_le

theorem gcount_optimize_full_iter_le (n : Nat) (g : Gate) :
    gcount (optimize_full_iter n g) ≤ gcount g

theoremgcount_optimize_ccx_pair_top_strict_on_pair

theorem gcount_optimize_ccx_pair_top_strict_on_pair (a b c : Nat) :
    gcount (optimize_ccx_pair_top (seq (CCX a b c) (CCX a b c))) <
      gcount (seq (CCX a b c) (CCX a b c))

If a CCX-CCX pair appears at the top level (same triple on both sides), the top-level optimizer strictly decreases gcount.

theoremtcount_optimize_ccx_pair_top_strict_on_pair

theorem tcount_optimize_ccx_pair_top_strict_on_pair (a b c : Nat) :
    tcount (optimize_ccx_pair_top (seq (CCX a b c) (CCX a b c))) <
      tcount (seq (CCX a b c) (CCX a b c))

Same statement for T-count: 14 → 0 is strict.

defhas_ccx_pair

def has_ccx_pair : Gate → Bool
  | seq (CCX a b c) (CCX a' b' c') =>
      (decide (a = a')) && (decide (b = b')) && (decide (c = c'))
  | seq g₁ g₂ => has_ccx_pair g₁ || has_ccx_pair g₂
  | _ => false

Decidable predicate: does `g` contain an adjacent CCX-CCX pair anywhere? Recurses into `seq` children. Used as the hypothesis for the future strict-decrease theorem.

example(example)

example : has_ccx_pair (seq (CCX 0 1 2) (CCX 0 1 2)) = true

Smoke: direct CCX pair is detected.

example(example)

example : has_ccx_pair (seq (X 0) (seq (CCX 0 1 2) (CCX 0 1 2))) = true

Smoke: nested CCX pair under X is detected via recursion.

example(example)

example : has_ccx_pair (seq (CCX 0 1 2) (CCX 0 1 3)) = false

Smoke: differing-triple CCX pair returns false.

example(example)

example : has_ccx_pair (seq (X 0) (CX 0 1)) = false

Smoke: no CCX at all → false.

example(example)

example : has_ccx_pair
    (seq (X 0) (seq (X 1) (seq (CCX 0 1 2) (CCX 0 1 2)))) = true

Smoke: deeply nested CCX pair behind two X's.

theoremgcount_optimize_ccx_pairs_deep_strict_on_pair

theorem gcount_optimize_ccx_pairs_deep_strict_on_pair (a b c : Nat) :
    gcount (optimize_ccx_pairs_deep (seq (CCX a b c) (CCX a b c))) <
      gcount (seq (CCX a b c) (CCX a b c))

Strict-decrease witness for the deep optimizer at the simplest input shape: a top-level CCX-CCX pair. The deep optimizer recurses into each child (both CCXs return themselves), then top-level matches the pair → `I`. So gcount drops 2 → 0 strictly. This is the "easy" seed for the future general theorem `has_ccx_pair g = true → gcount (optimize_ccx_pairs_deep g) < gcount g`.

theoremtcount_optimize_ccx_pairs_deep_strict_on_pair

theorem tcount_optimize_ccx_pairs_deep_strict_on_pair (a b c : Nat) :
    tcount (optimize_ccx_pairs_deep (seq (CCX a b c) (CCX a b c))) <
      tcount (seq (CCX a b c) (CCX a b c))

Same for T-count: deep optimizer drops 14 → 0 strictly on a pair.

theoremgcount_optimize_ccx_pairs_deep_strict_seq_X_pair

theorem gcount_optimize_ccx_pairs_deep_strict_seq_X_pair (q a b c : Nat) :
    gcount (optimize_ccx_pairs_deep
      (seq (X q) (seq (CCX a b c) (CCX a b c)))) <
    gcount (seq (X q) (seq (CCX a b c) (CCX a b c)))

Strict-decrease for a CCX pair nested under an X wrapper: the deep optimizer recurses, eliminates the inner CCX pair (replacing with `I`), then leaves `seq (X q) I` at the top. gcount drops 3 → 1. Demonstrates that strict-decrease propagates through the recursive structure of the deep optimizer when a pair exists anywhere below.

theoremtcount_optimize_ccx_pairs_deep_strict_seq_X_pair

theorem tcount_optimize_ccx_pairs_deep_strict_seq_X_pair (q a b c : Nat) :
    tcount (optimize_ccx_pairs_deep
      (seq (X q) (seq (CCX a b c) (CCX a b c)))) <
    tcount (seq (X q) (seq (CCX a b c) (CCX a b c)))

T-count strict-decrease for the same nested-under-X case: 14 → 0 (the X has tcount 0; only the CCX pair contributes).

theoremgcount_optimize_ccx_pairs_deep_strict_pair_seq_X

theorem gcount_optimize_ccx_pairs_deep_strict_pair_seq_X (a b c q : Nat) :
    gcount (optimize_ccx_pairs_deep
      (seq (seq (CCX a b c) (CCX a b c)) (X q))) <
    gcount (seq (seq (CCX a b c) (CCX a b c)) (X q))

Symmetric form: CCX pair on the LEFT of an X wrapper. Deep optimizer reduces gcount 3 → 1 and tcount 14 → 0.

theoremtcount_optimize_ccx_pairs_deep_strict_pair_seq_X

theorem tcount_optimize_ccx_pairs_deep_strict_pair_seq_X (a b c q : Nat) :
    tcount (optimize_ccx_pairs_deep
      (seq (seq (CCX a b c) (CCX a b c)) (X q))) <
    tcount (seq (seq (CCX a b c) (CCX a b c)) (X q))

theoremgcount_optimize_ccx_pairs_deep_strict_pair_left

theorem gcount_optimize_ccx_pairs_deep_strict_pair_left (a b c : Nat) (g : Gate) :
    gcount (optimize_ccx_pairs_deep
      (seq (seq (CCX a b c) (CCX a b c)) g)) <
    gcount (seq (seq (CCX a b c) (CCX a b c)) g)

Parametric: CCX pair on the LEFT, any gate `g` on the right. The deep optimizer collapses the pair to `I`, then leaves `seq I (deep g)` at top-level (no further match). Strict because the pair drops 2 gcount; `g`-side is monotone non-increasing.

theoremgcount_optimize_ccx_pairs_deep_strict_pair_right

theorem gcount_optimize_ccx_pairs_deep_strict_pair_right (g : Gate) (a b c : Nat) :
    gcount (optimize_ccx_pairs_deep
      (seq g (seq (CCX a b c) (CCX a b c)))) <
    gcount (seq g (seq (CCX a b c) (CCX a b c)))

Symmetric parametric: CCX pair on the RIGHT, any gate `g` on the left. Same shape as the `_left` variant: collapse the inner pair to `I`. The top-level optimizer can't be definitionally reduced on `seq (deep g) I` (since `deep g` is opaque), but its universal monotonicity bound is enough.

theoremgcount_optimize_ccx_pairs_deep_strict_via_left

theorem gcount_optimize_ccx_pairs_deep_strict_via_left (g₁ g₂ : Gate)
    (ih₁ : gcount (optimize_ccx_pairs_deep g₁) < gcount g₁) :
    gcount (optimize_ccx_pairs_deep (seq g₁ g₂)) < gcount (seq g₁ g₂)

If the LEFT child's deep optimization strictly decreases gcount, so does the seq's.

theoremgcount_optimize_ccx_pairs_deep_strict_via_right

theorem gcount_optimize_ccx_pairs_deep_strict_via_right (g₁ g₂ : Gate)
    (ih₂ : gcount (optimize_ccx_pairs_deep g₂) < gcount g₂) :
    gcount (optimize_ccx_pairs_deep (seq g₁ g₂)) < gcount (seq g₁ g₂)

Symmetric: if the RIGHT child's deep optimization strictly decreases gcount, so does the seq's.

theoremgcount_optimize_ccx_pairs_deep_strict

theorem gcount_optimize_ccx_pairs_deep_strict (g : Gate)
    (h : has_ccx_pair g = true) :
    gcount (optimize_ccx_pairs_deep g) < gcount g

*Main strict-decrease theorem.** If a gate contains any adjacent CCX-CCX pair (anywhere — `has_ccx_pair` recursive detector), the deep optimizer strictly reduces gcount. This is the well-founded termination prerequisite for an unfueled fixpoint.

theoremgcount_optimize_full_strict

theorem gcount_optimize_full_strict (g : Gate)
    (h : has_ccx_pair g = true) :
    gcount (optimize_full g) < gcount g

Strict-decrease lifted to `optimize_full = I-deep ∘ CCX-deep`. Since I-elim preserves gcount exactly, the strict drop comes entirely from the CCX-elim phase. Direct 3-line chain.

defoptimize_to_fixpoint

def optimize_to_fixpoint (g : Gate) : Gate

Iterate `optimize_full` until no adjacent CCX pair remains. Well-founded recursion on `gcount g`. The `_h` proof of `has_ccx_pair g = true` is unused inside the `then` branch but consumed by `decreasing_by` — Lean 4 allows the `_` prefix while still permitting references in proof-obligation blocks.

theoremoptimize_to_fixpoint_eq_self_of_no_pair

theorem optimize_to_fixpoint_eq_self_of_no_pair (g : Gate)
    (h : has_ccx_pair g = false) :
    optimize_to_fixpoint g = g

Easy direction: when `g` has no pair, `optimize_to_fixpoint g = g`. Just unfolds the `else` branch of the wf-definition.

theoremoptimize_to_fixpoint_eq_recurse_of_pair

theorem optimize_to_fixpoint_eq_recurse_of_pair (g : Gate)
    (h : has_ccx_pair g = true) :
    optimize_to_fixpoint g = optimize_to_fixpoint (optimize_full g)

One-step unfolding when `g` has a pair: `optimize_to_fixpoint g = optimize_to_fixpoint (optimize_full g)`.

theoremhas_ccx_pair_optimize_to_fixpoint

theorem has_ccx_pair_optimize_to_fixpoint (g : Gate) :
    has_ccx_pair (optimize_to_fixpoint g) = false

*Fixpoint property.** The optimizer terminates at an output with no remaining CCX pairs. Proved by well-founded recursion on `gcount g`, with `gcount_optimize_full_strict` as the decreasing bound.

theoremtcount_optimize_to_fixpoint_le

theorem tcount_optimize_to_fixpoint_le (g : Gate) :
    tcount (optimize_to_fixpoint g) ≤ tcount g

T-count monotonicity for the fixpoint operator. Same WF-recursive proof pattern as the fixpoint property, chaining the IH with `tcount_optimize_full_le` for the recursive step.

theoremgcount_optimize_to_fixpoint_le

theorem gcount_optimize_to_fixpoint_le (g : Gate) :
    gcount (optimize_to_fixpoint g) ≤ gcount g

Same monotonicity for gate-count.

defassoc_right_step

def assoc_right_step : Gate → Gate
  | seq (seq a b) c => seq a (seq b c)
  | g => g

Single top-level right-rotation: turns `seq (seq a b) c` into `seq a (seq b c)`. All other shapes pass through unchanged.

theoremtcount_assoc_right_step

theorem tcount_assoc_right_step (g : Gate) :
    tcount (assoc_right_step g) = tcount g

The rotation preserves T-count exactly.

theoremgcount_assoc_right_step

theorem gcount_assoc_right_step (g : Gate) :
    gcount (assoc_right_step g) = gcount g

The rotation preserves gate count exactly.

example(example)

example (a b c : Nat) (q : Nat) :
    assoc_right_step (seq (seq (CCX a b c) (CCX a b c)) (X q))
      = seq (CCX a b c) (seq (CCX a b c) (X q))

Smoke: rotating `seq (seq A B) C` gives `seq A (seq B C)`.

example(example)

example (q1 q2 : Nat) : assoc_right_step (seq (X q1) (X q2)) = seq (X q1) (X q2)

Smoke: rotating a non-left-leaning seq is a no-op.

defassoc_right_iter

def assoc_right_iter : Nat → Gate → Gate
  | 0, g => g
  | n + 1, g => assoc_right_iter n (assoc_right_step g)

Nat-fueled iteration of `assoc_right_step` at the top level. With enough fuel, the outer seq tree becomes right-leaning.

theoremtcount_assoc_right_iter

theorem tcount_assoc_right_iter (n : Nat) (g : Gate) :
    tcount (assoc_right_iter n g) = tcount g

Iterating rotations preserves T-count exactly. Induction on fuel + each step's preservation.

theoremgcount_assoc_right_iter

theorem gcount_assoc_right_iter (n : Nat) (g : Gate) :
    gcount (assoc_right_iter n g) = gcount g

Same exact preservation for gate count.

FormalRV.Arithmetic.Cuccaro.CuccaroAddConst

FormalRV/Arithmetic/Cuccaro/CuccaroAddConst.lean

FormalRV.BQAlgo.CuccaroAddConst — exact-budget Cuccaro add-constant primitive. Tick 45: build the add-constant primitive on top of the `cuccaro_n_bit_adder_full` machinery from Ticks 41-44. `cuccaro_addConstGate bits q_start c` implements `target ← (target + c) mod 2^bits` in place on the target/b register at positions `q_start + 2i + 1`, using a "prepare + adder + unprepare" pattern that re-encodes the constant `c` into the read/a register, runs the full Cuccaro adder, then unprepares the read register back to zero. Total qubit budget: `2*bits + 1` starting at `q_start` — matches SQIR's `modmult_rev_anc bits = 2*bits + 1` exactly. Structure: - `cuccaro_prepareConstRead`: XOR each read position with `c.testBit i`. - Per-position lemmas: action at read positions vs everywhere else. - `cuccaro_addConstGate`: composed gate. - Decoded correctness: target = `(x + c) % 2^bits`, read restored to 0, carry-in restored to false. - WellTyped + packaged primitive.

defcuccaro_prepareConstRead

def cuccaro_prepareConstRead : Nat → Nat → Nat → Gate
  | 0,     _,       _ => Gate.I
  | n + 1, q_start, c =>
      seq (cuccaro_prepareConstRead n q_start c)
          (cond (c.testBit n) (Gate.X (q_start + 2 * n + 2)) Gate.I)

*Constant-read preparation.** For each bit `i < bits`, applies `X` at the read-register position `q_start + 2*i + 2` iff `c.testBit i`. The gate is self-inverse on the affected positions (since X² = I).

theoremcuccaro_prepareConstRead_at_other

theorem cuccaro_prepareConstRead_at_other
    (bits q_start c q : Nat)
    (hq : ∀ i, i < bits → q ≠ q_start + 2 * i + 2)
    (f : Nat → Bool) :
    Gate.applyNat (cuccaro_prepareConstRead bits q_start c) f q = f q

*Frame: prepare doesn't touch positions outside the read range.** If `q` is not equal to any read position `q_start + 2*i + 2` (i < bits), the prepare gate leaves `f q` unchanged.

theoremcuccaro_prepareConstRead_at_read

theorem cuccaro_prepareConstRead_at_read
    (bits q_start c j : Nat) (hj : j < bits) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_prepareConstRead bits q_start c) f
        (q_start + 2 * j + 2)
      = xor (f (q_start + 2 * j + 2)) (c.testBit j)

*Action at read positions.** At read-position `q_start + 2*j + 2` for `j < bits`, the prepare gate XOR's `c.testBit j` into the existing value.

theoremcuccaro_prepareConstRead_wellTyped

theorem cuccaro_prepareConstRead_wellTyped
    (bits q_start c dim : Nat) (h : q_start + 2 * bits + 1 ≤ dim) :
    Gate.WellTyped dim (cuccaro_prepareConstRead bits q_start c)

*WellTyped: prepare fits in `q_start + 2*bits + 1` qubits.**

defcuccaro_addConstGate

def cuccaro_addConstGate (bits q_start c : Nat) : Gate

*Exact-budget Cuccaro add-constant gate.** Implements `target ← (target + c) mod 2^bits` in place via prepare-adder-unprepare. Total qubit budget: `2*bits + 1` starting at `q_start`.

theoremcuccaro_addConstGate_target_bit

theorem cuccaro_addConstGate_target_bit
    (bits q_start c x i : Nat) (hi : i < bits)
    (hc : c < 2^bits) :
    Gate.applyNat (cuccaro_addConstGate bits q_start c)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * i + 1)
      = (x + c).testBit i

*Target bit at position `q_start + 2*i + 1` for `i < bits` after the addConstGate**: equals `(x + c).testBit i`. Proved by tracing the three-stage composition: - After prepare₁ on input `cuccaro_input_F q_start false 0 x`: carry-in = false, b-bits = x.testBit, a-bits = c.testBit (XOR'd in). - After full adder: sum bit = `(x + c).testBit i` via the sum-bit theorem and `Adder.sumfb_eq_testBit_add_gen`. - After prepare₂: target b-bit position unchanged (prepare touches only a-positions).

theoremcuccaro_addConstGate_read_bit

theorem cuccaro_addConstGate_read_bit
    (bits q_start c x i : Nat) (hi : i < bits) :
    Gate.applyNat (cuccaro_addConstGate bits q_start c)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * i + 2)
      = false

*Read bit at position `q_start + 2*i + 2` for `i < bits` after the addConstGate**: equals `false` (restored to zero). Trace: - After prepare₁: a-bit at q_start+2*i+2 = false ⊕ c.testBit i = c.testBit i. - After full adder: a preserved (= c.testBit i) by `_a_restored`. - After prepare₂: c.testBit i ⊕ c.testBit i = false.

theoremcuccaro_addConstGate_carry_in_bit

theorem cuccaro_addConstGate_carry_in_bit
    (bits q_start c x : Nat) :
    Gate.applyNat (cuccaro_addConstGate bits q_start c)
        (cuccaro_input_F q_start false 0 x) q_start = false

*Carry-in at position `q_start` after the addConstGate**: equals `false` (restored).

theoremcuccaro_addConstGate_target_decode

theorem cuccaro_addConstGate_target_decode
    (bits q_start c x : Nat) (hc : c < 2^bits) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (cuccaro_addConstGate bits q_start c)
          (cuccaro_input_F q_start false 0 x))
      = (x + c) % 2^bits

*HEADLINE — decoded target correctness.** After running `cuccaro_addConstGate bits q_start c` on `cuccaro_input_F q_start false 0 x`, the target register decodes to `(x + c) % 2^bits`.

theoremcuccaro_addConstGate_read_decode

theorem cuccaro_addConstGate_read_decode
    (bits q_start c x : Nat) :
    cuccaro_read_val bits q_start
        (Gate.applyNat (cuccaro_addConstGate bits q_start c)
          (cuccaro_input_F q_start false 0 x))
      = 0

*Decoded read restoration.** After running the addConstGate, the read register decodes to `0`.

theoremcuccaro_addConstGate_wellTyped

theorem cuccaro_addConstGate_wellTyped
    (bits q_start c dim : Nat) (h : q_start + 2 * bits + 1 ≤ dim) :
    Gate.WellTyped dim (cuccaro_addConstGate bits q_start c)

*WellTyped: addConstGate fits in `q_start + 2*bits + 1` qubits.**

theoremcuccaro_addConstGate_clean

theorem cuccaro_addConstGate_clean
    (bits q_start c x : Nat) (hc : c < 2^bits) :
    Gate.WellTyped (q_start + (2 * bits + 1))
        (cuccaro_addConstGate bits q_start c)
    ∧ cuccaro_target_val bits q_start
          (Gate.applyNat (cuccaro_addConstGate bits q_start c)
            (cuccaro_input_F q_start false 0 x))
        = (x + c) % 2^bits
    ∧ cuccaro_read_val bits q_start
          (Gate.applyNat (cuccaro_addConstGate bits q_start c)
            (cuccaro_input_F q_start false 0 x))
        = 0

*HEADLINE — packaged Cuccaro add-constant primitive.** For any `bits`, `q_start`, `c < 2^bits`, and `x`, the addConstGate: - is WellTyped at dimension `q_start + (2*bits + 1)`; - writes `(x + c) % 2^bits` into the target register; - restores the read register to 0; - restores the carry-in qubit to false.

FormalRV.Arithmetic.Cuccaro.CuccaroAdderCorrectness

FormalRV/Arithmetic/Cuccaro/CuccaroAdderCorrectness.lean

FormalRV.Arithmetic.Cuccaro.CuccaroAdderCorrectness ─────────────────────────────────────────────────── THE semantic-correctness theorem for the Cuccaro n-bit adder. Imports the definition from `CuccaroAdderDef.lean`. The single theorem to audit is `cuccaro_adder_correct`. Its proof is delegated to the supporting lemmas in `CuccaroDecoded.lean` / `CuccaroFull.lean` / `CuccaroCorrectness.lean`.

theoremcuccaro_adder_correct

theorem cuccaro_adder_correct (bits q_start a b : Nat)
    (ha : a < 2 ^ bits) (hb : b < 2 ^ bits) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
          (cuccaro_input_F q_start false a b))
      = (a + b) % 2 ^ bits

*Cuccaro adder — semantic correctness (THE headline).** Running `cuccaro_n_bit_adder_full bits q_start` on the standard input encoding `cuccaro_input_F q_start false a b` (carry-in 0, target register = b, read register = a, with `a, b < 2^bits`) leaves the **target register decoding to `(a + b) mod 2^bits`**. The companion facts — read register restored to `a`, carry-in restored to 0, and WellTyped on the `2*bits + 1` qubit budget — are bundled in `cuccaro_adder_correct_full` below.

theoremcuccaro_adder_correct_full

theorem cuccaro_adder_correct_full (bits q_start a b : Nat)
    (ha : a < 2 ^ bits) (hb : b < 2 ^ bits) :
    Gate.WellTyped (q_start + (2 * bits + 1)) (cuccaro_n_bit_adder_full bits q_start)
    ∧ cuccaro_target_val bits q_start
          (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
            (cuccaro_input_F q_start false a b)) = (a + b) % 2 ^ bits
    ∧ cuccaro_read_val bits q_start
          (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
            (cuccaro_input_F q_start false a b)) = a
    ∧ Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
          (cuccaro_input_F q_start false a b) q_start = false

*Cuccaro adder — full correctness bundle.** The adder is WellTyped on the `2*bits + 1` qubit budget, writes `(a+b) % 2^bits` to the target register, preserves the read register `a`, and restores the carry-in to 0.

FormalRV.Arithmetic.Cuccaro.CuccaroAdderDef

FormalRV/Arithmetic/Cuccaro/CuccaroAdderDef.lean

FormalRV.Arithmetic.Cuccaro.CuccaroAdderDef ────────────────────────────────────────── THE definition of the Cuccaro–Draper–Kutin–Moulton n-bit ripple-carry adder, as concrete `Gate` data over the Framework IR. **Definitions only — no proofs.** THE adder is `cuccaro_n_bit_adder_full`: a forward MAJ chain followed by a REVERSE UMA chain, on `2*n + 1` qubits starting at `q_start`: • q_start + 0 : carry-in • q_start + 2i + 1 : bit i of b (target register; becomes (a+b+c_in) mod 2^n) • q_start + 2i + 2 : bit i of a (read register; preserved) Where to look next: • Semantic correctness : `CuccaroAdderCorrectness.lean` • Resources (T/Toffoli/qubits) : `CuccaroAdderResource.lean` • Supporting lemmas : `CuccaroFull.lean` / `CuccaroCorrectness.lean` / `CuccaroDecoded.lean` Refs: Cuccaro–Draper–Kutin–Moulton, arXiv:quant-ph/0410184; SQIR `ModMult.v`. The reverse-UMA ordering is the boundary correction validated by `scripts/check_cuccaro_adder.py` (exhaustive for n = 1..4).

defcuccaro_MAJ

def cuccaro_MAJ (a b c : Nat) : Gate

Cuccaro MAJ gadget: `MAJ a b c = CX c b ; CX c a ; CCX a b c`.

defcuccaro_UMA

def cuccaro_UMA (a b c : Nat) : Gate

Cuccaro UMA gadget: `UMA a b c = CCX a b c ; CX c a ; CX a b`.

defcuccaro_maj_chain

def cuccaro_maj_chain : Nat → Nat → Gate
  | 0,     _       => I
  | n + 1, q_start =>
      seq (cuccaro_MAJ q_start (q_start + 1) (q_start + 2))
          (cuccaro_maj_chain n (q_start + 2))

Forward chain of `n` MAJ gadgets on consecutive triples starting at `q_start`, then `q_start + 2`, … (the Cuccaro ripple structure).

defcuccaro_uma_chain_reverse

def cuccaro_uma_chain_reverse : Nat → Nat → Gate
  | 0,     _       => I
  | n + 1, q_start =>
      seq (cuccaro_uma_chain_reverse n (q_start + 2))
          (cuccaro_UMA q_start (q_start + 1) (q_start + 2))

Reverse UMA chain: `UMA_{n-1}, UMA_{n-2}, …, UMA_0` applied in descending order on consecutive triples starting at `q_start`.

defcuccaro_n_bit_adder_full

def cuccaro_n_bit_adder_full (n q_start : Nat) : Gate

*THE n-bit Cuccaro adder** (boundary-corrected): forward MAJ chain then REVERSE UMA chain, on `2*n + 1` qubits from `q_start`, computing `target := (a + b + c_in) mod 2^n` in place. Correctness: `cuccaro_n_bit_adder_full_target_decode` (CuccaroAdderCorrectness). Resource: `tcount_cuccaro_n_bit_adder_full = 14 * n` (CuccaroAdderResource).

FormalRV.Arithmetic.Cuccaro.CuccaroAdderExample

FormalRV/Arithmetic/Cuccaro/CuccaroAdderExample.lean

FormalRV.Arithmetic.Cuccaro.CuccaroAdderExample ─────────────────────────────────────────────── A worked example for the Cuccaro adder + its `Gadget` descriptor for the uniform QASM emitter. This file contains `#eval` demos, so it is kept OFF the default build path (not imported by the `Arithmetic` umbrella). Build / run on demand: lake build FormalRV.Arithmetic.Cuccaro.CuccaroAdderExample

defcuccaro_adder_2bit

def cuccaro_adder_2bit : Gate

The 2-bit Cuccaro adder.

example(example)

example : tcount cuccaro_adder_2bit = 28

Its T-count is `14 · 2 = 28` (instance of `cuccaro_adder_tcount`).

defCuccaroAdder

def CuccaroAdder : Gadget

The Cuccaro adder as a uniform, emittable `Gadget` descriptor.

example(example)

example (n : Nat) : CuccaroAdder.tcount n = 14 * n

The descriptor's structurally-computed T-count is *exactly* the proven closed form `14 · n` — for every `n`.

FormalRV.Arithmetic.Cuccaro.CuccaroAdderResource

FormalRV/Arithmetic/Cuccaro/CuccaroAdderResource.lean

FormalRV.Arithmetic.Cuccaro.CuccaroAdderResource ──────────────────────────────────────────────── THE resource theorem for the Cuccaro n-bit adder, and the theorem that ties the resource to the SAME circuit the correctness theorem verifies. Imports the definition from `CuccaroAdderDef.lean` and the correctness bundle from `CuccaroAdderCorrectness.lean`. Headlines: • `cuccaro_adder_tcount` — T-count = 14·n. • `cuccaro_adder_verified` — resource-after-correctness: the one circuit is simultaneously correct, WellTyped on 2n+1 qubits, and 14n T-gates.

theoremcuccaro_adder_tcount

theorem cuccaro_adder_tcount (bits q_start : Nat) :
    tcount (cuccaro_n_bit_adder_full bits q_start) = 14 * bits

*Cuccaro adder — resource (THE headline).** The full n-bit adder uses exactly `14 * n` T-gates (n MAJ + n UMA gadgets, 7 T each; CX/CCX-internal CXs are T-free).

theoremcuccaro_adder_verified

theorem cuccaro_adder_verified (bits q_start a b : Nat)
    (ha : a < 2 ^ bits) (hb : b < 2 ^ bits) :
    (Gate.WellTyped (q_start + (2 * bits + 1)) (cuccaro_n_bit_adder_full bits q_start)
      ∧ cuccaro_target_val bits q_start
            (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
              (cuccaro_input_F q_start false a b)) = (a + b) % 2 ^ bits
      ∧ cuccaro_read_val bits q_start
            (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
              (cuccaro_input_F q_start false a b)) = a
      ∧ Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
            (cuccaro_input_F q_start false a b) q_start = false)
    ∧ tcount (cuccaro_n_bit_adder_full bits q_start) = 14 * bits

*Cuccaro adder — verified-with-resource (resource AFTER correctness).** The single object `cuccaro_n_bit_adder_full bits q_start` is simultaneously: 1. semantically correct — writes `(a+b) % 2^bits`, preserves `a`, restores the carry-in (the `cuccaro_adder_correct_full` bundle); 2. **WellTyped on the `2*bits + 1` qubit budget**; 3. **`14 * bits` T-gates**. The resource bounds are stated about *exactly* the circuit the correctness theorem verifies, so "resource" is established only after "correctness".

FormalRV.Arithmetic.Cuccaro.CuccaroCompare

FormalRV/Arithmetic/Cuccaro/CuccaroCompare.lean

FormalRV.BQAlgo.CuccaroCompare — exact-budget comparator from the Cuccaro MAJ-chain forward pass. Tick 47: build the first exact-budget comparison primitive by reading the top carry of the Cuccaro MAJ chain BEFORE the reverse UMA chain uncomputes it. Mathematical idea: to compare `x` with `N`, add the two's-complement constant `K := 2^bits - N` to `x` and read the (bits)-th carry bit: carry_out_bit = decide (N ≤ x). The reverse UMA chain in `cuccaro_n_bit_adder_full` erases this carry; the forward-only gate retains it at position `q_start + 2*bits`. This file proves: - the arithmetic helper relating `Adder.carry false bits` to `(a + b).testBit bits` for a, b < 2^bits; - the comparator's top-carry value = `decide (N ≤ x)` (and its negation = `decide (x < N)`). IMPORTANT: this is a FORWARD-ONLY gate. It leaves the workspace in a "dirty" state — the MAJ chain has propagated XOR'd values through every register position. A separate reverse pass is needed to uncompute the workspace, which destroys the flag. Tick 48+ will address how to use this flag before uncomputation (a future decision-point flagged in QUESTIONS.md).

theoremtestBit_top_of_sum_eq_decide_ge

private theorem testBit_top_of_sum_eq_decide_ge
    (bits a b : Nat) (ha : a < 2^bits) (hb : b < 2^bits) :
    (a + b).testBit bits = decide (2^bits ≤ a + b)

*Top-bit-of-sum lemma** (private helper). For `a, b < 2^bits`, the `bits`-th bit of `a + b` equals `decide (2^bits ≤ a + b)`.

theoremAdder_carry_top_eq_testBit_sum

private theorem Adder_carry_top_eq_testBit_sum
    (bits a b : Nat) (ha : a < 2^bits) (hb : b < 2^bits) :
    Adder.carry false bits (fun i => a.testBit i) (fun i => b.testBit i)
      = (a + b).testBit bits

*Carry-out via top bit of sum** (private helper). For `a, b < 2^bits`, the carry-out of an n-bit addition equals the `bits`-th bit of `a + b`.

theoremadd_twos_complement_carry_out_eq

theorem add_twos_complement_carry_out_eq
    (bits N x : Nat) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Adder.carry false bits
        (fun i => (2^bits - N).testBit i)
        (fun i => x.testBit i)
      = decide (N ≤ x)

*HEADLINE arithmetic helper**: the carry-out of adding the two's-complement constant `2^bits - N` to `x` equals `decide (N ≤ x)`.

defcuccaro_compareConstForwardGate

def cuccaro_compareConstForwardGate (bits q_start N : Nat) : Gate

*Forward-only Cuccaro comparison gate.** Prepares the two's-complement constant `K := 2^bits - N` in the read register, then runs the MAJ chain. The top carry at position `q_start + 2*bits` holds `decide (N ≤ x)`. This gate does NOT uncompute the workspace. Subsequent positions hold XOR'd intermediate values from the MAJ chain. This is by design — uncomputing would erase the flag.

theoremcuccaro_compareConstForward_top_carry

theorem cuccaro_compareConstForward_top_carry
    (bits q_start N x : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Gate.applyNat (cuccaro_compareConstForwardGate bits q_start N)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * bits)
      = decide (N ≤ x)

*HEADLINE — top-carry of the forward comparator = `decide (N ≤ x)`.** After running `cuccaro_compareConstForwardGate bits q_start N` on `cuccaro_input_F q_start false 0 x`, the qubit at position `q_start + 2*bits` holds the comparison flag `decide (N ≤ x)`.

theoremcuccaro_compareConstForward_underflow

theorem cuccaro_compareConstForward_underflow
    (bits q_start N x : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    !(Gate.applyNat (cuccaro_compareConstForwardGate bits q_start N)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * bits))
      = decide (x < N)

*Underflow version**: the negation of the top carry equals `decide (x < N)`.

theoremcuccaro_compareConstForwardGate_wellTyped

theorem cuccaro_compareConstForwardGate_wellTyped
    (bits q_start N dim : Nat) (h : q_start + 2 * bits + 1 ≤ dim) :
    Gate.WellTyped dim (cuccaro_compareConstForwardGate bits q_start N)

*WellTyped: the forward comparator fits in `q_start + 2*bits + 1` qubits.**

theoremcuccaro_compareConstForwardGate_primitive

theorem cuccaro_compareConstForwardGate_primitive
    (bits q_start N x : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Gate.WellTyped (q_start + (2 * bits + 1))
        (cuccaro_compareConstForwardGate bits q_start N)
    ∧ Gate.applyNat (cuccaro_compareConstForwardGate bits q_start N)
          (cuccaro_input_F q_start false 0 x) (q_start + 2 * bits)
        = decide (N ≤ x)

*Packaged exact-budget comparator forward gate.**

FormalRV.Arithmetic.Cuccaro.CuccaroCorrectness

FormalRV/Arithmetic/Cuccaro/CuccaroCorrectness.lean

FormalRV.BQAlgo.CuccaroCorrectness — semantic correctness of the Cuccaro MAJ and UMA gadgets, proved against the Framework's RCIR-level bit-vector semantics. This is the SQIR analogue of `Lemma MAJ_correct` in `RCIR.v` — but Lean-native, computable, and small enough that `decide` discharges every case. The key correctness fact: applied to bits (a, b, c), the MAJ gate writes the **majority** of a, b, c into bit c, while transforming bit a → a ⊕ c and bit b → b ⊕ c (so MAJ is reversible: UMA undoes it).

theoremcuccaro_MAJ_writes_xor_a

theorem cuccaro_MAJ_writes_xor_a (a b c : Bool) :
    apply (cuccaro_MAJ 0 1 2) (mkState3 a b c) 0 = xor a c

After MAJ a b c, bit `a` becomes `a ⊕ c` (XOR with the original c).

theoremcuccaro_MAJ_writes_xor_b

theorem cuccaro_MAJ_writes_xor_b (a b c : Bool) :
    apply (cuccaro_MAJ 0 1 2) (mkState3 a b c) 1 = xor b c

After MAJ a b c, bit `b` becomes `b ⊕ c`.

theoremcuccaro_MAJ_writes_majority

theorem cuccaro_MAJ_writes_majority (a b c : Bool) :
    apply (cuccaro_MAJ 0 1 2) (mkState3 a b c) 2 = majority a b c

After MAJ a b c, bit `c` becomes the **majority** of (a, b, c).

theoremMAJ_then_UMA_restores_a

theorem MAJ_then_UMA_restores_a (a b c : Bool) :
    apply (seq (cuccaro_MAJ 0 1 2) (cuccaro_UMA 0 1 2)) (mkState3 a b c) 0 = a

UMA after MAJ on the same triple restores bit 0 (a ⊕ c → a).

theoremMAJ_then_UMA_writes_sum

theorem MAJ_then_UMA_writes_sum (a b c : Bool) :
    apply (seq (cuccaro_MAJ 0 1 2) (cuccaro_UMA 0 1 2)) (mkState3 a b c) 1
      = xor (xor a b) c

UMA after MAJ writes the **sum bit** `a ⊕ b ⊕ c` into qubit 1. This is the per-bit output of a full adder.

theoremMAJ_then_UMA_restores_c

theorem MAJ_then_UMA_restores_c (a b c : Bool) :
    apply (seq (cuccaro_MAJ 0 1 2) (cuccaro_UMA 0 1 2)) (mkState3 a b c) 2 = c

UMA after MAJ restores bit 2 (the carry-in).

theoremcuccaro_MAJ_at_a

theorem cuccaro_MAJ_at_a
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_MAJ a b c) f a = xor (f a) (f c)

*MAJ local semantics at the `a` wire.** Applied to `f`, the gate writes `f a ⊕ f c` at position `a`.

theoremcuccaro_MAJ_at_b

theorem cuccaro_MAJ_at_b
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_MAJ a b c) f b = xor (f b) (f c)

*MAJ local semantics at the `b` wire.** Applied to `f`, the gate writes `f b ⊕ f c` at position `b`.

theoremcuccaro_MAJ_at_c

theorem cuccaro_MAJ_at_c
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_MAJ a b c) f c
      = majority (f a) (f b) (f c)

*MAJ local semantics at the `c` wire.** Applied to `f`, the gate writes the boolean majority of `(f a, f b, f c)` at position `c`.

theoremcuccaro_MAJ_at_other

theorem cuccaro_MAJ_at_other
    (a b c q : Nat) (h_qa : q ≠ a) (h_qb : q ≠ b) (h_qc : q ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_MAJ a b c) f q = f q

*MAJ local semantics at any unrelated wire.** Applied to `f`, the gate is identity at positions outside `{a, b, c}`.

theoremcuccaro_UMA_at_c

theorem cuccaro_UMA_at_c
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_UMA a b c) f c
      = xor (f c) (f a && f b)

*UMA local semantics at the `c` wire.** Applied to `f`, the gate writes `f c ⊕ (f a AND f b)` at position `c` (the CCX action).

theoremcuccaro_UMA_at_a

theorem cuccaro_UMA_at_a
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_UMA a b c) f a
      = xor (f a) (xor (f c) (f a && f b))

*UMA local semantics at the `a` wire.** After UMA, position `a` holds `f a ⊕ f c ⊕ (f a AND f b)`.

theoremcuccaro_UMA_at_b

theorem cuccaro_UMA_at_b
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_UMA a b c) f b
      = xor (f b) (xor (f a) (xor (f c) (f a && f b)))

*UMA local semantics at the `b` wire.** After UMA, position `b` holds `f b ⊕ f a ⊕ f c ⊕ (f a AND f b)`.

theoremcuccaro_UMA_at_other

theorem cuccaro_UMA_at_other
    (a b c q : Nat) (h_qa : q ≠ a) (h_qb : q ≠ b) (h_qc : q ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_UMA a b c) f q = f q

*UMA local semantics at any unrelated wire.**

theoremcuccaro_MAJ_wellTyped

theorem cuccaro_MAJ_wellTyped
    (dim a b c : Nat) (ha : a < dim) (hb : b < dim) (hc : c < dim)
    (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) :
    Gate.WellTyped dim (cuccaro_MAJ a b c)

*WellTyped for `cuccaro_MAJ`.**

theoremcuccaro_UMA_wellTyped

theorem cuccaro_UMA_wellTyped
    (dim a b c : Nat) (ha : a < dim) (hb : b < dim) (hc : c < dim)
    (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) :
    Gate.WellTyped dim (cuccaro_UMA a b c)

*WellTyped for `cuccaro_UMA`.**

defcuccaro_input_F

def cuccaro_input_F (q_start : Nat) (c_in : Bool) (a b : Nat) (q : Nat) : Bool

*Cuccaro register-level input encoding.** Given `a`, `b : Nat` (the two inputs as binary numbers) and `c_in : Bool` (the carry-in), produces the initial bit-function over `Nat → Bool` per the layout above.

defcuccaroAdderSpec

def cuccaroAdderSpec (bits a b : Nat) : Nat

*Cuccaro spec: integer-level sum-modulo-2^bits.** The Boolean specification of an n-bit addition.

theoremcuccaro_input_F_at_c_in

theorem cuccaro_input_F_at_c_in (q_start : Nat) (c_in : Bool) (a b : Nat) :
    cuccaro_input_F q_start c_in a b q_start = c_in

*Sanity: decoder at the carry-in position.**

theoremcuccaro_input_F_at_b

theorem cuccaro_input_F_at_b
    (q_start i : Nat) (c_in : Bool) (a b : Nat) :
    cuccaro_input_F q_start c_in a b (q_start + 2 * i + 1) = b.testBit i

*Sanity: decoder at the i-th `b` position (q_start + 2i + 1).

theoremcuccaro_input_F_at_a

theorem cuccaro_input_F_at_a
    (q_start i : Nat) (c_in : Bool) (a b : Nat) :
    cuccaro_input_F q_start c_in a b (q_start + 2 * i + 2) = a.testBit i

*Sanity: decoder at the i-th `a` position (q_start + 2i + 2).

FormalRV.Arithmetic.Cuccaro.CuccaroDecoded

FormalRV/Arithmetic/Cuccaro/CuccaroDecoded.lean

FormalRV.BQAlgo.CuccaroDecoded — integer-level decoded specification of the exact-budget full Cuccaro adder. Tick 44 bridges the bit-level symbolic correctness of `cuccaro_n_bit_adder_full` (proved in `CuccaroFull.lean`) to the Nat-level statement `cuccaro_target_val bits q_start (output) = (a + b + c_in) % 2^bits` using the framework's existing `Adder.carry`/`Adder.sumfb` machinery (proved in `RippleCarryAdder.lean`). This is the natural next step toward closing the original SQIR placeholder axioms: the adder primitive now matches the integer arithmetic spec, exposing a clean composable interface. Structure of this file: - decoders: `cuccaro_target_val`, `cuccaro_read_val`. - decoder sanity lemmas on `cuccaro_input_F`. - `cuccaro_target_val_eq_sum_when_bits_match` (generic bit-stream→Nat). - `cuccaro_carry_eq_Adder_carry` (bridge to framework `Adder.carry`). - decoded correctness theorems for the full adder. - packaged primitive `cuccaro_n_bit_adder_full_primitive`.

defcuccaro_target_val

def cuccaro_target_val (bits q_start : Nat) (f : Nat → Bool) : Nat

Decoder: value of the target/b register at width `bits`, LSB-first. Bit at `q_start + 2i + 1` contributes weight `2^i`.

defcuccaro_read_val

def cuccaro_read_val (bits q_start : Nat) (f : Nat → Bool) : Nat

Decoder: value of the read/a register at width `bits`, LSB-first. Bit at `q_start + 2i + 2` contributes weight `2^i`.

theoremcuccaro_target_val_lt

theorem cuccaro_target_val_lt (bits q_start : Nat) (f : Nat → Bool) :
    cuccaro_target_val bits q_start f < 2^bits

theoremcuccaro_read_val_lt

theorem cuccaro_read_val_lt (bits q_start : Nat) (f : Nat → Bool) :
    cuccaro_read_val bits q_start f < 2^bits

theoremcuccaro_target_val_eq_sum_when_bits_match

theorem cuccaro_target_val_eq_sum_when_bits_match
    (bits q_start S : Nat) (f : Nat → Bool)
    (h : ∀ i, i < bits → f (q_start + 2 * i + 1) = S.testBit i) :
    cuccaro_target_val bits q_start f = S % 2^bits

*Generic bit-stream-to-Nat lemma for the target decoder.** If `f` matches `S.testBit i` at all target positions for `i < bits`, then `cuccaro_target_val bits q_start f = S % 2^bits`. Same shape as `gidney_target_val_eq_sum_when_bits_match` but for the Cuccaro layout.

theoremcuccaro_read_val_eq_sum_when_bits_match

theorem cuccaro_read_val_eq_sum_when_bits_match
    (bits q_start S : Nat) (f : Nat → Bool)
    (h : ∀ i, i < bits → f (q_start + 2 * i + 2) = S.testBit i) :
    cuccaro_read_val bits q_start f = S % 2^bits

*Generic bit-stream-to-Nat lemma for the read decoder.** Same shape as above.

theoremcuccaro_target_val_input

theorem cuccaro_target_val_input
    (bits q_start a b : Nat) (c_in : Bool) (hb : b < 2^bits) :
    cuccaro_target_val bits q_start (cuccaro_input_F q_start c_in a b) = b

The input encoding decodes the target register to `b % 2^bits`.

theoremcuccaro_read_val_input

theorem cuccaro_read_val_input
    (bits q_start a b : Nat) (c_in : Bool) (ha : a < 2^bits) :
    cuccaro_read_val bits q_start (cuccaro_input_F q_start c_in a b) = a

The input encoding decodes the read register to `a % 2^bits`.

theoremmajority_eq_xor_pairs

theorem majority_eq_xor_pairs (a b c : Bool) :
    Boolean.majority a b c
      = xor (xor (a && b) (b && c)) (a && c)

*Boolean majority = XOR-pairwise-AND.** Local algebraic identity used by the carry bridge.

theoremcuccaro_carry_eq_Adder_carry

theorem cuccaro_carry_eq_Adder_carry
    (f : Nat → Bool) (q_start k : Nat) :
    cuccaro_carry f q_start k
      = Adder.carry (f q_start) k
          (fun i => f (q_start + 2 * i + 1))
          (fun i => f (q_start + 2 * i + 2))

*Carry-function bridge.** The Cuccaro carry function on a state `f` and origin `q_start` equals the framework `Adder.carry` on the corresponding bit streams, with carry-in `f q_start`. Bit stream conventions: - f-stream of `Adder.carry`: `i ↦ f (q_start + 2i + 1)` (the b-bits). - g-stream of `Adder.carry`: `i ↦ f (q_start + 2i + 2)` (the a-bits). Note: `Adder.carry` is symmetric in its two streams (`Adder.carry_sym`), so the order doesn't affect the carry.

theoremcuccaro_n_bit_adder_full_target_decode_carry

theorem cuccaro_n_bit_adder_full_target_decode_carry
    (bits q_start a b : Nat) (c_in : Bool) (ha : a < 2^bits) (hb : b < 2^bits) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
          (cuccaro_input_F q_start c_in a b))
      = (a + b + c_in.toNat) % 2^bits

*HEADLINE — decoded target-register correctness for arbitrary carry-in.** After running the full Cuccaro adder on `cuccaro_input_F q_start c_in a b`, the target register decodes to `(a + b + c_in.toNat) % 2^bits`.

theoremcuccaro_n_bit_adder_full_target_decode

theorem cuccaro_n_bit_adder_full_target_decode
    (bits q_start a b : Nat) (ha : a < 2^bits) (hb : b < 2^bits) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
          (cuccaro_input_F q_start false a b))
      = (a + b) % 2^bits

*HEADLINE — decoded target-register correctness for carry-in `false`.** After running the full Cuccaro adder on `cuccaro_input_F q_start false a b`, the target register decodes to `(a + b) % 2^bits`.

theoremcuccaro_n_bit_adder_full_read_decode

theorem cuccaro_n_bit_adder_full_read_decode
    (bits q_start a b : Nat) (c_in : Bool) (ha : a < 2^bits) :
    cuccaro_read_val bits q_start
        (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
          (cuccaro_input_F q_start c_in a b))
      = a

*Decoded read-register restoration.** After running the full Cuccaro adder, the read register still decodes to `a`.

theoremcuccaro_n_bit_adder_full_carry_in_decode

theorem cuccaro_n_bit_adder_full_carry_in_decode
    (bits q_start a b : Nat) (c_in : Bool) :
    Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
        (cuccaro_input_F q_start c_in a b) q_start = c_in

*Decoded carry-in restoration.** After running the full Cuccaro adder on `cuccaro_input_F q_start c_in a b`, the carry-in qubit at `q_start` still holds `c_in`.

theoremcuccaro_n_bit_adder_full_primitive

theorem cuccaro_n_bit_adder_full_primitive
    (bits q_start a b : Nat) (ha : a < 2^bits) (hb : b < 2^bits) :
    Gate.WellTyped (q_start + (2 * bits + 1))
        (cuccaro_n_bit_adder_full bits q_start)
    ∧ cuccaro_target_val bits q_start
          (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
            (cuccaro_input_F q_start false a b))
        = (a + b) % 2^bits
    ∧ cuccaro_read_val bits q_start
          (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
            (cuccaro_input_F q_start false a b))
        = a

*HEADLINE — exact-budget Cuccaro adder primitive.** For any `bits`, `q_start`, and `a, b < 2^bits`, the full Cuccaro adder: - is WellTyped at dimension `q_start + (2*bits + 1)`; - writes `(a + b) % 2^bits` into the target register; - preserves the read register `a`; - restores the carry-in qubit (when initialized to `false`).

FormalRV.Arithmetic.Cuccaro.CuccaroFull

FormalRV/Arithmetic/Cuccaro/CuccaroFull.lean

FormalRV.BQAlgo.CuccaroFull — the BOUNDARY-CORRECTED Cuccaro adder. Tick 42: per the third-party Python sanity check (`scripts/check_cuccaro_adder.py`), the existing `cuccaro_n_bit_adder_skeleton` (forward MAJ-chain + forward UMA-chain) is NOT a correct in-place adder for n ≥ 2 — it fails 606 of 680 test cases. The fix is to **REVERSE the UMA chain order**: apply `UMA_{n-1}, UMA_{n-2}, ..., UMA_1, UMA_0` (descending) rather than `UMA_0, ..., UMA_{n-1}` (ascending). With that single structural fix, the simulator passes all 680 cases (n = 1..4, c_in ∈ {F, T}, all a, b < 2^n). This module defines the corrected `cuccaro_n_bit_adder_full` and proves WellTyped. Semantic correctness on the chain level is left as the next-tick deliverable. Layout (matches `cuccaro_input_F` in `BQAlgo/CuccaroCorrectness.lean`): - pos q_start + 0: c_in (carry-in). - pos q_start + 2i + 1: bit i of b (target register; becomes (a+b+c_in) mod 2^n). - pos q_start + 2i + 2: bit i of a (read register; preserved). - Total: 2*n + 1 qubits. This matches SQIR's `modmult_rev_anc n = 2*n + 1` budget EXACTLY, making this the natural exact-budget primitive for closing the original SQIR placeholders.

theoremtcount_cuccaro_uma_chain_reverse

theorem tcount_cuccaro_uma_chain_reverse (n q_start : Nat) :
    tcount (cuccaro_uma_chain_reverse n q_start) = 7 * n

T-count of the reverse UMA chain is `7 * n`.

theoremtcount_cuccaro_n_bit_adder_full

theorem tcount_cuccaro_n_bit_adder_full (n q_start : Nat) :
    tcount (cuccaro_n_bit_adder_full n q_start) = 14 * n

T-count of the full adder: `14 * n`. Same as the (incorrect) skeleton — reordering doesn't change cost.

example(example)

example : tcount (cuccaro_n_bit_adder_full 4 0) = 56

Smoke: 4-bit full adder has 56 T-gates.

theoremcuccaro_maj_chain_wellTyped

theorem cuccaro_maj_chain_wellTyped
    (n q_start dim : Nat) (h : q_start + 2 * n + 1 ≤ dim) :
    Gate.WellTyped dim (cuccaro_maj_chain n q_start)

The MAJ chain of `n` steps starting at `q_start` is well-typed in any dimension `dim` containing the touched range `[q_start, q_start + 2n]`.

theoremcuccaro_uma_chain_reverse_wellTyped

theorem cuccaro_uma_chain_reverse_wellTyped
    (n q_start dim : Nat) (h : q_start + 2 * n + 1 ≤ dim) :
    Gate.WellTyped dim (cuccaro_uma_chain_reverse n q_start)

The reverse UMA chain is well-typed in any dimension containing `[q_start, q_start + 2n]`.

theoremcuccaro_n_bit_adder_full_wellTyped

theorem cuccaro_n_bit_adder_full_wellTyped
    (n q_start dim : Nat) (h : q_start + 2 * n + 1 ≤ dim) :
    Gate.WellTyped dim (cuccaro_n_bit_adder_full n q_start)

*HEADLINE: Full Cuccaro adder is well-typed.** In any dimension `dim ≥ q_start + 2n + 1` (covers all touched qubits, the highest being `q_start + 2n` for n ≥ 1), the corrected full adder is structurally well-typed. Proved by structural composition of MAJ-chain WellTyped with reverse-UMA-chain WellTyped.

example(example)

example : Gate.WellTyped 5 (cuccaro_n_bit_adder_full 2 0)

theoremcuccaro_maj_chain_frame_below

theorem cuccaro_maj_chain_frame_below
    (n q_start : Nat) (f : Nat → Bool) (q : Nat) (h : q < q_start) :
    Gate.applyNat (cuccaro_maj_chain n q_start) f q = f q

*Frame lemma for the MAJ chain: positions strictly below `q_start` are unchanged.** The chain touches only qubits `[q_start, q_start + 2n]`, so anything below is preserved. Proved by induction on `n` using `cuccaro_MAJ_at_other`.

theoremcuccaro_uma_chain_reverse_frame_below

theorem cuccaro_uma_chain_reverse_frame_below
    (n q_start : Nat) (f : Nat → Bool) (q : Nat) (h : q < q_start) :
    Gate.applyNat (cuccaro_uma_chain_reverse n q_start) f q = f q

*Frame lemma for the reverse UMA chain: positions strictly below `q_start` are unchanged.** Analogous to the MAJ-chain frame.

theoremcuccaro_n_bit_adder_full_frame_below

theorem cuccaro_n_bit_adder_full_frame_below
    (n q_start : Nat) (f : Nat → Bool) (q : Nat) (h : q < q_start) :
    Gate.applyNat (cuccaro_n_bit_adder_full n q_start) f q = f q

*Frame lemma for the full adder: positions strictly below `q_start` are unchanged.** Composition of the MAJ-chain and reverse-UMA-chain frame lemmas.

theoremcuccaro_maj_chain_frame_above

theorem cuccaro_maj_chain_frame_above
    (n q_start : Nat) (f : Nat → Bool) (q : Nat)
    (h : q_start + 2 * n + 1 ≤ q) :
    Gate.applyNat (cuccaro_maj_chain n q_start) f q = f q

The MAJ chain doesn't touch positions `≥ q_start + 2n + 1`.

theoremcuccaro_uma_chain_reverse_frame_above

theorem cuccaro_uma_chain_reverse_frame_above
    (n q_start : Nat) (f : Nat → Bool) (q : Nat)
    (h : q_start + 2 * n + 1 ≤ q) :
    Gate.applyNat (cuccaro_uma_chain_reverse n q_start) f q = f q

The reverse UMA chain doesn't touch positions `≥ q_start + 2n + 1`.

theoremcuccaro_n_bit_adder_full_frame_above

theorem cuccaro_n_bit_adder_full_frame_above
    (n q_start : Nat) (f : Nat → Bool) (q : Nat)
    (h : q_start + 2 * n + 1 ≤ q) :
    Gate.applyNat (cuccaro_n_bit_adder_full n q_start) f q = f q

*Full adder doesn't touch positions outside `[q_start, q_start + 2n]`.**

theoremcuccaro_maj_chain_at_first_a

theorem cuccaro_maj_chain_at_first_a
    (n q_start : Nat) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_maj_chain (n + 1) q_start) f q_start
      = xor (f q_start) (f (q_start + 2))

*First MAJ-chain step at position `q_start` (the first MAJ's `a` wire).** After `cuccaro_maj_chain (n+1) q_start`, position `q_start` holds `xor (f q_start) (f (q_start + 2))` — the result of MAJ_0's `a`-wire action, since the recursive sub-chain (starting at `q_start + 2`) doesn't touch positions below `q_start + 2`.

theoremcuccaro_maj_chain_at_first_b

theorem cuccaro_maj_chain_at_first_b
    (n q_start : Nat) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_maj_chain (n + 1) q_start) f (q_start + 1)
      = xor (f (q_start + 1)) (f (q_start + 2))

*First MAJ-chain step at position `q_start + 1` (the first MAJ's `b` wire).** After `cuccaro_maj_chain (n+1) q_start`, position `q_start + 1` holds `xor (f (q_start + 1)) (f (q_start + 2))`.

defcuccaro_carry

def cuccaro_carry (f : Nat → Bool) (q_start : Nat) : Nat → Bool
  | 0     => f q_start
  | k + 1 => Boolean.majority
               (cuccaro_carry f q_start k)
               (f (q_start + 2 * k + 1))
               (f (q_start + 2 * k + 2))

*Classical Cuccaro carry function.** Given a state `f` and a register origin `q_start`, `cuccaro_carry f q_start k` is the carry into bit-position k of the addition encoded by `f` (per the layout `pos q_start = c_in; pos q_start + 2i + 1 = b_i; pos q_start + 2i + 2 = a_i`). Defined recursively via the majority function (which is the classical full-adder carry-out).

theoremcuccaro_carry_after_MAJ0_shift

theorem cuccaro_carry_after_MAJ0_shift
    (q_start : Nat) (f : Nat → Bool) (k : Nat) :
    cuccaro_carry (Gate.applyNat (cuccaro_MAJ q_start (q_start + 1) (q_start + 2)) f)
                  (q_start + 2) k
      = cuccaro_carry f q_start (k + 1)

*Shift lemma.** Applying `MAJ_0` (the first chain step) and then the carry function starting from the shifted position `q_start + 2` equals the original carry function at the next index. This is the algebraic glue for the chain-invariant induction.

theoremcuccaro_maj_chain_at_carry_a

theorem cuccaro_maj_chain_at_carry_a
    (n q_start : Nat) (f : Nat → Bool) (i : Nat) (hi : i < n) :
    Gate.applyNat (cuccaro_maj_chain n q_start) f (q_start + 2 * i)
      = xor (cuccaro_carry f q_start i) (f (q_start + 2 * i + 2))

*MAJ-chain invariant at the carry positions `q_start + 2*i` (i < n).**

theoremcuccaro_maj_chain_at_b_xor

theorem cuccaro_maj_chain_at_b_xor
    (n q_start : Nat) (f : Nat → Bool) (i : Nat) (hi : i < n) :
    Gate.applyNat (cuccaro_maj_chain n q_start) f (q_start + 2 * i + 1)
      = xor (f (q_start + 2 * i + 1)) (f (q_start + 2 * i + 2))

*MAJ-chain invariant at the `b`-bit positions `q_start + 2*i + 1` (i < n).**

theoremcuccaro_maj_chain_at_top_carry

theorem cuccaro_maj_chain_at_top_carry
    (n q_start : Nat) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_maj_chain n q_start) f (q_start + 2 * n)
      = cuccaro_carry f q_start n

*MAJ-chain invariant at the top position `q_start + 2*n`: holds the final carry `c_n`.**

theoremcuccaro_UMA_undo_MAJ_a

theorem cuccaro_UMA_undo_MAJ_a
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_UMA a b c)
        (Gate.applyNat (cuccaro_MAJ a b c) f) a
      = f a

*Algebraic UMA-after-MAJ identity (a-wire).** Applying UMA to the state after a MAJ on the same triple restores the original `a` value at the a-wire. This is the symbolic version of `MAJ_then_UMA_restores_a`.

theoremcuccaro_UMA_undo_MAJ_c

theorem cuccaro_UMA_undo_MAJ_c
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_UMA a b c)
        (Gate.applyNat (cuccaro_MAJ a b c) f) c
      = f c

*Algebraic UMA-after-MAJ identity (c-wire).** Restores the original `c` value at the c-wire.

theoremcuccaro_UMA_undo_MAJ_b

theorem cuccaro_UMA_undo_MAJ_b
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_UMA a b c)
        (Gate.applyNat (cuccaro_MAJ a b c) f) b
      = xor (xor (f a) (f b)) (f c)

*Algebraic UMA-after-MAJ identity (b-wire).** Writes the sum bit `f a XOR f b XOR f c` at the b-wire.

theoremcuccaro_n_bit_adder_full_carry_in_restored

theorem cuccaro_n_bit_adder_full_carry_in_restored
    (n q_start : Nat) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_n_bit_adder_full n q_start) f q_start = f q_start

*Carry-in restoration: position `q_start` is unchanged by the full Cuccaro adder.**

theoremcuccaro_n_bit_adder_full_a_restored

theorem cuccaro_n_bit_adder_full_a_restored
    (n q_start : Nat) (f : Nat → Bool) (i : Nat) (hi : i < n) :
    Gate.applyNat (cuccaro_n_bit_adder_full n q_start) f (q_start + 2 * i + 2)
      = f (q_start + 2 * i + 2)

*Read register restoration: position `q_start + 2*i + 2` is unchanged by the full Cuccaro adder for any `i < n`.** This is the second of the three positional invariants — the `a` register (stored at the read-positions) is preserved by the full adder. Same induction pattern as `_carry_in_restored`: split on `i = 0` vs `i ≥ 1`. For `i = 0`, the local `cuccaro_UMA_undo_MAJ_c` identity applies after using IH on the sub-carry-in restoration. For `i ≥ 1`, the sub-adder's a-restoration IH directly handles it, with the outer UMA_0 and MAJ_0 leaving the position untouched.

theoremcuccaro_n_bit_adder_full_sum_bit

theorem cuccaro_n_bit_adder_full_sum_bit
    (n q_start : Nat) (f : Nat → Bool) (i : Nat) (hi : i < n) :
    Gate.applyNat (cuccaro_n_bit_adder_full n q_start) f (q_start + 2 * i + 1)
      = xor (xor (cuccaro_carry f q_start i) (f (q_start + 2 * i + 1)))
            (f (q_start + 2 * i + 2))

*Sum-bit invariant: at position `q_start + 2*i + 1` (for `i < n`), the full Cuccaro adder produces the sum bit `c_i ⊕ b_i ⊕ a_i`.** This is the third and final positional invariant for the full adder. With the carry-in and a-restoration theorems above, this completes the symbolic specification of `cuccaro_n_bit_adder_full`. Proof structure: induction on n, splitting i into `i = 0` (UMA_0's b-wire action + cuccaro_UMA_undo_MAJ_b at the local level) and `i ≥ 1` (sub-adder's IH + carry-shift bridging).

theoremcuccaro_n_bit_adder_full_correct

theorem cuccaro_n_bit_adder_full_correct
    (n q_start : Nat) (f : Nat → Bool) :
    (Gate.applyNat (cuccaro_n_bit_adder_full n q_start) f q_start = f q_start) ∧
    (∀ i, i < n →
        Gate.applyNat (cuccaro_n_bit_adder_full n q_start) f (q_start + 2 * i + 1)
          = xor (xor (cuccaro_carry f q_start i) (f (q_start + 2 * i + 1)))
                (f (q_start + 2 * i + 2))) ∧
    (∀ i, i < n →
        Gate.applyNat (cuccaro_n_bit_adder_full n q_start) f (q_start + 2 * i + 2)
          = f (q_start + 2 * i + 2))

*HEADLINE — symbolic correctness of the full Cuccaro adder.** For any input state `f`, the full Cuccaro adder of length `n` starting at `q_start`: - restores the carry-in at position `q_start`; - produces sum bit `c_i ⊕ b_i ⊕ a_i` at position `q_start + 2*i + 1` for each `i < n` (where c_i is the cumulative classical carry); - restores the read register `a_i` at position `q_start + 2*i + 2`.

FormalRV.Arithmetic.Cuccaro.CuccaroModReduce

FormalRV/Arithmetic/Cuccaro/CuccaroModReduce.lean

FormalRV.BQAlgo.CuccaroModReduce — exact-budget Cuccaro modular-reduction skeleton + formal blocker. Tick 48: factor the Cuccaro subtract-constant primitive into its forward-only and reverse-only components, prove WellTyped for both, prove their composition equals the full subtract, and formalize the conclusion that no clean exact-budget modular reduction can be built from the current primitives without an additional qubit. Structure: - `cuccaro_subConstForwardOnlyGate`: prepare(K) ; MAJ chain. Exposes the comparison flag at the top carry position. - `cuccaro_subConstReverseOnlyGate`: UMA chain ; prepare(K). Completes the subtraction when run after the forward gate. - `cuccaro_subConst_forward_reverse_pointwise_eq`: pointwise equality of forward+reverse with the full subtract. - Flag-behavior theorem (reuse Tick 47 result). - Blocker documentation: simulation (script `check_cuccaro_modreduce.py`) confirms no exact-budget candidate gives clean modular reduction.

defcuccaro_subConstForwardOnlyGate

def cuccaro_subConstForwardOnlyGate (bits q_start N : Nat) : Gate

*Forward-only Cuccaro subtract gate.** Prepares the two's-complement constant `K = 2^bits - N` in the read register, then runs the MAJ chain. Leaves the workspace in a dirty intermediate state but exposes the comparison flag at position `q_start + 2*bits`. This is the same gate as `cuccaro_compareConstForwardGate` from Tick 47, introduced under a name that matches the subtraction-decomposition framing of Tick 48.

defcuccaro_subConstReverseOnlyGate

def cuccaro_subConstReverseOnlyGate (bits q_start N : Nat) : Gate

*Reverse-only Cuccaro subtract gate.** Runs the reverse UMA chain then unprepares the constant. When run AFTER the forward gate on the same input, the composition computes the full clean subtract.

theoremcuccaro_subConstForwardOnlyGate_wellTyped

theorem cuccaro_subConstForwardOnlyGate_wellTyped
    (bits q_start N dim : Nat) (h : q_start + 2 * bits + 1 ≤ dim) :
    Gate.WellTyped dim (cuccaro_subConstForwardOnlyGate bits q_start N)

theoremcuccaro_subConstReverseOnlyGate_wellTyped

theorem cuccaro_subConstReverseOnlyGate_wellTyped
    (bits q_start N dim : Nat) (h : q_start + 2 * bits + 1 ≤ dim) :
    Gate.WellTyped dim (cuccaro_subConstReverseOnlyGate bits q_start N)

theoremcuccaro_subConst_forward_reverse_pointwise_eq

theorem cuccaro_subConst_forward_reverse_pointwise_eq
    (bits q_start N : Nat) (f : Nat → Bool) (q : Nat) :
    Gate.applyNat
      (seq (cuccaro_subConstForwardOnlyGate bits q_start N)
            (cuccaro_subConstReverseOnlyGate bits q_start N)) f q
      = Gate.applyNat (cuccaro_subConstGate bits q_start N) f q

*Pointwise equality of forward ; reverse with the full subtract.**

theoremcuccaro_subConstForwardOnly_top_carry

theorem cuccaro_subConstForwardOnly_top_carry
    (bits q_start N x : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Gate.applyNat (cuccaro_subConstForwardOnlyGate bits q_start N)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * bits)
      = decide (N ≤ x)

*Flag behavior of the forward-only subtract**: at the top carry position, the value is `decide (N ≤ x)`. Reused from Tick 47.

theoremcuccaro_subConstSkeleton_flag_value_at_use_point

theorem cuccaro_subConstSkeleton_flag_value_at_use_point
    (bits q_start N x : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Gate.applyNat
        (cuccaro_subConstForwardOnlyGate bits q_start N)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * bits)
      = decide (N ≤ x)

*HEADLINE — flag-controlled action specification.** At the point of any "use the flag" operation that is inserted between forward and reverse, the qubit at `q_start + 2 * bits` holds `decide (N ≤ x)`. This is the contract any candidate modular-reduction skeleton must satisfy.

theoremcuccaro_subConstGate_not_modular_reduction

theorem cuccaro_subConstGate_not_modular_reduction
    (bits q_start N x : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N < 2^bits) (hx : x < N) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (cuccaro_subConstGate bits q_start N)
          (cuccaro_input_F q_start false 0 x))
      ≠ x % N

*Formal blocker: no candidate single-step modular reduction.** The bare subtract-constant primitive does not compute modular reduction. Specifically, for any `bits`, `N` with `0 < N ≤ 2^bits`, and `x ∈ [0, 2N)`, the bare subtract gives the WRONG result whenever `x < N`. This is proved by reduction to the existing `cuccaro_subConstSpec_of_lt` lemma: in the underflow case, the spec equals `x + 2^bits - N ≠ x` (in general).

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRCondAdd

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRCondAdd.lean

FormalRV.BQAlgo.CuccaroSQIRCondAdd — SQIR-style conditional add-constant / subtract-constant gates and dirty-flag modular adder. Tick 54: build the conditional add/sub primitives needed to turn the Tick 53 mod-2^bits skeleton into a true mod-N add-constant primitive. Route chosen: B (masked constant preparation). Our Gate IR has X, CX, CCX but no controlled-CCX. Following the existing Gidney-route `prepareMaskedConstRead`/`conditionalAddConstGate` pattern, we use CX(flagPos, read_pos_i) for each bit of the constant. This file lands: - `sqir_prepareMaskedConstRead`: definition. - per-position semantics (at_read, at_other) for masked prepare. - `sqir_prepareMaskedConstRead_wellTyped`. - `sqir_conditionalAddConstGate`: prepare(masked) ; full_adder ; prepare(masked). - WellTyped for the conditional add. - `sqir_conditionalSubConstGate`: alias for add by 2^bits - N. Semantic correctness (target decode) for conditional add/sub and the dirty-flag modular adder is left for a follow-up tick due to the depth of the input-state equivalence argument needed.

defsqir_prepareMaskedConstRead

def sqir_prepareMaskedConstRead : Nat → Nat → Nat → Nat → Gate
  | 0,     _,       _, _       => Gate.I
  | n + 1, q_start, N, flagPos =>
      seq (sqir_prepareMaskedConstRead n q_start N flagPos)
          (cond (N.testBit n) (Gate.CX flagPos (q_start + 2 * n + 2)) Gate.I)

*Masked constant preparation**: for each `i < bits`, conditionally applies `CX flagPos (q_start + 2*i + 2)` iff `N.testBit i`.

theoremsqir_prepareMaskedConstRead_at_other

theorem sqir_prepareMaskedConstRead_at_other
    (bits q_start N flagPos q : Nat)
    (hq : ∀ i, i < bits → q ≠ q_start + 2 * i + 2)
    (f : Nat → Bool) :
    Gate.applyNat (sqir_prepareMaskedConstRead bits q_start N flagPos) f q = f q

*Frame**: the masked prepare gate doesn't touch positions outside the read range.

theoremsqir_prepareMaskedConstRead_at_flagPos

theorem sqir_prepareMaskedConstRead_at_flagPos
    (bits q_start N flagPos : Nat)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (f : Nat → Bool) :
    Gate.applyNat (sqir_prepareMaskedConstRead bits q_start N flagPos) f flagPos = f flagPos

*Frame for flagPos**: as long as flagPos isn't a read position, the masked prepare gate doesn't touch flagPos (CX's control is read, not written).

theoremsqir_prepareMaskedConstRead_at_read

theorem sqir_prepareMaskedConstRead_at_read
    (bits q_start N flagPos j : Nat) (hj : j < bits) (f : Nat → Bool)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2) :
    Gate.applyNat (sqir_prepareMaskedConstRead bits q_start N flagPos) f
        (q_start + 2 * j + 2)
      = xor (f (q_start + 2 * j + 2)) (f flagPos && N.testBit j)

*Action at read positions**: at `q_start + 2*j + 2` for `j < bits`, the value is XORed with `(f flagPos && N.testBit j)`.

theoremsqir_prepareMaskedConstRead_wellTyped

theorem sqir_prepareMaskedConstRead_wellTyped
    (bits q_start N flagPos dim : Nat)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2) :
    Gate.WellTyped dim (sqir_prepareMaskedConstRead bits q_start N flagPos)

defsqir_conditionalAddConstGate

def sqir_conditionalAddConstGate (bits q_start N flagPos : Nat) : Gate

*Conditional add-constant gate**: adds `N` to the target register iff the flag is true. Uses masked prepare to encode the constant `N` into the read register conditionally on the flag value.

defsqir_conditionalSubConstGate

def sqir_conditionalSubConstGate (bits q_start N flagPos : Nat) : Gate

*Conditional sub-constant gate**: subtracts `N` from the target iff the flag is true. Implemented as conditional-add of `2^bits - N` (two's complement).

theoremsqir_conditionalAddConstGate_wellTyped

theorem sqir_conditionalAddConstGate_wellTyped
    (bits q_start N flagPos dim : Nat)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2) :
    Gate.WellTyped dim (sqir_conditionalAddConstGate bits q_start N flagPos)

theoremsqir_conditionalSubConstGate_wellTyped

theorem sqir_conditionalSubConstGate_wellTyped
    (bits q_start N flagPos dim : Nat)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2) :
    Gate.WellTyped dim (sqir_conditionalSubConstGate bits q_start N flagPos)

defsqir_style_modAddConst_dirtyFlag_candidate

def sqir_style_modAddConst_dirtyFlag_candidate
    (bits q_start N c flagPos : Nat) : Gate

*Dirty-flag modular add-constant candidate**: addConst(c) ; compareConst(N) ; conditionalSubConst(N). After this gate: - target = `(x + c) % N` (when `x, c < N`). - read register restored to 0. - carry-in restored to false. - flag (at flagPos) holds `decide(N ≤ (x+c) % 2^bits)` — DIRTY. The flag is dirty because we don't uncompute the comparator. A clean modular add-constant requires either: - a flag-uncompute step (e.g., another comparator with the right polarity), or - accepting the dirty flag at the modAdd level and tracking it in the calling context. For Shor's modular multiplier, the inner loops typically need a clean flag — so the next milestone is to uncompute the flag.

theoremsqir_style_modAddConst_dirtyFlag_candidate_wellTyped

theorem sqir_style_modAddConst_dirtyFlag_candidate_wellTyped
    (bits q_start N c flagPos dim : Nat)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_distinct_top : flagPos ≠ q_start + 2 * bits) :
    Gate.WellTyped dim
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)

theoremsqir_prepareMaskedConstRead_eq_id_at_flag_false

theorem sqir_prepareMaskedConstRead_eq_id_at_flag_false
    (bits q_start N flagPos : Nat) (f : Nat → Bool)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_false : f flagPos = false) (q : Nat) :
    Gate.applyNat (sqir_prepareMaskedConstRead bits q_start N flagPos) f q
      = f q

*Masked prepare with flag = false is identity** (per position).

theoremsqir_prepareMaskedConstRead_eq_unmasked_at_flag_true

theorem sqir_prepareMaskedConstRead_eq_unmasked_at_flag_true
    (bits q_start N flagPos : Nat) (f : Nat → Bool)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_true : f flagPos = true) (q : Nat) :
    Gate.applyNat (sqir_prepareMaskedConstRead bits q_start N flagPos) f q
      = Gate.applyNat (cuccaro_prepareConstRead bits q_start N) f q

*Masked prepare with flag = true equals `cuccaro_prepareConstRead N`** (per position).

theoremsqir_prepareMaskedConstRead_eq_id_fun

theorem sqir_prepareMaskedConstRead_eq_id_fun
    (bits q_start N flagPos : Nat) (f : Nat → Bool)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_false : f flagPos = false) :
    Gate.applyNat (sqir_prepareMaskedConstRead bits q_start N flagPos) f = f

*Function-level: masked prepare = id when flag = false.**

theoremsqir_prepareMaskedConstRead_eq_unmasked_fun

theorem sqir_prepareMaskedConstRead_eq_unmasked_fun
    (bits q_start N flagPos : Nat) (f : Nat → Bool)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_true : f flagPos = true) :
    Gate.applyNat (sqir_prepareMaskedConstRead bits q_start N flagPos) f
      = Gate.applyNat (cuccaro_prepareConstRead bits q_start N) f

*Function-level: masked prepare = cuccaro_prepareConstRead N when flag = true.**

theoremsqir_conditionalAddConstGate_apply_false_fun

theorem sqir_conditionalAddConstGate_apply_false_fun
    (bits q_start N flagPos : Nat) (g : Nat → Bool)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_false : g flagPos = false)
    (h_flag_disjoint : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_conditionalAddConstGate bits q_start N flagPos) g
      = Gate.applyNat (cuccaro_n_bit_adder_full bits q_start) g

*HEADLINE — false-flag reduction**: when the flag value in the input state is `false`, the conditional add gate behaves like the bare full Cuccaro adder.

theoremsqir_conditionalAddConstGate_apply_true_fun

theorem sqir_conditionalAddConstGate_apply_true_fun
    (bits q_start N flagPos : Nat) (g : Nat → Bool)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_true : g flagPos = true)
    (h_flag_disjoint : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_conditionalAddConstGate bits q_start N flagPos) g
      = Gate.applyNat (cuccaro_addConstGate bits q_start N) g

*HEADLINE — true-flag reduction**: when the flag value in the input state is `true`, the conditional add gate behaves like `cuccaro_addConstGate N`.

theoremcuccaro_carry_update_outside_locality

theorem cuccaro_carry_update_outside_locality
    (f : Nat → Bool) (q_start k p : Nat) (v : Bool)
    (h_p_outside : p < q_start ∨ q_start + 2 * k + 1 ≤ p) :
    cuccaro_carry (update f p v) q_start k = cuccaro_carry f q_start k

*Locality**: `cuccaro_carry` doesn't depend on input at positions outside its computation support.

theoremcuccaro_MAJ_commute_update

theorem cuccaro_MAJ_commute_update
    (a b c flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c)
    (h_neq_a : flagPos ≠ a) (h_neq_b : flagPos ≠ b) (h_neq_c : flagPos ≠ c) :
    Gate.applyNat (cuccaro_MAJ a b c) (update f flagPos v)
      = update (Gate.applyNat (cuccaro_MAJ a b c) f) flagPos v

*`cuccaro_MAJ` commutes with `update` outside its wires.**

theoremcuccaro_UMA_commute_update

theorem cuccaro_UMA_commute_update
    (a b c flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c)
    (h_neq_a : flagPos ≠ a) (h_neq_b : flagPos ≠ b) (h_neq_c : flagPos ≠ c) :
    Gate.applyNat (cuccaro_UMA a b c) (update f flagPos v)
      = update (Gate.applyNat (cuccaro_UMA a b c) f) flagPos v

*`cuccaro_UMA` commutes with `update` outside its wires.**

theoremcuccaro_maj_chain_commute_update_outside_workspace

theorem cuccaro_maj_chain_commute_update_outside_workspace
    (bits q_start flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (cuccaro_maj_chain bits q_start) (update f flagPos v)
      = update (Gate.applyNat (cuccaro_maj_chain bits q_start) f) flagPos v

*`cuccaro_maj_chain` commutes with `update` outside its workspace.**

theoremcuccaro_uma_chain_reverse_commute_update_outside_workspace

theorem cuccaro_uma_chain_reverse_commute_update_outside_workspace
    (bits q_start flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (cuccaro_uma_chain_reverse bits q_start) (update f flagPos v)
      = update (Gate.applyNat (cuccaro_uma_chain_reverse bits q_start) f) flagPos v

*`cuccaro_uma_chain_reverse` commutes with `update` outside its workspace.**

theoremcuccaro_n_bit_adder_full_commute_update_outside_workspace

theorem cuccaro_n_bit_adder_full_commute_update_outside_workspace
    (bits q_start flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (cuccaro_n_bit_adder_full bits q_start) (update f flagPos v)
      = update (Gate.applyNat (cuccaro_n_bit_adder_full bits q_start) f) flagPos v

*`cuccaro_n_bit_adder_full` commutes with `update` outside its workspace.**

theoremcuccaro_n_bit_adder_full_update_outside_workspace_at

theorem cuccaro_n_bit_adder_full_update_outside_workspace_at
    (bits q_start flagPos : Nat) (v : Bool) (f : Nat → Bool) (p : Nat)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hp_in : q_start ≤ p ∧ p < q_start + 2 * bits + 1) :
    Gate.applyNat (cuccaro_n_bit_adder_full bits q_start) (update f flagPos v) p
      = Gate.applyNat (cuccaro_n_bit_adder_full bits q_start) f p

*HEADLINE — full-adder locality at workspace under outside update.**

theoremcuccaro_n_bit_adder_full_preserves_outside_workspace

theorem cuccaro_n_bit_adder_full_preserves_outside_workspace
    (bits q_start flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (cuccaro_n_bit_adder_full bits q_start)
        (update f flagPos v) flagPos = v

*HEADLINE — full-adder preserves flagPos value.**

theoremcuccaro_prepareConstRead_commute_update_outside_workspace

theorem cuccaro_prepareConstRead_commute_update_outside_workspace
    (bits q_start c flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (cuccaro_prepareConstRead bits q_start c) (update f flagPos v)
      = update (Gate.applyNat (cuccaro_prepareConstRead bits q_start c) f) flagPos v

*`cuccaro_prepareConstRead` commutes with `update` outside its workspace.**

theoremcuccaro_addConstGate_commute_update_outside_workspace

theorem cuccaro_addConstGate_commute_update_outside_workspace
    (bits q_start c flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (cuccaro_addConstGate bits q_start c) (update f flagPos v)
      = update (Gate.applyNat (cuccaro_addConstGate bits q_start c) f) flagPos v

*`cuccaro_addConstGate` commutes with `update` outside its workspace.**

theoremcuccaro_addConstGate_update_outside_workspace_at

theorem cuccaro_addConstGate_update_outside_workspace_at
    (bits q_start c flagPos : Nat) (v : Bool) (f : Nat → Bool) (p : Nat)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hp_in : q_start ≤ p ∧ p < q_start + 2 * bits + 1) :
    Gate.applyNat (cuccaro_addConstGate bits q_start c) (update f flagPos v) p
      = Gate.applyNat (cuccaro_addConstGate bits q_start c) f p

*HEADLINE — addConstGate locality at workspace under outside update.**

theoremcuccaro_addConstGate_preserves_outside_workspace

theorem cuccaro_addConstGate_preserves_outside_workspace
    (bits q_start c flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (cuccaro_addConstGate bits q_start c)
        (update f flagPos v) flagPos = v

*HEADLINE — addConstGate preserves flagPos value.**

theoremsqir_conditionalAddConstGate_target_decode

theorem sqir_conditionalAddConstGate_target_decode
    (bits q_start N x flagPos : Nat) (flag : Bool)
    (hbits : 1 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (sqir_conditionalAddConstGate bits q_start N flagPos)
          (update (cuccaro_input_F q_start false 0 x) flagPos flag))
      = (x + (if flag then N else 0)) % 2^bits

*HEADLINE Deliverable C — conditional add target decode.**

theoremsqir_conditionalAddConstGate_carry_in_restored

theorem sqir_conditionalAddConstGate_carry_in_restored
    (bits q_start N x flagPos : Nat) (flag : Bool)
    (hbits : 1 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_conditionalAddConstGate bits q_start N flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos flag) q_start = false

*Conditional add carry-in restored.**

theoremsqir_conditionalAddConstGate_read_decode

theorem sqir_conditionalAddConstGate_read_decode
    (bits q_start N x flagPos : Nat) (flag : Bool)
    (hbits : 1 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_read_val bits q_start
        (Gate.applyNat (sqir_conditionalAddConstGate bits q_start N flagPos)
          (update (cuccaro_input_F q_start false 0 x) flagPos flag)) = 0

*Conditional add read register restored.**

theoremsqir_conditionalAddConstGate_flag_preserved

theorem sqir_conditionalAddConstGate_flag_preserved
    (bits q_start N x flagPos : Nat) (flag : Bool)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_conditionalAddConstGate bits q_start N flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos flag) flagPos = flag

*Conditional add flag preserved.**

theoremsqir_conditionalAddConstGate_clean

theorem sqir_conditionalAddConstGate_clean
    (bits q_start N x flagPos dim : Nat) (flag : Bool)
    (hbits : 1 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.WellTyped dim (sqir_conditionalAddConstGate bits q_start N flagPos)
    ∧ cuccaro_target_val bits q_start
          (Gate.applyNat (sqir_conditionalAddConstGate bits q_start N flagPos)
            (update (cuccaro_input_F q_start false 0 x) flagPos flag))
        = (x + (if flag then N else 0)) % 2^bits

*HEADLINE Deliverable E — packaged clean conditional add.**

theoremsqir_conditionalSubConstGate_target_decode

theorem sqir_conditionalSubConstGate_target_decode
    (bits q_start N x flagPos : Nat) (flag : Bool)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (sqir_conditionalSubConstGate bits q_start N flagPos)
          (update (cuccaro_input_F q_start false 0 x) flagPos flag))
      = (x + (if flag then 2^bits - N else 0)) % 2^bits

*HEADLINE Deliverable F — conditional sub target decode.**

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag.lean

(no documented top-level declarations)

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroCleanModularAddCorrectness

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroCleanModularAddCorrectness.lean

(no documented top-level declarations)

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroCleanModularAddCorrectness.CleanIdentityAndTotalTheorem

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroCleanModularAddCorrectness/CleanIdentityAndTotalTheorem.lean

CuccaroCleanModularAddCorrectness — Part1 (re-export shim part; same namespace, opens de-duplicated).

theoremsqir_style_modAddConst_clean_candidate_flag_restored

theorem sqir_style_modAddConst_clean_candidate_flag_restored
    (bits N c x : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N) :
    Gate.applyNat
        (sqir_style_modAddConst_clean_candidate bits 2 N c 1)
        (update (cuccaro_input_F 2 false 0 x) 1 false) 1
      = false

*HEADLINE Deliverable D — clean-candidate flag restoration.** At `flagPos`, the clean candidate restores the input flag value `false`.

theoremsqir_style_modAddConst_clean_candidate_target_decode_qstart

theorem sqir_style_modAddConst_clean_candidate_target_decode_qstart
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_target_val bits q_start
        (Gate.applyNat
          (sqir_style_modAddConst_clean_candidate bits q_start N c flagPos)
          (update (cuccaro_input_F q_start false 0 x) flagPos false))
      = (x + c) % N

*R7d^xxix-L-3.9′ DELIVERABLE: q_start-parametric clean candidate target preservation.** q_start-parametric port of `sqir_style_modAddConst_clean_candidate_target_decode`. Replaces the hard-coded layout `q_start = 2`, `flagPos = 1` with free parameters and the standard outside-workspace hypotheses. The decoded target after the clean candidate equals `(x + c) % N`, regardless of where the workspace and flag sit. Dependencies (all already q_start-parametric): - `sqir_style_modAddConst_dirtyFlag_state_eq` (CuccaroSQIRDirtyFlag.lean:1378); - `cuccaro_target_val_eq_sum_when_bits_match` (CuccaroDecoded.lean:102); - `sqir_style_compareConst_candidate_workspace_restored_at_general` (CuccaroSQIRDirtyFlag.lean:568); - `cuccaro_input_F_at_b` (CuccaroCorrectness.lean:240).

theoremsqir_style_modAddConst_clean_candidate_target_decode

theorem sqir_style_modAddConst_clean_candidate_target_decode
    (bits N c x : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N) :
    cuccaro_target_val bits 2
        (Gate.applyNat
          (sqir_style_modAddConst_clean_candidate bits 2 N c 1)
          (update (cuccaro_input_F 2 false 0 x) 1 false))
      = (x + c) % N

*HEADLINE Deliverable E — clean candidate target preservation.** The clean candidate's decoded target equals `(x + c) % N`.

theoremsqir_style_modAddConst_clean_candidate_clean

theorem sqir_style_modAddConst_clean_candidate_clean
    (bits N c x : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_modAddConst_clean_candidate bits 2 N c 1)
    ∧ cuccaro_target_val bits 2
          (Gate.applyNat
            (sqir_style_modAddConst_clean_candidate bits 2 N c 1)
            (update (cuccaro_input_F 2 false 0 x) 1 false))
        = (x + c) % N

*HEADLINE Deliverable F — clean candidate full bundle.** WellTyped + target = (x+c)%N + read restored + top-carry restored + flag restored.

theoremsqir_style_modAddConst_clean_gate_zero_eq

theorem sqir_style_modAddConst_clean_gate_zero_eq
    (bits N : Nat) :
    sqir_style_modAddConst_clean_gate bits N 0 = Gate.I

The wrapper at `c = 0` reduces to `Gate.I`.

theoremsqir_style_modAddConst_clean_gate_zero_clean

theorem sqir_style_modAddConst_clean_gate_zero_clean
    (bits N x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < N) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_modAddConst_clean_gate bits N 0)
    ∧ cuccaro_target_val bits 2
          (Gate.applyNat
            (sqir_style_modAddConst_clean_gate bits N 0)
            (update (cuccaro_input_F 2 false 0 x) 1 false))
        = x
    ∧ cuccaro_read_val bits 2
          (Gate.applyNat

*Deliverable B — c = 0 bundle.** At `c = 0` the gate is the identity, so all 5 conjuncts (WellTyped + target = x + read = 0 + top carry = false + flag = false) reduce to facts about the input encoding.

theoremsqir_style_modAddConst_clean_gate_clean

theorem sqir_style_modAddConst_clean_gate_clean
    (bits N c x : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc : c < N) (hx : x < N) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_modAddConst_clean_gate bits N c)
    ∧ cuccaro_target_val bits 2
          (Gate.applyNat
            (sqir_style_modAddConst_clean_gate bits N c)
            (update (cuccaro_input_F 2 false 0 x) 1 false))
        = (x + c) % N

*HEADLINE Deliverable C — total clean modular add-constant theorem.** For all `c < N` (including `c = 0`), the wrapper's output satisfies: WellTyped + target = `(x+c) % N` + read = 0 + top carry = false + flag = false.

theoremsqir_style_modAddConst_clean_gate_clean_from_BasicSetting

theorem sqir_style_modAddConst_clean_gate_clean_from_BasicSetting
    (a r N m n c x : Nat)
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m n)
    (hc : c < N) (hx : x < N) :
    Gate.WellTyped (sqir_modmult_rev_anc (n + 1))
        (sqir_style_modAddConst_clean_gate (n + 1) N c)
    ∧ cuccaro_target_val (n + 1) 2
          (Gate.applyNat
            (sqir_style_modAddConst_clean_gate (n + 1) N c)
            (update (cuccaro_input_F 2 false 0 x) 1 false))
        = (x + c) % N
    ∧ cuccaro_read_val (n + 1) 2

*HEADLINE Deliverable D — BasicSetting-derived total clean mod-add-constant theorem.** At `bits := n + 1`, the SQIR-faithful sizing `2*N ≤ 2^(n+1)` follows from `BasicSetting`, removing the explicit `hN`, `hN2`, `hN_pos` preconditions.

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroCleanModularAddCorrectness.ControlledModAddRouteAndDesign

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroCleanModularAddCorrectness/ControlledModAddRouteAndDesign.lean

CuccaroCleanModularAddCorrectness — Part2 (re-export shim part; same namespace, opens de-duplicated).

theoremsqir_controlledAddConstPow2_target_decode

theorem sqir_controlledAddConstPow2_target_decode
    (bits q_start c x controlIdx : Nat) (control : Bool)
    (hbits : 1 ≤ bits) (hc : c < 2^bits) (hx : x < 2^bits)
    (h_control_distinct : ∀ i, i < bits → controlIdx ≠ q_start + 2 * i + 2)
    (h_control_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (sqir_conditionalAddConstGate bits q_start c controlIdx)
          (update (cuccaro_input_F q_start false 0 x) controlIdx control))
      = (x + (if control then c else 0)) % 2^bits

*HEADLINE Task 3 — controlled add-mod-2^bits target decode.**

theoremsqir_controlledCompareConst_at_control_true_eq_unmasked_fun

theorem sqir_controlledCompareConst_at_control_true_eq_unmasked_fun
    (bits q_start c controlIdx flagPos : Nat) (g : Nat → Bool)
    (h_control_distinct : ∀ i, i < bits → controlIdx ≠ q_start + 2 * i + 2)
    (h_control_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (h_control_ne_flag : controlIdx ≠ flagPos)
    (h_control_true : g controlIdx = true) :
    Gate.applyNat (sqir_controlledCompareConst bits q_start c controlIdx flagPos) g
      = Gate.applyNat (sqir_style_compareConst_candidate bits q_start c flagPos) g

*Helper — `ctrlCompare` reduces to `compareConst(c)` when `state[controlIdx] = true`.** Function-level equality.

theoremsqir_style_controlledModAddConst_gate_zero_eq

theorem sqir_style_controlledModAddConst_gate_zero_eq
    (bits N controlIdx : Nat) :
    sqir_style_controlledModAddConst_gate bits 2 N 0 controlIdx 1 = Gate.I

*Total wrapper at c = 0 reduces to `Gate.I`.**

theoremsqir_style_controlledModAddConst_gate_zero_clean

theorem sqir_style_controlledModAddConst_gate_zero_clean
    (bits N x controlIdx : Nat) (control : Bool)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_controlledModAddConst_gate bits 2 N 0 controlIdx 1)
    ∧ cuccaro_target_val bits 2
          (Gate.applyNat
            (sqir_style_controlledModAddConst_gate bits 2 N 0 controlIdx 1)

*HEADLINE partial Deliverable F — c = 0 bundle for the controlled modular add-constant wrapper.**

theoremGate.applyNat_CX_at_control_false_eq_id_fun

theorem Gate.applyNat_CX_at_control_false_eq_id_fun
    (control target : Nat) (f : Nat → Bool) (h : f control = false) :
    Gate.applyNat (Gate.CX control target) f = f

*Deliverable B — CX with control = false is identity.**

theoremGate.applyNat_CX_at_control_true_eq_X_fun

theorem Gate.applyNat_CX_at_control_true_eq_X_fun
    (control target : Nat) (f : Nat → Bool) (h : f control = true) :
    Gate.applyNat (Gate.CX control target) f = Gate.applyNat (Gate.X target) f

*Deliverable B — CX with control = true equals X(target).**

theoremcuccaro_maj_chain_top_carry_on_input_F_zero_a

theorem cuccaro_maj_chain_top_carry_on_input_F_zero_a
    (bits q_start x : Nat) (hbits : 1 ≤ bits) (hx : x < 2^bits) :
    Gate.applyNat (cuccaro_maj_chain bits q_start)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * bits) = false

*Helper — maj_chain on `cuccaro_input_F` with `a = 0` has top carry = false.** Derived from `cuccaro_compareConstForward_top_carry` with `N = 2^bits` (reducing the prepare to identity).

theoremsqir_controlledCompareConst_at_control_false_on_input_F_eq_id_fun

theorem sqir_controlledCompareConst_at_control_false_on_input_F_eq_id_fun
    (bits q_start c controlIdx flagPos x : Nat)
    (hbits : 1 ≤ bits) (hx : x < 2^bits)
    (h_control_distinct : ∀ i, i < bits → controlIdx ≠ q_start + 2 * i + 2)
    (h_control_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (h_control_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat (sqir_controlledCompareConst bits q_start c controlIdx flagPos)
        (update (cuccaro_input_F q_start false 0 x) controlIdx false)
      = update (cuccaro_input_F q_start false 0 x) controlIdx false

*Deliverable A — controlled comparator at control = false is identity on `cuccaro_input_F`-shaped input.**

theoremsqir_style_compareConst_candidate_on_input_F_x_lt_N_eq_id_fun

theorem sqir_style_compareConst_candidate_on_input_F_x_lt_N_eq_id_fun
    (bits N x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < N) :
    Gate.applyNat (sqir_style_compareConst_candidate bits 2 N 1)
        (cuccaro_input_F 2 false 0 x)
      = cuccaro_input_F 2 false 0 x

*Deliverable A — uncontrolled comparator identity on `cuccaro_input_F` when `x < N`.** Since `decide(N ≤ x) = false`, the comparator XORs false into flagPos (no change), and workspace + outside positions are preserved.

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroCleanModularAddCorrectness.ParametricPorts

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroCleanModularAddCorrectness/ParametricPorts.lean

CuccaroCleanModularAddCorrectness — Part4 (re-export shim part; same namespace, opens de-duplicated).

theoremcuccaro_input_F_at_controlIdx_outside_eq_false_qstart

theorem cuccaro_input_F_at_controlIdx_outside_eq_false_qstart
    (q_start bits x controlIdx : Nat) (hx : x < 2^bits)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx) :
    cuccaro_input_F q_start false 0 x controlIdx = false

*q_start-parametric variant** of `cuccaro_input_F_at_controlIdx_outside_eq_false`. Same fact, fully parametric.

theoremupdate_input_F_controlIdx_false_eq_F_qstart

theorem update_input_F_controlIdx_false_eq_F_qstart
    (q_start bits x controlIdx : Nat) (hx : x < 2^bits)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx) :
    update (cuccaro_input_F q_start false 0 x) controlIdx false
      = cuccaro_input_F q_start false 0 x

*q_start-parametric variant** of `update_input_F_controlIdx_false_eq_F`.

theoremsqir_style_compareConst_candidate_on_input_F_x_lt_N_eq_id_fun_qstart

theorem sqir_style_compareConst_candidate_on_input_F_x_lt_N_eq_id_fun_qstart
    (bits q_start N flagPos x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < N)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (cuccaro_input_F q_start false 0 x)
      = cuccaro_input_F q_start false 0 x

*q_start-parametric variant** of `sqir_style_compareConst_candidate_on_input_F_x_lt_N_eq_id_fun`. Adds an explicit `hflag_out` hypothesis so `flagPos` can be at any outside-workspace position.

theoremsqir_style_controlledModAddConst_candidate_control_false_state_eq_qstart

theorem sqir_style_controlledModAddConst_candidate_control_false_state_eq_qstart
    (bits q_start N c x controlIdx flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat
        (sqir_style_controlledModAddConst_candidate bits q_start N c controlIdx flagPos)
        (update (cuccaro_input_F q_start false 0 x) controlIdx false)
      = update (cuccaro_input_F q_start false 0 x) controlIdx false

*q_start-parametric variant** of `sqir_style_controlledModAddConst_candidate_control_false_state_eq`. When the control is false, the controlled mod-add candidate is the identity on the appropriate base state. Parametric in both `q_start` and `flagPos`.

theoremsqir_style_controlledModAddConst_candidate_target_decode_control_false_qstart

theorem sqir_style_controlledModAddConst_candidate_target_decode_control_false_qstart
    (bits q_start N c x controlIdx flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    cuccaro_target_val bits q_start
        (Gate.applyNat
          (sqir_style_controlledModAddConst_candidate bits q_start N c controlIdx flagPos)
          (update (cuccaro_input_F q_start false 0 x) controlIdx false))

*PRIMARY L-3.6′ DELIVERABLE: q_start-parametric control = false target-decode.** The candidate controlled mod-add gate, applied to the zero-accumulator Cuccaro base with the control bit set to false, decodes to `x` at the target. Parametric in both `q_start` and `flagPos`.

theoremsqir_style_controlledModAddConst_candidate_workspace_control_false_qstart

theorem sqir_style_controlledModAddConst_candidate_workspace_control_false_qstart
    (bits q_start N c x controlIdx flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    cuccaro_read_val bits q_start
          (Gate.applyNat
            (sqir_style_controlledModAddConst_candidate bits q_start N c controlIdx flagPos)
            (update (cuccaro_input_F q_start false 0 x) controlIdx false))

*R7d^xxix-L-3.7′ DELIVERABLE: q_start-parametric control=false workspace bundle (4-conjunct).** Mirrors `sqir_style_controlledModAddConst_candidate_workspace_control_false` but parametric in `q_start` and `flagPos`. Both `controlIdx` and `flagPos` must lie OUTSIDE the Cuccaro workspace `[q_start, q_start + 2 * bits + 1)` and be distinct. After the candidate gate applied to `(update F controlIdx false)`: 1. `cuccaro_read_val bits q_start` of the output = 0; 2. position `q_start + 2 * bits` (top carry) = false; 3. position `flagPos` = false; 4. position `controlIdx` = false. Closes via the already-landed `sqir_style_controlledModAddConst_candidate_control_false_state_eq_qstart`.

theoremsqir_style_controlledModAddConst_candidate_clean_control_false_qstart

theorem sqir_style_controlledModAddConst_candidate_clean_control_false_qstart
    (bits q_start N c x dim controlIdx flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_control_lt_dim : controlIdx < dim)
    (h_flag_lt_dim : flagPos < dim) :
    Gate.WellTyped dim

*R7d^xxix-L-3.8′ DELIVERABLE: q_start-parametric control=false clean bundle.** Bundles the already-proved q_start-parametric facts for `sqir_style_controlledModAddConst_candidate bits q_start N c controlIdx flagPos` applied to `(update F controlIdx false)`: 1. `Gate.WellTyped dim` of the candidate gate. 2. target decoded value = `x` (no-op on the target). 3. read register = 0. 4. top carry position (`q_start + 2 * bits`) = false. 5. `flagPos` = false. 6. `controlIdx` = false. Parametric in `q_start`, `flagPos`, `controlIdx`, AND the ambient dimension `dim`. Wrapper over: - `sqir_style_controlledModAddConst_candidate_target_decode_control_false_qstart`, - `sqir_style_controlledModAddConst_candidate_workspace_control_false_qstart`, - the five existing q_start-parametric WellTyped sub-lemmas (`sqir_conditionalAddConstGate_wellTyped`, `sqir_style_compareConst_candidate_wellTyped`, `sqir_conditionalSubConstGate_wellTyped`, `cuccaro_maj_chain_wellTyped`, `cuccaro_maj_chain_inv_wellTyped`, `sqir_prepareMaskedConstRead_wellTyped`). No new infrastructure introduced; control=true direction NOT touched.

theoremsqir_style_controlledModAddConst_candidate_workspace_control_false

theorem sqir_style_controlledModAddConst_candidate_workspace_control_false
    (bits N c x controlIdx : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    cuccaro_read_val bits 2
          (Gate.applyNat
            (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
            (update (cuccaro_input_F 2 false 0 x) controlIdx false))
        = 0

*Control=false bundle (4-conjunct):** read = 0, top carry = false, flag = false, controlIdx = false.

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroCleanModularAddCorrectness.StageHelpersAndControlFalse

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroCleanModularAddCorrectness/StageHelpersAndControlFalse.lean

CuccaroCleanModularAddCorrectness — Part3 (re-export shim part; same namespace, opens de-duplicated).

theoremcuccaro_input_F_at_controlIdx_outside_eq_false

theorem cuccaro_input_F_at_controlIdx_outside_eq_false
    (bits x controlIdx : Nat) (hx : x < 2^bits)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx) :
    cuccaro_input_F 2 false 0 x controlIdx = false

*Helper — `cuccaro_input_F` at `controlIdx` outside workspace is `false`.**

theoremupdate_input_F_controlIdx_false_eq_F

theorem update_input_F_controlIdx_false_eq_F
    (bits x controlIdx : Nat) (hx : x < 2^bits)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx) :
    update (cuccaro_input_F 2 false 0 x) controlIdx false
      = cuccaro_input_F 2 false 0 x

*Helper — `update F controlIdx false = F` when F at controlIdx is false.**

theoremsqir_style_controlledModAddConst_candidate_control_false_state_eq

theorem sqir_style_controlledModAddConst_candidate_control_false_state_eq
    (bits N c x controlIdx : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    Gate.applyNat
        (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
        (update (cuccaro_input_F 2 false 0 x) controlIdx false)
      = update (cuccaro_input_F 2 false 0 x) controlIdx false

*HEADLINE Deliverable C — control = false state equality for the controlled mod-N add candidate.**

theoremsqir_style_controlledModAddConst_candidate_target_decode_control_false

theorem sqir_style_controlledModAddConst_candidate_target_decode_control_false
    (bits N c x controlIdx : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    cuccaro_target_val bits 2
        (Gate.applyNat
          (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
          (update (cuccaro_input_F 2 false 0 x) controlIdx false))
      = x

*Control=false target decode = x.**

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroControlledModularAddCorrectness

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroControlledModularAddCorrectness.lean

## Tick 69 — Control preservation helpers + control=true work.

theoremcuccaro_addConstGate_preserves_outside_workspace_at

theorem cuccaro_addConstGate_preserves_outside_workspace_at
    (bits q_start c controlIdx : Nat) (g : Nat → Bool)
    (h_control_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx) :
    Gate.applyNat (cuccaro_addConstGate bits q_start c) g controlIdx = g controlIdx

*Deliverable A — addConstGate preserves any outside-workspace position.**

theoremsqir_conditionalAddConstGate_preserves_outside

theorem sqir_conditionalAddConstGate_preserves_outside
    (bits q_start c flagPos controlIdx : Nat) (g : Nat → Bool)
    (h_control_distinct : ∀ i, i < bits → controlIdx ≠ q_start + 2 * i + 2)
    (h_control_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx) :
    Gate.applyNat (sqir_conditionalAddConstGate bits q_start c flagPos) g controlIdx
      = g controlIdx

*Deliverable C — conditionalAddConstGate preserves outside-workspace position (when distinct from read positions and flag position).**

theoremsqir_conditionalSubConstGate_preserves_outside

theorem sqir_conditionalSubConstGate_preserves_outside
    (bits q_start N flagPos controlIdx : Nat) (g : Nat → Bool)
    (h_control_distinct : ∀ i, i < bits → controlIdx ≠ q_start + 2 * i + 2)
    (h_control_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx) :
    Gate.applyNat (sqir_conditionalSubConstGate bits q_start N flagPos) g controlIdx
      = g controlIdx

*conditionalSubConstGate preserves outside-workspace position.**

theoremsqir_controlledCompareConst_preserves_control_outside

theorem sqir_controlledCompareConst_preserves_control_outside
    (bits q_start c controlIdx flagPos : Nat) (g : Nat → Bool)
    (h_control_distinct : ∀ i, i < bits → controlIdx ≠ q_start + 2 * i + 2)
    (h_control_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (h_control_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat (sqir_controlledCompareConst bits q_start c controlIdx flagPos) g controlIdx
      = g controlIdx

*`sqir_controlledCompareConst` preserves outside-workspace position (when distinct from read positions and flagPos).**

theoremsqir_style_controlledModAddConst_candidate_preserves_control

theorem sqir_style_controlledModAddConst_candidate_preserves_control
    (bits N c x controlIdx : Nat) (control : Bool)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    Gate.applyNat
        (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
        (update (cuccaro_input_F 2 false 0 x) controlIdx control) controlIdx
      = control

*Partial Deliverable D — control bit is preserved through the controlled mod-N add candidate.**

theoremGate.applyNat_X_commute_update_outside_fun

theorem Gate.applyNat_X_commute_update_outside_fun
    (target controlIdx : Nat) (v : Bool) (f : Nat → Bool) (h : controlIdx ≠ target) :
    Gate.applyNat (Gate.X target) (update f controlIdx v)
      = update (Gate.applyNat (Gate.X target) f) controlIdx v

*X commute with update at outside position.**

theoremGate.applyNat_CX_commute_update_outside_fun

theorem Gate.applyNat_CX_commute_update_outside_fun
    (control target controlIdx : Nat) (v : Bool) (f : Nat → Bool)
    (h_ne_control : controlIdx ≠ control) (h_ne_target : controlIdx ≠ target) :
    Gate.applyNat (Gate.CX control target) (update f controlIdx v)
      = update (Gate.applyNat (Gate.CX control target) f) controlIdx v

*CX commute with update at outside position (≠ control and ≠ target).**

theoremsqir_prepareMaskedConstRead_commute_update_outside_workspace

theorem sqir_prepareMaskedConstRead_commute_update_outside_workspace
    (bits q_start N flagPos controlIdx : Nat) (v : Bool) (f : Nat → Bool)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat (sqir_prepareMaskedConstRead bits q_start N flagPos) (update f controlIdx v)
      = update (Gate.applyNat (sqir_prepareMaskedConstRead bits q_start N flagPos) f)
              controlIdx v

*Deliverable A — masked prepare commutes with `update` at outside position.**

theoremcuccaro_maj_chain_inv_commute_update_outside_workspace_fun

theorem cuccaro_maj_chain_inv_commute_update_outside_workspace_fun
    (bits q_start flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (cuccaro_maj_chain_inv bits q_start) (update f flagPos v)
      = update (Gate.applyNat (cuccaro_maj_chain_inv bits q_start) f) flagPos v

*Function-level commute for `cuccaro_maj_chain_inv`.** Lifts the existing position-level theorem to a function equality.

theoremsqir_style_compareConst_candidate_commute_update_outside_fun

theorem sqir_style_compareConst_candidate_commute_update_outside_fun
    (bits q_start N flagPos controlIdx : Nat) (v : Bool) (f : Nat → Bool)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (update f controlIdx v)
      = update (Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos) f)
              controlIdx v

*Deliverable B — comparator commutes with `update` at outside position.**

theoremsqir_conditionalAddConstGate_commute_update_outside_fun

theorem sqir_conditionalAddConstGate_commute_update_outside_fun
    (bits q_start N flagPos controlIdx : Nat) (v : Bool) (f : Nat → Bool)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat (sqir_conditionalAddConstGate bits q_start N flagPos) (update f controlIdx v)
      = update (Gate.applyNat (sqir_conditionalAddConstGate bits q_start N flagPos) f)
              controlIdx v

*Deliverable C — conditionalAdd commutes with `update` at outside position.**

theoremsqir_conditionalSubConstGate_commute_update_outside_fun

theorem sqir_conditionalSubConstGate_commute_update_outside_fun
    (bits q_start N flagPos controlIdx : Nat) (v : Bool) (f : Nat → Bool)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat (sqir_conditionalSubConstGate bits q_start N flagPos) (update f controlIdx v)
      = update (Gate.applyNat (sqir_conditionalSubConstGate bits q_start N flagPos) f)
              controlIdx v

*Deliverable C — conditionalSub commutes with `update` at outside position.**

theoremsqir_style_modAddConst_clean_candidate_commute_update_control_outside_qstart

theorem sqir_style_modAddConst_clean_candidate_commute_update_control_outside_qstart
    (bits q_start N c controlIdx flagPos : Nat) (v : Bool) (f : Nat → Bool)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat (sqir_style_modAddConst_clean_candidate bits q_start N c flagPos)
        (update f controlIdx v)
      = update (Gate.applyNat (sqir_style_modAddConst_clean_candidate bits q_start N c flagPos) f)
              controlIdx v

*R7d^xxix-L-3.10′ helper: q_start-parametric clean modadd candidate commutes with `update` at controlIdx outside workspace ∪ {flagPos}.** q_start-parametric port of `sqir_style_modAddConst_clean_candidate_commute_update_control_outside`. All sub-deps already q_start-parametric: - `cuccaro_addConstGate_commute_update_outside_workspace` (CuccaroSQIRCondAdd.lean:685); - `sqir_style_compareConst_candidate_commute_update_outside_fun` (CuccaroSQIRDirtyFlag.lean:3132); - `sqir_conditionalSubConstGate_commute_update_outside_fun` (:3174); - `Gate.applyNat_X_commute_update_outside_fun` (:3039, generic).

theoremsqir_style_controlledModAddConst_candidate_control_true_state_eq_qstart

theorem sqir_style_controlledModAddConst_candidate_control_true_state_eq_qstart
    (bits q_start N c x controlIdx flagPos : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat
        (sqir_style_controlledModAddConst_candidate bits q_start N c controlIdx flagPos)
        (update (cuccaro_input_F q_start false 0 x) controlIdx true)
      = update (Gate.applyNat
                  (sqir_style_modAddConst_clean_candidate bits q_start N c flagPos)

*R7d^xxix-L-3.10′ HEADLINE: q_start-parametric control=true state equality for the controlled mod-N add candidate.** q_start-parametric port of `sqir_style_controlledModAddConst_candidate_control_true_state_eq`. The 5-stage rewrite chain is mirrored with `2 → q_start`, `1 → flagPos`, free `controlIdx`, with the standard outside-workspace hypotheses on both `controlIdx` and `flagPos`, plus distinctness. The state-equality lifts the controlled candidate's action (under external control = true) to the uncontrolled clean candidate applied to `cuccaro_input_F q_start false 0 x`, with the `controlIdx` slot pinned to `true` on both sides. Dependencies (all already q_start-parametric or just landed): - `sqir_style_modAddConst_clean_candidate_commute_update_control_outside_qstart` (above, this tick); - `sqir_conditionalAddConstGate_apply_true_fun` (CuccaroSQIRCondAdd.lean:379); - `sqir_conditionalSubConstGate_preserves_outside` (:2951); - `sqir_style_compareConst_candidate_frame_outside` (:175); - `cuccaro_addConstGate_preserves_outside_workspace_at` (:2918); - `sqir_controlledCompareConst_at_control_true_eq_unmasked_fun` (:2091); - `Gate.applyNat_CX_at_control_true_eq_X_fun` (generic CX).

theoremsqir_style_modAddConst_clean_candidate_commute_update_control_outside

theorem sqir_style_modAddConst_clean_candidate_commute_update_control_outside
    (bits N c controlIdx : Nat) (v : Bool) (f : Nat → Bool)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    Gate.applyNat (sqir_style_modAddConst_clean_candidate bits 2 N c 1)
        (update f controlIdx v)
      = update (Gate.applyNat (sqir_style_modAddConst_clean_candidate bits 2 N c 1) f)
              controlIdx v

*HEADLINE Deliverable D — clean modadd candidate commutes with `update` at controlIdx outside workspace ∪ {flagPos = 1}.**

theoremcuccaro_target_val_update_outside_workspace

theorem cuccaro_target_val_update_outside_workspace
    (bits q_start controlIdx : Nat) (v : Bool) (Y : Nat → Bool)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx) :
    cuccaro_target_val bits q_start (update Y controlIdx v)
      = cuccaro_target_val bits q_start Y

*Helper — `cuccaro_target_val` is invariant under `update` at outside controlIdx.**

theoremcuccaro_read_val_update_outside_workspace

theorem cuccaro_read_val_update_outside_workspace
    (bits q_start controlIdx : Nat) (v : Bool) (Y : Nat → Bool)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx) :
    cuccaro_read_val bits q_start (update Y controlIdx v)
      = cuccaro_read_val bits q_start Y

*Helper — `cuccaro_read_val` is invariant under `update` at outside controlIdx.**

theoremsqir_style_controlledModAddConst_candidate_control_true_state_eq

theorem sqir_style_controlledModAddConst_candidate_control_true_state_eq
    (bits N c x controlIdx : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    Gate.applyNat (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
        (update (cuccaro_input_F 2 false 0 x) controlIdx true)
      = update (Gate.applyNat (sqir_style_modAddConst_clean_candidate bits 2 N c 1)
                  (update (cuccaro_input_F 2 false 0 x) 1 false)) controlIdx true

*HEADLINE Deliverable A — control=true state equality for the controlled mod-N add candidate.**

theoremsqir_style_controlledModAddConst_candidate_target_decode_control_true_qstart

theorem sqir_style_controlledModAddConst_candidate_target_decode_control_true_qstart
    (bits q_start N c x controlIdx flagPos : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    cuccaro_target_val bits q_start
        (Gate.applyNat
          (sqir_style_controlledModAddConst_candidate bits q_start N c controlIdx flagPos)
          (update (cuccaro_input_F q_start false 0 x) controlIdx true))
      = (x + c) % N

*R7d^xxix-L-3.11′ DELIVERABLE: q_start-parametric control=true target decode.** q_start-parametric port of `sqir_style_controlledModAddConst_candidate_target_decode_control_true`. Three-step thin consequence of the L-3.10′ state equality: 1. rewrite via `_control_true_state_eq_qstart` to expose the uncontrolled clean candidate applied to `cuccaro_input_F q_start` with the `controlIdx` slot wrapped in `update _ controlIdx true`; 2. strip the outer `update controlIdx true` via `cuccaro_target_val_update_outside_workspace` (controlIdx lies outside the workspace by hypothesis); 3. close with `_modAddConst_clean_candidate_target_decode_qstart` (L-3.9′).

theoremsqir_style_controlledModAddConst_candidate_target_decode_control_true

theorem sqir_style_controlledModAddConst_candidate_target_decode_control_true
    (bits N c x controlIdx : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    cuccaro_target_val bits 2
        (Gate.applyNat (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
          (update (cuccaro_input_F 2 false 0 x) controlIdx true))
      = (x + c) % N

*Deliverable B — control=true target decode.**

theoremsqir_style_modAddConst_clean_candidate_flag_restored_qstart

theorem sqir_style_modAddConst_clean_candidate_flag_restored_qstart
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat
        (sqir_style_modAddConst_clean_candidate bits q_start N c flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos false) flagPos
      = false

*R7d^xxix-L-3.12′ helper: q_start-parametric clean-candidate flag restoration.** At `flagPos`, the uncontrolled clean candidate restores the flag to `false`. Direct q_start port of `sqir_style_modAddConst_clean_candidate_flag_restored` (line 1462).

theoremsqir_style_controlledModAddConst_candidate_workspace_control_true_qstart

theorem sqir_style_controlledModAddConst_candidate_workspace_control_true_qstart
    (bits q_start N c x controlIdx flagPos : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    cuccaro_read_val bits q_start
          (Gate.applyNat
            (sqir_style_controlledModAddConst_candidate bits q_start N c controlIdx flagPos)
            (update (cuccaro_input_F q_start false 0 x) controlIdx true))
        = 0

*R7d^xxix-L-3.12′ DELIVERABLE: q_start-parametric control=true workspace bundle.** 4-conjunct workspace bundle when the external control bit is `true`. After applying the controlled mod-N add candidate to `(update (cuccaro_input_F q_start false 0 x) controlIdx true)`: 1. `cuccaro_read_val` (read register) = 0; 2. position `q_start + 2 * bits` (top carry) = false; 3. position `flagPos` = false; 4. position `controlIdx` = true (preserved external control). Proof strategy mirrors the hard-coded version (line 3481+) but uses the L-3.10′ state-eq + inline read/top-carry computation + `_flag_restored_qstart` (helper above).

theoremsqir_style_controlledModAddConst_candidate_workspace_control_true

theorem sqir_style_controlledModAddConst_candidate_workspace_control_true
    (bits N c x controlIdx : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    cuccaro_read_val bits 2
          (Gate.applyNat (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
            (update (cuccaro_input_F 2 false 0 x) controlIdx true))
        = 0
    ∧ Gate.applyNat (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
          (update (cuccaro_input_F 2 false 0 x) controlIdx true) (2 + 2 * bits)

*Deliverable C — control=true workspace bundle (4-conjunct).**

theoremsqir_style_controlledModAddConst_candidate_target_decode

theorem sqir_style_controlledModAddConst_candidate_target_decode
    (bits N c x controlIdx : Nat) (control : Bool) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    cuccaro_target_val bits 2
        (Gate.applyNat (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
          (update (cuccaro_input_F 2 false 0 x) controlIdx control))
      = if control then (x + c) % N else x

*HEADLINE Deliverable D — combined controlled target decode.**

theoremsqir_style_controlledModAddConst_candidate_workspace

theorem sqir_style_controlledModAddConst_candidate_workspace
    (bits N c x controlIdx : Nat) (control : Bool) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    cuccaro_read_val bits 2
          (Gate.applyNat (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
            (update (cuccaro_input_F 2 false 0 x) controlIdx control))
        = 0
    ∧ Gate.applyNat (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
          (update (cuccaro_input_F 2 false 0 x) controlIdx control) (2 + 2 * bits)

*Deliverable E — combined workspace bundle.**

theoremsqir_style_controlledModAddConst_candidate_clean_qstart

theorem sqir_style_controlledModAddConst_candidate_clean_qstart
    (bits q_start N c x dim controlIdx flagPos : Nat) (control : Bool)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_control_lt_dim : controlIdx < dim)
    (h_flag_lt_dim : flagPos < dim) :
    Gate.WellTyped dim

*R7d^xxix-L-3.13′ DELIVERABLE: q_start-parametric controlled candidate clean bundle (combined over both control branches).** 6-conjunct bundle parametric in `q_start`, `flagPos`, `controlIdx`, `dim`, and `control : Bool`: 1. `Gate.WellTyped dim` of the candidate gate; 2. target decode = `if control then (x + c) % N else x`; 3. read register = 0; 4. position `q_start + 2 * bits` (top carry) = false; 5. position `flagPos` = false; 6. position `controlIdx` = `control` (preserved external control). Mechanical case-split on `control`: false branch fully delegated to `_clean_control_false_qstart` (L-3.8′); true branch reuses `_clean_control_false_qstart` only to extract the control-independent `Gate.WellTyped`, then assembles the remaining five conjuncts from `_target_decode_control_true_qstart` (L-3.11′) and `_workspace_control_true_qstart` (L-3.12′). No new arithmetic.

theoremsqir_style_controlledModAddConst_candidate_clean

theorem sqir_style_controlledModAddConst_candidate_clean
    (bits N c x controlIdx : Nat) (control : Bool) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1)
    (h_control_workspace_lt : controlIdx < sqir_modmult_rev_anc bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
    ∧ cuccaro_target_val bits 2
          (Gate.applyNat (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
            (update (cuccaro_input_F 2 false 0 x) controlIdx control))

*Deliverable F — controlled candidate clean bundle for c > 0.**

theoremsqir_style_controlledModAddConst_gate_zero_eq_qstart

theorem sqir_style_controlledModAddConst_gate_zero_eq_qstart
    (bits q_start N controlIdx flagPos : Nat) :
    sqir_style_controlledModAddConst_gate bits q_start N 0 controlIdx flagPos = Gate.I

*R7d^xxix-L-3.14′ helper: q_start-parametric `c = 0` reduction.** The total wrapper at `c = 0` is the identity gate, regardless of `q_start`/`flagPos`/`controlIdx`.

theoremsqir_style_controlledModAddConst_gate_zero_clean_qstart

theorem sqir_style_controlledModAddConst_gate_zero_clean_qstart
    (bits q_start N x dim controlIdx flagPos : Nat) (control : Bool)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim) :
    Gate.WellTyped dim
        (sqir_style_controlledModAddConst_gate bits q_start N 0 controlIdx flagPos)
    ∧ cuccaro_target_val bits q_start

*R7d^xxix-L-3.14′ helper: q_start-parametric `c = 0` clean bundle.** When `c = 0` the total wrapper is `Gate.I`, so all six conjuncts reduce to facts about the input state. Mirrors `sqir_style_controlledModAddConst_gate_zero_clean` (line 2150) with general `q_start`, `flagPos`, and free `dim`. Uses `cuccaro_input_F_at_outside_eq_false` for the flag conjunct instead of the hard-coded `unfold + if_pos`.

theoremsqir_style_controlledModAddConst_gate_clean_qstart

theorem sqir_style_controlledModAddConst_gate_clean_qstart
    (bits q_start N c x dim controlIdx flagPos : Nat) (control : Bool)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_control_lt_dim : controlIdx < dim)
    (h_flag_lt_dim : flagPos < dim) :
    Gate.WellTyped dim

*R7d^xxix-L-3.14′ DELIVERABLE: q_start-parametric total wrapper clean theorem.** Combines the `c = 0` case (delegated to `_gate_zero_clean_qstart` above, with the target conjunct re-massaged to match the headline's `if control then (x+c)%N else x` shape at `c = 0`) and the `c > 0` case (delegated to the L-3.13′ `_candidate_clean_qstart`). Mirrors `sqir_style_controlledModAddConst_gate_clean` (line 3871) with general `q_start`, `flagPos`, `controlIdx`, and free `dim`.

theoremsqir_style_controlledModAddConst_gate_clean

theorem sqir_style_controlledModAddConst_gate_clean
    (bits N c x controlIdx : Nat) (control : Bool) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1)
    (h_control_workspace_lt : controlIdx < sqir_modmult_rev_anc bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_controlledModAddConst_gate bits 2 N c controlIdx 1)
    ∧ cuccaro_target_val bits 2
          (Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c controlIdx 1)
            (update (cuccaro_input_F 2 false 0 x) controlIdx control))

*HEADLINE Deliverable G — total wrapper clean theorem.**

theoremsqir_style_controlledModAddConst_gate_clean_from_BasicSetting

theorem sqir_style_controlledModAddConst_gate_clean_from_BasicSetting
    (a r N m n c x controlIdx : Nat) (control : Bool)
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m n)
    (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * (n + 1) + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1)
    (h_control_workspace_lt : controlIdx < sqir_modmult_rev_anc (n + 1)) :
    Gate.WellTyped (sqir_modmult_rev_anc (n + 1))
        (sqir_style_controlledModAddConst_gate (n + 1) 2 N c controlIdx 1)
    ∧ cuccaro_target_val (n + 1) 2
          (Gate.applyNat (sqir_style_controlledModAddConst_gate (n + 1) 2 N c controlIdx 1)
            (update (cuccaro_input_F 2 false 0 x) controlIdx control))

*HEADLINE Deliverable H — BasicSetting-derived total wrapper clean theorem.**

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroDirtyFlagStageCorrectness

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroDirtyFlagStageCorrectness.lean

(no documented top-level declarations)

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroDirtyFlagStageCorrectness.BitLevelWorkspaceStateEq

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroDirtyFlagStageCorrectness/BitLevelWorkspaceStateEq.lean

CuccaroDirtyFlagStageCorrectness — Part4 (re-export shim part; same namespace, opens de-duplicated).

lemmacuccaro_prepareConstRead_zero_eq_id_fun

lemma cuccaro_prepareConstRead_zero_eq_id_fun
    (bits q_start : Nat) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_prepareConstRead bits q_start 0) f = f

Helper: `prepare(0)` is identity.

lemmacuccaro_addConstGate_zero_eq_full_adder_fun

lemma cuccaro_addConstGate_zero_eq_full_adder_fun
    (bits q_start : Nat) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_addConstGate bits q_start 0) f
      = Gate.applyNat (cuccaro_n_bit_adder_full bits q_start) f

Helper: on any input, `addConstGate(0)` agrees with the full adder.

theoremsqir_style_modAddConst_dirtyFlag_target_bit

theorem sqir_style_modAddConst_dirtyFlag_target_bit
    (bits q_start N c x flagPos i : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N) (hi : i < bits)
    (h_flag_distinct : ∀ j, j < bits → flagPos ≠ q_start + 2 * j + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos false)
        (q_start + 2 * i + 1)
      = ((x + c) % N).testBit i

*HEADLINE Deliverable A — dirty-flag target bit theorem.** At each target position `q_start + 2*i + 1` for `i < bits`, the dirty-flag candidate's output bit equals `((x + c) % N).testBit i`.

theoremsqir_style_modAddConst_dirtyFlag_read_bit

theorem sqir_style_modAddConst_dirtyFlag_read_bit
    (bits q_start N c x flagPos i : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N) (hi : i < bits)
    (h_flag_distinct : ∀ j, j < bits → flagPos ≠ q_start + 2 * j + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos false)
        (q_start + 2 * i + 2)
      = false

*HEADLINE Deliverable B — dirty-flag read bit theorem.** At each read position `q_start + 2*i + 2` for `i < bits`, the dirty-flag candidate's output bit is `false`.

theoremsqir_style_modAddConst_dirtyFlag_frame_outside

theorem sqir_style_modAddConst_dirtyFlag_frame_outside
    (bits q_start N c flagPos : Nat) (f : Nat → Bool)
    (h_flag_distinct : ∀ j, j < bits → flagPos ≠ q_start + 2 * j + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (q : Nat) (h_q_ne_flagPos : q ≠ flagPos)
    (h_q_outside : q < q_start ∨ q_start + 2 * bits + 1 ≤ q) :
    Gate.applyNat (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos) f q
      = f q

Frame: dirty-flag candidate preserves values at positions outside workspace ∪ {flagPos}.

theoremsqir_style_modAddConst_dirtyFlag_state_eq

theorem sqir_style_modAddConst_dirtyFlag_state_eq
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ j, j < bits → flagPos ≠ q_start + 2 * j + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos false)
      = update (cuccaro_input_F q_start false 0 ((x + c) % N))
              flagPos (decide (N ≤ x + c))

*HEADLINE Deliverable C — dirty-flag state equality.** As a function, the post-dirty-flag state equals `update (cuccaro_input_F false 0 ((x+c) % N)) flagPos (decide(N ≤ x+c))`.

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroDirtyFlagStageCorrectness.DirtyFlagArithmeticAndPostState

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroDirtyFlagStageCorrectness/DirtyFlagArithmeticAndPostState.lean

CuccaroDirtyFlagStageCorrectness — Part1 (re-export shim part; same namespace, opens de-duplicated).

theoremsqir_dirty_modadd_arith

theorem sqir_dirty_modadd_arith
    (bits N x c : Nat) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N) :
    (x + c + (if decide (N ≤ x + c) then 2^bits - N else 0)) % 2^bits
      = (x + c) % N

*HEADLINE Deliverable A — dirty-flag modular reduction arithmetic.** For `x, c < N` and `2*N ≤ 2^bits`, `(x + c + (if decide (N ≤ x+c) then 2^bits - N else 0)) % 2^bits = (x + c) % N`.

theoremcuccaro_addConstGate_output_eq_cuccaro_input_F

theorem cuccaro_addConstGate_output_eq_cuccaro_input_F
    (bits q_start c x : Nat) (hbits : 1 ≤ bits)
    (hc : c < 2^bits) (hx : x < 2^bits) (h_sum : x + c < 2^bits) :
    Gate.applyNat (cuccaro_addConstGate bits q_start c)
        (cuccaro_input_F q_start false 0 x)
      = cuccaro_input_F q_start false 0 (x + c)

*HEADLINE — post-addConst function equality.** Applying `cuccaro_addConstGate bits q_start c` to `cuccaro_input_F q_start false 0 x` gives `cuccaro_input_F q_start false 0 (x+c)` as a function, provided `x + c < 2^bits`.

theoremsqir_dirty_modadd_after_add_state_eq

theorem sqir_dirty_modadd_after_add_state_eq
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (cuccaro_addConstGate bits q_start c)
        (update (cuccaro_input_F q_start false 0 x) flagPos false)
      = update (cuccaro_input_F q_start false 0 (x + c)) flagPos false

*HEADLINE Deliverable A — post-add state with external flag.**

theoremsqir_style_compareConst_candidate_frame_outside

theorem sqir_style_compareConst_candidate_frame_outside
    (bits q_start N flagPos : Nat) (f : Nat → Bool)
    (q : Nat) (h_q_ne_flagPos : q ≠ flagPos)
    (h_q_outside : q < q_start ∨ q_start + 2 * bits + 1 ≤ q) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos) f q = f q

*`sqir_style_compareConst_candidate` frame at positions outside workspace ∪ {flagPos}.** Layer order after `simp [applyNat_seq]` (outermost first): prepare₂ → maj_inv → CX → maj → prepare₁. We strip from the outside in.

theoremsqir_dirty_modadd_after_compare_state_eq

theorem sqir_dirty_modadd_after_compare_state_eq
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (update (cuccaro_input_F q_start false 0 (x + c)) flagPos false)
      = update (cuccaro_input_F q_start false 0 (x + c)) flagPos (decide (N ≤ x + c))

*HEADLINE Deliverable B — post-compare state with external flag.**

theoremsqir_style_modAddConst_dirtyFlag_target_decode

theorem sqir_style_modAddConst_dirtyFlag_target_decode
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_target_val bits q_start
        (Gate.applyNat
          (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
          (update (cuccaro_input_F q_start false 0 x) flagPos false))
      = (x + c) % N

*HEADLINE Deliverable D — dirty-flag mod-N target decode.**

theoremsqir_style_modAddConst_dirtyFlag_read_decode

theorem sqir_style_modAddConst_dirtyFlag_read_decode
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_read_val bits q_start
        (Gate.applyNat
          (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
          (update (cuccaro_input_F q_start false 0 x) flagPos false))
      = 0

*HEADLINE Deliverable A — read register restored after the dirty-flag candidate.**

theoremsqir_style_modAddConst_dirtyFlag_carry_in_restored

theorem sqir_style_modAddConst_dirtyFlag_carry_in_restored
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos false) q_start
      = false

*HEADLINE Deliverable A (continued) — carry-in restored.**

theoremsqir_style_modAddConst_dirtyFlag_flag_value

theorem sqir_style_modAddConst_dirtyFlag_flag_value
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos false) flagPos
      = decide (N ≤ x + c)

*HEADLINE Deliverable A (continued) — flag holds `decide (N ≤ x + c)`.** The flag is DIRTY: it stores the comparison result, not the input `false`. Naming the field `dirtyFlag` is mandatory; do not advertise this as clean modular addition.

theoremsqir_style_modAddConst_dirtyFlag_candidate_wellTyped_sqir_dim

theorem sqir_style_modAddConst_dirtyFlag_candidate_wellTyped_sqir_dim
    (bits q_start N c flagPos : Nat) (hbits : 1 ≤ bits)
    (h_workspace : q_start + 2 * bits + 1 ≤ sqir_modmult_rev_anc bits)
    (h_flag : flagPos < sqir_modmult_rev_anc bits)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_distinct_top : flagPos ≠ q_start + 2 * bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)

*HEADLINE Deliverable B — WellTyped at the SQIR-faithful dimension `sqir_modmult_rev_anc bits = 3 * bits + 11`.**

theoremsqir_style_modAddConst_dirtyFlag_clean_except_flag

theorem sqir_style_modAddConst_dirtyFlag_clean_except_flag
    (bits q_start N c x flagPos dim : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.WellTyped dim
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
    ∧ cuccaro_target_val bits q_start

*HEADLINE Deliverable C — packaged dirty-flag mod-N add bundle.** Provides WellTyped, target decode, read restored, carry restored, and the dirty flag value, all under the dirty-flag precondition set (2*N ≤ 2^bits, x < N, c < N, flagPos above workspace).

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroDirtyFlagStageCorrectness.ExactLayoutAndFlagUncompute

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroDirtyFlagStageCorrectness/ExactLayoutAndFlagUncompute.lean

CuccaroDirtyFlagStageCorrectness — Part3 (re-export shim part; same namespace, opens de-duplicated).

theoremsqir_style_compareConst_candidate_flag_sqir_layout

theorem sqir_style_compareConst_candidate_flag_sqir_layout
    (bits N x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Gate.applyNat (sqir_style_compareConst_candidate bits 2 N 1)
        (cuccaro_input_F 2 false 0 x) 1
      = decide (N ≤ x)

*Deliverable A — SQIR-layout comparator flag-copy.**

theoremsqir_style_compareConst_candidate_clean_sqir_layout

theorem sqir_style_compareConst_candidate_clean_sqir_layout
    (bits N x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_compareConst_candidate bits 2 N 1)
    ∧ Gate.applyNat (sqir_style_compareConst_candidate bits 2 N 1)
          (cuccaro_input_F 2 false 0 x) 1
        = decide (N ≤ x)
    ∧ (∀ i, i < bits →
        Gate.applyNat (sqir_style_compareConst_candidate bits 2 N 1)
          (cuccaro_input_F 2 false 0 x) (2 + 2 * i + 2)
          = false)

*Deliverable B — SQIR-layout clean comparator bundle.**

theoremsqir_style_modAddConst_dirtyFlag_target_decode_sqir_layout

theorem sqir_style_modAddConst_dirtyFlag_target_decode_sqir_layout
    (bits N c x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N) :
    cuccaro_target_val bits 2
        (Gate.applyNat
          (sqir_style_modAddConst_dirtyFlag_candidate bits 2 N c 1)
          (update (cuccaro_input_F 2 false 0 x) 1 false))
      = (x + c) % N

*Deliverable C — SQIR-layout dirty-flag mod-N add target decode.**

theoremsqir_style_modAddConst_dirtyFlag_clean_except_flag_sqir_layout

theorem sqir_style_modAddConst_dirtyFlag_clean_except_flag_sqir_layout
    (bits N c x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_modAddConst_dirtyFlag_candidate bits 2 N c 1)
    ∧ cuccaro_target_val bits 2
          (Gate.applyNat
            (sqir_style_modAddConst_dirtyFlag_candidate bits 2 N c 1)
            (update (cuccaro_input_F 2 false 0 x) 1 false))
        = (x + c) % N
    ∧ cuccaro_read_val bits 2

*Deliverable D — SQIR-layout dirty-flag mod-N add clean-except-flag bundle.**

theoremsqir_style_modAddConst_dirtyFlag_clean_except_flag_from_BasicSetting

theorem sqir_style_modAddConst_dirtyFlag_clean_except_flag_from_BasicSetting
    (a r N m n c x : Nat)
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m n)
    (hx : x < N) (hc : c < N) :
    Gate.WellTyped (sqir_modmult_rev_anc (n + 1))
        (sqir_style_modAddConst_dirtyFlag_candidate (n + 1) 2 N c 1)
    ∧ cuccaro_target_val (n + 1) 2
          (Gate.applyNat
            (sqir_style_modAddConst_dirtyFlag_candidate (n + 1) 2 N c 1)
            (update (cuccaro_input_F 2 false 0 x) 1 false))
        = (x + c) % N
    ∧ cuccaro_read_val (n + 1) 2

*Deliverable E — BasicSetting-based SQIR-layout corollary.** Combines the SQIR-layout bundle with the sizing relation from `BasicSetting`. Instantiates `bits := n + 1` as the canonical workspace width per `BasicSetting_twoN_le_pow_succ`.

lemmaprepareMaj_at_top_eq_after_update

lemma prepareMaj_at_top_eq_after_update
    (bits q_start N x flagPos : Nat) (flag : Bool)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (cuccaro_maj_chain bits q_start)
        (Gate.applyNat (cuccaro_prepareConstRead bits q_start (2^bits - N))
          (update (cuccaro_input_F q_start false 0 x) flagPos flag))
        (q_start + 2 * bits)
      = decide (N ≤ x)

Helper for the XOR flag theorem: the inner `(prepare; maj)` block at `q_start + 2*bits` (top carry) equals `decide (N ≤ x)` even when the input has an outside `update` at `flagPos`.

theoremsqir_style_compareConst_candidate_flag_xor

theorem sqir_style_compareConst_candidate_flag_xor
    (bits q_start N x flagPos : Nat) (flag : Bool)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos flag) flagPos
      = xor flag (decide (N ≤ x))

*HEADLINE Task 1 — comparator flag-XOR semantics.** For any initial flag value `flag`, the SQIR-style comparator at `flagPos` returns `flag XOR decide (N ≤ x)`. This is the key polarity result needed for any flag-uncomputation construction.

theoremsqir_style_compareConst_candidate_flag_xor_sqir_layout

theorem sqir_style_compareConst_candidate_flag_xor_sqir_layout
    (bits N x : Nat) (flag : Bool) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Gate.applyNat (sqir_style_compareConst_candidate bits 2 N 1)
        (update (cuccaro_input_F 2 false 0 x) 1 flag) 1
      = xor flag (decide (N ≤ x))

*SQIR-layout corollary of Task 1.**

theoremdecide_c_le_xc_mod_N_eq_not_decide_N_le_xc

theorem decide_c_le_xc_mod_N_eq_not_decide_N_le_xc
    (N x c : Nat) (hN_pos : 0 < N) (hc_pos : 0 < c)
    (hx : x < N) (hc : c < N) :
    decide (c ≤ (x + c) % N) = ! decide (N ≤ x + c)

*HEADLINE — arithmetic identity for clean candidate.** For `0 < c`, `x < N`, `c < N`, the comparator's result on the reduced target `(x+c) % N` is precisely the negation of the dirty flag.

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroDirtyFlagStageCorrectness.WellTypedAndSizing

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroDirtyFlagStageCorrectness/WellTypedAndSizing.lean

CuccaroDirtyFlagStageCorrectness — Part2 (re-export shim part; same namespace, opens de-duplicated).

theoremsqir_style_modAddConst_dirtyFlag_candidate_wellTyped_sqir_layout

theorem sqir_style_modAddConst_dirtyFlag_candidate_wellTyped_sqir_layout
    (bits N c : Nat) (hbits : 1 ≤ bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_modAddConst_dirtyFlag_candidate bits 2 N c 1)

*Deliverable D (partial) — WellTyped at the exact SQIR layout `q_start = 2, flagPos = 1, dim = sqir_modmult_rev_anc bits`.**

theoremBasicSetting_twoN_le_pow_succ

theorem BasicSetting_twoN_le_pow_succ
    (a r N m n : Nat)
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m n) :
    2 * N ≤ 2 ^ (n + 1)

*HEADLINE Deliverable E — sizing relation: `BasicSetting a r N m n` implies `2 * N ≤ 2^(n + 1)`.** Reading: Shor's data register width is `n` bits; the dirty-flag modular adder must be instantiated at `bits := n + 1` (one extra bit) so that intermediate `x + c` cannot overflow before the comparator sees the top carry. This matches SQIR's `n + 1`-bit workspace per modular addition.

theoremcuccaro_input_F_at_outside_eq_false

theorem cuccaro_input_F_at_outside_eq_false
    (q_start bits x flagPos : Nat)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hx : x < 2^bits) :
    cuccaro_input_F q_start false 0 x flagPos = false

Helper: `cuccaro_input_F q_start false 0 x` evaluates to `false` at any position outside the workspace `[q_start, q_start + 2*bits]`.

theoremsqir_style_compareConst_candidate_flag_general

theorem sqir_style_compareConst_candidate_flag_general
    (bits q_start N x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (cuccaro_input_F q_start false 0 x) flagPos
      = decide (N ≤ x)

*Generalized flag-copy theorem.** For any `flagPos` outside the workspace (below OR above), the SQIR-style comparator candidate outputs `decide (N ≤ x)` at `flagPos`.

theoremsqir_style_compareConst_candidate_workspace_restored_at_general

theorem sqir_style_compareConst_candidate_workspace_restored_at_general
    (bits q_start N flagPos : Nat) (f : Nat → Bool)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (q : Nat) (hq_lower : q_start ≤ q) (hq_upper : q < q_start + 2 * bits + 1) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos) f q
      = f q

*Generalized workspace restoration (at-position).** At any workspace position, the SQIR-style comparator candidate restores the input value, for any `flagPos` outside workspace.

theoremsqir_dirty_modadd_after_compare_state_eq_general

theorem sqir_dirty_modadd_after_compare_state_eq_general
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (update (cuccaro_input_F q_start false 0 (x + c)) flagPos false)
      = update (cuccaro_input_F q_start false 0 (x + c))
              flagPos (decide (N ≤ x + c))

*Generalized compare-state equality** (Tick 59 Deliverable B, relaxed to `hflag_out`).

theoremsqir_style_modAddConst_dirtyFlag_target_decode_general

theorem sqir_style_modAddConst_dirtyFlag_target_decode_general
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_target_val bits q_start
        (Gate.applyNat
          (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
          (update (cuccaro_input_F q_start false 0 x) flagPos false))
      = (x + c) % N

*Generalized dirty-flag mod-N add target decode** (Tick 59 D, relaxed to `hflag_out`).

theoremsqir_style_modAddConst_dirtyFlag_read_decode_general

theorem sqir_style_modAddConst_dirtyFlag_read_decode_general
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_read_val bits q_start
        (Gate.applyNat
          (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
          (update (cuccaro_input_F q_start false 0 x) flagPos false))
      = 0

*Generalized workspace conjuncts** (Tick 60 A, relaxed to `hflag_out`).

theoremsqir_style_modAddConst_dirtyFlag_carry_in_restored_general

theorem sqir_style_modAddConst_dirtyFlag_carry_in_restored_general
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos false) q_start
      = false

theoremsqir_style_modAddConst_dirtyFlag_flag_value_general

theorem sqir_style_modAddConst_dirtyFlag_flag_value_general
    (bits q_start N c x flagPos : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos false) flagPos
      = decide (N ≤ x + c)

theoremsqir_style_modAddConst_dirtyFlag_clean_except_flag_general

theorem sqir_style_modAddConst_dirtyFlag_clean_except_flag_general
    (bits q_start N c x flagPos dim : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hx : x < N) (hc : c < N)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_flag_distinct : ∀ i, i < bits → flagPos ≠ q_start + 2 * i + 2)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.WellTyped dim
        (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
    ∧ cuccaro_target_val bits q_start

*Generalized clean-except-flag bundle** (Tick 60 C, relaxed to `hflag_out`). Supports both above- AND below-workspace flag.

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRDirtyFlag.CuccaroModularAddDefinitions

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRDirtyFlag/CuccaroModularAddDefinitions.lean

## Tick 62 — Clean modular-add candidate definition.

defsqir_style_modAddConst_clean_candidate

def sqir_style_modAddConst_clean_candidate
    (bits q_start N c flagPos : Nat) : Gate

*Clean modular add-constant candidate** for `0 < c < N`. Structure: dirty-flag candidate ; compareConst(c) ; X(flagPos). The compareConst(c) XORs `decide(c ≤ (x+c) % N) = ¬decide(N ≤ x+c)` into the flag, then X negates. Net flag effect: `flag → ¬(flag XOR decide(N ≤ x+c) XOR ¬decide(N ≤ x+c)) = ¬(flag XOR true) = flag`, so the flag is restored. The cleanup also re-touches the target / read / carry workspace, but by the comparator's workspace_restored property these end up at the same values as the dirty-flag stage. *Caveat on `c = 0`:** `compareConst(0)` cannot be implemented in `bits` bits because `K = 2^bits` overflows the read register. For `c = 0` the modular add is the identity and the dirty flag is already `false`; the clean candidate is correct only for `0 < c`. A wrapper that dispatches `c = 0` to identity is straightforward but introduces a conditional gate structure (deferred).

defsqir_style_modAddConst_clean_gate

def sqir_style_modAddConst_clean_gate (bits N c : Nat) : Gate

*Deliverable A — total clean modular add-constant gate.** Wraps the clean candidate (which requires `0 < c`) so that the `c = 0` case dispatches to the identity gate. This is the official clean mod-add-constant primitive at the SQIR-faithful layout `q_start = 2, flagPos = 1, dim = sqir_modmult_rev_anc bits`.

defsqir_controlledCompareConst

def sqir_controlledCompareConst
    (bits q_start c controlIdx flagPos : Nat) : Gate

*Controlled compareConst** — masked-prepare variant of `sqir_style_compareConst_candidate`. When `controlIdx = false`, identity at every position; when `controlIdx = true`, equivalent to `sqir_style_compareConst_candidate bits q_start c flagPos`.

defsqir_style_controlledModAddConst_candidate

def sqir_style_controlledModAddConst_candidate
    (bits q_start N c controlIdx flagPos : Nat) : Gate

*Controlled SQIR-style mod-N add-constant candidate** for `0 < c`.

defsqir_style_controlledModAddConst_gate

def sqir_style_controlledModAddConst_gate
    (bits q_start N c controlIdx flagPos : Nat) : Gate

*Total controlled SQIR mod-N add-constant** wrapper handling `c = 0`.

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRModAdd

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRModAdd.lean

FormalRV.BQAlgo.CuccaroSQIRModAdd — SQIR-style modular add-constant SKELETON. Tick 53: build the first SQIR-style modular adder skeleton. Background: SQIR's `modadder21` (ModMult.v lines 134-137) is register-to-register modular addition `[M][x][y] → [M][(x+y) % M][y]`. For our Lean development targeting Shor's modular multiplier (which multiplies by a CLASSICAL constant), we adapt this to a register-to- CONSTANT modular addition: `target ← (target + c) mod N` The SQIR sequence for register-to-register: swapper02 ; adder01 ; swapper02 ; -- target ← (target + y) mod 2^n comparator01 ; -- flag ← decide (M ≤ target) bygatectrl 1 (subtractor01) ; bcx 1 ; -- conditional sub of M, flip flag swapper02 ; bcinv (comparator01) ; -- uncompute flag (swap to undo logic) swapper02. Our adapted sequence for register-to-constant (with the constant c and modulus N): cuccaro_addConstGate c ; -- target ← (target + c) mod 2^bits sqir_style_compareConst_candidate N ; -- flag ← decide (N ≤ target) [conditional sub of N] ; -- target ← target - N if flag = 1 [flag uncompute]. This file lands the SKELETON `addConst c ; compareConst N` (Tick 53, Deliverable 6 fallback per directive). The conditional-sub + flag-uncompute steps are deferred to Tick 54+. Reason for split: the conditional subtract requires either a controlled-CCX (not in our IR), or a manual controlled re-encoding of the subtractor. Both are substantial work and deserve their own tick. This tick proves: - WellTyped. - Flag = `decide (N ≤ x + c)` (after the skeleton). - Target decode = `(x + c) % 2^bits`. - Read register restored to 0. - Carry-in qubit restored to 0. These four together characterize the skeleton's behavior precisely.

defsqir_style_modAddConst_skeleton

def sqir_style_modAddConst_skeleton
    (bits q_start N c flagPos : Nat) : Gate

*Skeleton modular add-constant** (Tick 53). Composes the clean add-const + clean compare-const primitives. The result has the target register holding `(x + c) mod 2^bits` and the external flag at `flagPos` holding `decide (N ≤ x + c)`.

theoremsqir_style_modAddConst_skeleton_wellTyped

theorem sqir_style_modAddConst_skeleton_wellTyped
    (bits q_start N c flagPos dim : Nat)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_flag_distinct : flagPos ≠ q_start + 2 * bits) :
    Gate.WellTyped dim
        (sqir_style_modAddConst_skeleton bits q_start N c flagPos)

theoremsqir_style_modAddConst_skeleton_target_bit

theorem sqir_style_modAddConst_skeleton_target_bit
    (bits q_start N c x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hc : c < 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    ∀ i, i < bits →
      Gate.applyNat (sqir_style_modAddConst_skeleton bits q_start N c flagPos)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * i + 1)
      = (x + c).testBit i

*Target bit after the skeleton.** At each target position `q_start + 2*i + 1` for `i < bits`, the output equals `(x+c).testBit i`.

theoremsqir_style_modAddConst_skeleton_read_bit

theorem sqir_style_modAddConst_skeleton_read_bit
    (bits q_start N c x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hc : c < 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    ∀ i, i < bits →
      Gate.applyNat (sqir_style_modAddConst_skeleton bits q_start N c flagPos)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * i + 2)
      = false

*Read bit after the skeleton.** At each read position `q_start + 2*i + 2` for `i < bits`, the output equals `false`.

theoremsqir_style_modAddConst_skeleton_carry_in_bit

theorem sqir_style_modAddConst_skeleton_carry_in_bit
    (bits q_start N c x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hc : c < 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_style_modAddConst_skeleton bits q_start N c flagPos)
        (cuccaro_input_F q_start false 0 x) q_start = false

*Carry-in qubit after the skeleton.** At position `q_start`, the output equals `false`.

theoremsqir_style_modAddConst_skeleton_target_decode

theorem sqir_style_modAddConst_skeleton_target_decode
    (bits q_start N c x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hc : c < 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (sqir_style_modAddConst_skeleton bits q_start N c flagPos)
          (cuccaro_input_F q_start false 0 x))
      = (x + c) % 2^bits

*HEADLINE — decoded target correctness.** After the skeleton, the target register decodes to `(x + c) % 2^bits`.

theoremsqir_style_modAddConst_skeleton_read_decode

theorem sqir_style_modAddConst_skeleton_read_decode
    (bits q_start N c x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hc : c < 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_read_val bits q_start
        (Gate.applyNat (sqir_style_modAddConst_skeleton bits q_start N c flagPos)
          (cuccaro_input_F q_start false 0 x))
      = 0

*Decoded read restoration.**

theoremsqir_style_modAddConst_skeleton_clean

theorem sqir_style_modAddConst_skeleton_clean
    (bits q_start N c x flagPos dim : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hc : c < 2^bits) (hx : x < 2^bits)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_flag_distinct : flagPos ≠ q_start + 2 * bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.WellTyped dim
        (sqir_style_modAddConst_skeleton bits q_start N c flagPos)
    ∧ cuccaro_target_val bits q_start
          (Gate.applyNat (sqir_style_modAddConst_skeleton bits q_start N c flagPos)
            (cuccaro_input_F q_start false 0 x))

*HEADLINE — packaged skeleton primitive.** Bundles WellTyped + target decode + read restored + carry-in restored. The flag-behavior theorem is DEFERRED to Tick 54 (requires the input-state equivalence argument and is needed for the controlled-sub-N step).

FormalRV.Arithmetic.Cuccaro.CuccaroSQIRStyle

FormalRV/Arithmetic/Cuccaro/CuccaroSQIRStyle.lean

FormalRV.BQAlgo.CuccaroSQIRStyle — SQIR-style compute-CNOT-uncompute comparator candidate. Tick 49 / Recovery of SQIR/RCIR exact-budget construction. CRITICAL DISCOVERY (Tick 49 source inspection of `SQIR/examples/shor/ModMult.v`): - **SQIR's actual `modmult_rev_anc n = 3 * n + 11`** (line 72 of `ModMult.v`), NOT `2 * n + 1` as the Lean placeholder in `SQIRPort/Shor.lean:4563` claims. - The Lean comment at line 4560-4562 incorrectly says "the specific RCIR implementation in Coq uses a similar linear-in-n count" — the actual SQIR value is `3n + 11`, with `3` non-overlapping n-bit registers + 2 designated flag bits at positions 0 and 1 + additional scratch. Consequence: the "exact-budget" framing in Ticks 41-48 was based on a too-tight Lean placeholder. The real SQIR budget gives substantial room for a designated flag qubit. SQIR's comparator01 (ModMult.v line 121-124): ``` comparator01 n := (bcx 0; negator0 n); highb01 n; bcinv (bcx 0; negator0 n). highb01 n := MAJseq n; bccnot (1 + n) 1; bcinv (MAJseq n). ``` - Position 1 is the designated FLAG bit. - `bccnot (1 + n) 1`: CNOT from the top carry (at position `1 + n`) to the flag (position 1). - The compute-CNOT-uncompute pattern: MAJseq forward, copy carry to flag, MAJseq reverse. This file ports the compute-CNOT-uncompute structure to our Cuccaro Gate IR. We: - Define `cuccaro_MAJ_inv` (the gate-level inverse of MAJ). - Define `cuccaro_maj_chain_inv` (the chain-level inverse). - Prove the local MAJ inverse identity: `MAJ ; MAJ_inv = id`. - Define `sqir_style_compareConst_candidate` matching SQIR's pattern, parameterized over an explicit flag qubit position `flagPos`. - Prove WellTyped (assuming `flagPos < dim`).

defcuccaro_MAJ_inv

def cuccaro_MAJ_inv (a b c : Nat) : Gate

*Inverse of the Cuccaro MAJ gate.** Since each component gate (CX, CCX) is self-inverse, the inverse is the reversed sequence.

defcuccaro_maj_chain_inv

def cuccaro_maj_chain_inv : Nat → Nat → Gate
  | 0,     _       => I
  | n + 1, q_start =>
      seq (cuccaro_maj_chain_inv n (q_start + 2))
          (cuccaro_MAJ_inv q_start (q_start + 1) (q_start + 2))

*Inverse of the n-step Cuccaro MAJ chain.**

theoremcuccaro_MAJ_inv_wellTyped

theorem cuccaro_MAJ_inv_wellTyped
    (dim a b c : Nat) (ha : a < dim) (hb : b < dim) (hc : c < dim)
    (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) :
    Gate.WellTyped dim (cuccaro_MAJ_inv a b c)

theoremcuccaro_maj_chain_inv_wellTyped

theorem cuccaro_maj_chain_inv_wellTyped
    (n q_start dim : Nat) (h : q_start + 2 * n + 1 ≤ dim) :
    Gate.WellTyped dim (cuccaro_maj_chain_inv n q_start)

defsqir_modmult_rev_anc

def sqir_modmult_rev_anc (n : Nat) : Nat

*SQIR-faithful ancilla count** (Coq `ModMult.v` line 72). SQIR uses `3 * n + 11` ancilla qubits for `modmult_rev`, NOT the `2 * n + 1` Lean placeholder. We expose this separately to avoid silently patching the Lean placeholder while still making the real SQIR value available for parallel SQIR-faithful Lean development.

theoremsqir_modmult_total_dim

theorem sqir_modmult_total_dim (n : Nat) :
    n + sqir_modmult_rev_anc n = 4 * n + 11

Total dimension under SQIR-faithful ancilla count: `4 * n + 11`.

theoremsqir_modmult_anc_diff_from_lean_placeholder

theorem sqir_modmult_anc_diff_from_lean_placeholder (n : Nat) :
    sqir_modmult_rev_anc n
      = FormalRV.SQIRPort.modmult_rev_anc n + (n + 10)

Arithmetic gap between Lean placeholder and SQIR source. The placeholder undercounts ancilla by `n + 10`.

theoremcuccaro_MAJ_inv_at_a

theorem cuccaro_MAJ_inv_at_a
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_MAJ_inv a b c) f a
      = xor (f a) (xor (f c) (f a && f b))

*MAJ_inv at the `a` wire.** Composes CCX, CX c a, CX c b left-to-right; at position a only the CX c a step writes a new value.

theoremcuccaro_MAJ_inv_at_b

theorem cuccaro_MAJ_inv_at_b
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_MAJ_inv a b c) f b
      = xor (f b) (xor (f c) (f a && f b))

*MAJ_inv at the `b` wire.**

theoremcuccaro_MAJ_inv_at_c

theorem cuccaro_MAJ_inv_at_c
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_MAJ_inv a b c) f c
      = xor (f c) (f a && f b)

*MAJ_inv at the `c` wire.** Only the first CCX writes here.

theoremcuccaro_MAJ_inv_at_other

theorem cuccaro_MAJ_inv_at_other
    (a b c q : Nat) (h_qa : q ≠ a) (h_qb : q ≠ b) (h_qc : q ≠ c) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_MAJ_inv a b c) f q = f q

*MAJ_inv at any unrelated wire.**

theoremcuccaro_MAJ_followed_by_MAJ_inv_eq_id

theorem cuccaro_MAJ_followed_by_MAJ_inv_eq_id
    (a b c : Nat) (h_ab : a ≠ b) (h_ac : a ≠ c) (h_bc : b ≠ c)
    (f : Nat → Bool) (q : Nat) :
    Gate.applyNat (seq (cuccaro_MAJ a b c) (cuccaro_MAJ_inv a b c)) f q = f q

*Local inverse identity** (per position). Applying MAJ followed by MAJ_inv to a state restores the original at every position.

theoremcuccaro_maj_chain_inv_after_chain_eq_id

theorem cuccaro_maj_chain_inv_after_chain_eq_id
    (n q_start : Nat) (g : Nat → Bool) :
    Gate.applyNat (cuccaro_maj_chain_inv n q_start)
        (Gate.applyNat (cuccaro_maj_chain n q_start) g) = g

*Chain inverse identity (function-level).** Applying the chain followed by its inverse to any state returns the original state.

theoremcuccaro_maj_chain_followed_by_inv_eq_id

theorem cuccaro_maj_chain_followed_by_inv_eq_id
    (n q_start : Nat) (f : Nat → Bool) (q : Nat) :
    Gate.applyNat
        (seq (cuccaro_maj_chain n q_start)
             (cuccaro_maj_chain_inv n q_start)) f q = f q

*Chain inverse identity** (per position). Pointwise corollary.

defsqir_style_compareConst_candidate

def sqir_style_compareConst_candidate
    (bits q_start N flagPos : Nat) : Gate

*SQIR-style compare-constant candidate gate** with explicit flag position. Uses the compute-CNOT-uncompute pattern: workspace restored, flag XOR'd with the comparison result.

theoremsqir_style_compareConst_candidate_wellTyped

theorem sqir_style_compareConst_candidate_wellTyped
    (bits q_start N flagPos dim : Nat)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_distinct : flagPos ≠ q_start + 2 * bits) :
    Gate.WellTyped dim
        (sqir_style_compareConst_candidate bits q_start N flagPos)

*WellTyped for the SQIR-style comparator candidate.** Requires both the workspace range `q_start + 2*bits + 1 ≤ dim` AND `flagPos < dim`, plus `flagPos ≠ q_start + 2 * bits` (the CNOT's controls and targets must differ).

theoremcuccaro_maj_chain_inv_frame_above

theorem cuccaro_maj_chain_inv_frame_above
    (n q_start : Nat) (f : Nat → Bool) (q : Nat)
    (h : q_start + 2 * n + 1 ≤ q) :
    Gate.applyNat (cuccaro_maj_chain_inv n q_start) f q = f q

The inverse MAJ chain doesn't touch positions strictly above its support (i.e., `q ≥ q_start + 2*n + 1`).

theoremcuccaro_maj_chain_inv_frame_below

theorem cuccaro_maj_chain_inv_frame_below
    (n q_start : Nat) (f : Nat → Bool) (q : Nat) (h : q < q_start) :
    Gate.applyNat (cuccaro_maj_chain_inv n q_start) f q = f q

The inverse MAJ chain doesn't touch positions strictly below its support (i.e., `q < q_start`).

theoremcuccaro_input_F_above_eq_false

theorem cuccaro_input_F_above_eq_false
    (q_start bits x q : Nat) (h_above : q_start + 2 * bits + 1 ≤ q) (hx : x < 2^bits) :
    cuccaro_input_F q_start false 0 x q = false

For an input `cuccaro_input_F q_start false 0 x` with `x < 2^bits`, all positions strictly above `q_start + 2*bits` evaluate to `false`.

theoremsqir_style_compareConst_candidate_flag

theorem sqir_style_compareConst_candidate_flag
    (bits q_start N x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (cuccaro_input_F q_start false 0 x) flagPos
      = decide (N ≤ x)

*HEADLINE — flag-copy theorem.** After running the SQIR-style comparator candidate on the input encoding `cuccaro_input_F q_start false 0 x`, the external flag qubit at `flagPos` holds `decide (N ≤ x)`.

theoremsqir_style_compareConst_candidate_underflow_flag

theorem sqir_style_compareConst_candidate_underflow_flag
    (bits q_start N x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    !(Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (cuccaro_input_F q_start false 0 x) flagPos)
      = decide (x < N)

*Underflow polarity**: negation of the flag gives `decide (x < N)`.

theoremsqir_style_compareConst_candidate_clean_flag

theorem sqir_style_compareConst_candidate_clean_flag
    (bits q_start N x flagPos dim : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_flag : flagPos < dim)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.WellTyped dim
        (sqir_style_compareConst_candidate bits q_start N flagPos)
    ∧ Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
          (cuccaro_input_F q_start false 0 x) flagPos
        = decide (N ≤ x)

*HEADLINE — packaged SQIR-style comparator primitive (flag-only).** Combines WellTyped at the SQIR-faithful dimension with the flag-copy theorem. Workspace restoration is established structurally by construction (forward-CX-uncompute pattern) but the full per-position bit-level workspace-restoration theorem requires a "function locality" lemma not yet proved — see status note below.

theoremcuccaro_prepareConstRead_self_inverse_at

theorem cuccaro_prepareConstRead_self_inverse_at
    (bits q_start c : Nat) (f : Nat → Bool) (q : Nat) :
    Gate.applyNat (cuccaro_prepareConstRead bits q_start c)
        (Gate.applyNat (cuccaro_prepareConstRead bits q_start c) f) q
      = f q

*Prepare self-inverse (per position).**

theoremcuccaro_prepareConstRead_self_inverse

theorem cuccaro_prepareConstRead_self_inverse
    (bits q_start c : Nat) (f : Nat → Bool) :
    Gate.applyNat (cuccaro_prepareConstRead bits q_start c)
        (Gate.applyNat (cuccaro_prepareConstRead bits q_start c) f) = f

*Prepare self-inverse (function-level).**

theoremcuccaro_maj_chain_inv_commute_update_outside_workspace

theorem cuccaro_maj_chain_inv_commute_update_outside_workspace
    (bits q_start flagPos : Nat) (v : Bool)
    (f : Nat → Bool)
    (hflag_outside : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (p : Nat) (hp_lower : q_start ≤ p) (hp_upper : p < q_start + 2 * bits + 1) :
    Gate.applyNat (cuccaro_maj_chain_inv bits q_start)
        (update f flagPos v) p
      = Gate.applyNat (cuccaro_maj_chain_inv bits q_start) f p

*Locality / commute-with-outside-update**: the inverse MAJ chain commutes with updating the input at any position outside its workspace range, when queried at any workspace position.

theoremsqir_style_compareConst_candidate_workspace_restored_at

theorem sqir_style_compareConst_candidate_workspace_restored_at
    (bits q_start N flagPos : Nat) (f : Nat → Bool)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos)
    (q : Nat) (hq_lower : q_start ≤ q) (hq_upper : q < q_start + 2 * bits + 1) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos) f q
      = f q

*HEADLINE — workspace restoration (at-position).** At any workspace position `q ∈ [q_start, q_start + 2*bits]`, the SQIR-style comparator candidate restores the input value.

theoremsqir_style_compareConst_candidate_target_restored

theorem sqir_style_compareConst_candidate_target_restored
    (bits q_start N x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    ∀ i, i < bits →
      Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * i + 1)
      = x.testBit i

*Target register restored**: at each target position `q_start + 2*i + 1` for `i < bits`, the output equals `x.testBit i`.

theoremsqir_style_compareConst_candidate_read_restored

theorem sqir_style_compareConst_candidate_read_restored
    (bits q_start N x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    ∀ i, i < bits →
      Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * i + 2)
      = false

*Read register restored**: at each read position `q_start + 2*i + 2` for `i < bits`, the output equals `false`.

theoremsqir_style_compareConst_candidate_carry_in_restored

theorem sqir_style_compareConst_candidate_carry_in_restored
    (bits q_start N x flagPos : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (cuccaro_input_F q_start false 0 x) q_start = false

*Carry-in qubit restored**: at position `q_start`, the output equals `false`.

theoremsqir_style_compareConst_candidate_top_carry_restored

theorem sqir_style_compareConst_candidate_top_carry_restored
    (bits q_start N x flagPos : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
        (cuccaro_input_F q_start false 0 x) (q_start + 2 * bits) = false

*Top-carry qubit restored**: at position `q_start + 2*bits`, the output equals the input value (= `0.testBit (bits - 1)` for `a = 0`, which is `false`).

theoremsqir_style_compareConst_candidate_clean

theorem sqir_style_compareConst_candidate_clean
    (bits q_start N x flagPos : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (h_workspace : q_start + 2 * bits + 1 ≤ sqir_modmult_rev_anc bits)
    (h_flag : flagPos < sqir_modmult_rev_anc bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_compareConst_candidate bits q_start N flagPos)
    ∧ Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
          (cuccaro_input_F q_start false 0 x) flagPos
        = decide (N ≤ x)
    ∧ (∀ i, i < bits →

*HEADLINE — FULLY CLEAN SQIR-style comparator primitive.** At the SQIR-faithful dimension `3*bits + 11`: - WellTyped; - `flagPos` gets `decide (N ≤ x)`; - read register fully restored to `0`; - target register fully restored to `x.testBit`; - carry-in qubit restored to `false`; - top-carry qubit restored to `false`.

theoremsqir_style_compareConst_candidate_target_decode_restored

theorem sqir_style_compareConst_candidate_target_decode_restored
    (bits q_start N x flagPos : Nat)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits)
    (h_flag_above : q_start + 2 * bits + 1 ≤ flagPos) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos)
          (cuccaro_input_F q_start false 0 x))
      = x

*Decoded target restoration**: the decoded target register after the comparator equals `x`.

theoremsqir_style_compareConst_candidate_wellTyped_sqir_dim

theorem sqir_style_compareConst_candidate_wellTyped_sqir_dim
    (bits q_start N flagPos : Nat) (hbits : 1 ≤ bits)
    (h_workspace : q_start + 2 * bits + 1 ≤ sqir_modmult_rev_anc bits)
    (h_flag : flagPos < sqir_modmult_rev_anc bits)
    (h_distinct : flagPos ≠ q_start + 2 * bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_compareConst_candidate bits q_start N flagPos)

*WellTyped at the SQIR-faithful dimension `sqir_modmult_rev_anc bits = 3 * bits + 11`.** The SQIR-style candidate fits comfortably: it uses `q_start + 2*bits + 1` workspace + 1 flag qubit = much less than the full SQIR ancilla budget.

FormalRV.Arithmetic.Cuccaro.CuccaroSubConst

FormalRV/Arithmetic/Cuccaro/CuccaroSubConst.lean

FormalRV.BQAlgo.CuccaroSubConst — exact-budget Cuccaro subtract-constant primitive + flag-feasibility analysis. Tick 46: - Define `cuccaro_subConstGate` as add-by-two's-complement. - Prove subtract correctness via wraparound spec. - Prove arithmetic split lemmas (no-underflow vs underflow cases). - Analyze whether the clean exact-budget subtract exposes a borrow/comparison flag. Conclusion (Deliverable D, formal): the CLEAN exact-budget Cuccaro subtract-constant primitive restores all non-target ancilla to their canonical zero values. The only informative output is the target register itself, which encodes `(x + 2^bits - N) mod 2^bits` — a function that distinguishes `x < N` from `x ≥ N` via its value but NOT via any single ancilla bit. Therefore an exact-budget modular-reduction step cannot read the borrow flag from a single qubit of this gate's output; a different construction (forward-only comparator copying the top carry before reverse uncompute, or a modified primitive that reserves a flag qubit) is required for the next layer. This file does NOT extend the Cuccaro budget — it identifies the precise structural blocker for the SQIR-axiom-closure pipeline.

defcuccaro_subConstGate

def cuccaro_subConstGate (bits q_start N : Nat) : Gate

*Cuccaro subtract-constant gate** (exact-budget). Implemented as add by the two's-complement of `N`.

defcuccaro_subConstSpec

def cuccaro_subConstSpec (bits N x : Nat) : Nat

*Wraparound spec for subtract.** The target register after a subtract-constant equals `(x + (2^bits - N)) mod 2^bits`. In the non-underflow case (`x ≥ N`) this reduces to `x - N`; in the underflow case (`x < N`) it equals `x + 2^bits - N`.

theoremcuccaro_subConstSpec_of_le

theorem cuccaro_subConstSpec_of_le
    (bits N x : Nat) (hN : N ≤ 2^bits) (hx : x < 2^bits) (hle : N ≤ x) :
    cuccaro_subConstSpec bits N x = x - N

*Non-underflow case.** When `N ≤ x`, the wraparound spec reduces to integer subtraction.

theoremcuccaro_subConstSpec_of_lt

theorem cuccaro_subConstSpec_of_lt
    (bits N x : Nat) (hN : N ≤ 2^bits) (hx : x < N) :
    cuccaro_subConstSpec bits N x = x + 2^bits - N

*Underflow case.** When `x < N`, the wraparound spec equals `x + 2^bits - N`.

theoremcuccaro_subConstGate_target_decode

theorem cuccaro_subConstGate_target_decode
    (bits q_start N x : Nat) (h1N : 1 ≤ N) (hN : N ≤ 2^bits) :
    cuccaro_target_val bits q_start
      (Gate.applyNat (cuccaro_subConstGate bits q_start N)
        (cuccaro_input_F q_start false 0 x))
    = cuccaro_subConstSpec bits N x

*HEADLINE — subtract-constant target decode.** After `cuccaro_subConstGate bits q_start N` on `cuccaro_input_F q_start false 0 x`, the target register decodes to `cuccaro_subConstSpec bits N x`.

theoremcuccaro_subConstGate_wellTyped

theorem cuccaro_subConstGate_wellTyped
    (bits q_start N dim : Nat) (h : q_start + 2 * bits + 1 ≤ dim) :
    Gate.WellTyped dim (cuccaro_subConstGate bits q_start N)

*subtract-constant WellTyped.**

theoremcuccaro_subConstGate_clean

theorem cuccaro_subConstGate_clean
    (bits q_start N x : Nat) (h1N : 1 ≤ N) (hN : N ≤ 2^bits) :
    Gate.WellTyped (q_start + (2 * bits + 1))
        (cuccaro_subConstGate bits q_start N)
    ∧ cuccaro_target_val bits q_start
          (Gate.applyNat (cuccaro_subConstGate bits q_start N)
            (cuccaro_input_F q_start false 0 x))
        = cuccaro_subConstSpec bits N x
    ∧ cuccaro_read_val bits q_start
          (Gate.applyNat (cuccaro_subConstGate bits q_start N)
            (cuccaro_input_F q_start false 0 x))
        = 0

*HEADLINE — packaged clean subtract-constant primitive.** - WellTyped at dimension `q_start + (2*bits + 1)`; - target decode = `cuccaro_subConstSpec bits N x`; - read register restored to 0; - carry-in qubit restored to false.

theoremcuccaro_subConstGate_clean_state_loses_underflow_info

theorem cuccaro_subConstGate_clean_state_loses_underflow_info
    (bits q_start N x : Nat) (h1N : 1 ≤ N) (hN : N ≤ 2^bits) :
    -- Carry-in restored to false.
    (Gate.applyNat (cuccaro_subConstGate bits q_start N)
        (cuccaro_input_F q_start false 0 x) q_start = false)
    ∧
    -- Every read-register qubit is false.
    (∀ i, i < bits →
        Gate.applyNat (cuccaro_subConstGate bits q_start N)
          (cuccaro_input_F q_start false 0 x) (q_start + 2 * i + 2) = false)

*Formal blocker — clean subtract leaves NO single bit holding the borrow flag.** Every ancilla qubit within the `2*bits + 1` adder budget is restored. In particular: - the carry-in qubit at `q_start` holds `false`; - every read-register qubit at `q_start + 2*i + 2` (i < bits) holds `false`. Consequence: an exact-budget modular-reduction step that needs the borrow flag cannot extract it from any single output qubit of the clean subtract-constant gate. A different construction is required (see PROGRESS.md tick-46 status for the path forward).

theoremcuccaro_subConstSpec_underflow_range

theorem cuccaro_subConstSpec_underflow_range
    (bits N x : Nat) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    (x < N → 2^bits - N ≤ cuccaro_subConstSpec bits N x)
    ∧ (N ≤ x → cuccaro_subConstSpec bits N x < 2^bits - N)

*Target encodes borrow via Nat-order, not via a single Boolean.** In the underflow case, the target value lies in `[2^bits - N, 2^bits - 1]`; in the non-underflow case it lies in `[0, 2^bits - N - 1]`.

FormalRV.Arithmetic.Cuccaro.CuccaroUComBridge

FormalRV/Arithmetic/Cuccaro/CuccaroUComBridge.lean

FormalRV.Arithmetic.Cuccaro.CuccaroUComBridge — Cuccaro-specific corollaries of the generic Gate→UCom bridge. These lemmas instantiate the certified-optimizer semantic preservation theorems (`FormalRV.Arithmetic.GateToUCom`) at the Cuccaro MAJ/UMA gate blocks. They live here (not in `GateToUCom`) so that the generic bridge — and everything importing it, including the `Adder` interface — does not transitively pull in the Cuccaro gate definitions.

theoremuc_eval_optimize_to_fixpoint_cuccaro_MAJ

theorem uc_eval_optimize_to_fixpoint_cuccaro_MAJ {dim : Nat}
    (a b c : Nat) (h_wt : Gate.WellTyped dim (cuccaro_MAJ a b c)) :
    uc_eval (Gate.toUCom dim (optimize_to_fixpoint (cuccaro_MAJ a b c)))
      = uc_eval (Gate.toUCom dim (cuccaro_MAJ a b c))

Cuccaro MAJ: the certified optimizer preserves its semantics (which is non-trivially the majority function on bits).

theoremuc_eval_optimize_to_fixpoint_cuccaro_UMA

theorem uc_eval_optimize_to_fixpoint_cuccaro_UMA {dim : Nat}
    (a b c : Nat) (h_wt : Gate.WellTyped dim (cuccaro_UMA a b c)) :
    uc_eval (Gate.toUCom dim (optimize_to_fixpoint (cuccaro_UMA a b c)))
      = uc_eval (Gate.toUCom dim (cuccaro_UMA a b c))

Cuccaro UMA analog.

example(example)

example (a b c : Nat) :
    tcount (cuccaro_MAJ a b c) + tcount (cuccaro_UMA a b c) = 14

Documented limitation: on `seq MAJ UMA`, the natural CCX-CCX boundary between MAJ-end and UMA-start is NOT caught by the current optimizer because association blocks the pattern. `seq MAJ UMA` has shape `seq (seq ... (CCX a b c)) (seq (CCX a b c) ...)` where the two CCXs are at different nesting depths. T-count stays at 14 (= 7 + 7) without an associativity-normalizing preprocessor. Smoke test: optimizer leaves it at 14 T per `MAJ_UMA_pair_tcount`.

FormalRV.Arithmetic.Cuccaro.CuccaroVariantsResource

FormalRV/Arithmetic/Cuccaro/CuccaroVariantsResource.lean

FormalRV.Arithmetic.Cuccaro.CuccaroVariantsResource ─────────────────────────────────────────────────── THE time-resource (T-count) theorems for every DERIVED Cuccaro gadget — closing the audit gap "the variants have semantics + WellTyped but no count theorems" (arithmetic-gadget audit, 2026-06-10). Every theorem is ANCHORED: the left-hand side is the independent tree-walker `Gate.tcount` (= `Resource.countT` via the bridge in `Resource/GateCount.lean`) applied to THE SAME syntactic object the variant's semantic-correctness theorem verifies. Nothing here can cheat — the counters live in their own world and the numbers are forced by the trees. Closed forms (per `bits`-bit gadget; preparations are X/CX-only, hence T-free): cuccaro_prepareConstRead 0 cuccaro_addConstGate 14·bits (prepare ; adder ; prepare) cuccaro_subConstGate 14·bits (= addConst (2^bits − N)) cuccaro_compareConstForwardGate 7·bits (prepare ; MAJ chain) cuccaro_subConstForward/ReverseOnly 7·bits each cuccaro_maj_chain_inv 7·bits sqir_prepareMaskedConstRead 0 sqir_conditionalAdd/SubConstGate 14·bits sqir_style_compareConst_candidate 14·bits (compute ; CX ; uncompute) sqir_controlledCompareConst 14·bits sqir_style_modAddConst_skeleton 28·bits sqir_style_modAddConst_dirtyFlag 42·bits sqir_style_modAddConst_clean_* 56·bits (the ModularAdder/Cuccaro gate) sqir_style_controlledModAddConst_* 56·bits NOTE: the SQIR-chain prerequisite lemmas (prepare/conditional/comparator/ controlled-candidate counts) are PRIVATE here because `ModMult/Internal/ ToffoliCount.lean` declares public twins of the same names (it is in this file's downstream import closure via the ModularAdder umbrella). Follow-up consolidation: make THIS file the canonical home and have ModMult import it. Cross-check: the proven ModMult composite `modmult_tcount = 112·bits²` is exactly `2 × bits × 56·bits` — `bits` controlled mod-adds at `56·bits` each, forward + uncompute. The per-gadget counts here are the missing per-layer anchors beneath that composite.

theoremtcount_cuccaro_addConstGate

theorem tcount_cuccaro_addConstGate (bits q_start c : Nat) :
    tcount (cuccaro_addConstGate bits q_start c) = 14 * bits

*Add-constant T-count = 14·bits** — the same syntactic object verified by `cuccaro_addConstGate_clean`.

theoremtcount_cuccaro_subConstGate

theorem tcount_cuccaro_subConstGate (bits q_start N : Nat) :
    tcount (cuccaro_subConstGate bits q_start N) = 14 * bits

*Sub-constant T-count = 14·bits** — the object verified by `cuccaro_subConstGate_clean` (two's-complement add).

theoremtcount_cuccaro_compareConstForwardGate

theorem tcount_cuccaro_compareConstForwardGate (bits q_start N : Nat) :
    tcount (cuccaro_compareConstForwardGate bits q_start N) = 7 * bits

*Forward comparator T-count = 7·bits** — the object verified by `cuccaro_compareConstGate_top_carry`.

theoremtcount_cuccaro_subConstForwardOnlyGate

theorem tcount_cuccaro_subConstForwardOnlyGate (bits q_start N : Nat) :
    tcount (cuccaro_subConstForwardOnlyGate bits q_start N) = 7 * bits

Forward-only subtract half (= the forward comparator).

theoremtcount_cuccaro_subConstReverseOnlyGate

theorem tcount_cuccaro_subConstReverseOnlyGate (bits q_start N : Nat) :
    tcount (cuccaro_subConstReverseOnlyGate bits q_start N) = 7 * bits

Reverse-only subtract half: the reverse UMA chain (7·bits), prepare free.

theoremtcount_cuccaro_maj_chain_inv

private theorem tcount_cuccaro_maj_chain_inv (n q_start : Nat) :
    tcount (cuccaro_maj_chain_inv n q_start) = 7 * n

theoremtcount_sqir_style_compareConst_candidate

private theorem tcount_sqir_style_compareConst_candidate (bits q_start N flagPos : Nat) :
    tcount (sqir_style_compareConst_candidate bits q_start N flagPos) = 14 * bits

*SQIR comparator T-count = 14·bits** — the object verified by `sqir_style_compareConst_candidate_flag` / `…_workspace_restored_at`.

theoremtcount_sqir_controlledCompareConst

private theorem tcount_sqir_controlledCompareConst (bits q_start c controlIdx flagPos : Nat) :
    tcount (sqir_controlledCompareConst bits q_start c controlIdx flagPos) = 14 * bits

Controlled comparator (masked prepares): same `14·bits`.

theoremtcount_sqir_conditionalAddConstGate

private theorem tcount_sqir_conditionalAddConstGate (bits q_start N flagPos : Nat) :
    tcount (sqir_conditionalAddConstGate bits q_start N flagPos) = 14 * bits

*Conditional add-constant T-count = 14·bits** (masked prepare ; adder ; masked unprepare) — the object whose WellTyped is `sqir_conditionalAddConstGate_wellTyped`.

theoremtcount_sqir_conditionalSubConstGate

private theorem tcount_sqir_conditionalSubConstGate (bits q_start N flagPos : Nat) :
    tcount (sqir_conditionalSubConstGate bits q_start N flagPos) = 14 * bits

theoremtcount_sqir_style_modAddConst_skeleton

theorem tcount_sqir_style_modAddConst_skeleton (bits q_start N c flagPos : Nat) :
    tcount (sqir_style_modAddConst_skeleton bits q_start N c flagPos) = 28 * bits

Skeleton (add ; compare): `14 + 14 = 28·bits`.

theoremtcount_sqir_style_modAddConst_dirtyFlag_candidate

theorem tcount_sqir_style_modAddConst_dirtyFlag_candidate
    (bits q_start N c flagPos : Nat) :
    tcount (sqir_style_modAddConst_dirtyFlag_candidate bits q_start N c flagPos)
      = 42 * bits

Dirty-flag pipeline (add ; compare ; conditional-sub): `42·bits` — the object verified by `sqir_style_modAddConst_dirtyFlag_clean_except_flag`.

theoremtcount_sqir_style_modAddConst_clean_candidate

theorem tcount_sqir_style_modAddConst_clean_candidate
    (bits q_start N c flagPos : Nat) :
    tcount (sqir_style_modAddConst_clean_candidate bits q_start N c flagPos)
      = 56 * bits

Clean pipeline (dirty ; flag-uncompute compare ; X): `56·bits` — the object verified by `sqir_style_modAddConst_clean_candidate_clean`.

theoremtcount_sqir_style_modAddConst_clean_gate

theorem tcount_sqir_style_modAddConst_clean_gate (bits N c : Nat) :
    tcount (sqir_style_modAddConst_clean_gate bits N c)
      = if c = 0 then 0 else 56 * bits

*THE clean modular add-constant gate (= `ModularAdder/Cuccaro`'s gate): T-count = 56·bits** (0 in the dispatched `c = 0` identity case) — the object verified by `sqir_style_modAddConst_clean_gate`'s correctness bundle.

theoremtcount_sqir_style_controlledModAddConst_candidate

private theorem tcount_sqir_style_controlledModAddConst_candidate
    (bits q_start N c controlIdx flagPos : Nat) :
    tcount (sqir_style_controlledModAddConst_candidate bits q_start N c controlIdx flagPos)
      = 56 * bits

Controlled candidate (cond-add ; compare ; cond-sub ; ctrl-compare ; CX): `14·4 = 56·bits` — the object verified by `sqir_style_controlledModAddConst_candidate_clean_qstart`.

theoremtcount_sqir_style_controlledModAddConst_gate

theorem tcount_sqir_style_controlledModAddConst_gate
    (bits q_start N c controlIdx flagPos : Nat) :
    tcount (sqir_style_controlledModAddConst_gate bits q_start N c controlIdx flagPos)
      = if c = 0 then 0 else 56 * bits

*THE controlled clean modular add-constant gate: T-count = 56·bits** (0 for `c = 0`). ModMult's `112·bits²` = `2 × bits ×` THIS count — the per-layer anchor beneath the proven composite.

example(example)

example : tcount (cuccaro_addConstGate 4 0 11) = 56

example(example)

example : tcount (cuccaro_compareConstForwardGate 4 0 11) = 28

example(example)

example : tcount (sqir_style_modAddConst_clean_gate 3 5 2) = 168

FormalRV.Arithmetic.GateShift

FormalRV/Arithmetic/GateShift.lean

FormalRV.Arithmetic.GateShift — relabel every qubit of a `Gate` by a constant base offset `+b`, with the transport law for `Gate.applyNat`, count-invariance, and the disjointness frame. A general reusable utility for placing a circuit at an arbitrary base in a wider register (the qubit-index analogue of the `Adder`'s `q_start` parameter, but for ANY `Gate`). Its purpose here is the CFS multi-register fold: `shiftGate (j·width) residueGate` puts residue register `j` at its own disjoint base, and the transport law lets `residueGate_verified` (proven at base 0) carry to every base — "reuse-via-transport", no re-derivation of the windowed multiplier internals. Key lemmas: `applyNat_shiftGate` — the shifted gate, at index `i ≥ b`, acts like `g` on the down-shifted register `(·+b)`; at `i < b` it is the identity. `tcount_shiftGate` — relabeling preserves the T/Toffoli count. `shiftGate_frame` — the shifted gate fixes every qubit below `b` (left disjointness).

defshiftGate

def shiftGate (b : Nat) : Gate → Gate
  | Gate.I            => Gate.I
  | Gate.X q          => Gate.X (q + b)
  | Gate.CX c t       => Gate.CX (c + b) (t + b)
  | Gate.CCX a c d    => Gate.CCX (a + b) (c + b) (d + b)
  | Gate.seq g₁ g₂    => Gate.seq (shiftGate b g₁) (shiftGate b g₂)

Relabel every qubit index of `g` by `+b`.

theoremtcount_shiftGate

theorem tcount_shiftGate (b : Nat) (g : Gate) : tcount (shiftGate b g) = tcount g

Relabeling preserves the T-count (only `CCX` count, index-independent).

theoremapplyNat_shiftGate

theorem applyNat_shiftGate (b : Nat) (g : Gate) :
    ∀ (f : Nat → Bool) (i : Nat),
      Gate.applyNat (shiftGate b g) f i
        = if b ≤ i then Gate.applyNat g (fun j => f (j + b)) (i - b) else f i

*The transport law.** The base-`b`-shifted gate acts on every index `i`: for `i ≥ b` it is `g` applied to the down-shifted register `(·+b)` read at `i - b`; for `i < b` it leaves `f i` untouched. Proven by induction on `g` (each `update` becomes an `ite`).

theoremshiftGate_frame

theorem shiftGate_frame (b : Nat) (g : Gate) (f : Nat → Bool) (i : Nat) (hi : i < b) :
    Gate.applyNat (shiftGate b g) f i = f i

*Left disjointness frame.** The base-`b`-shifted gate fixes every qubit strictly below `b`.

theoremapplyNat_shiftGate_at

theorem applyNat_shiftGate_at (b : Nat) (g : Gate) (f : Nat → Bool) (i : Nat) :
    Gate.applyNat (shiftGate b g) f (i + b) = Gate.applyNat g (fun j => f (j + b)) i

The shifted gate, read at a shifted index, is the down-shifted action — the clean corollary the multi-register fold consumes.

theoremapplyNat_congr_lt

theorem applyNat_congr_lt (dim : Nat) :
    ∀ (g : Gate), Gate.WellTyped dim g → ∀ (f f' : Nat → Bool),
      (∀ p, p < dim → f p = f' p) → ∀ q, q < dim → Gate.applyNat g f q = Gate.applyNat g f' q

*Input locality.** A `dim`-well-typed gate's action on every qubit `< dim` depends ONLY on the input restricted to `[0, dim)` — it never reads or writes outside its declared register. Proven by induction over the `Gate` IR, using that `WellTyped` bounds every qubit index by `dim`. This is what lets a circuit placed in a wider register (the CFS multi-register fold) be reasoned about from its own block alone, even though the surrounding qubits hold other registers' data.

FormalRV.Arithmetic.GateToUCom

FormalRV/Arithmetic/GateToUCom.lean

FormalRV.BQAlgo.GateToUCom — translation from the BQ-Algo `Gate` IR (used for cost accounting + optimization) to the Framework `BaseUCom` (used for semantic reasoning). The translation is faithful in the obvious sense: each `Gate` constructor maps to its `BaseUCom` analog, and `seq` becomes `UCom.seq`. This enables lifting BQ-Algo optimization theorems (tcount/gcount monotonicity) to BaseUCom semantic-preservation proofs via the existing `Framework` layer. Status: translation function + structural unfolding lemmas. Semantic preservation theorems (e.g., `uc_eval (toUCom (optimize_full g)) = uc_eval (toUCom g)`) are the natural next milestones.

defGate.toUCom

noncomputable def Gate.toUCom (dim : Nat) : Gate → BaseUCom dim
  | Gate.I            => BaseUCom.ID 0
  | Gate.X q          => BaseUCom.X q
  | Gate.CX c t       => BaseUCom.CNOT c t
  | Gate.CCX a b c    => BaseUCom.CCX a b c
  | Gate.seq g₁ g₂    => UCom.seq (Gate.toUCom dim g₁) (Gate.toUCom dim g₂)

Translate a BQ-Algo `Gate` into a `BaseUCom dim`. Identity, X, CNOT, and Toffoli have direct analogs; `Gate.seq` becomes `UCom.seq`. Marked `noncomputable` because `BaseUCom` carries real-valued matrix data downstream.

theoremuc_eval_toUCom_optimize_ccx_pair_top_pair

theorem uc_eval_toUCom_optimize_ccx_pair_top_pair {dim : Nat} (a b c : Nat)
    (h0 : 0 < dim)
    (ha : a < dim) (hb : b < dim) (hc : c < dim)
    (hab : a ≠ b) (hac : a ≠ c) (hbc : b ≠ c) :
    uc_eval (Gate.toUCom dim
      (optimize_ccx_pair_top (Gate.seq (Gate.CCX a b c) (Gate.CCX a b c))))
      = uc_eval (Gate.toUCom dim (Gate.seq (Gate.CCX a b c) (Gate.CCX a b c)))

Semantic preservation of the top-level CCX-pair rewrite on the matching-triple case. uc_eval of the optimized output (which is `BaseUCom.ID 0`) equals uc_eval of the input (`UCom.seq CCX CCX`) — both reduce to the identity matrix.

theoremuc_eval_toUCom_optimize_ccx_pair_top_I

theorem uc_eval_toUCom_optimize_ccx_pair_top_I {dim : Nat} :
    uc_eval (Gate.toUCom dim (optimize_ccx_pair_top Gate.I))
      = uc_eval (Gate.toUCom dim Gate.I)

theoremuc_eval_toUCom_optimize_ccx_pair_top_X

theorem uc_eval_toUCom_optimize_ccx_pair_top_X {dim q : Nat} :
    uc_eval (Gate.toUCom dim (optimize_ccx_pair_top (Gate.X q)))
      = uc_eval (Gate.toUCom dim (Gate.X q))

theoremuc_eval_toUCom_optimize_ccx_pair_top_CX

theorem uc_eval_toUCom_optimize_ccx_pair_top_CX {dim a b : Nat} :
    uc_eval (Gate.toUCom dim (optimize_ccx_pair_top (Gate.CX a b)))
      = uc_eval (Gate.toUCom dim (Gate.CX a b))

theoremuc_eval_toUCom_optimize_ccx_pair_top_CCX

theorem uc_eval_toUCom_optimize_ccx_pair_top_CCX {dim a b c : Nat} :
    uc_eval (Gate.toUCom dim (optimize_ccx_pair_top (Gate.CCX a b c)))
      = uc_eval (Gate.toUCom dim (Gate.CCX a b c))

theoremuc_eval_toUCom_optimize_ccx_pair_top_pair_diff

theorem uc_eval_toUCom_optimize_ccx_pair_top_pair_diff {dim : Nat}
    (a b c a' b' c' : Nat) (h : ¬ (a = a' ∧ b = b' ∧ c = c')) :
    uc_eval (Gate.toUCom dim
      (optimize_ccx_pair_top (Gate.seq (Gate.CCX a b c) (Gate.CCX a' b' c'))))
      = uc_eval (Gate.toUCom dim (Gate.seq (Gate.CCX a b c) (Gate.CCX a' b' c')))

When the two CCXs have differing triples, the optimizer leaves the circuit unchanged.

defGate.WellTyped

def Gate.WellTyped (dim : Nat) : Gate → Prop
  | Gate.I            => 0 < dim
  | Gate.X q          => q < dim
  | Gate.CX a b       => a < dim ∧ b < dim ∧ a ≠ b
  | Gate.CCX a b c    => a < dim ∧ b < dim ∧ c < dim ∧ a ≠ b ∧ a ≠ c ∧ b ≠ c
  | Gate.seq g₁ g₂    => Gate.WellTyped dim g₁ ∧ Gate.WellTyped dim g₂

A `Gate` is well-typed in `dim`-qubit context iff every contained gate-position is within `dim` and CCXs have distinct controls/target.

theoremuc_eval_toUCom_optimize_ccx_pair_top

theorem uc_eval_toUCom_optimize_ccx_pair_top {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    uc_eval (Gate.toUCom dim (optimize_ccx_pair_top g))
      = uc_eval (Gate.toUCom dim g)

*Unified semantic preservation** for the top-level CCX-pair rewrite. Combines the matching-pair case (uses CCX_CCX_eq_one) with all no-op cases (rfl + if_neg).

theoremGate.WellTyped_optimize_ccx_pair_top

theorem Gate.WellTyped_optimize_ccx_pair_top {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    Gate.WellTyped dim (optimize_ccx_pair_top g)

Well-typedness is preserved by the top-level CCX-pair rewrite. The interesting case: when the optimizer fires on a CCX-CCX pair, the output `I` requires `0 < dim`, which we can extract from the inner CCXs' well-typedness.

theoremGate.WellTyped_optimize_ccx_pairs_deep

theorem Gate.WellTyped_optimize_ccx_pairs_deep {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    Gate.WellTyped dim (optimize_ccx_pairs_deep g)

Well-typedness is preserved by the deep optimizer. Inductive on `g`.

theoremuc_eval_toUCom_optimize_ccx_pairs_deep

theorem uc_eval_toUCom_optimize_ccx_pairs_deep {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    uc_eval (Gate.toUCom dim (optimize_ccx_pairs_deep g))
      = uc_eval (Gate.toUCom dim g)

*Semantic preservation for the deep optimizer.** Inductive on `g`, using the top-level unified theorem at each `seq` step.

theoremuc_eval_toUCom_optimize_I_top

theorem uc_eval_toUCom_optimize_I_top {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    uc_eval (Gate.toUCom dim (optimize_I_top g))
      = uc_eval (Gate.toUCom dim g)

Semantic preservation for the top-level I-elimination rewrite. The interesting cases: `seq I g → g` and `seq g I → g`. Both use `uc_eval_ID_eq_one`.

theoremGate.WellTyped_optimize_I_top

theorem Gate.WellTyped_optimize_I_top {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    Gate.WellTyped dim (optimize_I_top g)

Well-typedness is preserved by the top-level I-elimination rewrite. `seq I g → g` and `seq g I → g` only drop an I (which is well-typed iff 0 < dim, propagated from any inner CCX or the seq's other half).

theoremGate.WellTyped_optimize_I_pairs_deep

theorem Gate.WellTyped_optimize_I_pairs_deep {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    Gate.WellTyped dim (optimize_I_pairs_deep g)

Well-typedness is preserved by the deep I-elimination optimizer.

theoremuc_eval_toUCom_optimize_I_pairs_deep

theorem uc_eval_toUCom_optimize_I_pairs_deep {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    uc_eval (Gate.toUCom dim (optimize_I_pairs_deep g))
      = uc_eval (Gate.toUCom dim g)

Semantic preservation for the deep I-elimination optimizer.

theoremuc_eval_toUCom_optimize_full

theorem uc_eval_toUCom_optimize_full {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    uc_eval (Gate.toUCom dim (optimize_full g))
      = uc_eval (Gate.toUCom dim g)

*Semantic preservation for the full optimizer.** Compose the CCX-deep and I-deep preservations.

theoremGate.WellTyped_optimize_full

theorem Gate.WellTyped_optimize_full {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    Gate.WellTyped dim (optimize_full g)

Well-typedness is preserved by `optimize_full`. Compose the two deep preservations.

theoremGate.WellTyped_optimize_to_fixpoint

theorem Gate.WellTyped_optimize_to_fixpoint {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    Gate.WellTyped dim (optimize_to_fixpoint g)

Well-typedness is preserved by the WF-recursive fixpoint operator. Same shape as the cost-monotonicity proofs: case-split on `has_ccx_pair g`, recurse via `_eq_recurse_of_pair`, base case via `_eq_self_of_no_pair`.

theoremuc_eval_toUCom_optimize_to_fixpoint

theorem uc_eval_toUCom_optimize_to_fixpoint {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    uc_eval (Gate.toUCom dim (optimize_to_fixpoint g))
      = uc_eval (Gate.toUCom dim g)

*Semantic preservation for the WF-recursive fixpoint operator.** Closes the certification stack: the unfueled `optimize_to_fixpoint` is formally proven to terminate, produce pair-free output, decrease both tcount and gcount monotonically, AND preserve uc_eval.

theoremoptimize_to_fixpoint_uc_equiv

theorem optimize_to_fixpoint_uc_equiv {dim : Nat}
    (g : Gate) (h_wt : Gate.WellTyped dim g) :
    UCom.equiv (Gate.toUCom dim (optimize_to_fixpoint g))
               (Gate.toUCom dim g)

UCom.equiv-form of the WF-fixpoint preservation. Clients reasoning in the UCom semantic layer can use this directly.

theoremuc_eval_toUCom_assoc_right_step

theorem uc_eval_toUCom_assoc_right_step {dim : Nat} (g : Gate) :
    uc_eval (Gate.toUCom dim (assoc_right_step g))
      = uc_eval (Gate.toUCom dim g)

The single-step associativity rotation `assoc_right_step` preserves `uc_eval` semantics. Reduces to `Matrix.mul_assoc` after unfolding `UCom.seq`'s right-to-left matrix multiplication.

theoremassoc_right_step_uc_equiv

theorem assoc_right_step_uc_equiv {dim : Nat} (g : Gate) :
    UCom.equiv (Gate.toUCom dim (assoc_right_step g))
               (Gate.toUCom dim g)

UCom.equiv form: the rotation produces an equivalent circuit.

theoremuc_eval_toUCom_assoc_right_iter

theorem uc_eval_toUCom_assoc_right_iter {dim : Nat} (n : Nat) (g : Gate) :
    uc_eval (Gate.toUCom dim (assoc_right_iter n g))
      = uc_eval (Gate.toUCom dim g)

Iterated rotation preserves uc_eval. Induction on fuel + each step's semantic preservation.

FormalRV.Arithmetic.MCPBridge

FormalRV/Arithmetic/MCPBridge.lean

FormalRV.BQAlgo.MCPBridge — promotion of a Gate-IR Boolean semantics into the `MultiplyCircuitProperty` shape required by `SQIRPort/Shor.lean`. This module imports both `BQAlgo.Correctness` (the structural `Gate.applyNat` → `f_to_vec` adapter) and `SQIRPort.Shor` (the declarations of `uc_eval` and `MultiplyCircuitProperty`). The single exported theorem is `toUCom_satisfies_MultiplyCircuitProperty_of_applyNat`: given a `Gate` IR term `g` together with an encoding `encode` of data-register inputs into bit-functions, and proofs that (a) `f_to_vec (n+anc) (encode x) = basis_vector … (x · 2^anc)`, and (b) `f_to_vec (n+anc) (Gate.applyNat g (encode x)) = basis_vector … ((a · x mod N) · 2^anc)`, conclude that `Gate.toUCom (n+anc) g` satisfies `MultiplyCircuitProperty a N n anc`. This is the exact statement consumed by `f_modmult_circuit_MMI`.

theoremtoUCom_satisfies_MultiplyCircuitProperty_of_applyNat

theorem toUCom_satisfies_MultiplyCircuitProperty_of_applyNat
    {a N n anc : Nat} {g : Gate}
    (h_wt : Gate.WellTyped (n + anc) g)
    (encode : Nat → (Nat → Bool))
    (h_input_encoded :
      ∀ x : Nat, x < N →
        f_to_vec (n + anc) (encode x)
          = FormalRV.Framework.basis_vector (2^(n+anc)) (x * 2^anc))
    (h_output_encoded :
      ∀ x : Nat, x < N →
        f_to_vec (n + anc) (Gate.applyNat g (encode x))
          = FormalRV.Framework.basis_vector (2^(n+anc))

*Gate IR ⟹ `MultiplyCircuitProperty` promotion.** Given a well-typed `Gate` term `g` on `n+anc` qubits, plus an encoding `encode : Nat → (Nat → Bool)` of inputs as bit-functions that (i) Boolean-encodes `x` as the basis state `|x · 2^anc⟩` (data register holds `x`, ancilla holds 0), and (ii) under the Gate IR's Boolean semantics, `g`'s action takes the encoded input to the encoded image `|(a · x mod N) · 2^anc⟩`, the compiled `Gate.toUCom (n+anc) g` satisfies `MultiplyCircuitProperty a N n anc`. This is the exact precondition demanded by `f_modmult_circuit_MMI` in `SQIRPort/Shor.lean`; once a constructive `Gate`-level modular multiplier `g_modmult` is supplied with the two encoding lemmas (i)/(ii), the axiom can be discharged by `toUCom_satisfies_MultiplyCircuitProperty_of_applyNat`.

theoremtoUCom_satisfies_MultiplyCircuitProperty_of_applyNat_ext

theorem toUCom_satisfies_MultiplyCircuitProperty_of_applyNat_ext
    {a N n anc : Nat} {g : Gate}
    (h_wt : Gate.WellTyped (n + anc) g)
    (encode : Nat → (Nat → Bool))
    (h_encode :
      ∀ y : Nat, y < N →
        f_to_vec (n + anc) (encode y)
          = FormalRV.Framework.basis_vector (2^(n+anc)) (y * 2^anc))
    (h_apply :
      ∀ x : Nat, x < N →
        Gate.applyNat g (encode x) = encode ((a * x) % N)) :
    FormalRV.SQIRPort.MultiplyCircuitProperty a N n anc

*Extensional (purely Boolean) Gate IR ⟹ `MultiplyCircuitProperty`.** This is the cleanest user-facing adapter for discharging `f_modmult_circuit_MMI`: the output obligation is now a *purely Boolean function equality* `Gate.applyNat g (encode x) = encode ((a * x) % N)` which contains no matrix, vector, or `f_to_vec` machinery. The matrix-level lift is entirely handled inside this theorem by appealing to `toUCom_satisfies_MultiplyCircuitProperty_of_applyNat` and `h_encode`. The only encoding-level hypothesis required is `h_encode` (single direction: bit-function → basis-vector at packed index `y * 2^anc`), which only has to be proved *once* for the chosen encoding scheme — not separately for every `x` and every image `(a * x) % N`. No extra side condition such as `0 < N` is needed: the bound is extracted from `x < N` via `Nat.lt_of_le_of_lt (Nat.zero_le _) hxN`, and then `(a * x) % N < N` follows from `Nat.mod_lt`.

defencodeDataZeroAnc

def encodeDataZeroAnc (n anc : Nat) (x : Nat) : Nat → Bool

Canonical Boolean encoding of the input register for the modular multiplier on `n` data qubits + `anc` ancilla qubits. Defined as `nat_to_funbool (n + anc) (x * 2^anc)`: the bit-function that produces the basis state `|x⟩|0_anc⟩` at index `x · 2^anc` via the big-endian `funbool_to_nat` convention.

theoremf_to_vec_encodeDataZeroAnc

theorem f_to_vec_encodeDataZeroAnc {n anc y : Nat} (hy : y < 2^n) :
    f_to_vec (n + anc) (encodeDataZeroAnc n anc y)
      = FormalRV.Framework.basis_vector (2^(n+anc)) (y * 2^anc)

The canonical encoding produces the basis state at index `y · 2^anc`. Direct specialisation of `basis_vector_eq_f_to_vec_nat` for the `x · 2^anc` family of indices. The required bound `y * 2^anc < 2^(n+anc)` follows from `y < 2^n` via `Nat.mul_lt_mul_of_pos_right`.

theoremencodeDataZeroAnc_data

theorem encodeDataZeroAnc_data
    {n anc x i : Nat}
    (_hx : x < 2^n) (hi : i < n) :
    encodeDataZeroAnc n anc x i
      = FormalRV.Framework.nat_to_funbool n x i

*Data-bit accessor.** For positions `i < n`, the canonical encoding's bit equals the big-endian `nat_to_funbool n x i`. Proof: `n + anc - 1 - i = (n - 1 - i) + anc`, and dividing `x * 2^anc` by `2^((n-1-i)+anc)` cancels `2^anc` via `Nat.mul_div_mul_right`.

theoremencodeDataZeroAnc_anc

theorem encodeDataZeroAnc_anc
    {n anc x j : Nat}
    (_hx : x < 2^n) (hj : j < anc) :
    encodeDataZeroAnc n anc x (n + j) = false

*Ancilla zero accessor.** For positions `n + j` with `j < anc`, the canonical encoding's bit is `false`. Proof: `n + anc - 1 - (n + j) = anc - 1 - j`, and `x * 2^anc / 2^(anc-1-j) = x * 2^(j+1)` (which is even, so `% 2 = 0`, so `decide (… = 1) = false`).

theoremencodeDataZeroAnc_ext

theorem encodeDataZeroAnc_ext
    {n anc x y : Nat}
    (hx : x < 2^n) (hy : y < 2^n)
    (hdata :
      ∀ i, i < n →
        encodeDataZeroAnc n anc x i = encodeDataZeroAnc n anc y i) :
    x = y

*Extensional injectivity of `encodeDataZeroAnc` on data bits.** If the data positions `0..n-1` of `encodeDataZeroAnc n anc x` and `encodeDataZeroAnc n anc y` agree pointwise, and both `x, y < 2^n`, then `x = y`. Proof: combine `encodeDataZeroAnc_data` (data bits = `nat_to_funbool n _`), `funbool_to_nat_congr` (agreement on `[0, n)` ⇒ same `funbool_to_nat`), and `funbool_to_nat_nat_to_funbool` (left inverse on `x < 2^n`).

theoremencodeDataZeroAnc_oob

theorem encodeDataZeroAnc_oob
    {n anc x i : Nat}
    (hanc_pos : 0 < anc) (hi : n + anc ≤ i) :
    encodeDataZeroAnc n anc x i = false

*Out-of-range accessor.** For positions `i ≥ n + anc`, the canonical encoding's bit is `false`, provided `anc ≥ 1`. Proof: the saturating Nat truncation gives `(n + anc) - 1 - i = 0`, so the value reduces to `decide ((x · 2^anc) % 2 = 1)`; `2^anc % 2 = 0` for `anc ≥ 1`, so the product is even and the decide returns `false`.

theoremeq_encodeDataZeroAnc_of_data_anc_oob

theorem eq_encodeDataZeroAnc_of_data_anc_oob
    {n anc y : Nat} {f : Nat → Bool}
    (hanc_pos : 0 < anc)
    (hy : y < 2^n)
    (hdata :
      ∀ i, i < n →
        f i = FormalRV.Framework.nat_to_funbool n y i)
    (hanc :
      ∀ j, j < anc →
        f (n + j) = false)
    (hoob :
      ∀ i, n + anc ≤ i →

*Full function reconstruction.** Any bit-function `f : Nat → Bool` that agrees with `nat_to_funbool n y` on the data band `[0, n)`, is `false` on the ancilla band `[n, n + anc)`, and is `false` outside `[0, n + anc)`, equals `encodeDataZeroAnc n anc y` as a function (under `0 < anc` and `y < 2^n`). Proved by `funext` + the three accessor lemmas `encodeDataZeroAnc_data`, `encodeDataZeroAnc_anc`, and `encodeDataZeroAnc_oob`. This is exactly the shape future modmult-correctness proofs need: the conclusion is the **function** equality `f = encodeDataZeroAnc n anc y`, not just pointwise equality on a finite band. Conversely, the hypotheses are local bit-by-bit statements that gate-IR correctness proofs naturally produce.

theoremGate.applyNat_eq_encodeDataZeroAnc_of_data_anc

theorem Gate.applyNat_eq_encodeDataZeroAnc_of_data_anc
    {n anc y : Nat} {g : Gate} {input : Nat → Bool}
    (hanc_pos : 0 < anc) (hy : y < 2^n)
    (h_wt : Gate.WellTyped (n + anc) g)
    (hdata :
      ∀ i, i < n →
        Gate.applyNat g input i
          = FormalRV.Framework.nat_to_funbool n y i)
    (hanc :
      ∀ j, j < anc →
        Gate.applyNat g input (n + j) = false)
    (hinput_oob :

*`Gate.applyNat`-specific wrapper of `eq_encodeDataZeroAnc_of_data_anc_oob`.** For a well-typed `Gate` on `n + anc` qubits, applied to an input function whose OOB region (positions `i ≥ n + anc`) is already zero, pointwise agreement on the data band `[0, n)` and on the ancilla band `[n, n + anc)` suffices to conclude the *function* equality `Gate.applyNat g input = encodeDataZeroAnc n anc y`. The OOB branch of the reconstruction is discharged automatically by `Gate.applyNat_oob` together with the user-supplied `hinput_oob`. This is exactly the shape downstream modmult-correctness proofs will produce: data-region semantic correctness of the arithmetic circuit plus ancilla-restoration of the workspace, then this lemma packages them into the function equality consumed by `toUCom_satisfies_MultiplyCircuitProperty_of_applyNat_encodeDataZeroAnc`.

theoremtoUCom_satisfies_MultiplyCircuitProperty_of_applyNat_encodeDataZeroAnc

theorem toUCom_satisfies_MultiplyCircuitProperty_of_applyNat_encodeDataZeroAnc
    {a N n anc : Nat} {g : Gate}
    (h_wt : Gate.WellTyped (n + anc) g)
    (hN : N ≤ 2^n)
    (h_apply :
      ∀ x : Nat, x < N →
        Gate.applyNat g (encodeDataZeroAnc n anc x)
          = encodeDataZeroAnc n anc ((a * x) % N)) :
    FormalRV.SQIRPort.MultiplyCircuitProperty a N n anc
      (Gate.toUCom (n + anc) g)

*Encoding-specific `MultiplyCircuitProperty` adapter.** Instantiates `toUCom_satisfies_MultiplyCircuitProperty_of_applyNat_ext` with the canonical `encodeDataZeroAnc` encoding. The user-side hypothesis reduces to the purely Boolean equality `Gate.applyNat g (encodeDataZeroAnc n anc x) = encodeDataZeroAnc n anc ((a * x) % N)`, with the additional bound `N ≤ 2^n` (necessary for `y < N` to imply `y < 2^n` so the encoding theorem applies). All matrix-vector machinery, all bit-order convention, and all index arithmetic are now hidden inside this theorem; downstream Boolean modmult correctness proofs need only reason about `Gate.applyNat`.

theoremuc_well_typed_toUCom_of_Gate_WellTyped

theorem uc_well_typed_toUCom_of_Gate_WellTyped
    (dim : Nat) (g : Gate) (h : Gate.WellTyped dim g) :
    FormalRV.SQIRPort.uc_well_typed (Gate.toUCom dim g)

*General `Gate.WellTyped` ⟹ `uc_well_typed (Gate.toUCom ...)` bridge.** For any `Gate` IR term `g`, structural well-typedness at dimension `dim` implies the compiled `BaseUCom` is well-typed at `dim`. Proven by structural induction on `g`.

theoremf_modmult_gate_family_uc_well_typed

theorem f_modmult_gate_family_uc_well_typed
    (bits N a multBits : Nat) (hbits : 1 ≤ bits) :
    ∀ i, FormalRV.SQIRPort.uc_well_typed
            (Gate.toUCom (multBits + (adder_n_qubits (bits + 1) + 1))
              (f_modmult_gate_family bits N a multBits i))

*`f_modmult_gate_family` is `uc_well_typed` at every iterate.** The analog of `f_modmult_circuit_uc_well_typed` for our gate family (at the Shor-compatible total dimension `multBits + (adder_n_qubits (bits+1) + 1)`). Note: this discharges the well-typedness obligation for OUR family, not directly for the SQIR-derived `f_modmult_circuit` (which is itself a top-level axiom; see QUESTIONS.md 2026-05-28 03:24 for the in-place/layout gap analysis).

theoremreverse_register_swap_encodeDataZeroAnc_to_mult_state_init

theorem reverse_register_swap_encodeDataZeroAnc_to_mult_state_init
    (bits multBits x : Nat) (hbits : 1 ≤ bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_multBits_pos : 0 < multBits)
    (hx : x < 2^multBits) :
    Gate.applyNat
      (reverse_register_swap multBits 0 (adder_n_qubits (bits + 1)))
      (encodeDataZeroAnc multBits (adder_n_qubits (bits + 1) + 1) x)
    = mult_state_init bits multBits x

*HEADLINE: Reverse SWAP converts `encodeDataZeroAnc` to `mult_state_init`.** Applied to `encodeDataZeroAnc multBits (adder_n_qubits (bits+1) + 1) x`, the reverse-pairing SWAP between positions `[0, multBits)` and `[adder_n_qubits, adder_n_qubits + multBits)` produces `mult_state_init bits multBits x`.

theoremreverse_register_swap_involution

theorem reverse_register_swap_involution
    (bits multBits : Nat) (hbits : 1 ≤ bits)
    (h_multBits_le : multBits ≤ bits + 1) (f : Nat → Bool) :
    Gate.applyNat (reverse_register_swap multBits 0 (adder_n_qubits (bits + 1)))
      (Gate.applyNat (reverse_register_swap multBits 0 (adder_n_qubits (bits + 1))) f)
    = f

*Reverse-pairing SWAP is involutive.** Applying `reverse_register_swap multBits 0 (adder_n_qubits (bits+1))` twice returns the original state. This follows from the at_A/_at_B position-level lemmas: each A-side position swaps to its B-side partner and back, and other positions are untouched.

theoremreverse_register_swap_mult_state_init_to_encodeDataZeroAnc

theorem reverse_register_swap_mult_state_init_to_encodeDataZeroAnc
    (bits multBits y : Nat) (hbits : 1 ≤ bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_multBits_pos : 0 < multBits)
    (hy : y < 2^multBits) :
    Gate.applyNat
      (reverse_register_swap multBits 0 (adder_n_qubits (bits + 1)))
      (mult_state_init bits multBits y)
    = encodeDataZeroAnc multBits (adder_n_qubits (bits + 1) + 1) y

*Converse bridge: `mult_state_init` → `encodeDataZeroAnc`.** By involution applied to the forward bridge: since `reverse_register_swap` is involutive and converts encodeDataZeroAnc x to mult_state_init x, applying it once more to mult_state_init x yields encodeDataZeroAnc x.

theoremBasicSetting_intro

theorem BasicSetting_intro
    (a r N m n : Nat)
    (h_a_pos : 0 < a) (h_a_lt : a < N)
    (h_ord : FormalRV.SQIRPort.Order a r N)
    (h_m_lo : N^2 < 2^m) (h_m_hi : 2^m ≤ 2 * N^2)
    (h_n_lo : N < 2^n) (h_n_hi : 2^n ≤ 2 * N) :
    FormalRV.SQIRPort.BasicSetting a r N m n

*Constructor for `BasicSetting`.** Bundles the four component conditions into the single anonymous-constructor form.

theoremcoprime_two_of_odd

theorem coprime_two_of_odd (N : Nat) (h_odd : Odd N) : Nat.Coprime 2 N

*`Nat.Coprime 2 N` from `Odd N`.** Direct invocation of `Odd.coprime_two_left`. Useful for users who think of "N odd" rather than "gcd(2, N) = 1".

theoremcoprime_two_iff_odd

theorem coprime_two_iff_odd (N : Nat) : Nat.Coprime 2 N ↔ Odd N

*`Nat.Coprime 2 N` iff `Odd N`.**

theoremBasicSetting_at_canonical_dim

theorem BasicSetting_at_canonical_dim
    (N a r : Nat) (h_N_gt_one : 1 < N)
    (h_a_pos : 0 < a) (h_a_lt : a < N)
    (h_ord : FormalRV.SQIRPort.Order a r N) :
    FormalRV.SQIRPort.BasicSetting a r N
      (Nat.log2 (2 * N^2)) (Nat.log2 (2 * N))

*`BasicSetting` at canonical Shor dimensions.** For any `1 < N`, `0 < a < N`, and `Order a r N`, the `BasicSetting` predicate holds at `m := Nat.log2 (2 * N^2)` and `n := Nat.log2 (2 * N)`. This packages the log2-bound derivations used by Shor's canonical-dim theorems for reuse.

theoremmodMultInPlaceShor_qubit_count_at_canonical

theorem modMultInPlaceShor_qubit_count_at_canonical (N : Nat) :
    Nat.log2 (2 * N) + (adder_n_qubits (Nat.log2 (2 * N) + 1) + 1)
    = 4 * Nat.log2 (2 * N) + 6

*Total qubit count of `modMultInPlaceShor` at canonical Shor dimensions.** The gate occupies `4 * Nat.log2 (2 * N) + 6` qubits. Comparison with SQIR's placeholder `f_modmult_circuit`: - SQIR's `f_modmult_circuit` has dimension `n + modmult_rev_anc n = 3n + 1` (where `n = Nat.log2 (2 * N)`). - Our gate has dimension `multBits + (adder_n_qubits (multBits+1) + 1) = 4n + 6`. - Overhead: `n + 5` more ancilla qubits than SQIR's placeholder. This is the explicit cost of using the FAITHFULLY VERIFIED Gidney ripple-carry adder + in-place wrapper approach. The overhead pays for kernel-clean correctness across the entire Shor pipeline.

theoremmodMultInPlaceShor_qubit_count

theorem modMultInPlaceShor_qubit_count (bits multBits : Nat) :
    multBits + (adder_n_qubits (bits + 1) + 1)
    = multBits + 3 * bits + 6

*General total qubit count formula.**

theoremBasicSetting_at_canonical_dim_from_coprime

theorem BasicSetting_at_canonical_dim_from_coprime
    (N a : Nat) (h_N_gt_one : 1 < N)
    (h_a_pos : 0 < a) (h_a_lt : a < N)
    (h_cop : Nat.Coprime a N) :
    FormalRV.SQIRPort.BasicSetting a (FormalRV.SQIRPort.ord a N) N
      (Nat.log2 (2 * N^2)) (Nat.log2 (2 * N))

*`BasicSetting` at canonical Shor dim from coprimality alone.** Variant of `BasicSetting_at_canonical_dim` (Tick 34) that takes `Nat.Coprime a N` instead of `Order a r N`, and uses `r := ord a N` as the order. The `Order` proof is derived internally via `ord_Order`.

theoremShor_correct_parametric_modmult

theorem Shor_correct_parametric_modmult
    (a r N m n anc : Nat)
    (f : Nat → FormalRV.SQIRPort.BaseUCom (n + anc))
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m n)
    (h_mmi : FormalRV.SQIRPort.ModMulImpl a N n anc f)
    (h_wt : ∀ i, i < m → FormalRV.SQIRPort.uc_well_typed (f i)) :
    FormalRV.SQIRPort.probability_of_success a r N m n anc f
      ≥ FormalRV.SQIRPort.κ / (Nat.log2 N : ℝ)^4

*HEADLINE: Parametric Shor success-probability bound.** Direct re-export of `FormalRV.SQIRPort.Shor_correct_var` (in `PostQFT.lean`), highlighting that this theorem is already parametric in: - `n` (data register size). - `anc` (ancilla count — ANY natural number). - `u : Nat → BaseUCom (n + anc)` (the modmult family). No hardcoding of `modmult_rev_anc n`. Any family satisfying `BasicSetting`, `ModMulImpl`, and per-iterate `uc_well_typed` yields the canonical Shor success-probability bound `≥ κ / (Nat.log2 N)^4`.

theoremsqir_placeholder_axioms_status

theorem sqir_placeholder_axioms_status :
    -- Our family has its own canonical ancilla count (4n + 6 vs SQIR's 3n + 1):
    ∀ bits multBits : Nat,
      multBits + (adder_n_qubits (bits + 1) + 1)
      = multBits + 3 * bits + 6

*Documentation theorem: SQIR placeholder axioms remain unchanged.** The original SQIR `f_modmult_circuit a ainv N n` (with `BaseUCom (n + modmult_rev_anc n)` shape) and its companion axioms remain placeholders in `SQIRPort/Shor.lean`. The concrete verified replacement is `our_modmult_family bits N a ainv multBits` with `BaseUCom (multBits + (adder_n_qubits (bits + 1) + 1))` shape. The parametric Shor theorem `Shor_correct_parametric_modmult` accepts EITHER shape (or any other satisfying the predicate-level hypotheses), so no dimension splicing or `modmult_rev_anc` redefinition is needed. This theorem holds trivially (it's a true conjunction) and serves as a documentation anchor.

theoremfinal_review_status

theorem final_review_status :
    -- (1) Total qubit count of our family at canonical Shor dim.
    (∀ N, Nat.log2 (2 * N) + (adder_n_qubits (Nat.log2 (2 * N) + 1) + 1)
          = 4 * Nat.log2 (2 * N) + 6) ∧
    -- (2) Our gate uses larger ancilla than SQIR's modmult_rev_anc n = 2n + 1.
    -- Concretely: our (3*bits + 6) - SQIR's (2*bits + 1) = bits + 5 more ancilla.
    (∀ bits : Nat,
        adder_n_qubits (bits + 1) + 1 = 3 * bits + 6) ∧
    -- (3) Original SQIR axioms remain as placeholders for the SQIR-size circuit;
    --     the verified replacement is `our_modmult_family` (separate gate).
    (∀ bits multBits : Nat,
        multBits + (adder_n_qubits (bits + 1) + 1)

*Deliverable D: Final review theorem documenting the project state.** This theorem packages three structural facts as a triple-conjunction: 1. The verified replacement gate's total qubit count formula. 2. The ancilla-count comparison with SQIR's `modmult_rev_anc n`. 3. The fact that SQIR's `f_modmult_circuit`-family axioms remain untouched placeholders (independent of our verified replacement). Each conjunct is decidable / provable; the theorem serves as a documentation anchor for the final project state.

FormalRV.Arithmetic.MeasuredAdder

FormalRV/Arithmetic/MeasuredAdder.lean

FormalRV.Arithmetic.MeasuredAdder ────────────────────────────────── The **measured** Gidney ripple-carry adder family — `n` Toffoli per add (vs the reversible `2n`), realising the cost Cain–Xu 2026 / Gidney 2018 charge to an adder. The carry ancillas are released by MEASUREMENT-based AND-uncompute (Gidney's temporary AND) instead of a second reversible Toffoli sweep, which is Toffoli-free — so the reverse pass costs `0` and the adder collapses to its forward sweep's `n` Toffoli, while STILL computing the faithful sum `(a + b) % 2^bits` on the target. The controlled variant gates the addend under a control (`ctrl ? (a+b) : b`) at `2n` Toffoli (Cain–Xu's E3 → E4 jump). Import this umbrella to get the whole verified measured adder (defs + value correctness + Toffoli counts) as the single public entry point. See `MeasuredAdder/README.md` for the spine (which file holds which headline theorem), the circuit, and a concrete example.

(no documented top-level declarations)

FormalRV.Arithmetic.MeasuredAdder.Example

FormalRV/Arithmetic/MeasuredAdder/Example.lean

FormalRV.Arithmetic.MeasuredAdder.Example ────────────────────────────────────────── Concrete demonstration of the measured Gidney adder on a small case (width `n+2 = 4`). We `#eval` the Toffoli counts (the HALF / DOUBLE headlines) and run the Boolean `EGate.applyNat` semantics on real inputs, decoding the target with `gidney_target_val` to see the faithful sum `(a + b) % 16` (uncontrolled) and the controlled sum `ctrl ? (a+b) : b`. Everything here is `#eval` of verified, kernel-clean objects (no axioms, no `native_decide`).

defrunMeas

private def runMeas (a b : Nat) : Nat

defcarryAfter

private def carryAfter (a b : Nat) : Nat

Carry register after the measured adder (should be all-zero = released).

example(example)

example : runMeas 3 5 = 8

Machine-checked: the measured adder really computes `(a+b) % 16` on this case (the same fact as `gidneyAdderMeasured_target_val`, here `decide`d numerically).

example(example)

example : runMeas 9 9 = 2

example(example)

example : carryAfter 7 6 = 0

defrunCtrl

private def runCtrl (a b ctrl cval : Nat) : Nat

example(example)

example : runCtrl 3 5 14 1 = 8

Machine-checked controlled cases: `cval=1` adds, `cval=0` is the identity on `b`.

example(example)

example : runCtrl 3 5 14 0 = 5

FormalRV.Arithmetic.MeasuredAdder.MeasuredAdderCorrectness

FormalRV/Arithmetic/MeasuredAdder/MeasuredAdderCorrectness.lean

FormalRV.Arithmetic.MeasuredAdder.MeasuredAdderCorrectness ─────────────────────────────────────────────────────────── VALUE correctness for the measured Gidney adder family — the FAITHFUL sum on the target register, for both the uncontrolled (`a + b`) and the controlled (`ctrl ? (a+b) : b`) adders. Imports only the shared base `MeasuredAdderDef`; every proof here is byte-for-byte the one that used to live in `GidneyMeasured.lean` / `GidneyMeasuredControlled.lean`. ## Why the value is still `a + b` (the frame argument, uncontrolled) Each measured reverse step equals the unitary reverse step followed by clearing its own carry (`*_eq` lemmas, in Def). Crucially, the carry an interior step writes is read by NO later (lower-index) step, so forcing it to `false` is INVISIBLE to every `read`/`target` output of the remaining cascade (`gidneyMeasFullReverse_rt`, via `propagation_reverse_clear_carry_insensitive`). Hence: • `target` after the measured adder = `target` after the reversible adder = `(a + b) % 2^bits` (REUSING `gidney_adder_full_faithful_no_measurement_target_correct`); • the carry register is `false` everywhere (`gidneyMeasFullReverse` clears it). ## Why the value is `ctrl ? (a+b) : b` (the reuse argument, controlled) After the mask, the adder-block sub-state is **literally** `adder_input_F n (if ctrl then a else 0) b` (read register = the gated addend, target = `b`, carries = 0). The control bit and the source register live at HIGH indices (`≥ adder_n_qubits = 3n+2`), and the measured adder only ever touches indices `< 3n+2` (`gidneyAdderMeasured_boundedBy`). A clean index-congruence (`EGate.applyNat_congr_lt`) swaps the masked state for the literal `adder_input_F`, and we REUSE `gidneyAdderMeasured_correct` verbatim — NO arithmetic is re-proved. Refs: Gidney arXiv:1709.06648 (temporary AND); Cain–Xu 2026.

theoremgidneyAdderMeasured_correct

theorem gidneyAdderMeasured_correct
    (n a b q_start : Nat) (ha : a < 2 ^ (n + 2)) (hb : b < 2 ^ (n + 2)) :
    ∀ i, i < n + 2 →
      (EGate.applyNat (gidneyAdderMeasured (n + 2) q_start)
          (adder_input_F (n + 2) a b) (target_idx i)
        = adder_sum_bit_classical a b i)
      ∧ (EGate.applyNat (gidneyAdderMeasured (n + 2) q_start)
          (adder_input_F (n + 2) a b) (carry_idx i) = false)

*Value correctness of the measured adder — FAITHFUL `a + b`.** On the clean two-operand input `adder_input_F (n+2) a b`, the measured Gidney adder writes the true sum bits `(a + b).testBit i` to the target register for every `i < n+2`, AND releases every carry ancilla to `false`: • `target[i] = (a + b).testBit i` (= `adder_sum_bit_classical a b i`), • `carry[i] = false`. The target value is REUSED verbatim from the reversible adder's correctness (`gidney_adder_full_faithful_no_measurement_target_correct`): the measured reverse agrees with the reversible reverse on every `target` position (`gidneyMeasFullReverse_rt`), and the reversible adder's target is the sum. The carries are released by the measurement-uncompute (`gidneyMeasFullReverse_carry_clear`), citing `MeasuredANDUncompute.measANDUncompute_perfect` for the quantum justification that this reset IS the perfect AND-uncompute.

theoremgidneyAdderMeasured_target_val

theorem gidneyAdderMeasured_target_val
    (n a b q_start : Nat) (ha : a < 2 ^ (n + 2)) (hb : b < 2 ^ (n + 2)) :
    gidney_target_val (n + 2)
        (EGate.applyNat (gidneyAdderMeasured (n + 2) q_start) (adder_input_F (n + 2) a b))
      = (a + b) % 2 ^ (n + 2)

*Decoded value form: the target register holds `(a + b) % 2^(n+2)`.** The LSB-first `gidney_target_val` decoder of the measured adder's output equals `(a + b) % 2^(n+2)` — the faithful arithmetic sum. Derived from the per-bit `gidneyAdderMeasured_correct` and the reversible adder's decoded-value theorem `gidney_adder_correct_full` (which both equal `(a+b) % 2^bits` bit-for-bit).

theoremgidneyAdderMeasuredControlled_correct

theorem gidneyAdderMeasuredControlled_correct
    (n a b q_start ctrl cval : Nat)
    (hctrl : adder_n_qubits (n + 2) ≤ ctrl)
    (ha : a < 2 ^ (n + 2)) (hb : b < 2 ^ (n + 2)) :
    ∀ i, i < n + 2 →
      (EGate.applyNat (gidneyAdderMeasuredControlled (n + 2) q_start ctrl)
          (ctrlAdder_input_F (n + 2) a b ctrl cval) (target_idx i)
        = (if cval = 1 then a + b else b).testBit i)
      ∧ (EGate.applyNat (gidneyAdderMeasuredControlled (n + 2) q_start ctrl)
          (ctrlAdder_input_F (n + 2) a b ctrl cval) (carry_idx i) = false)

*Value correctness of the controlled measured adder — the CONTROLLED sum.** With the control register placed above the adder block (`adder_n_qubits (n+2) ≤ ctrl`) and the classical control bit `cval`, on the clean input `ctrlAdder_input_F (n+2) a b ctrl cval` the controlled measured Gidney adder writes to the target register, for every `i < n+2`, • `target[i] = (if cval = 1 then a + b else b).testBit i` (the CONTROLLED sum), • `carry[i] = false` (carries released). The target value is the measured adder's faithful sum of `b` with the GATED addend `if cval = 1 then a else 0`, reused from `gidneyAdderMeasured_correct`.

theoremgidneyAdderMeasuredControlled_target_val

theorem gidneyAdderMeasuredControlled_target_val
    (n a b q_start ctrl cval : Nat)
    (hctrl : adder_n_qubits (n + 2) ≤ ctrl)
    (ha : a < 2 ^ (n + 2)) (hb : b < 2 ^ (n + 2)) :
    gidney_target_val (n + 2)
        (EGate.applyNat (gidneyAdderMeasuredControlled (n + 2) q_start ctrl)
          (ctrlAdder_input_F (n + 2) a b ctrl cval))
      = (if cval = 1 then a + b else b) % 2 ^ (n + 2)

*Decoded value form: the target register holds `if cval = 1 then (a+b) else b` mod `2^(n+2)`.** The LSB-first `gidney_target_val` decoder of the controlled measured adder's output equals `(if cval = 1 then a + b else b) % 2^(n+2)` — the faithful CONTROLLED sum: the arithmetic sum `(a+b) % 2^bits` when the control is set, and the unchanged accumulator `b % 2^bits` when it is not. Derived from the per-bit `gidneyAdderMeasuredControlled_correct` via `gidney_target_val_eq_sum_when_bits_match`.

FormalRV.Arithmetic.MeasuredAdder.MeasuredAdderDef

FormalRV/Arithmetic/MeasuredAdder/MeasuredAdderDef.lean

FormalRV.Arithmetic.MeasuredAdder.MeasuredAdderDef ────────────────────────────────────────────────── SHARED BASE for the **measured** Gidney ripple-carry adder family (uncontrolled + controlled). Holds ALL the `def`s and the internal frame/congruence lemmas that the Correctness (value) and Resource (count) files depend on. The split is purely structural — every declaration here is byte-for-byte the one that used to live in `GidneyMeasured.lean` / `GidneyMeasuredControlled.lean`, just relocated. ## The construction (composition, not from scratch) Our reversible faithful Gidney adder `gidney_adder_full_faithful_no_measurement = forward ; final_cx ; reverse` costs `2·(n+2)` Toffolis: • the FORWARD sweep `gidney_adder_forward_faithful_full` computes the carry chain into the carry ancillas (`carry_idx i = 3i+2`) — `n+2` Toffolis (CCX); • `gidney_final_cx_cascade` stamps `read ⊕ target` (T-free CNOTs); • the REVERSE sweep `gidney_adder_forward_faithful_full_reverse` simultaneously (a) undoes the forward propagation CXs so that `target` is RESTORED to the true sum bits, and (b) **uncomputes** the carry ancillas with a SECOND `n+2` Toffolis (the per-step `CCX(read i, target i, carry i)` AND-uncomputes). NOTE on a tempting-but-WRONG shortcut: `forward ; final_cx ; mz(carries)` does NOT compute `a + b` — after `forward ; final_cx` the target still holds the carry-sweep value, not the sum (machine-checked: at `n=2, a=b=1` it gives `target₁ = false` where the sum needs `true`). The reverse sweep's CXs are genuinely load-bearing for the sum, so they are KEPT. ## Gidney's measurement trick, faithfully Gidney's temporary-AND replaces ONLY the reverse sweep's AND-**uncompute** (each `CCX(read i, target i, carry i)`) by an X-basis MEASUREMENT of `carry i` plus a classically-controlled phase fixup (PROVEN to be the perfect uncompute on the computed family at the density layer in `FormalRV.Shor.MeasuredANDUncompute.measANDUncompute_perfect`). In the Boolean `EGate` model (`FormalRV.Shor.MeasUncompute.EGate`) the net effect of that channel is `EGate.mz (carry i)` — reset `carry i` to `|0⟩` — which is *Toffoli-free**. The reverse sweep's CX gates are kept verbatim. Concretely we build `gidneyMeasuredReverse` = the reverse cascade with each per-step uncompute `CCX(read i, target i, carry i)` swapped for `mz (carry i)`, KEEPING every CX. Its Toffoli count is `0`, so the measured adder `forward ; final_cx ; gidneyMeasuredReverse` costs exactly the forward sweep's `n+2` — HALF the reversible adder (`gidneyAdderMeasured_halves`). The `q_start` parameter is carried for API parity with the windowed-adder convention; THIS adder is hardwired to the interleaved layout `read/target/carry = 3i/3i+1/3i+2` (base 0), so `q_start` does not shift indices. Refs: Gidney arXiv:1709.06648 §"temporary AND"; Cain–Xu 2026 (n Toffoli/add).

theoremfunupd_eq_update

theorem funupd_eq_update (f : Nat → Bool) (c : Nat) (v : Bool) :
    Function.update f c v = update f c v

`EGate.mz` resets via `Function.update`; relate it to the project's `update`.

defgidneyMeasFirstReverse

def gidneyMeasFirstReverse : EGate

Measured first-bit reverse: the first-bit reverse with its uncompute `CCX(read 0, target 0, carry 0)` replaced by `mz (carry 0)` (CXs kept).

defgidneyMeasInteriorReverse

def gidneyMeasInteriorReverse (i : Nat) : EGate

Measured interior-bit reverse `i`: the interior reverse with its uncompute `CCX(read i, target i, carry i)` replaced by `mz (carry i)` (CXs kept).

defgidneyMeasLastReverse

def gidneyMeasLastReverse (i : Nat) : EGate

Measured last-bit reverse `i`: the last reverse with its uncompute `CCX(read i, target i, carry i)` replaced by `mz (carry i)` (chain CX kept).

theoremgidneyMeasFirstReverse_eq

theorem gidneyMeasFirstReverse_eq (f : Nat → Bool) :
    EGate.applyNat gidneyMeasFirstReverse f
      = update (gidney_first_bit_reverse_post_state f) (carry_idx 0) false

*Measured first-step = unitary first-step then clear `carry 0`.**

theoremgidneyMeasInteriorReverse_eq

theorem gidneyMeasInteriorReverse_eq (i : Nat) (f : Nat → Bool) :
    EGate.applyNat (gidneyMeasInteriorReverse i) f
      = update (gidney_interior_bit_reverse_post_state i f) (carry_idx i) false

*Measured interior-step `i` = unitary interior-step then clear `carry i`.**

theoremgidneyMeasLastReverse_eq

theorem gidneyMeasLastReverse_eq (i : Nat) (f : Nat → Bool) :
    EGate.applyNat (gidneyMeasLastReverse i) f
      = update (gidney_last_bit_reverse_post_state i f) (carry_idx i) false

*Measured last-step `i` = unitary last-step then clear `carry i`.**

defgidneyMeasPropReverse

def gidneyMeasPropReverse : Nat → EGate
  | 0     => EGate.base Gate.I
  | 1     => gidneyMeasFirstReverse
  | n + 2 => EGate.seq (gidneyMeasInteriorReverse (n + 1)) (gidneyMeasPropReverse (n + 1))

The measured propagation-reverse cascade (mirrors `gidney_adder_forward_with_propagation_reverse`).

defgidneyMeasFullReverse

def gidneyMeasFullReverse : Nat → EGate
  | 0     => EGate.base Gate.I
  | 1     => EGate.base Gate.I
  | n + 2 => EGate.seq (gidneyMeasLastReverse (n + 1)) (gidneyMeasPropReverse (n + 1))

The measured full reverse cascade (mirrors `gidney_adder_forward_faithful_full_reverse`).

theoremtcount_gidneyMeasPropReverse

theorem tcount_gidneyMeasPropReverse : ∀ n, EGate.tcount (gidneyMeasPropReverse n) = 0
  | 0     => rfl
  | 1     => by simp [gidneyMeasPropReverse, gidneyMeasFirstReverse, EGate.tcount, Gate.tcount]
  | n + 2 =>

The measured propagation reverse is Toffoli-free (only CX + measurement).

theoremtcount_gidneyMeasFullReverse

theorem tcount_gidneyMeasFullReverse : ∀ n, EGate.tcount (gidneyMeasFullReverse n) = 0
  | 0     => rfl
  | 1     => rfl
  | n + 2 =>

The measured full reverse is Toffoli-free (only CX + measurement).

theoremfirst_reverse_clear_carry_insensitive

theorem first_reverse_clear_carry_insensitive
    (m : Nat) (hm : 0 < m) (v : Bool) (f : Nat → Bool) (q : Nat) (hq : q ≠ carry_idx m) :
    gidney_first_bit_reverse_post_state (update f (carry_idx m) v) q
      = gidney_first_bit_reverse_post_state f q

*First-reverse insensitivity.** For `m ≥ 1` and `q ≠ carry m`, clearing `carry m` before the first-reverse leaves every other output unchanged (the first-reverse only reads `carry 0`, never `carry m`).

theoreminterior_reverse_clear_carry_insensitive

theorem interior_reverse_clear_carry_insensitive
    (i m : Nat) (hi : 0 < i) (him : i < m) (v : Bool) (f : Nat → Bool) (q : Nat)
    (hq : q ≠ carry_idx m) :
    gidney_interior_bit_reverse_post_state i (update f (carry_idx m) v) q
      = gidney_interior_bit_reverse_post_state i f q

*Interior-reverse insensitivity.** For `i ≥ 1`, `m > i` and `q ≠ carry m`, clearing `carry m` before the interior-reverse `i` leaves every other output unchanged (the step reads only `carry i`, `carry (i-1)`, both `< m`).

theorempropagation_reverse_clear_carry_insensitive

theorem propagation_reverse_clear_carry_insensitive (m : Nat) :
    ∀ (K : Nat) (v : Bool) (f : Nat → Bool) (q : Nat), K ≤ m → q ≠ carry_idx m →
      gidney_propagation_reverse_post_state K (update f (carry_idx m) v) q
        = gidney_propagation_reverse_post_state K f q

*Propagation-reverse cascade insensitivity.** For `K ≤ m` and `q ≠ carry m`, clearing `carry m` before the propagation-reverse cascade of length `K` is invisible at `q`. `propagation_reverse K` reads only carries `0 .. K-1`, all `< m`.

theoremgidneyMeasPropReverse_rt

theorem gidneyMeasPropReverse_rt :
    ∀ (K : Nat) (f : Nat → Bool) (q : Nat), (∀ m, q ≠ carry_idx m) →
      EGate.applyNat (gidneyMeasPropReverse K) f q
        = gidney_propagation_reverse_post_state K f q

*Measured propagation reverse = unitary on `read`/`target`.** At any position `q` that is not a carry index, the measured propagation-reverse cascade produces exactly the unitary one's value.

theoremgidneyMeasFullReverse_rt

theorem gidneyMeasFullReverse_rt
    (n : Nat) (f : Nat → Bool) (q : Nat) (hq : ∀ m, q ≠ carry_idx m) :
    EGate.applyNat (gidneyMeasFullReverse (n + 2)) f q
      = gidney_full_reverse_post_state (n + 2) f q

*Measured full reverse = unitary on `read`/`target`.**

theoremgidneyMeasPropReverse_carry

theorem gidneyMeasPropReverse_carry :
    ∀ (K : Nat) (f : Nat → Bool) (i : Nat),
      (i < K → EGate.applyNat (gidneyMeasPropReverse K) f (carry_idx i) = false)
      ∧ (K ≤ i → EGate.applyNat (gidneyMeasPropReverse K) f (carry_idx i) = f (carry_idx i))

The measured propagation reverse clears carries `0 .. K-1`; carries `≥ K` are preserved. (Each step `gidneyMeas*Reverse i` ends in `mz (carry i)`, and lower steps never touch a higher carry.)

theoremgidneyMeasFullReverse_carry_clear

theorem gidneyMeasFullReverse_carry_clear
    (n : Nat) (f : Nat → Bool) (i : Nat) (hi : i < n + 2) :
    EGate.applyNat (gidneyMeasFullReverse (n + 2)) f (carry_idx i) = false

The measured full reverse clears every carry `i < n+2`.

defgidneyAdderMeasured

def gidneyAdderMeasured (n _q_start : Nat) : EGate

*The measured Gidney ripple-carry adder** (`n`-bit, `n` Toffoli): the faithful forward carry sweep, the sum-stamping final-CX cascade, then the MEASURED reverse cascade (`gidneyMeasFullReverse`) — the reversible reverse with each carry AND-uncompute `CCX` swapped for a measurement `mz`. The forward sweep computes the carries (`n` Toffolis); the final-CX and the measured reverse are Toffoli-free, so the total is the forward's `n` — HALF the reversible `gidney_adder_full_faithful_no_measurement`. Computes `(a+b) % 2^n` on the target with the carry register released.

theoremgidneyAdderMeasured_applyNat

theorem gidneyAdderMeasured_applyNat (n q_start : Nat) (f : Nat → Bool) :
    EGate.applyNat (gidneyAdderMeasured (n + 2) q_start) f
      = EGate.applyNat (gidneyMeasFullReverse (n + 2))
          (gidney_final_cx_cascade_post_state (n + 2)
            (gidney_forward_faithful_full_post_state (n + 2) f))

The measured adder splits as: apply `forward ; final_cx` (as a base `Gate`), then the measured reverse cascade.

defGate.boundedBy

def Gate.boundedBy (B : Nat) : Gate → Prop
  | Gate.I         => True
  | Gate.X q       => q < B
  | Gate.CX c t    => c < B ∧ t < B
  | Gate.CCX a b c => a < B ∧ b < B ∧ c < B
  | Gate.seq g₁ g₂ => Gate.boundedBy B g₁ ∧ Gate.boundedBy B g₂

All qubit indices referenced by a `Gate` are `< B`.

theoremGate.applyNat_congr_lt

theorem Gate.applyNat_congr_lt (B : Nat) :
    ∀ (gate : Gate), Gate.boundedBy B gate →
      ∀ (f g : Nat → Bool), (∀ q, q < B → f q = g q) →
        ∀ q, q < B → Gate.applyNat gate f q = Gate.applyNat gate g q

*`Gate.applyNat` congruence below `B`.** If `f` and `g` agree on all indices `< B` and the gate only touches indices `< B`, the outputs agree on all `< B`.

defEGate.boundedBy

def EGate.boundedBy (B : Nat) : EGate → Prop
  | EGate.base g  => Gate.boundedBy B g
  | EGate.mz q    => q < B
  | EGate.seq a b => EGate.boundedBy B a ∧ EGate.boundedBy B b

All qubit indices referenced by an `EGate` are `< B`.

theoremEGate.applyNat_congr_lt

theorem EGate.applyNat_congr_lt (B : Nat) :
    ∀ (eg : EGate), EGate.boundedBy B eg →
      ∀ (f g : Nat → Bool), (∀ q, q < B → f q = g q) →
        ∀ q, q < B → EGate.applyNat eg f q = EGate.applyNat eg g q

*`EGate.applyNat` congruence below `B`.**

theoremGate.boundedBy_mono

theorem Gate.boundedBy_mono {B B' : Nat} (h : B ≤ B') :
    ∀ (g : Gate), Gate.boundedBy B g → Gate.boundedBy B' g

Monotonicity of `Gate.boundedBy` in the bound.

theoremEGate.boundedBy_mono

theorem EGate.boundedBy_mono {B B' : Nat} (h : B ≤ B') :
    ∀ (eg : EGate), EGate.boundedBy B eg → EGate.boundedBy B' eg

Monotonicity of `EGate.boundedBy` in the bound.

theoremfirst_step_bounded

private theorem first_step_bounded :
    Gate.boundedBy 5 gidney_adder_bit_step_faithful_first

theoreminterior_step_bounded

private theorem interior_step_bounded (i : Nat) :
    Gate.boundedBy (3 * i + 5) (gidney_adder_bit_step_faithful_interior i)

theoremlast_step_bounded

private theorem last_step_bounded (i : Nat) :
    Gate.boundedBy (3 * i + 3) (gidney_adder_bit_step_faithful_last i)

theoremforward_with_propagation_bounded

private theorem forward_with_propagation_bounded :
    ∀ k, Gate.boundedBy (3 * k + 2) (gidney_adder_forward_with_propagation k)
  | 0 => trivial
  | 1 => Gate.boundedBy_mono (by omega) _ first_step_bounded
  | n + 2 =>

The propagation forward cascade of length `k` is bounded by `3*k + 2`.

theoremgidney_adder_forward_faithful_full_boundedBy

theorem gidney_adder_forward_faithful_full_boundedBy (n : Nat) :
    Gate.boundedBy (adder_n_qubits (n + 2)) (gidney_adder_forward_faithful_full (n + 2))

The faithful forward sweep at width `n+2` is bounded by `adder_n_qubits (n+2)`.

theoremfinal_cx_cascade_bounded

private theorem final_cx_cascade_bounded :
    ∀ k, Gate.boundedBy (3 * k) (gidney_final_cx_cascade k)
  | 0 => trivial
  | k + 1 =>

The final-CX cascade of length `k` is bounded by `3*k` (it uses `read_idx j, target_idx j` for `j < k`, max `target_idx (k-1) = 3k-2 < 3k`).

theoremmeasFirstReverse_bounded

private theorem measFirstReverse_bounded :
    EGate.boundedBy 5 gidneyMeasFirstReverse

theoremmeasInteriorReverse_bounded

private theorem measInteriorReverse_bounded (i : Nat) :
    EGate.boundedBy (3 * i + 5) (gidneyMeasInteriorReverse i)

theoremmeasLastReverse_bounded

private theorem measLastReverse_bounded (i : Nat) :
    EGate.boundedBy (3 * i + 3) (gidneyMeasLastReverse i)

theoremmeasPropReverse_bounded

private theorem measPropReverse_bounded :
    ∀ K, EGate.boundedBy (3 * K + 2) (gidneyMeasPropReverse K)
  | 0 => by simp [gidneyMeasPropReverse, EGate.boundedBy, Gate.boundedBy]
  | 1 => EGate.boundedBy_mono (by omega) _ measFirstReverse_bounded
  | n + 2 =>

The measured propagation-reverse cascade of length `K` is bounded by `3*K + 2`.

theoremmeasFullReverse_bounded

private theorem measFullReverse_bounded (n : Nat) :
    EGate.boundedBy (adder_n_qubits (n + 2)) (gidneyMeasFullReverse (n + 2))

The measured full-reverse cascade at width `n+2` is bounded by `adder_n_qubits (n+2)`.

theoremgidneyAdderMeasured_boundedBy

theorem gidneyAdderMeasured_boundedBy (n q_start : Nat) :
    EGate.boundedBy (adder_n_qubits (n + 2)) (gidneyAdderMeasured (n + 2) q_start)

*The uncontrolled measured Gidney adder is bounded by `adder_n_qubits (n+2)`.** Every gate touches only read/target/carry indices `< 3*(n+2)`; this is the locality fact that lets the controlled wrapper discharge its high-index control + source register and reuse `gidneyAdderMeasured_correct` verbatim.

theoremgidneyAdderMeasured_boundedBy_tight

theorem gidneyAdderMeasured_boundedBy_tight (n q_start : Nat) :
    EGate.boundedBy (3 * (n + 2)) (gidneyAdderMeasured (n + 2) q_start)

*TIGHT locality bound: the uncontrolled measured Gidney adder touches only indices `< 3*(n+2)`.** The `adder_n_qubits (n+2) = 3*(n+2)+2` total leaves the two slack indices `3*(n+2)` and `3*(n+2)+1` UNTOUCHED — every gate references a read/target/carry index of bit `i ≤ n+1`, i.e. `≤ carry_idx (n+1) = 3*(n+1)+2 < 3*(n+2)`. This tighter bound lets a caller frame those two slack indices (and anything above) with a single `EGate.boundedBy`-above argument.

defsrcA_idx

def srcA_idx (ctrl i : Nat) : Nat

Qubit index of the `i`-th bit of the SOURCE register holding the addend `a` (placed just above the control qubit).

defctrlMaskRead

def ctrlMaskRead (ctrl : Nat) : Nat → Gate
  | 0     => Gate.I
  | k + 1 => Gate.seq (ctrlMaskRead ctrl k)
                      (Gate.CCX ctrl (srcA_idx ctrl k) (read_idx k))

The controlled-mask cascade: for `i < k`, `CCX(ctrl, srcA_idx ctrl i, read_idx i)`. Applied to a clean (`read = 0`) input it sets `read_idx i := ctrl ∧ srcA_idx ctrl i`.

theoremtcount_ctrlMaskRead

theorem tcount_ctrlMaskRead (ctrl : Nat) : ∀ k, Gate.tcount (ctrlMaskRead ctrl k) = 7 * k
  | 0     => rfl
  | k + 1 =>

Toffoli count of the mask: one CCX (`7` T) per bit.

defctrlAdder_input_F

def ctrlAdder_input_F (n a b ctrl cval : Nat) (k : Nat) : Bool

The clean two-operand + control + source input for the controlled adder.

theoremctrlMaskRead_preserves_non_read

private theorem ctrlMaskRead_preserves_non_read (ctrl : Nat) :
    ∀ (k : Nat) (f : Nat → Bool) (q : Nat), (∀ j, j < k → q ≠ read_idx j) →
      Gate.applyNat (ctrlMaskRead ctrl k) f q = f q

The mask cascade of length `k` leaves any index `q` disjoint from its read targets (`q ≠ read_idx j` for `j < k`) unchanged.

theoremctrlMaskRead_preserves_high_read

private theorem ctrlMaskRead_preserves_high_read (ctrl : Nat) :
    ∀ (k : Nat) (f : Nat → Bool) (j : Nat), k ≤ j →
      Gate.applyNat (ctrlMaskRead ctrl k) f (read_idx j) = f (read_idx j)

The mask cascade of length `k` preserves `read_idx j` for `j ≥ k` (it only targets `read_idx 0 .. read_idx (k-1)`).

theoremctrlMaskRead_read

private theorem ctrlMaskRead_read (ctrl : Nat) :
    ∀ (k : Nat),
      (∀ j, j < k → ctrl ≠ read_idx j) →
      (∀ i j, j < k → srcA_idx ctrl i ≠ read_idx j) →
      ∀ (f : Nat → Bool) (j : Nat), j < k →
        Gate.applyNat (ctrlMaskRead ctrl k) f (read_idx j)
          = xor (f (read_idx j)) (f ctrl && f (srcA_idx ctrl j))

The mask cascade sets `read_idx j` (for `j < k`) to `f (read_idx j) ⊕ (f ctrl ∧ f (srcA_idx ctrl j))`, provided `ctrl` and the source register are disjoint from every read index (`ctrl, srcA_idx ctrl _ ≠ read_idx _`), which holds when `ctrl` is above the block.

theoremctrlMaskRead_eq_adder_input

theorem ctrlMaskRead_eq_adder_input
    (n a b ctrl cval : Nat) (hctrl : adder_n_qubits (n + 2) ≤ ctrl)
    (q : Nat) (hq : q < adder_n_qubits (n + 2)) :
    Gate.applyNat (ctrlMaskRead ctrl (n + 2)) (ctrlAdder_input_F (n + 2) a b ctrl cval) q
      = adder_input_F (n + 2) (if cval = 1 then a else 0) b q

*The mask turns the clean input into the gated-addend `adder_input_F`.** On every adder-block index `q < adder_n_qubits (n+2)`, the masked clean input equals `adder_input_F (n+2) (if cval = 1 then a else 0) b` — read register = the gated addend, target = `b`, carries = 0. Requires the control register placed above the block (`adder_n_qubits (n+2) ≤ ctrl`).

defgidneyAdderMeasuredControlled

def gidneyAdderMeasuredControlled (n q_start ctrl : Nat) : EGate

*The faithful controlled measured Gidney adder** (`2n` Toffoli). The control qubit lives at index `ctrl` (placed above the adder block) and the addend `a` lives in the source register `srcA_idx ctrl i`. The gate is the controlled core `ctrlMaskRead` (`n` CCX = `n` Toffoli) — which gates the addend into the read register under `ctrl` — followed by the REUSED uncontrolled measured Gidney adder `gidneyAdderMeasured` (`n` Toffoli). Net Toffoli = `2n`, realising Cain–Xu E4. When `ctrl = 1` it adds the addend (`target := a + b`); when `ctrl = 0` the read register is gated to `0` and the adder is the identity on the value (`target := b`).

FormalRV.Arithmetic.MeasuredAdder.MeasuredAdderResource

FormalRV/Arithmetic/MeasuredAdder/MeasuredAdderResource.lean

FormalRV.Arithmetic.MeasuredAdder.MeasuredAdderResource ──────────────────────────────────────────────────────── COUNT theorems for the measured Gidney adder family: the uncontrolled adder is `n` Toffoli (HALF the reversible adder) and the controlled adder is `2n` Toffoli (DOUBLE the uncontrolled). Imports only the shared base `MeasuredAdderDef`; every proof here is byte-for-byte the one that used to live in `GidneyMeasured.lean` / `GidneyMeasuredControlled.lean`. The measurement-uncompute trick makes the reverse sweep Toffoli-free (`tcount_gidneyMeasFullReverse = 0`), so the uncontrolled adder's cost collapses to the forward sweep's `n` (`toffoli_gidneyAdderMeasured`), realising Cain–Xu's `n`-Toffoli-per-add adder and closing the factor-2 of the reversible version (`gidneyAdderMeasured_halves`). A *controlled* add cannot measure away its addend gating, so the controlled core (`ctrlMaskRead`, `n` CCX) adds a genuine `n` Toffoli on top, giving `2n` (`toffoli_gidneyAdderMeasuredControlled`, `gidneyAdderMeasuredControlled_doubles`) — Cain–Xu's E3 → E4 jump. Refs: Gidney arXiv:1709.06648 (temporary AND); Cain–Xu 2026 (E3 n / E4 2n).

theoremtoffoli_gidneyAdderMeasured

theorem toffoli_gidneyAdderMeasured (n q_start : Nat) :
    EGate.toffoli (gidneyAdderMeasured (n + 2) q_start) = n + 2

*Toffoli count of the measured adder is exactly the forward sweep's `n`** (here `n+2` at width `n+2`): the final-CX cascade and the measured reverse cascade contribute `0`. Derived from `tcount_gidney_adder_forward_faithful_full` (`7·(n+2)` T = `(n+2)` Toffolis), `tcount_gidney_final_cx_cascade = 0`, and `tcount_gidneyMeasFullReverse = 0`.

theoremgidneyAdderMeasured_halves

theorem gidneyAdderMeasured_halves (n q_start : Nat) :
    EGate.toffoli (gidneyAdderMeasured (n + 2) q_start)
      = tcount (gidney_adder_full_faithful_no_measurement (n + 2)) / 7 / 2

*★ HEADLINE — the measurement-uncompute HALVES the adder Toffoli count.** The measured Gidney adder costs exactly HALF the Toffolis of the reversible faithful `gidney_adder_full_faithful_no_measurement` (`n+2` vs `2·(n+2)`) — the verified statement that Gidney's measurement-based carry-uncompute realises Cain–Xu's `n`-Toffoli-per-add adder, closing the factor-2 of the reversible version, while STILL computing the FAITHFUL sum `(a+b) % 2^bits` (`gidneyAdderMeasured_correct`).

theoremtoffoli_gidneyAdderMeasuredControlled

theorem toffoli_gidneyAdderMeasuredControlled (n q_start ctrl : Nat) :
    EGate.toffoli (gidneyAdderMeasuredControlled (n + 2) q_start ctrl) = 2 * (n + 2)

*Toffoli count of the controlled measured adder is exactly `2·(n+2)`** at width `n+2`: the controlled mask contributes `n+2` (one CCX per addend bit) and the reused measured adder contributes `n+2`; their sum is `2·(n+2)`. This is the verified `E3 (n) → E4 (2n)` jump: the control DOUBLES the adder Toffoli cost vs the uncontrolled measured adder `gidneyAdderMeasured` (`toffoli_gidneyAdderMeasured`).

theoremgidneyAdderMeasuredControlled_doubles

theorem gidneyAdderMeasuredControlled_doubles (n q_start ctrl : Nat) :
    EGate.toffoli (gidneyAdderMeasuredControlled (n + 2) q_start ctrl)
      = 2 * EGate.toffoli (gidneyAdderMeasured (n + 2) q_start)

*★ The control DOUBLES the measured-adder Toffoli count (E3 → E4).** The controlled measured adder costs exactly TWICE the uncontrolled measured Gidney adder `gidneyAdderMeasured` — Cain–Xu's `30 q_A = 2·q_A` (E4) vs `25 q_A = q_A` (E3) controlled-adder factor-2, on verified objects.

FormalRV.Arithmetic.ModExp

FormalRV/Arithmetic/ModExp.lean

(no documented top-level declarations)

FormalRV.Arithmetic.ModExp.ModExpCorrectness

FormalRV/Arithmetic/ModExp/ModExpCorrectness.lean

FormalRV.Arithmetic.ModExp.ModExpCorrectness Semantic correctness of the modexp oracle family: it is a `ModMulImpl` (every iterate multiplies by a^(2^i) mod N), and the resulting Shor success-probability bound. HEADLINE: `our_modmult_family_ModMulImpl`, `Shor_correct_with_verified_modexp`.

theoremMultiplyCircuitProperty_mod_invariance

theorem MultiplyCircuitProperty_mod_invariance
    (a N n anc : Nat) (c : FormalRV.SQIRPort.BaseUCom (n + anc))
    (h : FormalRV.SQIRPort.MultiplyCircuitProperty (a % N) N n anc c) :
    FormalRV.SQIRPort.MultiplyCircuitProperty a N n anc c

*`MultiplyCircuitProperty` is invariant under modular reduction of the multiplier**. Since the MCP property mentions `a` only inside `(a * x) % N`, reducing `a` modulo `N` doesn't change the property.

theoremour_modmult_family_mcp_per_iterate

theorem our_modmult_family_mcp_per_iterate
    (bits N a ainv multBits : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_multBits_pos : 0 < multBits)
    (h_N_le_pow_multBits : N ≤ 2^multBits)
    (h_inv_pow : ∀ i, (a^(2^i) % N) * (ainv^(2^i) % N) % N = 1)
    (h_pow_a_pos : ∀ i, 0 < a^(2^i) % N)
    (h_pow_ainv_pos : ∀ i, 0 < ainv^(2^i) % N)
    (h_const_pos_a_iter : ∀ i j, j < multBits → 0 < (a^(2^i) % N * 2^j) % N)
    (h_const_pos_inv_iter :
      ∀ i j, j < multBits → 0 < ((N - ainv^(2^i) % N) % N * 2^j) % N) :

*`our_modmult_family` satisfies `MultiplyCircuitProperty` at every iterate.** Combined with the WellTyped from Tick 26, this is the `ModMulImpl` evidence required by `Shor_correct_var`.

theoremour_modmult_family_ModMulImpl

theorem our_modmult_family_ModMulImpl
    (bits N a ainv multBits : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_multBits_pos : 0 < multBits)
    (h_N_le_pow_multBits : N ≤ 2^multBits)
    (h_inv_pow : ∀ i, (a^(2^i) % N) * (ainv^(2^i) % N) % N = 1)
    (h_pow_a_pos : ∀ i, 0 < a^(2^i) % N)
    (h_pow_ainv_pos : ∀ i, 0 < ainv^(2^i) % N)
    (h_const_pos_a_iter : ∀ i j, j < multBits → 0 < (a^(2^i) % N * 2^j) % N)
    (h_const_pos_inv_iter :
      ∀ i j, j < multBits → 0 < ((N - ainv^(2^i) % N) % N * 2^j) % N) :

*`our_modmult_family` is a `ModMulImpl`.** Direct reformulation of `our_modmult_family_mcp_per_iterate`.

theoremShor_correct_with_our_family

theorem Shor_correct_with_our_family
    (bits N a ainv multBits m r : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_multBits_pos : 0 < multBits)
    (h_N_le_pow_multBits : N ≤ 2^multBits)
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m multBits)
    (h_inv_pow : ∀ i, (a^(2^i) % N) * (ainv^(2^i) % N) % N = 1)
    (h_pow_a_pos : ∀ i, 0 < a^(2^i) % N)
    (h_pow_ainv_pos : ∀ i, 0 < ainv^(2^i) % N)
    (h_const_pos_a_iter : ∀ i j, j < multBits → 0 < (a^(2^i) % N * 2^j) % N)
    (h_const_pos_inv_iter :

*HEADLINE: Shor's success-probability bound for our concrete in-place modular multiplier family.** Direct application of `Shor_correct_var` with `u := our_modmult_family bits N a ainv multBits`, using Tick 26's WellTyped and Tick 27's `ModMulImpl`. The user must supply `BasicSetting a r N m multBits` — the order-and-bounds hypothesis on `(a, r, N, m, multBits)` — plus the modular-arithmetic conditions required by Tick 27.

theoremcoprime_mod_pos

theorem coprime_mod_pos (a N : Nat) (hN : 1 < N) (h_cop : Nat.Coprime a N) :
    0 < a % N

*Coprime + 1 < N implies `0 < a % N`.** If `N ∣ a` then `N ≤ gcd a N = 1`, contradicting `1 < N`.

theoremcoprime_pow

theorem coprime_pow (a N k : Nat) (h_cop : Nat.Coprime a N) :
    Nat.Coprime (a^k) N

*`gcd(a, N) = 1 → gcd(a^k, N) = 1`** via `Nat.Coprime.pow_left`.

theoremcoprime_pow_mod_pos

theorem coprime_pow_mod_pos (a N k : Nat) (hN : 1 < N) (h_cop : Nat.Coprime a N) :
    0 < a^k % N

*`gcd(a, N) = 1 + 1 < N → 0 < a^k % N` for all `k`.** Combines `coprime_pow` and `coprime_mod_pos`.

theoremcoprime_mul_pow_two_mod_pos

theorem coprime_mul_pow_two_mod_pos
    (a N k j : Nat) (hN : 1 < N) (h_cop : Nat.Coprime a N)
    (h_cop_two : Nat.Coprime 2 N) :
    0 < (a^k % N * 2^j) % N

*`gcd(a, N) = 1 + gcd(2, N) = 1 → 0 < (a^k % N * 2^j) % N`.** The per-bit coprimality condition needed by `our_modmult_family`'s hypotheses, derived from a base coprimality of `a` and `2` with `N`.

theoremcoprime_of_mul_mod_one

theorem coprime_of_mul_mod_one (a ainv N : Nat) (h_inv : a * ainv % N = 1) :
    Nat.Coprime a N

*`a * ainv % N = 1` implies `Nat.Coprime a N`.**

theoremcoprime_inv_of_mul_mod_one

theorem coprime_inv_of_mul_mod_one (a ainv N : Nat) (h_inv : a * ainv % N = 1) :
    Nat.Coprime ainv N

*`a * ainv % N = 1` implies `Nat.Coprime ainv N`.**

theoremmul_pow_mod_one

theorem mul_pow_mod_one (a ainv N k : Nat) (hN : 1 < N) (h_inv : a * ainv % N = 1) :
    (a^k % N) * (ainv^k % N) % N = 1

*`a * ainv % N = 1 + 1 < N → ∀ k, (a^k % N) * (ainv^k % N) % N = 1`.**

theoremShor_correct_with_our_family_coprime

theorem Shor_correct_with_our_family_coprime
    (bits N a ainv multBits m r : Nat)
    (hbits : 1 ≤ bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_multBits_pos : 0 < multBits)
    (hN : N ≤ 2^bits)
    (h_N_le_pow_multBits : N ≤ 2^multBits)
    (h_N_gt_one : 1 < N)
    (h_cop_a : Nat.Coprime a N)
    (h_cop_two : Nat.Coprime 2 N)
    (h_inv : a * ainv % N = 1)
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m multBits) :

*HEADLINE: Shor success-probability bound from minimal coprimality hypotheses.** Bundles Tick 28's `Shor_correct_with_our_family` with the derivations from `1 < N`, `Nat.Coprime a N`, `Nat.Coprime 2 N` (N odd), and `a * ainv % N = 1`. This is the SIMPLEST user-facing Shor success-probability theorem for our concrete in-place modular multiplier construction.

theoremShor_correct_with_our_family_at_canonical_dim

theorem Shor_correct_with_our_family_at_canonical_dim
    (N a ainv : Nat)
    (h_N_gt_one : 1 < N)
    (h_a_pos : 0 < a) (h_a_lt : a < N)
    (h_cop_a : Nat.Coprime a N)
    (h_cop_two : Nat.Coprime 2 N)
    (h_inv : a * ainv % N = 1) :
    FormalRV.SQIRPort.probability_of_success a
        (FormalRV.SQIRPort.ord a N) N
        (Nat.log2 (2 * N^2)) (Nat.log2 (2 * N))
        (adder_n_qubits (Nat.log2 (2 * N) + 1) + 1)
        (our_modmult_family (Nat.log2 (2 * N)) N a ainv (Nat.log2 (2 * N)))

*HEADLINE: Shor success-probability bound at canonical Shor parameters.** Specializes `Shor_correct_with_our_family_coprime` at `multBits := Nat.log2 (2 * N)` and `m := Nat.log2 (2 * N^2)` (the canonical Shor sizing), automatically deriving the `BasicSetting` log2 bounds from `1 < N`. This mirrors the canonical-dim choice in `Shor_correct` but uses our concrete in-place gate.

theoremShor_correct_with_our_family_from_parametric

theorem Shor_correct_with_our_family_from_parametric
    (bits N a ainv multBits m r : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_multBits_pos : 0 < multBits)
    (h_N_le_pow_multBits : N ≤ 2^multBits)
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m multBits)
    (h_inv_pow : ∀ i, (a^(2^i) % N) * (ainv^(2^i) % N) % N = 1)
    (h_pow_a_pos : ∀ i, 0 < a^(2^i) % N)
    (h_pow_ainv_pos : ∀ i, 0 < ainv^(2^i) % N)
    (h_const_pos_a_iter : ∀ i j, j < multBits → 0 < (a^(2^i) % N * 2^j) % N)
    (h_const_pos_inv_iter :

*Our family instantiated via the parametric Shor theorem.** A thin wrapper around: - `Shor_correct_parametric_modmult` (the parametric Shor theorem). - `our_modmult_family_ModMulImpl` (Tick 27 — ModMulImpl evidence). - `our_modmult_family_uc_well_typed` (Tick 26 — WellTyped evidence). The user supplies the standard Shor hypotheses plus the per-iterate coprimality conditions; this theorem packages everything for our concrete `our_modmult_family`.

theoremour_modmult_family_hypotheses_from_inverse

theorem our_modmult_family_hypotheses_from_inverse
    (N a ainv multBits : Nat)
    (h_N_gt_one : 1 < N)
    (h_cop_two : Nat.Coprime 2 N)
    (h_inv : a * ainv % N = 1) :
    (∀ i, (a^(2^i) % N) * (ainv^(2^i) % N) % N = 1)
    ∧ (∀ i, 0 < a^(2^i) % N)
    ∧ (∀ i, 0 < ainv^(2^i) % N)
    ∧ (∀ i j, j < multBits → 0 < (a^(2^i) % N * 2^j) % N)
    ∧ (∀ i j, j < multBits → 0 < ((N - ainv^(2^i) % N) % N * 2^j) % N)

*Deliverable A: bundled per-iterate hypothesis generator.** Given `1 < N`, `a * ainv % N = 1`, and `Nat.Coprime 2 N`, derives all 5 of the per-iterate hypotheses required by `Shor_correct_with_our_family_from_parametric`.

theoremShor_correct_with_verified_modexp

theorem Shor_correct_with_verified_modexp
    (bits N a ainv multBits m r : Nat)
    (hbits : 1 ≤ bits)
    (hN_pos : 0 < N)
    (hN : N ≤ 2^bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_multBits_pos : 0 < multBits)
    (h_N_le_pow_multBits : N ≤ 2^multBits)
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m multBits)
    (h_N_gt_one : 1 < N)
    (h_cop_two : Nat.Coprime 2 N)
    (h_inv : a * ainv % N = 1) :

*HEADLINE Deliverable B: Clean final theorem for the verified modular-exponentiation family.** The minimal-assumption form of the end-to-end Shor success-probability bound for our concrete in-place modular multiplier construction. Mathematical assumptions (genuinely necessary): - `1 < N` — non-trivial Shor instance. - `Nat.Coprime 2 N` — N is odd (required so that `2^j` is coprime to N for the per-bit constant positivity). - `a * ainv % N = 1` — the modular inverse relation. From this, `Nat.Coprime a N` and `Nat.Coprime ainv N` are derived internally. - `BasicSetting a r N m multBits` — the standard Shor order + log2 bounds. Structural sizing assumptions: - `1 ≤ bits`, `multBits ≤ bits + 1`, `0 < multBits`. - `N ≤ 2^bits`, `N ≤ 2^multBits`.

theoremour_modmult_family_anc_strictly_exceeds_sqir

theorem our_modmult_family_anc_strictly_exceeds_sqir (n : Nat) :
    FormalRV.SQIRPort.modmult_rev_anc n + (n + 5)
    = adder_n_qubits (n + 1) + 1

*Deliverable A: explicit ancilla count mismatch.** For all `n ≥ 0`, our family's ancilla budget `adder_n_qubits (n + 1) + 1 = 3n + 6` is strictly greater than SQIR's `modmult_rev_anc n = 2n + 1`. Difference: `n + 5 ≥ 5` ancillas.

theoremour_modmult_family_dim_strictly_exceeds_sqir

theorem our_modmult_family_dim_strictly_exceeds_sqir (n : Nat) :
    n + FormalRV.SQIRPort.modmult_rev_anc n + (n + 5)
    = n + (adder_n_qubits (n + 1) + 1)

*Total-dimension mismatch.** With `bits = multBits = n`, our total dimension `n + (adder_n_qubits (n + 1) + 1) = 4n + 6` exceeds SQIR's `n + modmult_rev_anc n = 3n + 1` by `n + 5`.

theoremsqir_anc_ne_our_anc

theorem sqir_anc_ne_our_anc (n : Nat) (h_n_pos : 0 < n) :
    n + FormalRV.SQIRPort.modmult_rev_anc n
    ≠ n + (adder_n_qubits (n + 1) + 1)

*No `BaseUCom` of one dimension can inhabit another.** Type nonequality at the dimension level: `BaseUCom (3n + 1)` and `BaseUCom (4n + 6)` are DIFFERENT TYPES. This is the formal obstacle that PREVENTS pointing `f_modmult_circuit` (return type `BaseUCom (n + modmult_rev_anc n) = BaseUCom (3n + 1)`) at our gate (return type `BaseUCom (n + (adder_n_qubits (n + 1) + 1)) = BaseUCom (4n + 6)`).

theoremsqir_axiom_closure_obstruction

theorem sqir_axiom_closure_obstruction
    (a ainv N n : Nat) (_h_a_lt : a < N) (_h_ainv_lt : ainv < N)
    (_h_inv : a * ainv % N = 1) (h_n_pos : 0 < n) :
    -- SQIR's expected oracle type:
    let sqir_dim

*Closure obstruction theorem (Deliverable C as a Lean statement).** Composite documentation theorem stating three facts about the closure of the original SQIR axioms: 1. SQIR's expected oracle type is `BaseUCom (n + modmult_rev_anc n) = BaseUCom (3n + 1)`. 2. Our family's oracle type is `BaseUCom (n + (adder_n_qubits (n + 1) + 1)) = BaseUCom (4n + 6)`. 3. These types are not equal for any `n ≥ 1` (witnessed by `n + 5` strictly positive ancilla excess). Combined effect: any further closure of `f_modmult_circuit`, `f_modmult_circuit_MMI`, `f_modmult_circuit_uc_well_typed` must construct a new oracle family at the EXACT SQIR type, not embed our family.

theoremmodexpOracleFamily_ModMulImpl

theorem modexpOracleFamily_ModMulImpl (n anc N a ainv : Nat) (gate : Nat → Nat → Gate)
    (h : ∀ i, FormalRV.SQIRPort.MultiplyCircuitProperty (a ^ (2 ^ i) % N) N n anc
                (Gate.toUCom (n + anc) (gate (a ^ (2 ^ i) % N) (ainv ^ (2 ^ i) % N)))) :
    FormalRV.SQIRPort.ModMulImpl a N n anc (modexpOracleFamily (n + anc) N a ainv gate)

*Generic: any verified per-constant modmult yields a `ModMulImpl`.** If each iterate's gate satisfies `MultiplyCircuitProperty` for the reduced constant `a^(2^i) mod N`, the squared-power family is a valid Shor modexp oracle, at ANY ancilla count `anc`.

theoremmodexpFamilyMCP_ModMulImpl

theorem modexpFamilyMCP_ModMulImpl (bits N a ainv : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2 ^ bits) (hN2 : 2 * N ≤ 2 ^ bits)
    (h_inv_pow : ∀ i, (a ^ (2 ^ i) % N) * (ainv ^ (2 ^ i) % N) % N = 1) :
    FormalRV.SQIRPort.ModMulImpl a N bits (sqir_modmult_rev_anc bits)
      (modexpFamilyMCP bits N a ainv)

*`modmult_MCP_gate` is a valid modexp oracle.** The SQIR-layout ModMult gadget plugs into the SAME generic modexp (ancilla `sqir_modmult_rev_anc bits`); its per-iterate `MultiplyCircuitProperty` comes straight from `modmult_correct`.

theoremShor_correct_with_mcp_family

theorem Shor_correct_with_mcp_family (bits N a ainv m r : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2 ^ bits) (hN2 : 2 * N ≤ 2 ^ bits)
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m bits)
    (h_inv_pow : ∀ i, (a ^ (2 ^ i) % N) * (ainv ^ (2 ^ i) % N) % N = 1) :
    FormalRV.SQIRPort.probability_of_success a r N m bits (sqir_modmult_rev_anc bits)
        (modexpFamilyMCP bits N a ainv)
      ≥ FormalRV.SQIRPort.κ / (Nat.log2 N : ℝ) ^ 4

*Shor success with the SQIR-layout modmult.** ModExp's order-finding oracle works with `modmult_MCP_gate` exactly as with `modMultInPlaceShor`, only the ancilla count differs. (Uses the gadget's sizing `2N ≤ 2^bits`.)

FormalRV.Arithmetic.ModExp.ModExpDef

FormalRV/Arithmetic/ModExp/ModExpDef.lean

FormalRV.Arithmetic.ModExp.ModExpDef The verified modular-exponentiation ORACLE FAMILY: the squared-power chain of in-place modular multipliers (iterate i multiplies by a^(2^i) mod N). Relocated from MCPBridge so the modexp gadget lives in its own folder, built on ModMult.

defour_modmult_family

noncomputable def our_modmult_family (bits N a ainv multBits : Nat) :
    Nat → FormalRV.SQIRPort.BaseUCom
            (multBits + (adder_n_qubits (bits + 1) + 1))

The Shor-shaped modular multiplication family indexed by QPE iterate. At iterate `i`, the gate multiplies by `a^(2^i) mod N` in-place. Each per-iterate gate uses `(a^(2^i)) % N` as its base multiplier (so the constant fits in `[0, N)`) and `(ainv^(2^i)) % N` as its modular inverse (since `(a*ainv) ≡ 1 (mod N)` implies `(a*ainv)^(2^i) ≡ 1 (mod N)`, hence `(a^(2^i)) * (ainv^(2^i)) ≡ 1 (mod N)`).

theoremour_modmult_family_uc_well_typed

theorem our_modmult_family_uc_well_typed
    (bits N a ainv multBits : Nat) (hbits : 1 ≤ bits)
    (h_multBits_le : multBits ≤ bits + 1) (h_multBits_pos : 0 < multBits) :
    ∀ i, FormalRV.SQIRPort.uc_well_typed
            (our_modmult_family bits N a ainv multBits i)

*WellTyped for the squared-power family.** For every iterate `i`, the compiled `BaseUCom` is well-typed at the Shor dimension.

defmodexpOracleFamily

noncomputable def modexpOracleFamily (dim N a ainv : Nat) (gate : Nat → Nat → Gate) :
    Nat → FormalRV.SQIRPort.BaseUCom dim

Generic squared-power modexp oracle family: iterate `i` is `gate` applied to the reduced constant `a^(2^i) mod N` (with its inverse), at dimension `dim`.

defmodexpFamilyMCP

noncomputable def modexpFamilyMCP (bits N a ainv : Nat) :
    Nat → FormalRV.SQIRPort.BaseUCom (bits + sqir_modmult_rev_anc bits)

ModExp instantiated on the SQIR-layout ModMult gadget `modmult_MCP_gate` (dim `bits + sqir_modmult_rev_anc bits`).

FormalRV.Arithmetic.ModExp.ModExpResource

FormalRV/Arithmetic/ModExp/ModExpResource.lean

FormalRV.Arithmetic.ModExp.ModExpResource — EXACT T-counts of two concrete mod-exp-shaped `Gate` IR chains. The counts are exact and machine-checked; the LABELS below are carefully honest about what each chain is (a counting audit, 2026-06-03, flagged earlier overclaims). TWO chains, TWO numbers — do NOT confuse them: `shorModExp` (this section): chains the OUT-OF-PLACE `modmult_const_gate` (8·bits² Toffoli/step). T-count EXACTLY `112·bits³` (= 16·bits³ Toffoli; `16·2048³ = 137 438 953 472`). ⚠ This is a COUNTING MODEL only: an out-of-place multiplier writes `a·x` into a FRESH accumulator with no feedback, so a chain of them does NOT compute modular exponentiation. It is NOT the term the verified Shor algorithm uses. Keep it only as the per-step structural skeleton. `shorModExpVerified` (below): chains the IN-PLACE verified oracle `modmult_MCP_gate` (16·bits² Toffoli/step) — the term the verified Shor theorem actually uses. T-count EXACTLY `224·bits³` (= 32·bits³ Toffoli; `32·2048³ = 274 877 906 944` = 2× the above, the in-place forward+uncompute factor). This is the honest verified-oracle arithmetic figure. HONEST STATUS (CLAUDE.md "semantic correctness before resource counts"): the COUNTS are exact, but NEITHER chain has a proof that it computes `a^x mod N`. Each per-step multiplier is semantically verified (`const_gate`: (a·m)%N decode; `MCP`: MultiplyCircuitProperty), but the chain-realizes-modular-exponentiation theorem is NOT proved here (the verified mod-exp semantics lives in `Shor_correct_verified_no_modmult_axioms` via `controlled_powers`, a DIFFERENT BaseUCom-level term — no bridge to these Gate chains yet). So both chains are SCAFFOLDED (count-only), and the `2·bits` exponent-register multiplicity is structural. EXACT counts derived by induction (math — the 2048 circuit is never built), no `sorry`/`axiom`.

defshorModExpChain

def shorModExpChain (m bits N a : Nat) : Gate

COUNTING-MODEL chain of `m` OUT-OF-PLACE `const_gate` multipliers (step `k` multiplies by `a^(2^k)`). ⚠ Not a valid modular-exponentiation circuit (out-of-place = no feedback) and NOT the verified Shor oracle term — kept only for its per-step Toffoli structure. For the verified-oracle chain use `shorModExpVerified`.

theoremtcount_shorModExpChain

theorem tcount_shorModExpChain (m bits N a : Nat)
    (hcop : Nat.Coprime a N) (hodd : Odd N) (h1 : 1 < N) :
    tcount (shorModExpChain m bits N a) = m * (56 * bits ^ 2)

*EXACT** T-count of the `m`-fold modular-multiplier chain: `m · 56·bits²`, for any valid Shor base (`gcd(a,N)=1`, `N` odd, `N>1` — so every multiplier step is non-trivial).

defshorModExp

def shorModExp (bits N a : Nat) : Gate

COUNTING-MODEL mod-exp skeleton: `2·bits` out-of-place multipliers. ⚠ Not a valid mod-exp (no feedback) and not the verified oracle — see header; use `shorModExpVerified`.

theoremtcount_shorModExp

theorem tcount_shorModExp (bits N a : Nat)
    (hcop : Nat.Coprime a N) (hodd : Odd N) (h1 : 1 < N) :
    tcount (shorModExp bits N a) = 112 * bits ^ 3

*EXACT** T-count of the out-of-place counting-model chain `shorModExp`: `112·bits³` (= `16·bits³` Toffoli). Exact count of THIS concrete term; the term is a counting model, NOT the verified Shor circuit (header).

example(example)

example : tcount (shorModExp 4 15 7) = 112 * 4 ^ 3

defshorModExpMCPChain

def shorModExpMCPChain (m bits N a ainv : Nat) : Gate

theoremtcount_shorModExpMCPChain

theorem tcount_shorModExpMCPChain (m bits N a ainv : Nat)
    (hcop : Nat.Coprime a N) (hcopinv : Nat.Coprime ainv N)
    (hpos : 0 < ainv) (hlt : ainv < N) (hodd : Odd N) (h1 : 1 < N) :
    tcount (shorModExpMCPChain m bits N a ainv) = m * (112 * bits ^ 2)

defshorModExpVerified

def shorModExpVerified (bits N a ainv : Nat) : Gate

Mod-exp-shaped chain of `2·bits` VERIFIED in-place MCP oracles — the honest arithmetic figure (each step is the term the verified Shor theorem uses). ⚠ Count-only/SCAFFOLDED: no proof yet that the chain computes `a^x mod N` (header); `2·bits` is structural.

theoremtcount_shorModExpVerified

theorem tcount_shorModExpVerified (bits N a ainv : Nat)
    (hcop : Nat.Coprime a N) (hcopinv : Nat.Coprime ainv N)
    (hpos : 0 < ainv) (hlt : ainv < N) (hodd : Odd N) (h1 : 1 < N) :
    tcount (shorModExpVerified bits N a ainv) = 224 * bits ^ 3

*EXACT** T-count of the verified-oracle chain `shorModExpVerified`: `224·bits³` (= `32·bits³` Toffoli; twice `shorModExp`, the in-place forward+uncompute factor). This is the count on the verified-oracle building block — count-only (mod-exp semantics not proved for the chain; see header).

example(example)

example : tcount (shorModExpVerified 2 15 7 13) = 224 * 2 ^ 3

FormalRV.Arithmetic.ModMult

FormalRV/Arithmetic/ModMult.lean

(no documented top-level declarations)

FormalRV.Arithmetic.ModMult.Internal.AccumulatorRange

FormalRV/Arithmetic/ModMult/Internal/AccumulatorRange.lean

(no documented top-level declarations)

FormalRV.Arithmetic.ModMult.Internal.AccumulatorRange.CleanBundleAndAdapter

FormalRV/Arithmetic/ModMult/Internal/AccumulatorRange/CleanBundleAndAdapter.lean

## Tick 77 — Task 6: Clean workspace bundle for in-place wrapper.

theoremmodmult_inplace_candidate_target_decode

theorem modmult_inplace_candidate_target_decode
    (bits N a ainv x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (hx : x < N)
    (h_inv : (a * ainv) % N = 1) :
    cuccaro_target_val bits 2
        (Gate.applyNat (modmult_inplace_candidate bits N a ainv) (modmult_input_F bits x 0))
      = 0

*In-place modular multiplier candidate, target decoded.**

theoremmodmult_inplace_candidate_mult_bit

theorem modmult_inplace_candidate_mult_bit
    (bits N a ainv x k : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (hx : x < N)
    (h_inv : (a * ainv) % N = 1) (hk : k < bits) :
    Gate.applyNat (modmult_inplace_candidate bits N a ainv)
        (modmult_input_F bits x 0) (mult_control_idx bits k)
      = ((a * x) % N).testBit k

*In-place modular multiplier candidate, multiplier register decoded to `(a*x) % N`.**

theoremmodmult_inplace_candidate_clean

theorem modmult_inplace_candidate_clean
    (bits N a ainv x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (hx : x < N)
    (h_inv : (a * ainv) % N = 1) :
    cuccaro_target_val bits 2
        (Gate.applyNat (modmult_inplace_candidate bits N a ainv)
          (modmult_input_F bits x 0)) = 0
    ∧ cuccaro_read_val bits 2
        (Gate.applyNat (modmult_inplace_candidate bits N a ainv)
          (modmult_input_F bits x 0)) = 0
    ∧ Gate.applyNat (modmult_inplace_candidate bits N a ainv)

*In-place modular multiplier — clean bundle.**

theoremmult_input_F_shifted_below_bits

theorem mult_input_F_shifted_below_bits
    (bits x acc q : Nat) (hq : q < bits) :
    mult_input_F_shifted bits x acc q = false

theoremmult_input_F_shifted_above_bits

theorem mult_input_F_shifted_above_bits
    (bits x acc q : Nat) (hq : bits ≤ q) :
    mult_input_F_shifted bits x acc q
      = modmult_input_F bits x acc (q - bits)

theoremmult_input_F_shifted_at_shifted_control_bit

theorem mult_input_F_shifted_at_shifted_control_bit
    (bits x acc k : Nat) (hk : k < bits) :
    mult_input_F_shifted bits x acc (bits + mult_control_idx bits k)
      = x.testBit k

theoremGate.shift_seq

theorem Gate.shift_seq (off : Nat) (g h : Gate) :
    Gate.shift off (Gate.seq g h)
      = Gate.seq (Gate.shift off g) (Gate.shift off h)

theoremGate.applyNat_shift_at_lo

theorem Gate.applyNat_shift_at_lo
    (off : Nat) (g : Gate) (f : Nat → Bool) (q : Nat) (hq : q < off) :
    Gate.applyNat (Gate.shift off g) f q = f q

*At positions below `off`, a shifted gate acts as identity.**

theoremGate.applyNat_shift_at_hi

theorem Gate.applyNat_shift_at_hi
    (off : Nat) (g : Gate) (f : Nat → Bool) (q : Nat) (hq : off ≤ q) :
    Gate.applyNat (Gate.shift off g) f q
      = Gate.applyNat g (fun r => f (off + r)) (q - off)

*At positions ≥ `off`, a shifted gate acts as the original gate on the function `r ↦ f (off + r)`.**

theoremGate.shift_wellTyped

theorem Gate.shift_wellTyped
    {off dim : Nat} {g : Gate} (h : Gate.WellTyped dim g) :
    Gate.WellTyped (off + dim) (Gate.shift off g)

*Gate.shift is WellTyped at the larger dimension.**

theoremencode_to_mult_adapter_disjoint

theorem encode_to_mult_adapter_disjoint (bits : Nat) :
    0 + bits ≤ bits + mult_control_idx bits 0
      ∨ bits + mult_control_idx bits 0 + bits ≤ 0

Disjointness of swap ranges (used in `reverse_register_swap` lemmas).

theoremencode_to_mult_adapter_wellTyped

theorem encode_to_mult_adapter_wellTyped
    (bits : Nat) (hbits : 1 ≤ bits) :
    Gate.WellTyped (modmult_total_dim bits) (encode_to_mult_adapter bits)

theoremcuccaro_input_F_zero_at_workspace

theorem cuccaro_input_F_zero_at_workspace
    (q : Nat) (hq : q < 2 + 2 * (0 : Nat) + 1 ∨ True) :
    cuccaro_input_F 2 false 0 0 q = false

Helper: workspace value of `cuccaro_input_F 2 false 0 0` is always false.

theoremencode_to_mult_adapter_correct

theorem encode_to_mult_adapter_correct
    (bits x : Nat) (hbits : 1 ≤ bits) (hx : x < 2^bits) :
    Gate.applyNat (encode_to_mult_adapter bits)
        (encodeDataZeroAnc bits (sqir_modmult_rev_anc bits) x)
      = mult_input_F_shifted bits x 0

*Adapter correctness: `encodeDataZeroAnc → mult_input_F_shifted`.**

theoremreverse_register_swap_involution_general

theorem reverse_register_swap_involution_general
    (n offsetA offsetB : Nat)
    (h_disjoint : offsetA + n ≤ offsetB ∨ offsetB + n ≤ offsetA)
    (f : Nat → Bool) :
    Gate.applyNat (reverse_register_swap n offsetA offsetB)
      (Gate.applyNat (reverse_register_swap n offsetA offsetB) f)
    = f

*General reverse-register-swap involution.** Applying `reverse_register_swap n offsetA offsetB` twice yields identity, given disjoint ranges.

theoremencode_to_mult_adapter_involution

theorem encode_to_mult_adapter_involution
    (bits : Nat) (f : Nat → Bool) :
    Gate.applyNat (encode_to_mult_adapter bits)
      (Gate.applyNat (encode_to_mult_adapter bits) f) = f

*Adapter is self-inverse.**

theoremencode_to_mult_adapter_reverse

theorem encode_to_mult_adapter_reverse
    (bits y : Nat) (hbits : 1 ≤ bits) (hy : y < 2^bits) :
    Gate.applyNat (encode_to_mult_adapter bits)
        (mult_input_F_shifted bits y 0)
      = encodeDataZeroAnc bits (sqir_modmult_rev_anc bits) y

*Adapter reverse direction: `mult_input_F_shifted → encodeDataZeroAnc`.**

theoremmodmult_inplace_shifted_correct

theorem modmult_inplace_shifted_correct
    (bits N a ainv x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (hx : x < N) (h_inv : (a * ainv) % N = 1) :
    Gate.applyNat (modmult_inplace_shifted bits N a ainv)
        (mult_input_F_shifted bits x 0)
      = mult_input_F_shifted bits ((a * x) % N) 0

*Shifted in-place multiplier correctness.**

theoremmodmult_inplace_shifted_wellTyped

theorem modmult_inplace_shifted_wellTyped
    (bits N a ainv : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits) :
    Gate.WellTyped (modmult_total_dim bits) (modmult_inplace_shifted bits N a ainv)

theoremmodmult_MCP_gate_apply_encode

theorem modmult_MCP_gate_apply_encode
    (bits N a ainv x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (hx : x < N) (h_inv : (a * ainv) % N = 1) :
    Gate.applyNat (modmult_MCP_gate bits N a ainv)
        (encodeDataZeroAnc bits (sqir_modmult_rev_anc bits) x)
      = encodeDataZeroAnc bits (sqir_modmult_rev_anc bits) ((a * x) % N)

*MCP-layout gate apply theorem.** The composed gate maps `encodeDataZeroAnc bits anc x` to `encodeDataZeroAnc bits anc ((a*x) % N)`.

theoremmodmult_MCP_gate_wellTyped

theorem modmult_MCP_gate_wellTyped
    (bits N a ainv : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits) :
    Gate.WellTyped (modmult_total_dim bits) (modmult_MCP_gate bits N a ainv)

*MCP-layout gate WellTyped.**

theoremmodmult_MCP_gate_satisfies_MultiplyCircuitProperty

theorem modmult_MCP_gate_satisfies_MultiplyCircuitProperty
    (bits N a ainv : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (h_inv : (a * ainv) % N = 1) :
    FormalRV.SQIRPort.MultiplyCircuitProperty a N bits (sqir_modmult_rev_anc bits)
      (Gate.toUCom (modmult_total_dim bits) (modmult_MCP_gate bits N a ainv))

*HEADLINE: MCP-layout gate satisfies `MultiplyCircuitProperty`.**

FormalRV.Arithmetic.ModMult.Internal.AccumulatorRange.InverseAndInPlace

FormalRV/Arithmetic/ModMult/Internal/AccumulatorRange/InverseAndInPlace.lean

## Tick 77 — Task 4: Modular inverse arithmetic.

theoremmodmult_inverse_clear_arith

theorem modmult_inverse_clear_arith
    (N a ainv x : Nat) (hN_pos : 0 < N) (hx : x < N) (h_ainv_le : ainv ≤ N)
    (h_inv : (a * ainv) % N = 1) :
    (x + ((N - ainv) % N) * ((a * x) % N)) % N = 0

*Modular inverse clear arithmetic.** If `(a * ainv) % N = 1`, then `(x + ((N - ainv) % N) * ((a * x) % N)) % N = 0`.

theoremmodmult_inplace_candidate_state_eq

theorem modmult_inplace_candidate_state_eq
    (bits N a ainv x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (hx : x < N)
    (h_inv : (a * ainv) % N = 1) :
    Gate.applyNat (modmult_inplace_candidate bits N a ainv) (modmult_input_F bits x 0)
      = modmult_input_F bits ((a * x) % N) 0

*In-place modular multiplier candidate target theorem.** After applying the in-place wrapper to `(x, 0)`, the resulting state is `((a*x) % N, 0)` — i.e., the original "multiplier" register now holds the product, and the accumulator is cleared.

theoremmodmult_inplace_candidate_state_eq_qstart

theorem modmult_inplace_candidate_state_eq_qstart
    (bits q_start N a ainv x flagPos dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (hx : x < N)
    (h_inv : (a * ainv) % N = 1)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    Gate.applyNat
        (modmult_inplace_candidate_qstart bits q_start N a ainv flagPos)
        (mult_input_F_qstart bits q_start x 0)
      = mult_input_F_qstart bits q_start ((a * x) % N) 0

q_start port of `modmult_inplace_candidate_state_eq`. After applying the q_start in-place wrapper to `(x, 0)`, the resulting state is `((a*x) % N, 0)` — the original "multiplier" register now holds the product, and the accumulator is cleared.

theoremmodmult_inplace_candidate_target_decode_qstart

theorem modmult_inplace_candidate_target_decode_qstart
    (bits q_start N a ainv x flagPos dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (hx : x < N)
    (h_inv : (a * ainv) % N = 1)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    cuccaro_target_val bits q_start
        (Gate.applyNat
          (modmult_inplace_candidate_qstart bits q_start N a ainv flagPos)
          (mult_input_F_qstart bits q_start x 0))

q_start port of `modmult_inplace_candidate_target_decode` (line 3708): after the in-place wrapper, the decoded target value is `0`.

theoremmodmult_inplace_candidate_mult_bit_qstart

theorem modmult_inplace_candidate_mult_bit_qstart
    (bits q_start N a ainv x k flagPos dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (hx : x < N)
    (h_inv : (a * ainv) % N = 1) (hk : k < bits)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    Gate.applyNat
        (modmult_inplace_candidate_qstart bits q_start N a ainv flagPos)
        (mult_input_F_qstart bits q_start x 0)
        (mult_control_idx_qstart bits q_start k)

q_start port of `modmult_inplace_candidate_mult_bit` (line 3721): the multiplier register decodes bit-by-bit to `((a*x) % N).testBit k`.

theoremmodmult_inplace_candidate_clean_qstart

theorem modmult_inplace_candidate_clean_qstart
    (bits q_start N a ainv x flagPos dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (h_ainv_le : ainv ≤ N) (hx : x < N)
    (h_inv : (a * ainv) % N = 1)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    cuccaro_target_val bits q_start
        (Gate.applyNat
          (modmult_inplace_candidate_qstart bits q_start N a ainv flagPos)
          (mult_input_F_qstart bits q_start x 0)) = 0

q_start port of `modmult_inplace_candidate_clean` (line 3733). The clean bundle restating the in-place state-eq pointwise: `cuccaro_target_val = 0`; `cuccaro_read_val = 0`; every position below `q_start` is `false` (q_start generalisation of the old `flag_0`/`flag_1` conjuncts at positions 0 and 1); top-carry position `q_start + 2*bits` is `false`; multiplier-bit decoding at every `mult_control_idx_qstart bits q_start k` equals `((a*x) % N).testBit k`.

FormalRV.Arithmetic.ModMult.Internal.AccumulatorRange.SwapConcrete

FormalRV/Arithmetic/ModMult/Internal/AccumulatorRange/SwapConcrete.lean

## Per-position behavior of `modmult_swap_acc_mult_aux`.

theoremmodmult_swap_acc_mult_aux_at_mult_out_range

theorem modmult_swap_acc_mult_aux_at_mult_out_range
    (bits k i : Nat) (hk : k ≤ bits) (hi_ge : k ≤ i) (hi_bits : i < bits) (f : Nat → Bool) :
    Gate.applyNat (modmult_swap_acc_mult_aux bits k) f (mult_control_idx bits i)
      = f (mult_control_idx bits i)

*At a multiplier bit `i ≥ k`, swap output = input.**

theoremmodmult_swap_acc_mult_aux_at_target_out_range

theorem modmult_swap_acc_mult_aux_at_target_out_range
    (bits k i : Nat) (hk : k ≤ bits) (hi_ge : k ≤ i) (hi_bits : i < bits) (f : Nat → Bool) :
    Gate.applyNat (modmult_swap_acc_mult_aux bits k) f (modmult_target_idx i)
      = f (modmult_target_idx i)

*At an accumulator bit `i ≥ k`, swap output = input.**

theoremmodmult_swap_acc_mult_aux_at_target_in_range

theorem modmult_swap_acc_mult_aux_at_target_in_range
    (bits k i : Nat) (hk : k ≤ bits) (hi_k : i < k) (f : Nat → Bool) :
    Gate.applyNat (modmult_swap_acc_mult_aux bits k) f (modmult_target_idx i)
      = f (mult_control_idx bits i)

*At an accumulator bit `i < k`, swap output = input at the matched multiplier position.**

theoremmodmult_swap_acc_mult_aux_at_mult_in_range

theorem modmult_swap_acc_mult_aux_at_mult_in_range
    (bits k i : Nat) (hk : k ≤ bits) (hi_k : i < k) (f : Nat → Bool) :
    Gate.applyNat (modmult_swap_acc_mult_aux bits k) f (mult_control_idx bits i)
      = f (modmult_target_idx i)

*At a multiplier bit `i < k`, swap output = input at matched target.**

theoremmodmult_swap_acc_mult_aux_at_other

theorem modmult_swap_acc_mult_aux_at_other
    (bits k q : Nat) (hk : k ≤ bits) (f : Nat → Bool)
    (h_q_not_target : ∀ i, i < k → q ≠ modmult_target_idx i)
    (h_q_not_mult : ∀ i, i < k → q ≠ mult_control_idx bits i) :
    Gate.applyNat (modmult_swap_acc_mult_aux bits k) f q = f q

*At any position outside the swap range, output = input.**

theoremmodmult_target_idx_value

theorem modmult_target_idx_value (i : Nat) :
    modmult_target_idx i = 2 + 2 * i + 1

*Sanity helper:** `modmult_target_idx i = 2 + 2*i + 1`.

theoremmodmult_swap_acc_mult_apply

theorem modmult_swap_acc_mult_apply
    (bits m acc : Nat) (hbits : 1 ≤ bits)
    (hm : m < 2^bits) (hacc : acc < 2^bits) :
    Gate.applyNat (modmult_swap_acc_mult bits) (modmult_input_F bits m acc)
      = modmult_input_F bits acc m

*Full SWAP correctness on `modmult_input_F`.**

FormalRV.Arithmetic.ModMult.Internal.AccumulatorRange.SwapQStart

FormalRV/Arithmetic/ModMult/Internal/AccumulatorRange/SwapQStart.lean

theoremmodmult_swap_acc_mult_at_target_out_range_qstart

theorem modmult_swap_acc_mult_at_target_out_range_qstart
    (bits q_start k i : Nat) (hk : k ≤ bits) (hi_ge : k ≤ i) (hi_bits : i < bits)
    (f : Nat → Bool) :
    Gate.applyNat (modmult_swap_acc_mult_aux_qstart bits q_start k) f
        (modmult_target_idx_qstart q_start i)
      = f (modmult_target_idx_qstart q_start i)

q_start port of `modmult_swap_acc_mult_aux_at_target_out_range` (line 3118). At an accumulator bit `i ≥ k`, swap output = input.

theoremmodmult_swap_acc_mult_at_target_in_range_qstart

theorem modmult_swap_acc_mult_at_target_in_range_qstart
    (bits q_start k i : Nat) (hk : k ≤ bits) (hi_k : i < k) (f : Nat → Bool) :
    Gate.applyNat (modmult_swap_acc_mult_aux_qstart bits q_start k) f
        (modmult_target_idx_qstart q_start i)
      = f (mult_control_idx_qstart bits q_start i)

q_start port of `modmult_swap_acc_mult_aux_at_target_in_range` (line 3139). At an accumulator bit `i < k`, swap output = input at the matched multiplier position.

theoremmodmult_swap_acc_mult_at_mult_in_range_qstart

theorem modmult_swap_acc_mult_at_mult_in_range_qstart
    (bits q_start k i : Nat) (hk : k ≤ bits) (hi_k : i < k) (f : Nat → Bool) :
    Gate.applyNat (modmult_swap_acc_mult_aux_qstart bits q_start k) f
        (mult_control_idx_qstart bits q_start i)
      = f (modmult_target_idx_qstart q_start i)

q_start port of `modmult_swap_acc_mult_aux_at_mult_in_range` (line 3164). At a multiplier bit `i < k`, swap output = input at matched target.

theoremmodmult_swap_acc_mult_at_other_qstart

theorem modmult_swap_acc_mult_at_other_qstart
    (bits q_start k q : Nat) (hk : k ≤ bits) (f : Nat → Bool)
    (h_q_not_target : ∀ i, i < k → q ≠ modmult_target_idx_qstart q_start i)
    (h_q_not_mult : ∀ i, i < k → q ≠ mult_control_idx_qstart bits q_start i) :
    Gate.applyNat (modmult_swap_acc_mult_aux_qstart bits q_start k) f q = f q

q_start port of `modmult_swap_acc_mult_aux_at_other` (line 3189). At any position outside the swap range, output = input.

theoremmodmult_swap_acc_mult_apply_qstart

theorem modmult_swap_acc_mult_apply_qstart
    (bits q_start m acc : Nat) (hbits : 1 ≤ bits)
    (hm : m < 2^bits) (hacc : acc < 2^bits) :
    Gate.applyNat (modmult_swap_acc_mult_qstart bits q_start)
        (mult_input_F_qstart bits q_start m acc)
      = mult_input_F_qstart bits q_start acc m

q_start port of `modmult_swap_acc_mult_apply` (line 3215). Full SWAP correctness on `mult_input_F_qstart`.

theoremmodmult_target_idx_ne_mult_control_idx

theorem modmult_target_idx_ne_mult_control_idx
    (bits i j : Nat) (hi : i < bits) :
    modmult_target_idx i ≠ mult_control_idx bits j

theoremmodmult_swap_acc_mult_aux_succ_eq

theorem modmult_swap_acc_mult_aux_succ_eq (bits k : Nat) :
    modmult_swap_acc_mult_aux bits (k + 1)
      = Gate.seq (modmult_swap_acc_mult_aux bits k)
          (qubit_swap (modmult_target_idx k) (mult_control_idx bits k))

theoremmodmult_swap_acc_mult_aux_wellTyped

theorem modmult_swap_acc_mult_aux_wellTyped
    (bits k : Nat) (hbits : 1 ≤ bits) (hk : k ≤ bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits) (modmult_swap_acc_mult_aux bits k)

*WellTyped for `modmult_swap_acc_mult_aux`.**

theoremmodmult_swap_acc_mult_wellTyped

theorem modmult_swap_acc_mult_wellTyped
    (bits : Nat) (hbits : 1 ≤ bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits) (modmult_swap_acc_mult bits)

FormalRV.Arithmetic.ModMult.Internal.BitPositioning

FormalRV/Arithmetic/ModMult/Internal/BitPositioning.lean

(no documented top-level declarations)

FormalRV.Arithmetic.ModMult.Internal.BitPositioning.ControlIdxAndCommute

FormalRV/Arithmetic/ModMult/Internal/BitPositioning/ControlIdxAndCommute.lean

theoremmult_control_idx_outside_modadd_workspace

theorem mult_control_idx_outside_modadd_workspace
    (bits j : Nat) :
    mult_control_idx bits j < 2
      ∨ 2 + (2 * bits + 1) ≤ mult_control_idx bits j

theoremmult_control_idx_ne_flag

theorem mult_control_idx_ne_flag
    (bits j : Nat) :
    mult_control_idx bits j ≠ 1

theoremmult_control_idx_ne_top_carry

theorem mult_control_idx_ne_top_carry
    (bits j : Nat) :
    mult_control_idx bits j ≠ 2 + 2 * bits

theoremmult_control_idx_lt_sqir_dim

theorem mult_control_idx_lt_sqir_dim
    (bits j : Nat) (hj : j < bits) :
    mult_control_idx bits j < sqir_modmult_rev_anc bits

theoremmult_control_idx_outside_modadd_workspace_form

theorem mult_control_idx_outside_modadd_workspace_form
    (bits j : Nat) :
    mult_control_idx bits j < 2
      ∨ 2 + 2 * bits + 1 ≤ mult_control_idx bits j

theoremmult_control_idx_injective

theorem mult_control_idx_injective
    (bits j j' : Nat) (h : mult_control_idx bits j = mult_control_idx bits j') :
    j = j'

Distinct multiplier bits map to distinct positions.

theoremmult_input_control_bit

theorem mult_input_control_bit
    (bits m acc j : Nat) (hj : j < bits) :
    modmult_input_F bits m acc (mult_control_idx bits j) = m.testBit j

*Multiplier bit at `mult_control_idx bits j` is `m.testBit j`.**

theoremmult_input_target_decode

theorem mult_input_target_decode
    (bits m acc : Nat) (hacc : acc < 2 ^ bits) :
    cuccaro_target_val bits 2 (modmult_input_F bits m acc) = acc

*Decoded target register equals `acc` (for `acc < 2^bits`).**

theoremmult_input_read_decode

theorem mult_input_read_decode
    (bits m acc : Nat) :
    cuccaro_read_val bits 2 (modmult_input_F bits m acc) = 0

*Decoded read register is 0.**

theoremmult_input_flag_0_false

theorem mult_input_flag_0_false
    (bits m acc : Nat) :
    modmult_input_F bits m acc 0 = false

*Flag bits are false.**

theoremmult_input_flag_1_false

theorem mult_input_flag_1_false
    (bits m acc : Nat) :
    modmult_input_F bits m acc 1 = false

theoremmult_input_top_carry_false

theorem mult_input_top_carry_false
    (bits m acc : Nat) (hbits : 1 ≤ bits) :
    modmult_input_F bits m acc (2 + 2 * bits) = false

*Top carry is false (when bits ≥ 1).** The top carry position `2 + 2*bits = 2 + 2*(bits-1) + 2` is the highest read register bit in the Cuccaro encoding with `a = 0`.

theoremmodmult_acc_spec_zero

theorem modmult_acc_spec_zero (N a m : Nat) :
    modmult_acc_spec N a m 0 = 0

theoremmodmult_acc_spec_succ_false

theorem modmult_acc_spec_succ_false
    (N a m k : Nat) (h : m.testBit k = false) :
    modmult_acc_spec N a m (k + 1) = modmult_acc_spec N a m k

theoremmodmult_acc_spec_succ_true

theorem modmult_acc_spec_succ_true
    (N a m k : Nat) (h : m.testBit k = true) :
    modmult_acc_spec N a m (k + 1)
      = (modmult_acc_spec N a m k + (a * 2 ^ k) % N) % N

theoremmodmult_acc_spec_lt

theorem modmult_acc_spec_lt (N a m k : Nat) (hN_pos : 0 < N) :
    modmult_acc_spec N a m k < N

For `0 < N`, the accumulator after any prefix is in `[0, N)`.

theoremmodmult_prefix_gate_zero_eq_I

theorem modmult_prefix_gate_zero_eq_I
    (bits N a : Nat) :
    modmult_prefix_gate bits N a 0 = Gate.I

theoremmodmult_prefix_gate_succ_eq

theorem modmult_prefix_gate_succ_eq
    (bits N a k : Nat) :
    modmult_prefix_gate bits N a (k + 1)
      = seq (modmult_prefix_gate bits N a k) (modmult_step_gate bits N a k)

theoremcontrolledCompareConst_commute_update_outside_fun

theorem controlledCompareConst_commute_update_outside_fun
    (bits q_start c controlIdx flagPos updateIdx : Nat) (v : Bool) (f : Nat → Bool)
    (hupdate_out : updateIdx < q_start ∨ q_start + 2 * bits + 1 ≤ updateIdx)
    (hupdate_ne_flag : updateIdx ≠ flagPos)
    (hupdate_ne_ctrl : updateIdx ≠ controlIdx) :
    Gate.applyNat (sqir_controlledCompareConst bits q_start c controlIdx flagPos)
        (update f updateIdx v)
      = update (Gate.applyNat (sqir_controlledCompareConst bits q_start c controlIdx flagPos) f)
              updateIdx v

*Controlled compareConst commutes with update at outside position distinct from the inner controlIdx and flagPos.**

theoremstyle_controlledModAddConst_gate_commute_update_outside_fun

theorem style_controlledModAddConst_gate_commute_update_outside_fun
    (bits N c controlIdx updateIdx : Nat) (v : Bool) (f : Nat → Bool)
    (hupdate_out : updateIdx < 2 ∨ 2 + (2 * bits + 1) ≤ updateIdx)
    (hupdate_ne_flag : updateIdx ≠ 1)
    (hupdate_ne_control : updateIdx ≠ controlIdx) :
    Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c controlIdx 1)
        (update f updateIdx v)
      = update (Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c controlIdx 1) f)
              updateIdx v

*Deliverable A — controlled modular add-constant gate commutes with `update` at outside positions (distinct from flag and controlIdx).**

theoremmult_control_idx_outside_modadd_workspace_form_qstart

theorem mult_control_idx_outside_modadd_workspace_form_qstart
    (bits q_start j : Nat) :
    mult_control_idx_qstart bits q_start j < q_start
      ∨ q_start + 2 * bits + 1 ≤ mult_control_idx_qstart bits q_start j

The j-th multiplier bit lies above the shifted Cuccaro workspace `[q_start, q_start + 2 * bits + 1)`. Port of `mult_control_idx_outside_modadd_workspace_form` (line 63).

theoremmult_control_idx_ne_flag_qstart

theorem mult_control_idx_ne_flag_qstart
    (bits q_start j flagPos : Nat) (h_flag_lt : flagPos < q_start) :
    mult_control_idx_qstart bits q_start j ≠ flagPos

The j-th multiplier bit is distinct from any chosen `flagPos` strictly below the shifted workspace. Port of `mult_control_idx_ne_flag` (line 45).

theoremmult_control_idx_lt_dim_qstart

theorem mult_control_idx_lt_dim_qstart
    (bits q_start j dim : Nat) (hj : j < bits)
    (h_dim : q_start + (2 * bits + 1) + bits ≤ dim) :
    mult_control_idx_qstart bits q_start j < dim

The j-th multiplier bit fits in a dimension that covers the workspace plus the multiplier register. Port of `mult_control_idx_lt_sqir_dim` (line 57) generalised to a free `dim` parameter.

theoremmult_control_idx_injective_qstart

theorem mult_control_idx_injective_qstart
    (bits q_start j j' : Nat)
    (h : mult_control_idx_qstart bits q_start j
          = mult_control_idx_qstart bits q_start j') :
    j = j'

Distinct multiplier bits map to distinct positions. Port of `mult_control_idx_injective` (line 72).

theoremmult_input_control_bit_qstart

theorem mult_input_control_bit_qstart
    (bits q_start m acc j : Nat) (hj : j < bits) :
    mult_input_F_qstart bits q_start m acc
        (mult_control_idx_qstart bits q_start j) = m.testBit j

The multiplier bit at `mult_control_idx_qstart bits q_start j` is `m.testBit j`. Port of `mult_input_control_bit` (line 103).

theoremstyle_controlledModAddConst_gate_commute_update_outside_fun_qstart

theorem style_controlledModAddConst_gate_commute_update_outside_fun_qstart
    (bits q_start N c controlIdx flagPos updateIdx : Nat) (v : Bool) (f : Nat → Bool)
    (hupdate_out : updateIdx < q_start ∨ q_start + 2 * bits + 1 ≤ updateIdx)
    (hupdate_ne_flag : updateIdx ≠ flagPos)
    (hupdate_ne_control : updateIdx ≠ controlIdx) :
    Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c controlIdx flagPos)
        (update f updateIdx v)
      = update (Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
                                  controlIdx flagPos) f)
              updateIdx v

q_start-parametric commute helper for the controlled mod-add gate. Port of `style_controlledModAddConst_gate_commute_update_outside_fun` (line 276): the gate commutes with an `update` at any position outside its workspace and distinct from both `controlIdx` and `flagPos`. All sub-helpers are already q_start-parametric: - `sqir_conditionalAddConstGate_commute_update_outside_fun` (CuccaroSQIRDirtyFlag.lean:3157); - `sqir_style_compareConst_candidate_commute_update_outside_fun` (:3132); - `sqir_conditionalSubConstGate_commute_update_outside_fun` (:3174); - `controlledCompareConst_commute_update_outside_fun` (this file:249); - `Gate.applyNat_CX_commute_update_outside_fun` (CuccaroSQIRDirtyFlag.lean:3039).

theoreminstall_mult_bits_skip_j_at_workspace_eq

theorem install_mult_bits_skip_j_at_workspace_eq
    (bits m j num_bits : Nat) (f : Nat → Bool) (q : Nat)
    (hq : q < 2 + 2 * bits + 1) :
    install_mult_bits_skip_j bits m j num_bits f q = f q

*`install_mult_bits_skip_j` at outside workspace.** At any position `q < 2 + 2 * bits + 1`, the installs don't touch `q` (they update only at multiplier positions).

theoreminstall_mult_bits_skip_j_at_mult_k_eq

theorem install_mult_bits_skip_j_at_mult_k_eq
    (bits m j num_bits k : Nat) (f : Nat → Bool)
    (h_k_lt : k < num_bits) (h_k_ne_j : k ≠ j) :
    install_mult_bits_skip_j bits m j num_bits f (mult_control_idx bits k)
      = m.testBit k

*`install_mult_bits_skip_j` at multiplier position `k`** (`k < num_bits`, `k ≠ j`): installs `m.testBit k`.

theoreminstall_mult_bits_skip_j_at_j_eq

theorem install_mult_bits_skip_j_at_j_eq
    (bits m j num_bits : Nat) (f : Nat → Bool) :
    install_mult_bits_skip_j bits m j num_bits f (mult_control_idx bits j)
      = f (mult_control_idx bits j)

*`install_mult_bits_skip_j` at the skipped position `j`.** Installs never touch position `controlIdx_j` (always skipped), so the install returns `f (controlIdx_j)`.

theoreminstall_mult_bits_skip_j_at_above_eq

theorem install_mult_bits_skip_j_at_above_eq
    (bits m j num_bits : Nat) (h_num_le : num_bits ≤ bits) (f : Nat → Bool) (q : Nat)
    (hq : q ≥ 2 + 2 * bits + 1 + bits) :
    install_mult_bits_skip_j bits m j num_bits f q = f q

*`install_mult_bits_skip_j` at outside-multiplier upper positions.** For `q ≥ 2 + 2 * bits + 1 + bits` (above the multiplier register), installs don't touch `q`.

FormalRV.Arithmetic.ModMult.Internal.BitPositioning.InstallBridgeAndTargetDecode

FormalRV/Arithmetic/ModMult/Internal/BitPositioning/InstallBridgeAndTargetDecode.lean

## Deliverable B — Bridge: `modmult_input_F` as install over `F_j`.

theoremmult_input_F_eq_install_with_j

theorem mult_input_F_eq_install_with_j
    (bits m acc j : Nat) (hj : j < bits) (hacc : acc < 2 ^ bits) :
    modmult_input_F bits m acc
      = install_mult_bits_skip_j bits m j bits
          (update (cuccaro_input_F 2 false 0 acc) (mult_control_idx bits j) (m.testBit j))

*`modmult_input_F` decomposes as `install_mult_bits_skip_j` applied to `update (cuccaro_input_F) controlIdx_j (m.testBit j)`.**

theoreminstall_mult_bits_skip_j_at_workspace_eq_qstart

theorem install_mult_bits_skip_j_at_workspace_eq_qstart
    (bits q_start m j num_bits : Nat) (f : Nat → Bool) (q : Nat)
    (hq : q < q_start + 2 * bits + 1) :
    install_mult_bits_skip_j_qstart bits q_start m j num_bits f q = f q

q_start-parametric: install chain does not modify positions strictly below `q_start + 2 * bits + 1`. Port of `install_mult_bits_skip_j_at_workspace_eq` (line 456).

theoreminstall_mult_bits_skip_j_at_mult_k_eq_qstart

theorem install_mult_bits_skip_j_at_mult_k_eq_qstart
    (bits q_start m j num_bits k : Nat) (f : Nat → Bool)
    (h_k_lt : k < num_bits) (h_k_ne_j : k ≠ j) :
    install_mult_bits_skip_j_qstart bits q_start m j num_bits f
        (mult_control_idx_qstart bits q_start k) = m.testBit k

q_start-parametric: install chain at multiplier position `k` (`k < num_bits`, `k ≠ j`) installs `m.testBit k`. Port of `install_mult_bits_skip_j_at_mult_k_eq` (line 476).

theoreminstall_mult_bits_skip_j_at_j_eq_qstart

theorem install_mult_bits_skip_j_at_j_eq_qstart
    (bits q_start m j num_bits : Nat) (f : Nat → Bool) :
    install_mult_bits_skip_j_qstart bits q_start m j num_bits f
        (mult_control_idx_qstart bits q_start j)
      = f (mult_control_idx_qstart bits q_start j)

q_start-parametric: install chain at the skipped position `j` is preserved from the base state. Port of `install_mult_bits_skip_j_at_j_eq` (line 503).

theoreminstall_mult_bits_skip_j_at_above_eq_qstart

theorem install_mult_bits_skip_j_at_above_eq_qstart
    (bits q_start m j num_bits : Nat) (h_num_le : num_bits ≤ bits)
    (f : Nat → Bool) (q : Nat)
    (hq : q ≥ q_start + 2 * bits + 1 + bits) :
    install_mult_bits_skip_j_qstart bits q_start m j num_bits f q = f q

q_start-parametric: install chain identity above the multiplier register. Port of `install_mult_bits_skip_j_at_above_eq` (line 522).

theoremmult_input_F_eq_install_with_j_qstart

theorem mult_input_F_eq_install_with_j_qstart
    (bits q_start m acc j : Nat) (hj : j < bits) (hacc : acc < 2 ^ bits) :
    mult_input_F_qstart bits q_start m acc
      = install_mult_bits_skip_j_qstart bits q_start m j bits
          (update (cuccaro_input_F q_start false 0 acc)
            (mult_control_idx_qstart bits q_start j) (m.testBit j))

q_start-parametric bridge: `mult_input_F_qstart bits q_start m acc` decomposes as `install_mult_bits_skip_j_qstart` applied to `update (cuccaro_input_F q_start false 0 acc) (mult_control_idx_qstart bits q_start j) (m.testBit j)`. Port of `mult_input_F_eq_install_with_j` (line 544).

theoremcuccaro_target_val_through_install_mult

theorem cuccaro_target_val_through_install_mult
    (bits m j N c : Nat) (num_bits : Nat) (f : Nat → Bool) :
    cuccaro_target_val bits 2
        (Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c
            (mult_control_idx bits j) 1)
          (install_mult_bits_skip_j bits m j num_bits f))
      = cuccaro_target_val bits 2
          (Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c
            (mult_control_idx bits j) 1) f)

*`cuccaro_target_val` is invariant under installing multiplier bits on the gate-applied state.** Each installed update is at position `controlIdx_k` (outside workspace), so by Deliverable A (commute with update) + `cuccaro_target_val_update_outside_workspace`, the target register's decoded value is unchanged.

theoremmodmult_step_target_decode

theorem modmult_step_target_decode
    (bits N a j m acc : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N) :
    cuccaro_target_val bits 2
        (Gate.applyNat (modmult_step_gate bits N a j)
          (modmult_input_F bits m acc))
      = if m.testBit j then (acc + (a * 2 ^ j) % N) % N else acc

*Deliverable C — One-step modular-multiplier target decode.** After applying `modmult_step_gate bits N a j` to `modmult_input_F bits m acc`, the decoded target register equals `if m.testBit j then (acc + (a * 2^j) % N) % N else acc`.

FormalRV.Arithmetic.ModMult.Internal.BitPositioning.QStartStepPosAndCarryInChain

FormalRV/Arithmetic/ModMult/Internal/BitPositioning/QStartStepPosAndCarryInChain.lean

## R7d^xxix-L-3.15e.2 — q_start-parametric step-gate position helpers (above-layout / flag0 / carry-in restored). Three position-level helpers needed by the eventual `modmult_step_state_eq_qstart` proof. The above-layout and flag0 cases route through a shared `_step_at_untouched_pos_qstart` helper; the carry-in case routes through the `sqir_style_controlledModAddConst_gate` carry-in chain.

theoremmodmult_step_at_untouched_pos_qstart

theorem modmult_step_at_untouched_pos_qstart
    (bits q_start N a j flagPos m acc q : Nat) (hj : j < bits)
    (h_input : mult_input_F_qstart bits q_start m acc q = false)
    (h_q_out : q < q_start ∨ q_start + 2 * bits + 1 ≤ q)
    (h_q_ne_flag : q ≠ flagPos)
    (h_q_ne_ctrl_j : q ≠ mult_control_idx_qstart bits q_start j) :
    Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
        (mult_input_F_qstart bits q_start m acc) q = false

q_start-parametric: step gate doesn't touch positions outside its support. At any `q` outside the workspace, distinct from `flagPos` and the j-th multiplier control position, the gate's output equals the input's value. Port of `modmult_step_at_untouched_pos` (line 1712).

theoremmodmult_step_flag0_false_qstart

theorem modmult_step_flag0_false_qstart
    (bits q_start N a j flagPos m acc : Nat) (hbits : 1 ≤ bits) (hj : j < bits)
    (h_qstart_ge_2 : 2 ≤ q_start)
    (h_flag_ne_0 : (0 : Nat) ≠ flagPos) :
    Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
        (mult_input_F_qstart bits q_start m acc) 0 = false

q_start-parametric: step gate's output at flag-0 position is `false`. Port of `modmult_step_flag0_false` (line 1734).

theoremmodmult_step_above_layout_false_qstart

theorem modmult_step_above_layout_false_qstart
    (bits q_start N a j flagPos m acc q : Nat) (hbits : 1 ≤ bits) (hj : j < bits)
    (hq : q ≥ q_start + 2 * bits + 1 + bits)
    (h_flag_lt_qstart : flagPos < q_start) :
    Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
        (mult_input_F_qstart bits q_start m acc) q = false

q_start-parametric: step gate's output above the multiplier register (for `q ≥ q_start + 2 * bits + 1 + bits`) is `false`. Port of `modmult_step_above_layout_false` (line 1747).

theoremstyle_modAddConst_clean_candidate_carry_in_restored_qstart

theorem style_modAddConst_clean_candidate_carry_in_restored_qstart
    (bits q_start N c x flagPos : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc : c < N) (hx : x < N)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos) :
    Gate.applyNat (sqir_style_modAddConst_clean_candidate bits q_start N c flagPos)
        (update (cuccaro_input_F q_start false 0 x) flagPos false) q_start = false

q_start-parametric: the uncontrolled clean modular-add candidate restores the carry-in to `false`. Port of `style_modAddConst_clean_candidate_carry_in_restored` (line 1637).

theoremstyle_controlledModAddConst_candidate_carry_in_restored_qstart

theorem style_controlledModAddConst_candidate_carry_in_restored_qstart
    (bits q_start N c x controlIdx flagPos : Nat) (control : Bool)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat (sqir_style_controlledModAddConst_candidate bits q_start N c controlIdx flagPos)
        (update (cuccaro_input_F q_start false 0 x) controlIdx control) q_start = false

q_start-parametric: controlled candidate carry-in restored. Dispatches on `control`. Port of `style_controlledModAddConst_candidate_carry_in_restored` (line 1657).

theoremstyle_controlledModAddConst_gate_carry_in_restored_qstart

theorem style_controlledModAddConst_gate_carry_in_restored_qstart
    (bits q_start N c x controlIdx flagPos : Nat) (control : Bool)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < q_start ∨ q_start + 2 * bits + 1 ≤ controlIdx)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (hcontrol_ne_flag : controlIdx ≠ flagPos) :
    Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c controlIdx flagPos)
        (update (cuccaro_input_F q_start false 0 x) controlIdx control) q_start = false

q_start-parametric: wrapper-level controlled mod-add carry-in restored. Adds the `c = 0` identity case. Port of `style_controlledModAddConst_gate_carry_in_restored` (line 1684).

theoremstyle_controlledModAddConst_gate_commute_install_qstart

theorem style_controlledModAddConst_gate_commute_install_qstart
    (bits q_start m j N c flagPos num_bits : Nat) (f : Nat → Bool)
    (h_flag_lt_qstart : flagPos < q_start) :
    Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
        (mult_control_idx_qstart bits q_start j) flagPos)
      (install_mult_bits_skip_j_qstart bits q_start m j num_bits f)
      = install_mult_bits_skip_j_qstart bits q_start m j num_bits
          (Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
            (mult_control_idx_qstart bits q_start j) flagPos) f)

q_start-parametric: the controlled mod-add wrapper gate commutes with the entire install stack. Port of `style_controlledModAddConst_gate_commute_install` (line 1362).

theoremmodmult_step_carry_in_restored_qstart

theorem modmult_step_carry_in_restored_qstart
    (bits q_start N a j flagPos m acc : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N)
    (h_flag_lt_qstart : flagPos < q_start) :
    Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
        (mult_input_F_qstart bits q_start m acc) q_start = false

q_start-parametric: step gate's output at the carry-in position (`q_start`) is `false`. Port of `modmult_step_carry_in_restored` (line 1764).

FormalRV.Arithmetic.ModMult.Internal.BitPositioning.QStartTargetAndWorkspacePorts

FormalRV/Arithmetic/ModMult/Internal/BitPositioning/QStartTargetAndWorkspacePorts.lean

## R7d^xxix-L-3.15c — q_start-parametric target-through-install bridge + step target-decode headline. q_start-parametric counterparts of `cuccaro_target_val_through_install_mult` and the headline `modmult_step_target_decode`. Uses the L-3.15a infrastructure (control-index facts, commute helper) and the L-3.15b install chain + bridge, plus the L-3.14′ `sqir_style_controlledModAddConst_gate_clean_qstart`. This sub-tick closes the q_start chain at the modular-multiplier-step target-decode layer. Workspace preservation (`modmult_step_workspace_qstart`) is intentionally deferred.

theoremcuccaro_target_val_through_install_mult_qstart

theorem cuccaro_target_val_through_install_mult_qstart
    (bits q_start m j N c flagPos : Nat) (num_bits : Nat) (f : Nat → Bool)
    (h_flag_lt_qstart : flagPos < q_start) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
            (mult_control_idx_qstart bits q_start j) flagPos)
          (install_mult_bits_skip_j_qstart bits q_start m j num_bits f))
      = cuccaro_target_val bits q_start
          (Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
            (mult_control_idx_qstart bits q_start j) flagPos) f)

q_start-parametric: installing multiplier bits (other than the j-th) outside the Cuccaro workspace does not change the gate's decoded target value. Port of `cuccaro_target_val_through_install_mult` (line 782).

theoremmodmult_step_target_decode_qstart

theorem modmult_step_target_decode_qstart
    (bits q_start N a j m acc dim flagPos : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
          (mult_input_F_qstart bits q_start m acc))
      = if m.testBit j then (acc + (a * 2 ^ j) % N) % N else acc

*R7d^xxix-L-3.15c HEADLINE: q_start-parametric one-step modular- multiplier target decode.** After applying `modmult_step_gate_qstart bits q_start N a j flagPos` to `mult_input_F_qstart bits q_start m acc`, the decoded target register equals `if m.testBit j then (acc + (a * 2^j) % N) % N else acc`. Port of `modmult_step_target_decode` (line 820). Uses the L-3.14′ `sqir_style_controlledModAddConst_gate_clean_qstart`, the L-3.15a control-index facts, and the L-3.15b install bridge + this tick's `_through_install_mult_qstart`. The `flagPos < q_start` hypothesis matches the hard-coded `flagPos = 1 < 2 = q_start` case and ensures `flagPos` is distinct from every multiplier-bit position.

theoremcuccaro_read_val_through_install_mult

theorem cuccaro_read_val_through_install_mult
    (bits m j N c : Nat) (num_bits : Nat) (f : Nat → Bool) :
    cuccaro_read_val bits 2
        (Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c
            (mult_control_idx bits j) 1)
          (install_mult_bits_skip_j bits m j num_bits f))
      = cuccaro_read_val bits 2
          (Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c
            (mult_control_idx bits j) 1) f)

*Read register invariant through install.**

theoremapplyNat_modmult_through_install_at_workspace

theorem applyNat_modmult_through_install_at_workspace
    (bits m j N c : Nat) (num_bits q : Nat) (f : Nat → Bool)
    (hq_ws : q < 2 + 2 * bits + 1) :
    Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c
        (mult_control_idx bits j) 1)
      (install_mult_bits_skip_j bits m j num_bits f) q
      = Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c
        (mult_control_idx bits j) 1) f q

*Position-wise invariance through install at workspace positions.**

theoremapplyNat_modmult_through_install_at_j

theorem applyNat_modmult_through_install_at_j
    (bits m j N c : Nat) (num_bits : Nat) (f : Nat → Bool) :
    Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c
        (mult_control_idx bits j) 1)
      (install_mult_bits_skip_j bits m j num_bits f) (mult_control_idx bits j)
      = Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c
        (mult_control_idx bits j) 1) f (mult_control_idx bits j)

*Position-wise invariance through install at the controlIdx_j position.**

theoremmodmult_step_workspace

theorem modmult_step_workspace
    (bits N a j m acc : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N) :
    cuccaro_read_val bits 2
        (Gate.applyNat (modmult_step_gate bits N a j)
          (modmult_input_F bits m acc)) = 0
    ∧ Gate.applyNat (modmult_step_gate bits N a j)
          (modmult_input_F bits m acc) (2 + 2 * bits) = false
    ∧ Gate.applyNat (modmult_step_gate bits N a j)
          (modmult_input_F bits m acc) 1 = false
    ∧ Gate.applyNat (modmult_step_gate bits N a j)

*Deliverable D — One-step workspace preservation.** After applying the step gate, the read register is 0, the top carry is false, the flag bit is false, and the multiplier control bit `j` is preserved as `m.testBit j`.

theoremcuccaro_read_val_through_install_mult_qstart

theorem cuccaro_read_val_through_install_mult_qstart
    (bits q_start m j N c flagPos : Nat) (num_bits : Nat) (f : Nat → Bool)
    (h_flag_lt_qstart : flagPos < q_start) :
    cuccaro_read_val bits q_start
        (Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
            (mult_control_idx_qstart bits q_start j) flagPos)
          (install_mult_bits_skip_j_qstart bits q_start m j num_bits f))
      = cuccaro_read_val bits q_start
          (Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
            (mult_control_idx_qstart bits q_start j) flagPos) f)

q_start-parametric: installing multiplier bits (other than the j-th) outside the Cuccaro workspace preserves the decoded read/workspace value. Port of `cuccaro_read_val_through_install_mult` (line 958).

theoremapplyNat_modmult_through_install_at_workspace_qstart

theorem applyNat_modmult_through_install_at_workspace_qstart
    (bits q_start m j N c flagPos : Nat) (num_bits q : Nat) (f : Nat → Bool)
    (h_flag_lt_qstart : flagPos < q_start)
    (hq_ws : q < q_start + 2 * bits + 1) :
    Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
        (mult_control_idx_qstart bits q_start j) flagPos)
      (install_mult_bits_skip_j_qstart bits q_start m j num_bits f) q
      = Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
        (mult_control_idx_qstart bits q_start j) flagPos) f q

q_start-parametric: install chain commutes with the step gate at any single workspace position `q < q_start + 2 * bits + 1`. Port of `applyNat_modmult_through_install_at_workspace` (line 990).

theoremapplyNat_modmult_through_install_at_j_qstart

theorem applyNat_modmult_through_install_at_j_qstart
    (bits q_start m j N c flagPos : Nat) (num_bits : Nat) (f : Nat → Bool)
    (h_flag_lt_qstart : flagPos < q_start) :
    Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
        (mult_control_idx_qstart bits q_start j) flagPos)
      (install_mult_bits_skip_j_qstart bits q_start m j num_bits f)
        (mult_control_idx_qstart bits q_start j)
      = Gate.applyNat (sqir_style_controlledModAddConst_gate bits q_start N c
        (mult_control_idx_qstart bits q_start j) flagPos) f
          (mult_control_idx_qstart bits q_start j)

q_start-parametric: install chain commutes with the step gate at the j-th multiplier control position. Port of `applyNat_modmult_through_install_at_j` (line 1022).

theoremmodmult_step_workspace_qstart

theorem modmult_step_workspace_qstart
    (bits q_start N a j m acc dim flagPos : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    cuccaro_read_val bits q_start
        (Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
          (mult_input_F_qstart bits q_start m acc)) = 0
    ∧ Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
          (mult_input_F_qstart bits q_start m acc) (q_start + 2 * bits) = false

*R7d^xxix-L-3.15d HEADLINE: q_start-parametric one-step modular-multiplier workspace preservation.** After applying `modmult_step_gate_qstart` to `mult_input_F_qstart`: 1. the read register decodes to 0; 2. the top carry position (`q_start + 2 * bits`) is `false`; 3. `flagPos` is `false`; 4. the j-th multiplier control position holds `m.testBit j`. Port of `modmult_step_workspace` (line 1055).

theoremmult_input_flag_0_false_qstart

theorem mult_input_flag_0_false_qstart
    (bits q_start m acc : Nat) (h_lt : 0 < q_start) :
    mult_input_F_qstart bits q_start m acc 0 = false

q_start-parametric input flag-0 mini-helper. At position 0 of `mult_input_F_qstart`, the bit is `false`, provided position 0 is below the Cuccaro workspace start. Port of `mult_input_flag_0_false` (line 143).

theoremmult_input_flag_1_false_qstart

theorem mult_input_flag_1_false_qstart
    (bits q_start m acc : Nat) (h_lt : 1 < q_start) :
    mult_input_F_qstart bits q_start m acc 1 = false

q_start-parametric input flag-1 mini-helper. At position 1 of `mult_input_F_qstart`, the bit is `false`, provided position 1 is below the Cuccaro workspace start. Port of `mult_input_flag_1_false` (line 151).

theoremmodmult_prefix_gate_qstart_zero_eq_I

theorem modmult_prefix_gate_qstart_zero_eq_I
    (bits q_start N a flagPos : Nat) :
    modmult_prefix_gate_qstart bits q_start N a flagPos 0 = Gate.I

q_start-parametric prefix gate at 0 windows is the identity.

theoremmodmult_prefix_gate_qstart_succ_eq

theorem modmult_prefix_gate_qstart_succ_eq
    (bits q_start N a flagPos k : Nat) :
    modmult_prefix_gate_qstart bits q_start N a flagPos (k + 1)
      = seq (modmult_prefix_gate_qstart bits q_start N a flagPos k)
            (modmult_step_gate_qstart bits q_start N a k flagPos)

q_start-parametric prefix gate at `k + 1` windows.

FormalRV.Arithmetic.ModMult.Internal.Encoding

FormalRV/Arithmetic/ModMult/Internal/Encoding.lean

FormalRV.Arithmetic.ModMult.Internal.Encoding ─────────────────────────────────────────────────── The basis-state INPUT ENCODING for the SQIR modular multiplier: the boolean state functions that place the accumulator and multiplier registers, and its shifted (external-data-register) variant. Used to STATE the correctness theorems. No proofs.

defmodmult_input_F

def modmult_input_F (bits m acc : Nat) : Nat → Bool

Input state for the modular multiplier: Cuccaro accumulator state at `q_start = 2` plus the multiplier bits `m.testBit j` in the control register.

defmult_input_F_shifted

def mult_input_F_shifted (bits x acc : Nat) : Nat → Bool

The input encoding shifted up by `bits` (positions `[0,bits)` reserved for the external data register).

FormalRV.Arithmetic.ModMult.Internal.Family

FormalRV/Arithmetic/ModMult/Internal/Family.lean

FormalRV.Arithmetic.ModMult.Internal.Family ───────────────────────────────────────────────── The verified `BaseUCom` oracle FAMILIES built from `modmult_MCP_gate`, indexed for Shor's order-finding (`a^(2^i) mod N`). These are the bridge objects consumed by the Shor wiring. No proofs.

deff_modmult_circuit_verified

noncomputable def f_modmult_circuit_verified (a ainv N n : Nat) :
    Nat → FormalRV.Framework.BaseUCom ((n + 1) + sqir_modmult_rev_anc (n + 1))

Verified modular-multiplier family at dimension `(n+1) + sqir_modmult_rev_anc (n+1)`.

deff_modmult_circuit_verified_bits

noncomputable def f_modmult_circuit_verified_bits (a ainv N bits : Nat) :
    Nat → FormalRV.Framework.BaseUCom (bits + sqir_modmult_rev_anc bits)

Bits-parameterized verified modular-multiplier family.

FormalRV.Arithmetic.ModMult.Internal.PrefixInvariant

FormalRV/Arithmetic/ModMult/Internal/PrefixInvariant.lean

(no documented top-level declarations)

FormalRV.Arithmetic.ModMult.Internal.PrefixInvariant.CleanBundle

FormalRV/Arithmetic/ModMult/Internal/PrefixInvariant/CleanBundle.lean

## R7d^xxix-L-3.15f — q_start-parametric constant-multiplier workspace bundle. After applying the q_start constant multiplier `modmult_const_gate_qstart` to the initial input `mult_input_F_qstart bits q_start m 0`, the non-target workspace positions are clean and the multiplier control bits are preserved. Mirrors the workspace conjuncts of the hard-coded `modmult_const_gate_clean` (line 2629). Strategy: instead of porting an explicit prefix-workspace induction (the hard-coded version doesn't have one), reuse the just-landed `modmult_prefix_state_eq_qstart` (L-3.15e.5) to reshape the post-gate state into `mult_input_F_qstart bits q_start m ((a * m) % N)`, then read off each conjunct from the input-state shape via small q_start ports of the existing input-state facts.

theoremmult_input_at_below_qstart_eq_false_qstart

theorem mult_input_at_below_qstart_eq_false_qstart
    (bits q_start m acc q : Nat) (hq : q < q_start) :
    mult_input_F_qstart bits q_start m acc q = false

q_start-parametric: `mult_input_F_qstart` at any position strictly below `q_start` is `false`. Generalises the hard-coded `mult_input_flag_0_false` / `_flag_1_false` to any flagPos < q_start.

theoremmult_input_read_decode_qstart

theorem mult_input_read_decode_qstart
    (bits q_start m acc : Nat) :
    cuccaro_read_val bits q_start (mult_input_F_qstart bits q_start m acc) = 0

q_start-parametric: the read register of `mult_input_F_qstart` is 0. Port of `mult_input_read_decode` (line 129).

theoremmult_input_top_carry_false_qstart

theorem mult_input_top_carry_false_qstart
    (bits q_start m acc : Nat) (hbits : 1 ≤ bits) :
    mult_input_F_qstart bits q_start m acc (q_start + 2 * bits) = false

q_start-parametric: the top-carry position `q_start + 2 * bits` of `mult_input_F_qstart` is `false`. Port of `mult_input_top_carry_false` (line 162).

theoremmodmult_const_gate_workspace_qstart

theorem modmult_const_gate_workspace_qstart
    (bits q_start N a m flagPos dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hm : m < 2^bits)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    cuccaro_read_val bits q_start
          (Gate.applyNat (modmult_const_gate_qstart bits q_start N a flagPos)
            (mult_input_F_qstart bits q_start m 0))
        = 0
    ∧ Gate.applyNat (modmult_const_gate_qstart bits q_start N a flagPos)

*R7d^xxix-L-3.15f HEADLINE: q_start-parametric constant-multiplier workspace bundle.** For the full multiplier `modmult_const_gate_qstart bits q_start N a flagPos` applied to `mult_input_F_qstart bits q_start m 0`: 1. the read register decodes to 0; 2. the top carry position `q_start + 2 * bits` is `false`; 3. the dirty-flag position `flagPos` is `false`; 4. every multiplier control bit `k < bits` is preserved as `m.testBit k`. Proof routes through `modmult_prefix_state_eq_qstart` (L-3.15e.5) at `k = bits` + `modmult_acc_spec_eq_mul_mod` (q_start-independent) to reshape the post-gate state, then reads each conjunct off the input state shape. Port of the workspace conjuncts of `modmult_const_gate_clean` (line 2629).

theoremmodmult_const_gate_state_eq

theorem modmult_const_gate_state_eq
    (bits N a m : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hm : m < 2^bits) :
    Gate.applyNat (modmult_const_gate bits N a) (modmult_input_F bits m 0)
      = modmult_input_F bits m ((a * m) % N)

*Deliverable G corollary — Full multiplier state equality.** After the full multiplier, the state is `modmult_input_F bits m ((a*m)%N)`.

theoremmodmult_step_gate_wellTyped

theorem modmult_step_gate_wellTyped
    (bits N a j : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits) (hj : j < bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits) (modmult_step_gate bits N a j)

*Step gate is WellTyped at `sqir_modmult_rev_anc bits`.**

theoremmodmult_prefix_gate_wellTyped

theorem modmult_prefix_gate_wellTyped
    (bits N a k : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits) (hk : k ≤ bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits) (modmult_prefix_gate bits N a k)

*Prefix gate is WellTyped at `sqir_modmult_rev_anc bits`.**

theoremmodmult_const_gate_clean

theorem modmult_const_gate_clean
    (bits N a m : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hm : m < 2^bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits) (modmult_const_gate bits N a)
    ∧ cuccaro_target_val bits 2
          (Gate.applyNat (modmult_const_gate bits N a) (modmult_input_F bits m 0))
        = (a * m) % N
    ∧ cuccaro_read_val bits 2
          (Gate.applyNat (modmult_const_gate bits N a) (modmult_input_F bits m 0))
        = 0
    ∧ Gate.applyNat (modmult_const_gate bits N a) (modmult_input_F bits m 0) 0

*Deliverable H — Clean modular-multiplier bundle.** For the full multiplier gate `modmult_const_gate bits N a`: - WellTyped at `sqir_modmult_rev_anc bits`. - Target decoded to `(a * m) % N`. - Read = 0. - Flag bits 0, 1 = false. - Top carry (position `2 + 2*bits`) = false. - All multiplier control bits preserved as `m.testBit k`.

theoremmodmult_const_gate_clean_from_BasicSetting

theorem modmult_const_gate_clean_from_BasicSetting
    (a r N m n x_mult : Nat)
    (h_basic : FormalRV.SQIRPort.BasicSetting a r N m n)
    (hm : x_mult < 2^(n + 1)) :
    Gate.WellTyped (sqir_modmult_rev_anc (n + 1))
        (modmult_const_gate (n + 1) N a)
    ∧ cuccaro_target_val (n + 1) 2
          (Gate.applyNat (modmult_const_gate (n + 1) N a)
            (modmult_input_F (n + 1) x_mult 0)) = (a * x_mult) % N
    ∧ cuccaro_read_val (n + 1) 2
          (Gate.applyNat (modmult_const_gate (n + 1) N a)
            (modmult_input_F (n + 1) x_mult 0)) = 0

*Deliverable I — BasicSetting specialization of the full multiplier clean bundle.** For BasicSetting parameters (which give `N ≤ 2^(n+1)` and `2*N ≤ 2^(n+1)`), the clean bundle holds at `bits = n + 1`.

theoremmodmult_acc_spec_from_zero

theorem modmult_acc_spec_from_zero (N a m acc : Nat) :
    modmult_acc_spec_from N a m acc 0 = acc

theoremmodmult_acc_spec_from_succ_true

theorem modmult_acc_spec_from_succ_true
    (N a m acc k : Nat) (h : m.testBit k = true) :
    modmult_acc_spec_from N a m acc (k + 1)
      = (modmult_acc_spec_from N a m acc k + (a * 2 ^ k) % N) % N

theoremmodmult_acc_spec_from_succ_false

theorem modmult_acc_spec_from_succ_false
    (N a m acc k : Nat) (h : m.testBit k = false) :
    modmult_acc_spec_from N a m acc (k + 1)
      = modmult_acc_spec_from N a m acc k

theoremmodmult_acc_spec_from_lt

theorem modmult_acc_spec_from_lt
    (N a m acc k : Nat) (hN_pos : 0 < N) (hacc : acc < N) :
    modmult_acc_spec_from N a m acc k < N

For `0 < N`, the accumulator-from-start stays in `[0, N)` if `acc < N`.

theoremmodmult_acc_spec_from_eq_mod_pow

theorem modmult_acc_spec_from_eq_mod_pow
    (N a m acc k : Nat) (hN_pos : 0 < N) (hacc : acc < N) :
    modmult_acc_spec_from N a m acc k = (acc + a * (m % 2^k)) % N

*Closed form for `acc_spec_from`**: equals `(acc + a * (m % 2^k)) % N` for `acc < N`.

theoremmodmult_acc_spec_from_eq_add_mul_mod

theorem modmult_acc_spec_from_eq_add_mul_mod
    (bits N a m acc : Nat) (hN_pos : 0 < N) (hacc : acc < N) (hm : m < 2^bits) :
    modmult_acc_spec_from N a m acc bits = (acc + a * m) % N

For `m < 2^bits` and `acc < N`, the final accumulator equals `(acc + a*m) % N`.

theoremmodmult_prefix_state_eq_from

theorem modmult_prefix_state_eq_from
    (bits N a m acc k : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hacc : acc < N) (hk : k ≤ bits) :
    Gate.applyNat (modmult_prefix_gate bits N a k) (modmult_input_F bits m acc)
      = modmult_input_F bits m (modmult_acc_spec_from N a m acc k)

*Generalized prefix state equality** for arbitrary starting accumulator.

theoremmodmult_const_gate_state_eq_from

theorem modmult_const_gate_state_eq_from
    (bits N a m acc : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hacc : acc < N) (hm : m < 2^bits) :
    Gate.applyNat (modmult_const_gate bits N a) (modmult_input_F bits m acc)
      = modmult_input_F bits m ((acc + a * m) % N)

*Generalized full multiplier state equality** for arbitrary starting accumulator.

theoremmodmult_prefix_state_eq_from_qstart

theorem modmult_prefix_state_eq_from_qstart
    (bits q_start N a m acc k flagPos dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hacc : acc < N) (hk : k ≤ bits)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    Gate.applyNat (modmult_prefix_gate_qstart bits q_start N a flagPos k)
        (mult_input_F_qstart bits q_start m acc)
      = mult_input_F_qstart bits q_start m
          (modmult_acc_spec_from N a m acc k)

q_start-parametric: prefix state-eq generalised to an arbitrary starting accumulator. Port of `modmult_prefix_state_eq_from`. Uses the q_start-INDEPENDENT `modmult_acc_spec_from` chain plus the L-3.15e.4 `modmult_step_state_eq_qstart`.

theoremmodmult_const_gate_state_eq_from_qstart

theorem modmult_const_gate_state_eq_from_qstart
    (bits q_start N a m acc flagPos dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hacc : acc < N) (hm : m < 2^bits)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    Gate.applyNat (modmult_const_gate_qstart bits q_start N a flagPos)
        (mult_input_F_qstart bits q_start m acc)
      = mult_input_F_qstart bits q_start m ((acc + a * m) % N)

q_start-parametric: full multiplier state-eq generalised to an arbitrary starting accumulator. Port of `modmult_const_gate_state_eq_from`.

theoremmodmult_target_idx_ne_mult_control_idx_qstart

theorem modmult_target_idx_ne_mult_control_idx_qstart
    (bits q_start i j : Nat) (hi : i < bits) :
    modmult_target_idx_qstart q_start i
      ≠ mult_control_idx_qstart bits q_start j

Target index disjoint from multiplier control index.

theoremmodmult_swap_acc_mult_aux_qstart_succ_eq

theorem modmult_swap_acc_mult_aux_qstart_succ_eq (bits q_start k : Nat) :
    modmult_swap_acc_mult_aux_qstart bits q_start (k + 1)
      = Gate.seq (modmult_swap_acc_mult_aux_qstart bits q_start k)
          (qubit_swap (modmult_target_idx_qstart q_start k)
                      (mult_control_idx_qstart bits q_start k))

Unfold lemma for `modmult_swap_acc_mult_aux_qstart`.

theoremmodmult_target_idx_qstart_value

theorem modmult_target_idx_qstart_value (q_start i : Nat) :
    modmult_target_idx_qstart q_start i = q_start + 2 * i + 1

Sanity helper: `modmult_target_idx_qstart` value.

theoremmodmult_swap_acc_mult_at_mult_out_range_qstart

theorem modmult_swap_acc_mult_at_mult_out_range_qstart
    (bits q_start k i : Nat) (hk : k ≤ bits) (hi_ge : k ≤ i) (hi_bits : i < bits)
    (f : Nat → Bool) :
    Gate.applyNat (modmult_swap_acc_mult_aux_qstart bits q_start k) f
        (mult_control_idx_qstart bits q_start i)
      = f (mult_control_idx_qstart bits q_start i)

q_start port of `modmult_swap_acc_mult_aux_at_mult_out_range` (line 3097). At a multiplier bit `i ≥ k`, swap output = input.

FormalRV.Arithmetic.ModMult.Internal.PrefixInvariant.OneStep

FormalRV/Arithmetic/ModMult/Internal/PrefixInvariant/OneStep.lean

## Tick 74 — Deliverable F: Prefix invariant starter.

theoremmodmult_prefix_target_decode_zero

theorem modmult_prefix_target_decode_zero
    (bits N a m : Nat) :
    cuccaro_target_val bits 2
        (Gate.applyNat (modmult_prefix_gate bits N a 0) (modmult_input_F bits m 0))
      = modmult_acc_spec N a m 0

*Prefix invariant — base case (`k = 0`).** The 0-step prefix gate is identity, so the target register is just the encoded `acc = 0`.

theoremstyle_controlledModAddConst_gate_commute_install

theorem style_controlledModAddConst_gate_commute_install
    (bits m j N c num_bits : Nat) (f : Nat → Bool) :
    Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c
        (mult_control_idx bits j) 1)
      (install_mult_bits_skip_j bits m j num_bits f)
      = install_mult_bits_skip_j bits m j num_bits
          (Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c
            (mult_control_idx bits j) 1) f)

*Function-level commute of step gate with install.** The controlled mod-add wrapper gate commutes with the entire install stack, because each install update is at `controlIdx_k` for `k ≠ j` (by construction of `install_mult_bits_skip_j`), which is outside the gate's support.

theoremmodmult_step_preserves_all_control_bits

theorem modmult_step_preserves_all_control_bits
    (bits N a m acc j k : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hacc : acc < N) (hj : j < bits) (hk : k < bits) :
    Gate.applyNat (modmult_step_gate bits N a j)
        (modmult_input_F bits m acc) (mult_control_idx bits k)
      = m.testBit k

*Deliverable A — All control bits preserved by one step.** The one-step gate `modmult_step_gate bits N a j` preserves every multiplier control bit `k < bits` as `m.testBit k`. This generalizes Tick 74's Deliverable D from `k = j` to all `k < bits`.

theoremcuccaro_target_val_eq_implies_bits_match

theorem cuccaro_target_val_eq_implies_bits_match
    (bits q_start S : Nat) (f : Nat → Bool)
    (hS : S < 2^bits) (h : cuccaro_target_val bits q_start f = S) :
    ∀ i, i < bits → f (q_start + 2 * i + 1) = S.testBit i

*Converse to `cuccaro_target_val_eq_sum_when_bits_match`.** For `S < 2^bits`, if `cuccaro_target_val bits q_start f = S`, then each target bit `i < bits` matches `S.testBit i`. By uniqueness of binary representation. Useful for deducing per-bit info from a target_val equality. This is a forward-looking utility lemma; in Tick 75 it is not yet consumed by `modmult_step_state_eq` (deferred to Tick 76).

theoremcuccaro_read_val_eq_implies_bits_match

theorem cuccaro_read_val_eq_implies_bits_match
    (bits q_start S : Nat) (f : Nat → Bool)
    (hS : S < 2^bits) (h : cuccaro_read_val bits q_start f = S) :
    ∀ i, i < bits → f (q_start + 2 * i + 2) = S.testBit i

*Converse to `cuccaro_read_val_eq_sum_when_bits_match`.**

theoremmodmult_step_state_normal

theorem modmult_step_state_normal
    (bits N a j m acc : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N) :
    let acc'

*Deliverable B — One-step state-normal theorem.** Combines Tick 74's target decode + Tick 74 workspace preservation + Deliverable A's all-control-bits preservation into a unified finite- state characterization of the step gate's output.

theoremmodmult_acc_spec_eq_mul_mod_pow

theorem modmult_acc_spec_eq_mul_mod_pow
    (N a m k : Nat) (hN_pos : 0 < N) :
    modmult_acc_spec N a m k = (a * (m % 2^k)) % N

*Strong recurrence form.**

theoremmodmult_acc_spec_eq_mul_mod

theorem modmult_acc_spec_eq_mul_mod
    (bits N a m : Nat) (hN_pos : 0 < N) (hm : m < 2^bits) :
    modmult_acc_spec N a m bits = (a * m) % N

*Deliverable E — Accumulator-spec equals modular product.** For `m < 2^bits`, the bit-by-bit accumulator equals `(a * m) % N`.

FormalRV.Arithmetic.ModMult.Internal.PrefixInvariant.StateEq

FormalRV/Arithmetic/ModMult/Internal/PrefixInvariant/StateEq.lean

## Tick 76 — Deliverable D: One-step state equality.

theoremmodmult_step_state_eq

theorem modmult_step_state_eq
    (bits N a j m acc : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N) :
    Gate.applyNat (modmult_step_gate bits N a j) (modmult_input_F bits m acc)
      = modmult_input_F bits m
          (if m.testBit j then (acc + (a * 2^j) % N) % N else acc)

*Deliverable D — One-step state equality (function-level).** After applying the step gate to `modmult_input_F bits m acc`, the state is exactly `modmult_input_F bits m acc'` where `acc' = if m.testBit j then (acc + (a*2^j)%N) % N else acc`.

theoremmodmult_step_state_eq_qstart

theorem modmult_step_state_eq_qstart
    (bits q_start N a j flagPos m acc dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
        (mult_input_F_qstart bits q_start m acc)
      = mult_input_F_qstart bits q_start m
          (if m.testBit j then (acc + (a * 2^j) % N) % N else acc)

theoremmodmult_prefix_state_eq

theorem modmult_prefix_state_eq
    (bits N a m k : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hk : k ≤ bits) :
    Gate.applyNat (modmult_prefix_gate bits N a k) (modmult_input_F bits m 0)
      = modmult_input_F bits m (modmult_acc_spec N a m k)

*Deliverable E — Prefix state equality.** By induction on `k`, the prefix gate's output on `modmult_input_F bits m 0` equals `modmult_input_F bits m (acc_spec ... k)`. Uses `modmult_step_state_eq` at each step + the accumulator recurrence.

theoremmodmult_prefix_target_decode

theorem modmult_prefix_target_decode
    (bits N a m k : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hk : k ≤ bits) :
    cuccaro_target_val bits 2
        (Gate.applyNat (modmult_prefix_gate bits N a k) (modmult_input_F bits m 0))
      = modmult_acc_spec N a m k

*Deliverable F — Prefix target decode (corollary of E).** The decoded target after applying the prefix gate equals the accumulator spec at `k`.

theoremmodmult_const_gate_target_decode

theorem modmult_const_gate_target_decode
    (bits N a m : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hm : m < 2^bits) :
    cuccaro_target_val bits 2
        (Gate.applyNat (modmult_const_gate bits N a) (modmult_input_F bits m 0))
      = (a * m) % N

*Deliverable G — Full modular multiplier target theorem.** After applying `modmult_const_gate bits N a` to `modmult_input_F bits m 0`, the target register decodes to `(a*m) % N`.

theoremmult_input_target_decode_qstart

theorem mult_input_target_decode_qstart
    (bits q_start m acc : Nat) (hacc : acc < 2 ^ bits) :
    cuccaro_target_val bits q_start (mult_input_F_qstart bits q_start m acc) = acc

q_start-parametric: decoded target of the initial input state equals the accumulator value (assuming `acc < 2^bits`). Port of `mult_input_target_decode` (line 115).

theoremmodmult_prefix_state_eq_qstart

theorem modmult_prefix_state_eq_qstart
    (bits q_start N a m k flagPos dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hk : k ≤ bits)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    Gate.applyNat (modmult_prefix_gate_qstart bits q_start N a flagPos k)
        (mult_input_F_qstart bits q_start m 0)
      = mult_input_F_qstart bits q_start m (modmult_acc_spec N a m k)

q_start-parametric prefix state equality. Port of `modmult_prefix_state_eq` (line 2416).

theoremmodmult_prefix_target_decode_qstart

theorem modmult_prefix_target_decode_qstart
    (bits q_start N a m k flagPos dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hk : k ≤ bits)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (modmult_prefix_gate_qstart bits q_start N a flagPos k)
          (mult_input_F_qstart bits q_start m 0))
      = modmult_acc_spec N a m k

q_start-parametric prefix target decode (corollary of prefix state equality). Port of `modmult_prefix_target_decode` (line 2443).

theoremmodmult_const_gate_target_decode_qstart

theorem modmult_const_gate_target_decode_qstart
    (bits q_start N a m flagPos dim : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hm : m < 2^bits)
    (h_flag_lt_qstart : flagPos < q_start)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    cuccaro_target_val bits q_start
        (Gate.applyNat (modmult_const_gate_qstart bits q_start N a flagPos)
          (mult_input_F_qstart bits q_start m 0))
      = (a * m) % N

*R7d^xxix-L-3.15e.5 HEADLINE: q_start-parametric full modular multiplier target decode.** After applying `modmult_const_gate_qstart bits q_start N a flagPos` to `mult_input_F_qstart bits q_start m 0`, the target register decodes to `(a * m) % N`. Port of `modmult_const_gate_target_decode` (line 2461).

FormalRV.Arithmetic.ModMult.Internal.PrefixInvariant.StepPositions

FormalRV/Arithmetic/ModMult/Internal/PrefixInvariant/StepPositions.lean

## Tick 76 — Carry-in restoration chain.

theoremstyle_modAddConst_clean_candidate_carry_in_restored

theorem style_modAddConst_clean_candidate_carry_in_restored
    (bits N c x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc : c < N) (hx : x < N) :
    Gate.applyNat (sqir_style_modAddConst_clean_candidate bits 2 N c 1)
        (update (cuccaro_input_F 2 false 0 x) 1 false) 2 = false

*Clean candidate carry-in (`q_start = 2`) restored to `false`.** Chains through `dirtyFlag → compareConst c → X(1)`: - `dirtyFlag` restores carry-in via `dirtyFlag_carry_in_restored_general`. - `compareConst c` preserves all workspace positions via `compareConst_candidate_workspace_restored_at_general`. - `X(1)` doesn't touch position 2 (since 2 ≠ 1).

theoremstyle_controlledModAddConst_candidate_carry_in_restored

theorem style_controlledModAddConst_candidate_carry_in_restored
    (bits N c x controlIdx : Nat) (control : Bool)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc_pos : 0 < c) (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    Gate.applyNat (sqir_style_controlledModAddConst_candidate bits 2 N c controlIdx 1)
        (update (cuccaro_input_F 2 false 0 x) controlIdx control) 2 = false

*Controlled candidate carry-in restored.** Dispatches on `control`: - `control = false`: identity (via `control_false_state_eq`). - `control = true`: chains through `clean_candidate_carry_in_restored`.

theoremstyle_controlledModAddConst_gate_carry_in_restored

theorem style_controlledModAddConst_gate_carry_in_restored
    (bits N c x controlIdx : Nat) (control : Bool)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1) :
    Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c controlIdx 1)
        (update (cuccaro_input_F 2 false 0 x) controlIdx control) 2 = false

*Wrapper-level controlled mod-add carry-in restored.** Adds the `c = 0` identity case to the candidate version.

theoremmodmult_step_at_untouched_pos

theorem modmult_step_at_untouched_pos
    (bits N a j m acc q : Nat) (hj : j < bits)
    (h_input : modmult_input_F bits m acc q = false)
    (h_q_out : q < 2 ∨ 2 + (2 * bits + 1) ≤ q)
    (h_q_ne_flag : q ≠ 1)
    (h_q_ne_ctrl_j : q ≠ mult_control_idx bits j) :
    Gate.applyNat (modmult_step_gate bits N a j)
        (modmult_input_F bits m acc) q = false

*Generic helper — step gate doesn't touch positions outside its support.** At any `q` outside workspace, distinct from flag and controlIdx_j, the gate's output equals the input's value (via commute + update_self).

theoremmodmult_step_flag0_false

theorem modmult_step_flag0_false
    (bits N a j m acc : Nat) (hbits : 1 ≤ bits) (hj : j < bits) :
    Gate.applyNat (modmult_step_gate bits N a j)
        (modmult_input_F bits m acc) 0 = false

*Step gate's output at flag bit 0 is `false`.**

theoremmodmult_step_above_layout_false

theorem modmult_step_above_layout_false
    (bits N a j m acc q : Nat) (hbits : 1 ≤ bits) (hj : j < bits)
    (hq : q ≥ 2 + 2 * bits + 1 + bits) :
    Gate.applyNat (modmult_step_gate bits N a j)
        (modmult_input_F bits m acc) q = false

*Step gate's output above the multiplier register is `false`.** For `q ≥ 2 + 2 * bits + 1 + bits`.

theoremmodmult_step_carry_in_restored

theorem modmult_step_carry_in_restored
    (bits N a j m acc : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N) :
    Gate.applyNat (modmult_step_gate bits N a j)
        (modmult_input_F bits m acc) 2 = false

*Step gate's output at carry-in (position 2) is `false`.**

theoremmodmult_step_target_bit

theorem modmult_step_target_bit
    (bits N a j m acc i : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N) (hi : i < bits) :
    Gate.applyNat (modmult_step_gate bits N a j)
        (modmult_input_F bits m acc) (2 + 2 * i + 1)
      = (if m.testBit j then (acc + (a * 2^j) % N) % N else acc).testBit i

*Step gate's output at target bit `i` equals `acc'.testBit i`.** Uses the per-bit converse `cuccaro_target_val_eq_implies_bits_match` plus the Tick 74 `modmult_step_target_decode`.

theoremmodmult_step_read_bit

theorem modmult_step_read_bit
    (bits N a j m acc i : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N) (hi : i < bits) :
    Gate.applyNat (modmult_step_gate bits N a j)
        (modmult_input_F bits m acc) (2 + 2 * i + 2) = false

*Step gate's output at read bit `i` is `false`.**

theoremmodmult_step_target_bit_qstart

theorem modmult_step_target_bit_qstart
    (bits q_start N a j flagPos m acc i : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N) (hi : i < bits)
    (h_flag_lt_qstart : flagPos < q_start)
    (dim : Nat)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
        (mult_input_F_qstart bits q_start m acc) (q_start + 2 * i + 1)
      = (if m.testBit j then (acc + (a * 2^j) % N) % N else acc).testBit i

q_start-parametric: step gate's output at target/b-register position `q_start + 2 * i + 1` decodes the i-th bit of the advanced accumulator. Port of `modmult_step_target_bit` (line 2063).

theoremmodmult_step_read_bit_qstart

theorem modmult_step_read_bit_qstart
    (bits q_start N a j flagPos m acc i : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hj : j < bits) (hacc : acc < N) (hi : i < bits)
    (h_flag_lt_qstart : flagPos < q_start)
    (dim : Nat)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
        (mult_input_F_qstart bits q_start m acc) (q_start + 2 * i + 2) = false

q_start-parametric: step gate's output at read/a-register position `q_start + 2 * i + 2` is `false`. Port of `modmult_step_read_bit` (line 2161).

theoremmodmult_step_preserves_all_control_bits_qstart

theorem modmult_step_preserves_all_control_bits_qstart
    (bits q_start N a m acc j k flagPos : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hN2 : 2 * N ≤ 2^bits)
    (hacc : acc < N) (hj : j < bits) (hk : k < bits)
    (h_flag_lt_qstart : flagPos < q_start)
    (dim : Nat)
    (h_workspace : q_start + 2 * bits + 1 ≤ dim)
    (h_dim_covers_mult : q_start + (2 * bits + 1) + bits ≤ dim) :
    Gate.applyNat (modmult_step_gate_qstart bits q_start N a j flagPos)
        (mult_input_F_qstart bits q_start m acc)
          (mult_control_idx_qstart bits q_start k)
      = m.testBit k

q_start-parametric: step gate preserves every multiplier control bit `k < bits` as `m.testBit k`. Port of `modmult_step_preserves_all_control_bits` (line 1717).

FormalRV.Arithmetic.ModMult.Internal.QStart

FormalRV/Arithmetic/ModMult/Internal/QStart.lean

FormalRV.Arithmetic.ModMult.Internal.QStart ───────────────────────────────────────────────── q_start-PARAMETRIC infrastructure: ports of the gate chain (`SQIRModMultDef.lean`) and the input encoding to a free workspace offset `q_start`, plus the multiplier-bit install helpers. Consumed only by the prefix-invariant correctness proofs. No proofs.

defmult_control_idx_qstart

def mult_control_idx_qstart (bits q_start j : Nat) : Nat

q_start-parametric multiplier-bit position.

defmult_input_F_qstart

def mult_input_F_qstart (bits q_start m acc : Nat) : Nat → Bool

q_start-parametric input state.

defmodmult_step_gate_qstart

def modmult_step_gate_qstart (bits q_start N a j flagPos : Nat) : Gate

q_start-parametric one-step gate.

definstall_mult_bits_skip_j

def install_mult_bits_skip_j (bits m j : Nat) : Nat → (Nat → Bool) → (Nat → Bool)
  | 0,     f => f
  | n + 1, f =>
    if n = j then install_mult_bits_skip_j bits m j n f
    else update (install_mult_bits_skip_j bits m j n f) (mult_control_idx bits n) (m.testBit n)

Install multiplier bits `0,…,num-1` from `m`, skipping bit `j`.

definstall_mult_bits_skip_j_qstart

def install_mult_bits_skip_j_qstart (bits q_start m j : Nat) :
    Nat → (Nat → Bool) → (Nat → Bool)
  | 0,     f => f
  | n + 1, f =>
    if n = j then install_mult_bits_skip_j_qstart bits q_start m j n f
    else update (install_mult_bits_skip_j_qstart bits q_start m j n f)
                (mult_control_idx_qstart bits q_start n) (m.testBit n)

q_start-parametric `install_mult_bits_skip_j`.

defmodmult_prefix_gate_qstart

def modmult_prefix_gate_qstart (bits q_start N a flagPos : Nat) : Nat → Gate
  | 0     => Gate.I
  | k + 1 => seq (modmult_prefix_gate_qstart bits q_start N a flagPos k)
                 (modmult_step_gate_qstart bits q_start N a k flagPos)

q_start-parametric prefix gate.

defmodmult_const_gate_qstart

def modmult_const_gate_qstart (bits q_start N a flagPos : Nat) : Gate

q_start-parametric full const-multiplier gate.

defmodmult_target_idx_qstart

def modmult_target_idx_qstart (q_start i : Nat) : Nat

q_start-parametric accumulator (target) bit index.

defmodmult_swap_acc_mult_aux_qstart

def modmult_swap_acc_mult_aux_qstart (bits q_start : Nat) : Nat → Gate
  | 0     => Gate.I
  | k + 1 => Gate.seq (modmult_swap_acc_mult_aux_qstart bits q_start k)
                      (qubit_swap (modmult_target_idx_qstart q_start k)
                                  (mult_control_idx_qstart bits q_start k))

q_start-parametric accumulator↔multiplier swap (recursive).

defmodmult_swap_acc_mult_qstart

def modmult_swap_acc_mult_qstart (bits q_start : Nat) : Gate

q_start-parametric full accumulator↔multiplier swap.

defmodmult_inplace_candidate_qstart

def modmult_inplace_candidate_qstart (bits q_start N a ainv flagPos : Nat) : Gate

q_start-parametric in-place modular multiplier (requires `a·ainv ≡ 1 mod N`).

FormalRV.Arithmetic.ModMult.Internal.Spec

FormalRV/Arithmetic/ModMult/Internal/Spec.lean

FormalRV.Arithmetic.ModMult.Internal.Spec ─────────────────────────────────────────────── The CLASSICAL specification of the modular multiplier's action — the shift-and-accumulate loop, as a pure `Nat` recursion. These are the reference values the gate-level correctness proofs decode to. No proofs.

defmodmult_acc_spec

def modmult_acc_spec (N a m : Nat) : Nat → Nat
  | 0     => 0
  | k + 1 =>
    if m.testBit k then (modmult_acc_spec N a m k + (a * 2 ^ k) % N) % N
    else modmult_acc_spec N a m k

Accumulator after the first `k` multiplier bits, starting from 0.

defmodmult_acc_spec_from

def modmult_acc_spec_from (N a m acc : Nat) : Nat → Nat
  | 0     => acc
  | k + 1 =>
    if m.testBit k then (modmult_acc_spec_from N a m acc k + (a * 2 ^ k) % N) % N
    else modmult_acc_spec_from N a m acc k

Like `modmult_acc_spec` but starting from `acc` (for the uncompute step).

FormalRV.Arithmetic.ModMult.Internal.ToffoliCount

FormalRV/Arithmetic/ModMult/Internal/ToffoliCount.lean

FormalRV.Arithmetic.ModMult.Internal.ToffoliCount — a proved Toffoli (T-count) bound on the SAME Gate IR term that the SQIR port proves computes modular multiplication. The verified modular multiplier `modmult_const_gate bits N a` is proved to write `(a · m) % N` into the accumulator (`modmult_const_gate_target_decode`). Here we derive a closed-form UPPER BOUND on its T-count by structural induction over its Gate IR — `tcount ≤ 56·bits²` (i.e. ≤ 8·bits² Toffolis) — so for the first time a modular- arithmetic building block has count AND semantics on one verified circuit term. Layer counts (each derived, not asserted): prepare-reads tcount 0 (only X / CX / cond) cuccaro_maj_chain_inv tcount 7·n conditional add / sub tcount 14·bits (one Cuccaro adder, 14·bits) compare / ctrl-compare tcount 14·bits (two maj-chains, 7·bits each) controlled mod-add tcount 56·bits (four 14·bits sub-blocks) modmult prefix (k steps) tcount ≤ k·56·bits modmult const (bits) tcount ≤ 56·bits² No `sorry`, no new `axiom`.

theoremtcount_cuccaro_prepareConstRead

theorem tcount_cuccaro_prepareConstRead (n q c : Nat) :
    tcount (cuccaro_prepareConstRead n q c) = 0

theoremtcount_sqir_prepareMaskedConstRead

theorem tcount_sqir_prepareMaskedConstRead (n q N f : Nat) :
    tcount (sqir_prepareMaskedConstRead n q N f) = 0

theoremtcount_cuccaro_MAJ_inv

theorem tcount_cuccaro_MAJ_inv (a b c : Nat) : tcount (cuccaro_MAJ_inv a b c) = 7

theoremtcount_cuccaro_maj_chain_inv

theorem tcount_cuccaro_maj_chain_inv (n q : Nat) :
    tcount (cuccaro_maj_chain_inv n q) = 7 * n

theoremtcount_sqir_conditionalAddConstGate

theorem tcount_sqir_conditionalAddConstGate (bits q N f : Nat) :
    tcount (sqir_conditionalAddConstGate bits q N f) = 14 * bits

theoremtcount_sqir_conditionalSubConstGate

theorem tcount_sqir_conditionalSubConstGate (bits q N f : Nat) :
    tcount (sqir_conditionalSubConstGate bits q N f) = 14 * bits

theoremtcount_sqir_style_compareConst_candidate

theorem tcount_sqir_style_compareConst_candidate (bits q N f : Nat) :
    tcount (sqir_style_compareConst_candidate bits q N f) = 14 * bits

theoremtcount_sqir_controlledCompareConst

theorem tcount_sqir_controlledCompareConst (bits q c ci f : Nat) :
    tcount (sqir_controlledCompareConst bits q c ci f) = 14 * bits

theoremtcount_sqir_style_controlledModAddConst_candidate

theorem tcount_sqir_style_controlledModAddConst_candidate (bits q N c ci f : Nat) :
    tcount (sqir_style_controlledModAddConst_candidate bits q N c ci f) = 56 * bits

theoremtcount_sqir_style_controlledModAddConst_gate_le

theorem tcount_sqir_style_controlledModAddConst_gate_le (bits q N c ci f : Nat) :
    tcount (sqir_style_controlledModAddConst_gate bits q N c ci f) ≤ 56 * bits

theoremtcount_sqir_modmult_step_gate_le

theorem tcount_sqir_modmult_step_gate_le (bits N a j : Nat) :
    tcount (modmult_step_gate bits N a j) ≤ 56 * bits

theoremtcount_sqir_modmult_prefix_gate_le

theorem tcount_sqir_modmult_prefix_gate_le (bits N a k : Nat) :
    tcount (modmult_prefix_gate bits N a k) ≤ k * (56 * bits)

theoremtcount_sqir_modmult_const_gate_le

theorem tcount_sqir_modmult_const_gate_le (bits N a : Nat) :
    tcount (modmult_const_gate bits N a) ≤ 56 * bits ^ 2

*T-count UPPER BOUND on the verified modular multiplier.** The SAME Gate term `modmult_const_gate bits N a` that `modmult_const_gate_target_decode` proves computes `(a · m) % N` costs at most `56·bits²` T-gates (≤ 8·bits² Toffolis).

theoremtcount_sqir_style_controlledModAddConst_gate_eq

theorem tcount_sqir_style_controlledModAddConst_gate_eq
    (bits q N c ci f : Nat) (hc : c ≠ 0) :
    tcount (sqir_style_controlledModAddConst_gate bits q N c ci f) = 56 * bits

theoremtcount_sqir_modmult_step_gate_eq

theorem tcount_sqir_modmult_step_gate_eq
    (bits N a j : Nat) (h : (a * 2 ^ j) % N ≠ 0) :
    tcount (modmult_step_gate bits N a j) = 56 * bits

theoremtcount_sqir_modmult_prefix_gate_eq

theorem tcount_sqir_modmult_prefix_gate_eq
    (bits N a k : Nat) (h : ∀ j, j < k → (a * 2 ^ j) % N ≠ 0) :
    tcount (modmult_prefix_gate bits N a k) = k * (56 * bits)

theoremtcount_sqir_modmult_const_gate_eq

theorem tcount_sqir_modmult_const_gate_eq
    (bits N a : Nat) (h : ∀ j, j < bits → (a * 2 ^ j) % N ≠ 0) :
    tcount (modmult_const_gate bits N a) = 56 * bits ^ 2

*EXACT T-count of the verified modular multiplier.** `modmult_const_gate bits N a` (PROVED to compute `(a·m) % N`) costs EXACTLY `56·bits²` T-gates whenever every step constant is non-zero — so the compiled circuit will count exactly this number.

theoremmodmult_step_const_ne_zero

theorem modmult_step_const_ne_zero
    (a N : Nat) (hcop : Nat.Coprime a N) (hodd : Odd N) (h1 : 1 < N) (j : Nat) :
    (a * 2 ^ j) % N ≠ 0

theoremtcount_sqir_modmult_const_gate_shor

theorem tcount_sqir_modmult_const_gate_shor
    (bits N a : Nat) (hcop : Nat.Coprime a N) (hodd : Odd N) (h1 : 1 < N) :
    tcount (modmult_const_gate bits N a) = 56 * bits ^ 2

*EXACT T-count of the verified modular multiplier, for any valid Shor base.**

theoremtcount_qubit_swap

theorem tcount_qubit_swap (a b : Nat) : tcount (qubit_swap a b) = 0

theoremtcount_Gate_shift

theorem tcount_Gate_shift (off : Nat) (g : Gate) : tcount (Gate.shift off g) = tcount g

theoremtcount_sqir_swap_acc_mult_aux

theorem tcount_sqir_swap_acc_mult_aux (bits k : Nat) :
    tcount (modmult_swap_acc_mult_aux bits k) = 0

theoremtcount_sqir_swap_acc_mult

theorem tcount_sqir_swap_acc_mult (bits : Nat) : tcount (modmult_swap_acc_mult bits) = 0

theoremtcount_reverse_register_swap_aux

theorem tcount_reverse_register_swap_aux (n oa ob k : Nat) :
    tcount (reverse_register_swap_aux n oa ob k) = 0

theoremtcount_reverse_register_swap

theorem tcount_reverse_register_swap (n oa ob : Nat) :
    tcount (reverse_register_swap n oa ob) = 0

theoremtcount_sqir_encode_to_mult_adapter

theorem tcount_sqir_encode_to_mult_adapter (bits : Nat) :
    tcount (encode_to_mult_adapter bits) = 0

theoremtcount_sqir_modmult_inplace_candidate_eq

theorem tcount_sqir_modmult_inplace_candidate_eq
    (bits N a ainv : Nat)
    (ha : ∀ j, j < bits → (a * 2 ^ j) % N ≠ 0)
    (hb : ∀ j, j < bits → ((N - ainv) % N * 2 ^ j) % N ≠ 0) :
    tcount (modmult_inplace_candidate bits N a ainv) = 112 * bits ^ 2

EXACT T-count of the in-place modular multiplier = `112·bits²` when both constant multipliers (`a` and `(N-ainv)%N`) have all non-zero steps.

theoremtcount_sqir_modmult_MCP_gate_eq

theorem tcount_sqir_modmult_MCP_gate_eq
    (bits N a ainv : Nat)
    (ha : ∀ j, j < bits → (a * 2 ^ j) % N ≠ 0)
    (hb : ∀ j, j < bits → ((N - ainv) % N * 2 ^ j) % N ≠ 0) :
    tcount (modmult_MCP_gate bits N a ainv) = 112 * bits ^ 2

*EXACT T-count of the VERIFIED MCP oracle term** = `112·bits²` (same conditions).

theoremcoprime_modsub

theorem coprime_modsub (ainv N : Nat)
    (hcopinv : Nat.Coprime ainv N) (hpos : 0 < ainv) (hlt : ainv < N) :
    Nat.Coprime ((N - ainv) % N) N

The uncompute constant `(N-ainv)%N` is coprime to `N` when `ainv` is (and `0<ainv<N`).

theoremtcount_sqir_modmult_MCP_gate_shor

theorem tcount_sqir_modmult_MCP_gate_shor
    (bits N a ainv : Nat) (hcop : Nat.Coprime a N) (hcopinv : Nat.Coprime ainv N)
    (hpos : 0 < ainv) (hlt : ainv < N) (hodd : Odd N) (h1 : 1 < N) :
    tcount (modmult_MCP_gate bits N a ainv) = 112 * bits ^ 2

*EXACT T-count of the verified MCP oracle, for any valid Shor base + inverse.**

FormalRV.Arithmetic.ModMult.ModMultCorrectness

FormalRV/Arithmetic/ModMult/ModMultCorrectness.lean

FormalRV.Arithmetic.ModMult.ModMultCorrectness ────────────────────────────────────────────────────── THE semantic-correctness theorem for the SQIR-faithful in-place modular multiplier. Definition (THE multiplier): `modmult_MCP_gate` in `SQIRModMultDefinitions.lean`. Correctness is stated through the shared `Gate.applyNat` semantic core (the proof routes via `modmult_MCP_gate_apply_encode`). The single theorem to audit is `modmult_correct`.

theoremmodmult_correct

theorem modmult_correct (bits N a ainv : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2 ^ bits) (hN2 : 2 * N ≤ 2 ^ bits)
    (h_ainv_le : ainv ≤ N) (h_inv : (a * ainv) % N = 1) :
    FormalRV.SQIRPort.MultiplyCircuitProperty a N bits (sqir_modmult_rev_anc bits)
      (Gate.toUCom (modmult_total_dim bits) (modmult_MCP_gate bits N a ainv))

*SQIR modular multiplier — semantic correctness (THE headline).** For valid Shor parameters (`a·ainv ≡ 1 mod N`, `2N ≤ 2^bits`), the MCP-layout gate `modmult_MCP_gate bits N a ainv` satisfies `MultiplyCircuitProperty a N` — i.e. on the SQIR-faithful encoding it maps the data register `x ↦ (a · x) mod N` in place. Proven through the `Gate.applyNat` semantic core.

FormalRV.Arithmetic.ModMult.ModMultDef

FormalRV/Arithmetic/ModMult/ModMultDef.lean

FormalRV.Arithmetic.ModMult.ModMultDef ────────────────────────────────────────────── THE definition of the SQIR-faithful in-place modular multiplier, as concrete `Gate`-IR data. **Definitions only — no proofs.** THE multiplier is `modmult_MCP_gate bits N a ainv`: it maps the data register `x ↦ (a · x) mod N` in place, on `modmult_total_dim bits` qubits, where `bits` is the bit-width of the integers (needs `2N ≤ 2^bits`). Construction (bottom-up): step → prefix → const_gate -- shift-and-add of (a·2^j mod N) into the accumulator + modmult_swap_acc_mult -- swap accumulator ↔ multiplier register = inplace_candidate -- x ↦ (a·x) mod N (compute · swap · uncompute) + Gate.shift + encode adapter = modmult_MCP_gate (MultiplyCircuitProperty layout) Correctness : `SQIRModMultCorrectness.lean` (`modmult_correct`) Resource : `SQIRModMultResource.lean` (`modmult_tcount = 112·bits²`) Supporting : `SQIRModMultDefinitions.lean` (input encoding, classical specs, verified BaseUCom families, q_start-parametric infrastructure)

defmult_control_idx

def mult_control_idx (bits j : Nat) : Nat

Multiplier bit `j` sits at this qubit, just above the Cuccaro mod-add block.

defmodmult_step_gate

def modmult_step_gate (bits N a j : Nat) : Gate

One step: conditionally add `(a · 2^j) mod N` to the accumulator, controlled by multiplier bit `j`.

defmodmult_prefix_gate

def modmult_prefix_gate (bits N a : Nat) : Nat → Gate
  | 0     => Gate.I
  | k + 1 => seq (modmult_prefix_gate bits N a k) (modmult_step_gate bits N a k)

Apply the step for `j = 0, …, k-1` in order.

defmodmult_const_gate

def modmult_const_gate (bits N a : Nat) : Gate

The const-multiplier gate: process all `bits` multiplier bits.

defmodmult_target_idx

def modmult_target_idx (i : Nat) : Nat

Qubit of accumulator (target) bit `i` in the Cuccaro layout.

defmodmult_swap_acc_mult_aux

def modmult_swap_acc_mult_aux (bits : Nat) : Nat → Gate
  | 0     => Gate.I
  | k + 1 => Gate.seq (modmult_swap_acc_mult_aux bits k)
                      (qubit_swap (modmult_target_idx k) (mult_control_idx bits k))

Swap accumulator bits `[0,k)` with multiplier bits `[0,k)`.

defmodmult_swap_acc_mult

def modmult_swap_acc_mult (bits : Nat) : Gate

Full SWAP of the accumulator (target) register with the multiplier register.

defmodmult_inplace_candidate

def modmult_inplace_candidate (bits N a ainv : Nat) : Gate

In-place modular multiplier (requires `a · ainv ≡ 1 mod N`): compute `(a·x) mod N` into the accumulator, swap it into the `x` register, then uncompute the old `x` by accumulating `(N - ainv)·(a·x) ≡ -x (mod N)`.

defmodmult_total_dim

def modmult_total_dim (bits : Nat) : Nat

Total qubit budget: `bits` for the external data register + the SQIR ancilla/workspace block `sqir_modmult_rev_anc bits`.

defGate.shift

def Gate.shift (off : Nat) : Gate → Gate
  | Gate.I         => Gate.I
  | Gate.X q       => Gate.X (off + q)
  | Gate.CX a b    => Gate.CX (off + a) (off + b)
  | Gate.CCX a b c => Gate.CCX (off + a) (off + b) (off + c)
  | Gate.seq g h   => Gate.seq (Gate.shift off g) (Gate.shift off h)

Shift every gate position up by `off` (embeds a gate into a larger register).

defencode_to_mult_adapter

def encode_to_mult_adapter (bits : Nat) : Gate

Adapter between the external `encodeDataZeroAnc` layout and the (shifted) SQIR multiplier layout: swap the data register `[0,bits)` into the shifted multiplier register (bit-order reversed).

defmodmult_inplace_shifted

def modmult_inplace_shifted (bits N a ainv : Nat) : Gate

The in-place multiplier embedded at the shifted SQIR layout.

defmodmult_MCP_gate

def modmult_MCP_gate (bits N a ainv : Nat) : Gate

*THE SQIR modular multiplier** (MCP layout): `encode adapter` → `shifted in-place multiplier` → `decode adapter`. Maps `x ↦ (a·x) mod N`. Correctness: `modmult_correct`. Resource: `modmult_tcount = 112·bits²`.

FormalRV.Arithmetic.ModMult.ModMultResource

FormalRV/Arithmetic/ModMult/ModMultResource.lean

FormalRV.Arithmetic.ModMult.ModMultResource ─────────────────────────────────────────────────── THE resource theorem for the SQIR-faithful in-place modular multiplier, and the theorem tying the resource to the SAME gate the correctness theorem verifies. Headlines: • `modmult_tcount` — EXACT T-count = 112·bits² (an equality, not a bound). • `modmult_verified` — resource AFTER correctness: the one gate is both MultiplyCircuitProperty-correct AND has T-count 112·bits².

theoremmodmult_tcount

theorem modmult_tcount (bits N a ainv : Nat)
    (hcop : Nat.Coprime a N) (hcopinv : Nat.Coprime ainv N)
    (hpos : 0 < ainv) (hlt : ainv < N) (hodd : Odd N) (h1 : 1 < N) :
    tcount (modmult_MCP_gate bits N a ainv) = 112 * bits ^ 2

*SQIR modular multiplier — resource (THE headline, EXACT).** T-count `= 112 · bits²` for the verified MCP oracle, for any valid Shor base `a` and inverse `ainv` (coprime to an odd modulus `N > 1`). This is an exact equality, computed structurally from the construction.

theoremmodmult_verified

theorem modmult_verified (bits N a ainv : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2 ^ bits) (hN2 : 2 * N ≤ 2 ^ bits)
    (h_ainv_le : ainv ≤ N) (h_inv : (a * ainv) % N = 1)
    (hcop : Nat.Coprime a N) (hcopinv : Nat.Coprime ainv N)
    (hpos : 0 < ainv) (hlt : ainv < N) (hodd : Odd N) (h1 : 1 < N) :
    FormalRV.SQIRPort.MultiplyCircuitProperty a N bits (sqir_modmult_rev_anc bits)
        (Gate.toUCom (modmult_total_dim bits) (modmult_MCP_gate bits N a ainv))
    ∧ tcount (modmult_MCP_gate bits N a ainv) = 112 * bits ^ 2

*SQIR modular multiplier — verified-with-resource (resource AFTER correctness).** The single gate `modmult_MCP_gate bits N a ainv` is simultaneously (i) `MultiplyCircuitProperty a N`-correct and (ii) exactly `112 · bits²` T-gates. The resource is stated about exactly the verified gate.

FormalRV.Arithmetic.ModMult.ShorOracle.Correctness

FormalRV/Arithmetic/ModMult/ShorOracle/Correctness.lean

FormalRV.Arithmetic.ModMult.ShorOracle.Correctness Semantic correctness of the Shor-layout modmult: WellTyped, the encodeDataZeroAnc action (x -> a*x mod N), and the `MultiplyCircuitProperty` discharge that lets it serve as Shor's ModMulImpl oracle. HEADLINE: modMultInPlaceShor_MultiplyCircuitProperty.

theoremmodMultInPlaceShor_wellTyped

theorem modMultInPlaceShor_wellTyped
    (bits N a ainv multBits : Nat) (hbits : 1 ≤ bits)
    (h_multBits_le : multBits ≤ bits + 1) (h_multBits_pos : 0 < multBits) :
    Gate.WellTyped (multBits + (adder_n_qubits (bits + 1) + 1))
      (modMultInPlaceShor bits N a ainv multBits)

*WellTyped for `modMultInPlaceShor`.**

theoremmodMultInPlaceShor_correct

theorem modMultInPlaceShor_correct
    (bits N a ainv multBits x : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_multBits_pos : 0 < multBits)
    (h_N_le_pow_multBits : N ≤ 2^multBits)
    (ha_pos : 0 < a) (ha_lt : a < N)
    (hainv_pos : 0 < ainv) (hainv_lt : ainv < N)
    (h_inv : a * ainv % N = 1)
    (hx_lt : x < N)
    (h_const_pos_a : ∀ j, j < multBits → 0 < (a * 2^j) % N)
    (h_const_pos_inv : ∀ j, j < multBits → 0 < ((N - ainv) % N * 2^j) % N) :

*HEADLINE: Layout-converting in-place modular multiplier correctness.** Applied to `encodeDataZeroAnc multBits (adder_n_qubits (bits+1) + 1) x`, the gate produces `encodeDataZeroAnc multBits (adder_n_qubits (bits+1) + 1) ((a*x) % N)`. This is the exact shape required by `toUCom_satisfies_MultiplyCircuitProperty_of_applyNat_encodeDataZeroAnc`.

theoremmodMultInPlaceShor_MultiplyCircuitProperty

theorem modMultInPlaceShor_MultiplyCircuitProperty
    (bits N a ainv multBits : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_multBits_pos : 0 < multBits)
    (h_N_le_pow_multBits : N ≤ 2^multBits)
    (ha_pos : 0 < a) (ha_lt : a < N)
    (hainv_pos : 0 < ainv) (hainv_lt : ainv < N)
    (h_inv : a * ainv % N = 1)
    (h_const_pos_a : ∀ j, j < multBits → 0 < (a * 2^j) % N)
    (h_const_pos_inv : ∀ j, j < multBits → 0 < ((N - ainv) % N * 2^j) % N) :
    FormalRV.SQIRPort.MultiplyCircuitProperty a N multBits

*HEADLINE: `modMultInPlaceShor` satisfies `MultiplyCircuitProperty`.** The compiled `BaseUCom (multBits + (adder_n_qubits (bits+1) + 1))` from `Gate.toUCom` satisfies the SQIR-shape modular-multiplication property required by `Shor_correct_var` / `Shor_correct`. This is the structural Phase 6 obligation, blocked since Tick 10 (out-of-place vs in-place, layout mismatch) and now closed via path (A).

FormalRV.Arithmetic.ModMult.ShorOracle.Def

FormalRV/Arithmetic/ModMult/ShorOracle/Def.lean

FormalRV.Arithmetic.ModMult.ShorOracle.Def The Shor-layout in-place modular multiplier `modMultInPlaceShor` — a SECOND modmult variant (distinct from modmult_MCP_gate): it wraps the faithfully-verified Gidney ripple-carry in-place multiplier (ModularAdder/Gidney) with register-swap adapters, living at the Shor order-finding dimension multBits + adder_n_qubits(bits+1)+1 = 4*bits+6. This is the modmult that Shor's order-finding oracle consumes.

defmodMultInPlaceShor

def modMultInPlaceShor (bits N a ainv multBits : Nat) : Gate

*Shor-shaped in-place modular multiplier gate.** Three-stage composition: SWAP → in-place multiplier → SWAP. Takes `encodeDataZeroAnc` input and produces `encodeDataZeroAnc` output with the data register replaced by `(a*x) mod N`.

FormalRV.Arithmetic.ModularAdder

FormalRV/Arithmetic/ModularAdder.lean

# FormalRV.Arithmetic.ModularAdder *Two** verified implementations of the modular adder `(x + c) mod N`, on two different base ripple-carry adders. Both compute the same value by the same textbook algorithm — add `c` (over one extra bit) → subtract `N` → read the borrow/high bit as a comparison flag → conditionally add `N` back → uncompute the flag — and differ only in (a) which base adder fills the "add" slot and (b) whether anything downstream consumes them. - [`ModularAdder.Gidney`](ModularAdder/Gidney.lean) — built on the **Gidney** patched ripple-carry adder. Fully proven, but **standalone** (not wired into the verified Shor path). - [`ModularAdder.Cuccaro`](ModularAdder/Cuccaro.lean) — built on the **Cuccaro** adder; a re-export of `Cuccaro/CuccaroSQIRDirtyFlag`. Fully proven and *LIVE** — this is the modular adder the verified modular multiplier and Shor actually use. See [`ModularAdder/README.md`](ModularAdder/README.md) for the composition chains and the live-vs-standalone details.

(no documented top-level declarations)

FormalRV.Arithmetic.ModularAdder.Cuccaro

FormalRV/Arithmetic/ModularAdder/Cuccaro.lean

FormalRV.Arithmetic.ModularAdder.Cuccaro ──────────────────────────────────────── THE LIVE modular adder `(x + c) mod N` — the Cuccaro/SQIR-style family the verified modular multiplier and Shor actually use. Follows the same Def / Correctness / Resource spine as `ModularAdder.Gidney`, but only *surfaces** the family: the definitions and proofs stay physically under `Cuccaro/CuccaroSQIRDirtyFlag/` (they are built on the Cuccaro adder and are imported by `ModMult/`). No definitions or proofs are added here. • `Cuccaro/Def.lean` — re-exports `sqir_style_modAddConst_clean_gate`, `sqir_style_controlledModAddConst_gate`. • `Cuccaro/Correctness.lean` — `cuccaroModAddConst_correct`, `cuccaroControlledModAddConst_correct`. • `Cuccaro/Resource.lean` — qubit budget (`sqir_modmult_rev_anc bits`). Live path: `ModMult.modmult_step_gate → sqir_style_controlledModAddConst_gate → modmult_MCP_gate (verified multiplier) → VerifiedShor`.

(no documented top-level declarations)

FormalRV.Arithmetic.ModularAdder.Cuccaro.Correctness

FormalRV/Arithmetic/ModularAdder/Cuccaro/Correctness.lean

FormalRV.Arithmetic.ModularAdder.Cuccaro.Correctness ──────────────────────────────────────────────────── THE semantic-correctness theorems for the Cuccaro/SQIR-style modular adder `(x + c) mod N` (the LIVE one). Surfaced here as thin wrappers; the heavy proofs live in `Cuccaro/CuccaroSQIRDirtyFlag/` (kept there — consumed by `ModMult/`). Headlines (at the SQIR layout `q_start = 2`, `flagPos = 1`): • `cuccaroModAddConst_correct` — uncontrolled `(x + c) mod N`. • `cuccaroControlledModAddConst_correct` — controlled version (THE gate the verified modular multiplier uses). Both decode the target register to `(x+c) mod N` and certify that the read register, top carry, and flag are restored (the controlled one additionally preserves the control bit and gates on it).

theoremcuccaroModAddConst_correct

theorem cuccaroModAddConst_correct
    (bits N c x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2 ^ bits) (hN2 : 2 * N ≤ 2 ^ bits)
    (hc : c < N) (hx : x < N) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_modAddConst_clean_gate bits N c)
    ∧ cuccaro_target_val bits 2
          (Gate.applyNat (sqir_style_modAddConst_clean_gate bits N c)
            (update (cuccaro_input_F 2 false 0 x) 1 false))
        = (x + c) % N
    ∧ cuccaro_read_val bits 2
          (Gate.applyNat (sqir_style_modAddConst_clean_gate bits N c)

*Cuccaro modular adder — correctness (uncontrolled).** For `1 ≤ bits`, `0 < N`, `N ≤ 2^bits`, `2N ≤ 2^bits`, `c < N`, `x < N`, the gate `sqir_style_modAddConst_clean_gate bits N c` on the clean SQIR-layout input is WellTyped on `sqir_modmult_rev_anc bits` qubits, decodes the target register to `(x+c) mod N`, and restores the read register, the top carry, and the flag.

theoremcuccaroControlledModAddConst_correct

theorem cuccaroControlledModAddConst_correct
    (bits N c x controlIdx : Nat) (control : Bool) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2 ^ bits) (hN2 : 2 * N ≤ 2 ^ bits)
    (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1)
    (h_control_workspace_lt : controlIdx < sqir_modmult_rev_anc bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
        (sqir_style_controlledModAddConst_gate bits 2 N c controlIdx 1)
    ∧ cuccaro_target_val bits 2
          (Gate.applyNat (sqir_style_controlledModAddConst_gate bits 2 N c controlIdx 1)
            (update (cuccaro_input_F 2 false 0 x) controlIdx control))

*Cuccaro controlled modular adder — correctness (THE live headline).** For any `control` and out-of-band `controlIdx ≠ 1` with `controlIdx < sqir_modmult_rev_anc bits`, `sqir_style_controlledModAddConst_gate bits 2 N c controlIdx 1` decodes the target to `(x+c) mod N` iff `control` is set (else leaves `x`), restoring the read register / top carry / flag and preserving the control bit.

FormalRV.Arithmetic.ModularAdder.Cuccaro.Def

FormalRV/Arithmetic/ModularAdder/Cuccaro/Def.lean

FormalRV.Arithmetic.ModularAdder.Cuccaro.Def ──────────────────────────────────────────── THE definitions of the Cuccaro/SQIR-style modular adder `(x + c) mod N` — the LIVE one consumed by the verified modular multiplier and Shor. *No definitions are added here.** The gates physically live (and stay) under `Cuccaro/CuccaroSQIRDirtyFlag/CuccaroModularAddDefinitions.lean`, because they are built on the Cuccaro MAJ/UMA adder and are imported by `ModMult/`; this file re-exports them so the Cuccaro modular adder follows the same Def/Correctness/Resource spine as `ModularAdder.Gidney`. THE adder: • `sqir_style_modAddConst_clean_gate bits N c` — uncontrolled clean `(x + c) mod N`. • `sqir_style_controlledModAddConst_gate bits q_start N c controlIdx flagPos` — controlled version; at the SQIR layout `q_start = 2, flagPos = 1` this is the gate `ModMult.modmult_step_gate` calls. Construction: the same textbook pipeline as the Gidney adder (add `c` → compareConst `N` (forward-MAJ-only comparator copies `decide(N ≤ x+c)` into a flag qubit) → conditional subtract `N` → cleanup uncomputes the flag), but the "add" slot is filled by the Cuccaro adder `cuccaro_n_bit_adder_full`. Where to look next: • Correctness : `Cuccaro/Correctness.lean` • Resource : `Cuccaro/Resource.lean` • Proofs (do not edit — consumed by `ModMult/`) : `Cuccaro/CuccaroSQIRDirtyFlag/*`.

(no documented top-level declarations)

FormalRV.Arithmetic.ModularAdder.Cuccaro.Resource

FormalRV/Arithmetic/ModularAdder/Cuccaro/Resource.lean

FormalRV.Arithmetic.ModularAdder.Cuccaro.Resource ───────────────────────────────────────────────── THE resource theorem for the Cuccaro/SQIR-style modular adder. As with the Gidney spine the natural resource is the **qubit budget**: the modular adder is `WellTyped` on `sqir_modmult_rev_anc bits` qubits (the SQIR `ModMult.v` reverse workspace — the `2·bits`-wide Cuccaro block at `q_start = 2`, the flag at position 1, plus the reverse ancillas). Surfaced as thin wrappers extracting the WellTyped conjunct of the clean correctness bundles. T-count note: each modular-add block is a constant number of Cuccaro adders (add `c`, compare `N`, conditional subtract `N`, cleanup); the verified multiplier applies one controlled block per multiplier bit, giving the `modmult_tcount = 112·bits²` figure proved in `ModMult/`. No separate closed-form T-count is restated here. Where to look next: • Definition : `Cuccaro/Def.lean` • Correctness : `Cuccaro/Correctness.lean`

theoremcuccaroModAddConst_wellTyped

theorem cuccaroModAddConst_wellTyped
    (bits N c x : Nat) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2 ^ bits) (hN2 : 2 * N ≤ 2 ^ bits)
    (hc : c < N) (hx : x < N) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
      (sqir_style_modAddConst_clean_gate bits N c)

*Cuccaro modular adder — qubit budget (uncontrolled).** `WellTyped` on `sqir_modmult_rev_anc bits` qubits.

theoremcuccaroControlledModAddConst_wellTyped

theorem cuccaroControlledModAddConst_wellTyped
    (bits N c x controlIdx : Nat) (control : Bool) (hbits : 1 ≤ bits)
    (hN_pos : 0 < N) (hN : N ≤ 2 ^ bits) (hN2 : 2 * N ≤ 2 ^ bits)
    (hc : c < N) (hx : x < N)
    (hcontrol_out : controlIdx < 2 ∨ 2 + 2 * bits + 1 ≤ controlIdx)
    (hcontrol_ne_flag : controlIdx ≠ 1)
    (h_control_workspace_lt : controlIdx < sqir_modmult_rev_anc bits) :
    Gate.WellTyped (sqir_modmult_rev_anc bits)
      (sqir_style_controlledModAddConst_gate bits 2 N c controlIdx 1)

*Cuccaro controlled modular adder — qubit budget (THE resource headline).** `WellTyped` on `sqir_modmult_rev_anc bits` qubits, for out-of-band `controlIdx ≠ 1` with `controlIdx < sqir_modmult_rev_anc bits`.

FormalRV.Arithmetic.ModularAdder.Cuccaro.TimeCount

FormalRV/Arithmetic/ModularAdder/Cuccaro/TimeCount.lean

FormalRV.Arithmetic.ModularAdder.Cuccaro.TimeCount ────────────────────────────────────────────────── THE time-resource (T-count) theorems for the Cuccaro-style (SQIR-layout) modular adder — closing the audit gap "ModularAdder time counts missing". The gates here ARE the SQIR clean modular-add gates (`Def.lean` re-exports them from `Cuccaro/CuccaroSQIRDirtyFlag`), so the closed forms are proven once in `Cuccaro/CuccaroVariantsResource.lean` and re-surfaced here as the spine's TIME headlines: sqir_style_modAddConst_clean_gate bits N c 56·bits (c ≠ 0) sqir_style_controlledModAddConst_gate bits … 56·bits (c ≠ 0) ANCHORED: `Gate.tcount` (= `Resource.countT`) walking the same syntactic objects verified by `…clean_candidate_clean` / `…candidate_clean_qstart`. Cross-check: ModMult's proven `112·bits²` = `2 × bits × 56·bits`.

theoremtcount_cuccaro_style_modAddConst_clean_gate

theorem tcount_cuccaro_style_modAddConst_clean_gate (bits N c : Nat) :
    tcount (sqir_style_modAddConst_clean_gate bits N c)
      = if c = 0 then 0 else 56 * bits

*THE Cuccaro-style clean modular add-constant gate: T-count = 56·bits** (0 in the dispatched `c = 0` identity case).

theoremtcount_cuccaro_style_controlledModAddConst_gate

theorem tcount_cuccaro_style_controlledModAddConst_gate
    (bits q_start N c controlIdx flagPos : Nat) :
    tcount (sqir_style_controlledModAddConst_gate bits q_start N c controlIdx flagPos)
      = if c = 0 then 0 else 56 * bits

*The CONTROLLED Cuccaro-style modular add-constant gate: T-count = 56·bits** (0 for `c = 0`) — the per-bit primitive whose `bits`-fold forward + uncompute composition is ModMult's proven `112·bits²`.

FormalRV.Arithmetic.ModularAdder.Gidney

FormalRV/Arithmetic/ModularAdder/Gidney.lean

FormalRV.Arithmetic.ModularAdder.Gidney ─────────────────────────────────────── THE Gidney-based modular adder `(x + c) mod N`, built on the patched Gidney ripple-carry adder. Follows the Def / Correctness / Resource spine convention: • `Gidney/Def.lean` — THE definitions (`modAddConstGate`, `controlledModAddConstGate`, and the standalone modular-multiplier tower). No proofs. • `Gidney/Correctness.lean` — `modAddConst_correct`, `controlledModAddConst_correct`. • `Gidney/Resource.lean` — qubit budget (`controlledModAddConst_wellTyped`). Supporting proofs (read only if auditing): `Gidney/PowerOfTwoCase.lean`, `Gidney/ForwardFaithfulness.lean`, `Gidney/ControlledPipeline.lean`, `Gidney/SwapSemantics.lean`. ⚠️ Fully verified, but STANDALONE — the verified Shor multiplier uses the Cuccaro/SQIR family (`FormalRV.Arithmetic.ModularAdder.Cuccaro`). See `ModularAdder/README.md`.

(no documented top-level declarations)

FormalRV.Arithmetic.ModularAdder.Gidney.ControlledPipeline

FormalRV/Arithmetic/ModularAdder/Gidney/ControlledPipeline.lean

(no documented top-level declarations)

FormalRV.Arithmetic.ModularAdder.Gidney.ControlledPipeline.ModInverseAndSwapCorrectness

FormalRV/Arithmetic/ModularAdder/Gidney/ControlledPipeline/ModInverseAndSwapCorrectness.lean

ControlledPipeline — Part4 (re-export shim part; same namespace, opens de-duplicated).

theoreminv_mul_mod_eq_self

theorem inv_mul_mod_eq_self (a ainv N x : Nat) (hN : 0 < N)
    (hx : x < N) (hainv : ainv < N) (h_inv : a * ainv % N = 1) :
    ainv * (a * x % N) % N = x

*Modular-inverse "undo" identity.** If `a * ainv ≡ 1 (mod N)`, `x < N`, and `ainv < N`, then `ainv * (a*x mod N) mod N = x`.

theoremmod_inv_cancel_identity

theorem mod_inv_cancel_identity (a ainv N x : Nat) (hN : 0 < N)
    (hx : x < N) (hainv : ainv < N) (h_inv : a * ainv % N = 1) :
    (x + (N - ainv) * (a * x % N)) % N = 0

*Modular cancellation by the additive-inverse-mod-N coefficient.** If `a * ainv ≡ 1 (mod N)`, `x < N`, `ainv < N`, then `(x + (N - ainv) * (a*x mod N)) mod N = 0`. This is the algebraic identity that justifies the third stage of the in-place modular multiplier wrapper.

theoremmult_target_swap_aux_succ

theorem mult_target_swap_aux_succ (bits k : Nat) :
    mult_target_swap_aux bits (k + 1)
    = Gate.seq (mult_target_swap_aux bits k)
               (qubit_swap (adder_n_qubits (bits + 1) + k) (target_idx k))

Recursion unfolding for `mult_target_swap_aux`.

theoremmult_target_swap_aux_wellTyped

theorem mult_target_swap_aux_wellTyped
    (bits multBits k : Nat) (hbits : 1 ≤ bits)
    (h_multBits_le : multBits ≤ bits + 1) (hk : k ≤ multBits) :
    Gate.WellTyped (adder_n_qubits (bits + 1) + multBits + 1)
      (mult_target_swap_aux bits k)

*WellTyped for `mult_target_swap_aux`.** At dimension `adder_n_qubits (bits + 1) + multBits + 1` (Shor-compatible), each constituent `qubit_swap (adder_n_qubits + k) (target_idx k)` is well-typed when `k ≤ multBits ≤ bits + 1`.

theoremmult_target_swap_wellTyped

theorem mult_target_swap_wellTyped
    (bits multBits : Nat) (hbits : 1 ≤ bits)
    (h_multBits_le : multBits ≤ bits + 1) :
    Gate.WellTyped (adder_n_qubits (bits + 1) + multBits + 1)
      (mult_target_swap bits multBits)

*WellTyped for `mult_target_swap`.**

theoremmodMultInPlace_wellTyped

theorem modMultInPlace_wellTyped
    (bits N a ainv multBits : Nat) (hbits : 1 ≤ bits)
    (h_multBits_le : multBits ≤ bits + 1) :
    Gate.WellTyped (adder_n_qubits (bits + 1) + multBits + 1)
      (modMultInPlace bits N a ainv multBits)

*WellTyped for `modMultInPlace`.**

theoremmodMultInPlace_wellTyped_at_shor_dim

theorem modMultInPlace_wellTyped_at_shor_dim
    (bits N a ainv multBits : Nat) (hbits : 1 ≤ bits)
    (h_multBits_le : multBits ≤ bits + 1) :
    Gate.WellTyped (multBits + (adder_n_qubits (bits + 1) + 1))
      (modMultInPlace bits N a ainv multBits)

*In-place WellTyped at the Shor-compatible dimension.**

theoremmult_target_swap_aux_at_other

theorem mult_target_swap_aux_at_other
    (bits n : Nat) (f : Nat → Bool) (q : Nat)
    (h_n_le : n ≤ bits + 1)
    (h_outside : ∀ k, k < n →
      q ≠ adder_n_qubits (bits + 1) + k ∧ q ≠ target_idx k) :
    Gate.applyNat (mult_target_swap_aux bits n) f q = f q

*At-other for `mult_target_swap_aux`.** If `q` is not equal to any swap-paired position (multiplier-side or target-side) up to iteration `n`, then the gate is identity at `q`. Requires `n ≤ bits + 1` to ensure each swap-pair has distinct positions.

theoremmult_target_swap_aux_at_mult

theorem mult_target_swap_aux_at_mult
    (bits n : Nat) (f : Nat → Bool) (j : Nat) (hj : j < n)
    (h_n_le : n ≤ bits + 1) :
    Gate.applyNat (mult_target_swap_aux bits n) f
      (adder_n_qubits (bits + 1) + j)
    = f (target_idx j)

*At multiplier-side position**: at `adder_n_qubits + j` for `j < n`, the gate returns `f (target_idx j)`. Requires `n ≤ bits + 1`.

theoremmult_target_swap_aux_at_target

theorem mult_target_swap_aux_at_target
    (bits n : Nat) (f : Nat → Bool) (j : Nat) (hj : j < n)
    (h_n_le : n ≤ bits + 1) :
    Gate.applyNat (mult_target_swap_aux bits n) f (target_idx j)
    = f (adder_n_qubits (bits + 1) + j)

*At target-side position**: at `target_idx j` for `j < n`, the gate returns `f (adder_n_qubits + j)`. Requires `n ≤ bits + 1`.

FormalRV.Arithmetic.ModularAdder.Gidney.ControlledPipeline.MultiplierCommuteAndReorder

FormalRV/Arithmetic/ModularAdder/Gidney/ControlledPipeline/MultiplierCommuteAndReorder.lean

ControlledPipeline — Part2 (re-export shim part; same namespace, opens de-duplicated).

theoremmodMultConstGateAux_commute_update_outer

theorem modMultConstGateAux_commute_update_outer
    (bits N a multBits k p : Nat) (v : Bool) (hbits : 1 ≤ bits)
    (hk : k ≤ multBits) (hp : adder_n_qubits (bits + 1) + multBits < p) :
    ∀ (f : Nat → Bool),
      Gate.applyNat (modMultConstGateAux bits N a multBits k) (update f p v)
      = update (Gate.applyNat (modMultConstGateAux bits N a multBits k) f) p v

*Commute lemma for `modMultConstGateAux`.** At positions strictly above the multiplier circuit's flag (i.e., `p > adder_n_qubits (bits+1) + multBits`), an `update _ p v` commutes through the full multiplier auxiliary gate. Proven directly via `applyNat_commute_update_above_dim` applied to `modMultConstGateAux_wellTyped`.

theoremmodMultConstGate_commute_update_outer

theorem modMultConstGate_commute_update_outer
    (bits N a multBits p : Nat) (v : Bool) (hbits : 1 ≤ bits)
    (hp : adder_n_qubits (bits + 1) + multBits < p) :
    ∀ (f : Nat → Bool),
      Gate.applyNat (modMultConstGate bits N a multBits) (update f p v)
      = update (Gate.applyNat (modMultConstGate bits N a multBits) f) p v

*Commute lemma for `modMultConstGate`.** Specialization of the aux-level commute lemma at `k = multBits`.

theoremmodMultConstGateAux_commute_update_mult_pos_above

theorem modMultConstGateAux_commute_update_mult_pos_above
    (bits N a multBits k j : Nat) (v : Bool) (hbits : 1 ≤ bits)
    (hk : k ≤ multBits) (hjk : k ≤ j) (hj : j < multBits) :
    ∀ (f : Nat → Bool),
      Gate.applyNat (modMultConstGateAux bits N a multBits k)
          (update f (adder_n_qubits (bits + 1) + j) v)
      = update (Gate.applyNat (modMultConstGateAux bits N a multBits k) f)
          (adder_n_qubits (bits + 1) + j) v

*`modMultConstGateAux` commute lemma at a multiplier-bit position.** For positions in the multiplier-bit range `p = adder_n_qubits (bits+1) + j` with `j < multBits` AND `j ≥ k` (i.e., a multiplier bit that has NOT yet been touched by iterations `0, 1, ..., k-1`), `update _ p v` commutes through `modMultConstGateAux bits N a multBits k`. Proven by induction on `k`, using `controlledModAddConstGate_commute_update_outer` for the step.

theoremmult_input_F_aux_succ

theorem mult_input_F_aux_succ (bits multBits m i : Nat) (f : Nat → Bool) :
    mult_input_F_aux bits multBits m (i + 1) f
    = update (mult_input_F_aux bits multBits m i f)
             (adder_n_qubits (bits + 1) + i) (Nat.testBit m i)

Recursion unfolding for the aux at `i+1`.

theoremmult_input_F_aux_at_mult_pos

theorem mult_input_F_aux_at_mult_pos
    (bits multBits m i j : Nat) (hj : j < i) (f : Nat → Bool) :
    mult_input_F_aux bits multBits m i f (adder_n_qubits (bits + 1) + j)
    = Nat.testBit m j

Decoder at multiplier-bit positions: `mult_input_F_aux ... i f` at position `adder_n_qubits (bits+1) + j` returns `Nat.testBit m j`, when `j < i` (i.e., bit `j` has been written by some iteration ≤ i-1).

theoremmult_input_F_aux_at_non_mult_pos

theorem mult_input_F_aux_at_non_mult_pos
    (bits multBits m i p : Nat)
    (h_outside : p < adder_n_qubits (bits + 1) ∨ adder_n_qubits (bits + 1) + i ≤ p)
    (f : Nat → Bool) :
    mult_input_F_aux bits multBits m i f p = f p

Decoder at non-multiplier positions: `mult_input_F_aux ... i f` at position `p` outside the multiplier-bit range `[adder_n_qubits (bits+1), adder_n_qubits (bits+1) + i)` equals `f p`.

theoremmult_input_F_at_mult_pos

theorem mult_input_F_at_mult_pos
    (bits multBits x m j : Nat) (hj : j < multBits) :
    mult_input_F bits multBits x m (adder_n_qubits (bits + 1) + j)
    = Nat.testBit m j

Top-level decoder at multiplier-bit position.

theoremmult_input_F_at_non_mult_pos

theorem mult_input_F_at_non_mult_pos
    (bits multBits x m p : Nat)
    (h_outside : p < adder_n_qubits (bits + 1)
                 ∨ adder_n_qubits (bits + 1) + multBits ≤ p) :
    mult_input_F bits multBits x m p = adder_input_F (bits + 1) 0 x p

Top-level decoder at non-multiplier positions: equal to `adder_input_F (bits+1) 0 x`.

theoremmult_input_F_aux_commute_update_above

theorem mult_input_F_aux_commute_update_above
    (bits multBits m i j : Nat) (hj : i ≤ j) (v : Bool) (f : Nat → Bool) :
    mult_input_F_aux bits multBits m i (update f (adder_n_qubits (bits + 1) + j) v)
    = update (mult_input_F_aux bits multBits m i f)
             (adder_n_qubits (bits + 1) + j) v

`mult_input_F_aux` commutes with an `update _ (adder_n_qubits (bits+1) + j) v` when `j ≥ i` (i.e., the iteration hasn't touched position `pos j` yet).

theoremmult_input_F_isolate_k

theorem mult_input_F_isolate_k
    (bits multBits x m k : Nat) (hk : k < multBits) :
    mult_input_F bits multBits x m
    = mult_input_F_aux bits multBits m multBits
        (update (adder_input_F (bits + 1) 0 x)
                (adder_n_qubits (bits + 1) + k) (Nat.testBit m k))

*`mult_input_F` isolation at position `k`.** For `k < multBits`, the full multiplier-encoded input is equal to `mult_input_F_aux` at iteration `multBits` applied to a base that already carries the k-th multiplier update on `adder_input_F`. The k-th iteration of the aux overwrites position `adder_n_qubits (bits+1) + k` to the same value (`Nat.testBit m k`), so the additional update is absorbed; outside the multiplier range the update at `pos k` is transparent.

theoremmult_input_F_aux_absorb_at_k_position

theorem mult_input_F_aux_absorb_at_k_position
    (bits multBits m k : Nat) (f : Nat → Bool) :
    mult_input_F_aux bits multBits m (k + 1)
        (update f (adder_n_qubits (bits + 1) + k) (Nat.testBit m k))
    = mult_input_F_aux bits multBits m k
        (update f (adder_n_qubits (bits + 1) + k) (Nat.testBit m k))

Absorption lemma: when an outer `update` at the k-th multiplier position rewrites a value that the inner aux-at-iteration-k already carries (because the inner has `update f (pos k) (testBit m k)` as base and aux at k doesn't touch pos k), the outer update is a no-op.

theoremCMAcg_on_mult_input_F_aux_iso

theorem CMAcg_on_mult_input_F_aux_iso
    (bits N c x m multBits k : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < N) (hc_pos : 0 < c) (hc : c < N) (hk : k < multBits) :
    ∀ i, i ≤ multBits →
    Gate.applyNat
      (controlledModAddConstGate bits N c
        (adder_n_qubits (bits + 1) + k) (adder_n_qubits (bits + 1) + multBits))
      (mult_input_F_aux bits multBits m i
        (update (adder_input_F (bits + 1) 0 x)
                (adder_n_qubits (bits + 1) + k) (Nat.testBit m k)))
    = mult_input_F_aux bits multBits m i

Inductive helper for the single-step correctness on `mult_input_F`.

theoremcontrolledModAddConstGate_on_mult_input_F

theorem controlledModAddConstGate_on_mult_input_F
    (bits N c x m multBits k : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < N) (hc_pos : 0 < c) (hc : c < N) (hk : k < multBits) :
    Gate.applyNat
      (controlledModAddConstGate bits N c
        (adder_n_qubits (bits + 1) + k) (adder_n_qubits (bits + 1) + multBits))
      (mult_input_F bits multBits x m)
    = mult_input_F bits multBits
        (if Nat.testBit m k then (x + c) % N else x) m

*Single-step correctness for `controlledModAddConstGate` on `mult_input_F`.** Applied to the multiplier-encoded input `mult_input_F bits multBits x m`, the controlled modular-add gate (controlled by the `k`-th multiplier qubit, with shared flag at position `adder_n_qubits (bits+1) + multBits`) advances the adder's target register from `x` to `(x + c) % N` when bit `k` of `m` is set, or leaves it unchanged otherwise.

FormalRV.Arithmetic.ModularAdder.Gidney.ControlledPipeline.MultiplierCorrectness

FormalRV/Arithmetic/ModularAdder/Gidney/ControlledPipeline/MultiplierCorrectness.lean

ControlledPipeline — Part3 (re-export shim part; same namespace, opens de-duplicated).

lemmam_mod_two_pow_succ_eq

lemma m_mod_two_pow_succ_eq (m k : Nat) :
    m % 2^(k+1) = m % 2^k + (m / 2^k % 2) * 2^k

*Bit decomposition for the next power of two.** `m mod 2^(k+1) = m mod 2^k + (testBit m k as Nat) * 2^k`.

theoremmodMultConstGateAux_correct

theorem modMultConstGateAux_correct
    (bits N a multBits x m : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < N)
    (h_const_pos : ∀ j, j < multBits → 0 < (a * 2^j) % N) :
    ∀ k, k ≤ multBits →
    Gate.applyNat (modMultConstGateAux bits N a multBits k)
                  (mult_input_F bits multBits x m)
    = mult_input_F bits multBits ((x + a * (m % 2^k)) % N) m

*Inductive correctness for `modMultConstGateAux`.** At iteration `k ≤ multBits`, the aux gate has advanced the adder's target from `x` to `(x + a * (m mod 2^k)) mod N`, given that each per-bit constant `(a * 2^j) % N` is non-zero for `j < multBits`.

theoremmodMultConstGate_correct

theorem modMultConstGate_correct
    (bits N a multBits x m : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < N)
    (hm : m < 2^multBits)
    (h_const_pos : ∀ j, j < multBits → 0 < (a * 2^j) % N) :
    Gate.applyNat (modMultConstGate bits N a multBits)
                  (mult_input_F bits multBits x m)
    = mult_input_F bits multBits ((x + a * m) % N) m

*Modular multiplier correctness.** When `m < 2^multBits`, the modular multiplier gate sends the adder's target from `x` to `(x + a * m) mod N`, while preserving the multiplier register `m` and the flag. Equivalent form: each multiplier-bit `i` contributes `(a * 2^i) mod N` to the target when set.

theoremmult_state_init_at_mult_pos

theorem mult_state_init_at_mult_pos
    (bits multBits x j : Nat) (hj : j < multBits) :
    mult_state_init bits multBits x (adder_n_qubits (bits + 1) + j)
    = Nat.testBit x j

Decoder at multiplier-bit positions.

theoremmult_state_init_at_non_mult_pos

theorem mult_state_init_at_non_mult_pos
    (bits multBits x p : Nat)
    (h_outside : p < adder_n_qubits (bits + 1)
                 ∨ adder_n_qubits (bits + 1) + multBits ≤ p) :
    mult_state_init bits multBits x p = adder_input_F (bits + 1) 0 0 p

Decoder at non-multiplier positions: zero.

theoremmodMultConstGate_on_init_correct

theorem modMultConstGate_on_init_correct
    (bits N a multBits x : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < 2^multBits)
    (h_const_pos : ∀ j, j < multBits → 0 < (a * 2^j) % N) :
    Gate.applyNat (modMultConstGate bits N a multBits)
                  (mult_state_init bits multBits x)
    = mult_input_F bits multBits ((a * x) % N) x

*Modular multiplier on the initial input state.** When applied to `mult_state_init bits multBits x` (multiplier register holds `x`, adder zeroed), the gate produces a state whose adder-target register encodes `(a * x) mod N` while the multiplier register `x` is preserved. Hypotheses ensure each per-bit constant `(a * 2^j) % N` is positive (Shor's coprimality condition) and `x < 2^multBits`.

theoremmodMultConstGate_wellTyped_at_shor_dim

theorem modMultConstGate_wellTyped_at_shor_dim
    (bits N a multBits : Nat) (hbits : 1 ≤ bits) :
    Gate.WellTyped (multBits + (adder_n_qubits (bits + 1) + 1))
      (modMultConstGate bits N a multBits)

*WellTyped corollary at the Shor-compatible dimension.** Setting `n := multBits` (the data register size) and `anc := adder_n_qubits (bits+1) + 1` (the workspace including the flag), the modular multiplier gate is well-typed at dimension `n + anc`, matching the shape required by `encodeDataZeroAnc n anc` and `MultiplyCircuitProperty a N n anc`.

theoremf_modmult_step_gate_wellTyped

theorem f_modmult_step_gate_wellTyped
    (bits N a multBits i : Nat) (hbits : 1 ≤ bits) :
    Gate.WellTyped (multBits + (adder_n_qubits (bits + 1) + 1))
      (f_modmult_step_gate bits N a multBits i)

*WellTyped** for the step gate at the Shor-compatible dimension.

theoremf_modmult_step_gate_wellTyped_aux

theorem f_modmult_step_gate_wellTyped_aux
    (bits N a multBits i : Nat) (hbits : 1 ≤ bits) :
    Gate.WellTyped (adder_n_qubits (bits + 1) + multBits + 1)
      (f_modmult_step_gate bits N a multBits i)

*WellTyped** at the original aux dimension.

theoremf_modmult_step_gate_on_init_correct

theorem f_modmult_step_gate_on_init_correct
    (bits N a multBits i x : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < 2^multBits)
    (h_const_pos :
      ∀ j, j < multBits → 0 < ((a^(2^i) % N) * 2^j) % N) :
    Gate.applyNat (f_modmult_step_gate bits N a multBits i)
                  (mult_state_init bits multBits x)
    = mult_input_F bits multBits ((a^(2^i) * x) % N) x

*Step correctness on the initial state.** Applied to `mult_state_init bits multBits x`, the step gate at iterate `i` produces a state whose adder-target register holds `(a^(2^i) * x) % N` while the multiplier register `x` is preserved. Hypotheses ensure each per-bit constant `((a^(2^i)) * 2^j) % N` is positive (the analogue of Shor's coprimality condition for the squared base).

theoremf_modmult_gate_family_wellTyped

theorem f_modmult_gate_family_wellTyped
    (bits N a multBits : Nat) (hbits : 1 ≤ bits) :
    ∀ i, Gate.WellTyped (multBits + (adder_n_qubits (bits + 1) + 1))
            (f_modmult_gate_family bits N a multBits i)

*Family-level WellTyped.** For every iterate `i`, the gate `f_modmult_gate_family bits N a multBits i` is `Gate.WellTyped` at the Shor-compatible dimension `n + anc = multBits + (adder_n_qubits (bits+1) + 1)`.

theoremf_modmult_gate_family_on_init_correct

theorem f_modmult_gate_family_on_init_correct
    (bits N a multBits : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (h_const_pos :
      ∀ i, ∀ j, j < multBits → 0 < ((a^(2^i) % N) * 2^j) % N) :
    ∀ i x, x < 2^multBits →
      Gate.applyNat (f_modmult_gate_family bits N a multBits i)
                    (mult_state_init bits multBits x)
      = mult_input_F bits multBits ((a^(2^i) * x) % N) x

*Family-level out-of-place correctness on the initial state.** For each iterate `i`, applied to `mult_state_init bits multBits x`, the family member produces a state with adder-target register holding `(a^(2^i) * x) mod N` and multiplier register `x` preserved.

theoremqubit_swap_wellTyped

theorem qubit_swap_wellTyped (dim a b : Nat)
    (ha : a < dim) (hb : b < dim) (hab : a ≠ b) :
    Gate.WellTyped dim (qubit_swap a b)

Well-typedness for `qubit_swap`.

theoremqubit_swap_correct

theorem qubit_swap_correct (a b : Nat) (f : Nat → Bool) (hab : a ≠ b) :
    Gate.applyNat (qubit_swap a b) f
    = update (update f a (f b)) b (f a)

*Boolean-state correctness for SWAP.** Applied to `f`, the swap gate produces a state with values at positions `a` and `b` exchanged.

theoremregister_swap_aux_succ

theorem register_swap_aux_succ
    (offsetA offsetB k : Nat) :
    register_swap_aux offsetA offsetB (k + 1)
    = Gate.seq (register_swap_aux offsetA offsetB k)
               (qubit_swap (offsetA + k) (offsetB + k))

Recursion unfolding for `register_swap_aux`.

theoremregister_swap_aux_wellTyped

theorem register_swap_aux_wellTyped
    (dim offsetA offsetB k : Nat) (hdim : 0 < dim)
    (hA : offsetA + k ≤ dim) (hB : offsetB + k ≤ dim)
    (h_disjoint : offsetA + k ≤ offsetB ∨ offsetB + k ≤ offsetA) :
    Gate.WellTyped dim (register_swap_aux offsetA offsetB k)

*WellTyped for `register_swap_aux`.** Requires non-empty `dim`, both offset ranges fitting inside `dim`, and the two ranges being disjoint.

theoremregister_swap_wellTyped

theorem register_swap_wellTyped
    (dim multBits offsetA offsetB : Nat) (hdim : 0 < dim)
    (hA : offsetA + multBits ≤ dim) (hB : offsetB + multBits ≤ dim)
    (h_disjoint : offsetA + multBits ≤ offsetB ∨ offsetB + multBits ≤ offsetA) :
    Gate.WellTyped dim (register_swap multBits offsetA offsetB)

*WellTyped for `register_swap`.**

theoremregister_swap_aux_at_other

theorem register_swap_aux_at_other
    (offsetA offsetB n : Nat) (f : Nat → Bool) (q : Nat)
    (h_disjoint : offsetA + n ≤ offsetB ∨ offsetB + n ≤ offsetA)
    (h_outside_A : q < offsetA ∨ offsetA + n ≤ q)
    (h_outside_B : q < offsetB ∨ offsetB + n ≤ q) :
    Gate.applyNat (register_swap_aux offsetA offsetB n) f q = f q

*Correctness at "other" positions** of `register_swap_aux`. At any position outside both `[offsetA, offsetA + n)` and `[offsetB, offsetB + n)`, the gate is identity.

theoremregister_swap_aux_at_A

theorem register_swap_aux_at_A
    (offsetA offsetB n : Nat) (f : Nat → Bool) (j : Nat) (hj : j < n)
    (h_disjoint : offsetA + n ≤ offsetB ∨ offsetB + n ≤ offsetA) :
    Gate.applyNat (register_swap_aux offsetA offsetB n) f (offsetA + j)
    = f (offsetB + j)

*Correctness at A positions**: at `offsetA + j` for `j < n`, the gate returns `f (offsetB + j)`.

theoremregister_swap_aux_at_B

theorem register_swap_aux_at_B
    (offsetA offsetB n : Nat) (f : Nat → Bool) (j : Nat) (hj : j < n)
    (h_disjoint : offsetA + n ≤ offsetB ∨ offsetB + n ≤ offsetA) :
    Gate.applyNat (register_swap_aux offsetA offsetB n) f (offsetB + j)
    = f (offsetA + j)

*Correctness at B positions**: at `offsetB + j` for `j < n`, the gate returns `f (offsetA + j)`.

FormalRV.Arithmetic.ModularAdder.Gidney.ControlledPipeline.MultiplierStepBaseCases

FormalRV/Arithmetic/ModularAdder/Gidney/ControlledPipeline/MultiplierStepBaseCases.lean

ControlledPipeline — Part1 (re-export shim part; same namespace, opens de-duplicated).

theoremcontrolled_step5_true

theorem controlled_step5_true
    (bits N c x controlIdx flagIdx : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc_pos : 0 < c) (hc : c < N)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.applyNat (conditionalAddConstGate (bits + 1) (2^(bits+1) - c) controlIdx)
      (update (update (adder_input_F (bits + 1) 0 ((x + c) % N)) controlIdx true)
              flagIdx (decide ((x + c) < N)))
    = update (update (adder_input_F (bits + 1) 0 (subConstPow2WideSpec bits c ((x + c) % N)))
                controlIdx true) flagIdx (decide ((x + c) < N))

Intermediate: applying step 5 of controlled pipeline (controlled sub c) with controlBit = true takes target from `(x+c) % N` to `subConstPow2WideSpec bits c ((x+c) % N)`.

theoremcontrolled_step6_true

theorem controlled_step6_true
    (bits N c x controlIdx flagIdx : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc_pos : 0 < c) (hc : c < N)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.applyNat (Gate.CCX controlIdx (target_idx bits) flagIdx)
      (update (update (adder_input_F (bits + 1) 0 (subConstPow2WideSpec bits c ((x + c) % N)))
                  controlIdx true) flagIdx (decide ((x + c) < N)))
    = update (update (adder_input_F (bits + 1) 0 (subConstPow2WideSpec bits c ((x + c) % N)))
                controlIdx true) flagIdx true

Intermediate: applying step 6 of controlled pipeline (second CCX flag-copy) with controlBit = true sets flagIdx to `TRUE` (the XOR of the comparison flag and its complement).

theoremcontrolled_step7_true

theorem controlled_step7_true
    (bits c x controlIdx flagIdx : Nat) (y : Nat)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.applyNat (Gate.CX controlIdx flagIdx)
      (update (update (adder_input_F (bits + 1) 0 y) controlIdx true) flagIdx true)
    = update (update (adder_input_F (bits + 1) 0 y) controlIdx true) flagIdx false

Intermediate: applying step 7 of controlled pipeline (controlled X flipping flagIdx) takes flagIdx from `TRUE` to `FALSE`.

theoremcontrolled_step8_true

theorem controlled_step8_true
    (bits N c x controlIdx flagIdx : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc_pos : 0 < c) (hc : c < N)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.applyNat (conditionalAddConstGate (bits + 1) c controlIdx)
      (update (update (adder_input_F (bits + 1) 0 (subConstPow2WideSpec bits c ((x + c) % N)))
                  controlIdx true) flagIdx false)
    = update (update (adder_input_F (bits + 1) 0 ((x + c) % N)) controlIdx true) flagIdx false

Intermediate: applying step 8 of controlled pipeline (final controlled add c) takes target from `subConstPow2WideSpec bits c ((x + c) % N)` to `(x + c) % N` via algebraic cancellation.

theoremcontrolledModAddConstGate_correct_true

theorem controlledModAddConstGate_correct_true
    (bits N c x : Nat) (controlIdx flagIdx : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < N) (hc_pos : 0 < c) (hc : c < N)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.applyNat (controlledModAddConstGate bits N c controlIdx flagIdx)
      (update (adder_input_F (bits + 1) 0 x) controlIdx true)
    = update (adder_input_F (bits + 1) 0 ((x + c) % N)) controlIdx true

*Tick 6p HEADLINE — `controlBit = true` branch**. When the control bit is `true`, the full 8-step pipeline produces target = `(x + c) % N` with all workspace restored.

theoremcontrolledModAddConstGate_correct

theorem controlledModAddConstGate_correct
    (bits N c x : Nat) (controlBit : Bool) (controlIdx flagIdx : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < N) (hc_pos : 0 < c) (hc : c < N)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.applyNat (controlledModAddConstGate bits N c controlIdx flagIdx)
      (update (adder_input_F (bits + 1) 0 x) controlIdx controlBit)
    = update (adder_input_F (bits + 1) 0 (if controlBit then (x + c) % N else x))
        controlIdx controlBit

*Tick 6 HEADLINE — full `controlledModAddConstGate_correct`**. For any `controlBit`, the 8-step pipeline produces target = `if controlBit then (x + c) % N else x` with all workspace restored.

theoremmodMultConstGateAux_zero

theorem modMultConstGateAux_zero (bits N a multBits : Nat) :
    modMultConstGateAux bits N a multBits 0 = Gate.I

`modMultConstGateAux ... 0 = Gate.I` by definition.

theoremmodMultConstGate_zero

theorem modMultConstGate_zero (bits N a : Nat) :
    modMultConstGate bits N a 0 = Gate.I

`modMultConstGate ... 0 = Gate.I` (zero-bit multiplier is the identity).

theoremmodMultConstGateAux_succ

theorem modMultConstGateAux_succ (bits N a multBits k : Nat) :
    modMultConstGateAux bits N a multBits (k + 1)
    = Gate.seq
        (modMultConstGateAux bits N a multBits k)
        (controlledModAddConstGate bits N ((a * 2^k) % N)
          (adder_n_qubits (bits + 1) + k)
          (adder_n_qubits (bits + 1) + multBits))

Recursive unfolding: `modMultConstGateAux ... (k+1)` is the seq of the `k`-step and the controlled add at bit `k`.

theoremmodMultConstGateAux_wellTyped

theorem modMultConstGateAux_wellTyped
    (bits N a multBits k : Nat) (hbits : 1 ≤ bits) (hk : k ≤ multBits) :
    Gate.WellTyped (adder_n_qubits (bits + 1) + multBits + 1)
      (modMultConstGateAux bits N a multBits k)

Well-typedness of the auxiliary gate at width `adder_n_qubits (bits+1) + multBits + 1` for any `k ≤ multBits`.

theoremmodMultConstGate_wellTyped

theorem modMultConstGate_wellTyped
    (bits N a multBits : Nat) (hbits : 1 ≤ bits) :
    Gate.WellTyped (adder_n_qubits (bits + 1) + multBits + 1)
      (modMultConstGate bits N a multBits)

*Well-typedness of `modMultConstGate`.** The full multiplier gate is well-typed at width `adder_n_qubits (bits+1) + multBits + 1` (adder block + `multBits` multiplier qubits + 1 flag qubit).

theoremmodMultConstGateAux_correct_zero

theorem modMultConstGateAux_correct_zero
    (bits N a multBits : Nat) (f : Nat → Bool) :
    Gate.applyNat (modMultConstGateAux bits N a multBits 0) f = f

Base case: the zero-step multiplier auxiliary gate is identity.

theoremmodMultConstGate_correct_zero

theorem modMultConstGate_correct_zero
    (bits N a : Nat) (f : Nat → Bool) :
    Gate.applyNat (modMultConstGate bits N a 0) f = f

Special case at `multBits = 0`: the full multiplier gate is identity (no multiplier bits to control).

theoremmodMultConstGateAux_apply_succ

theorem modMultConstGateAux_apply_succ
    (bits N a multBits k : Nat) (f : Nat → Bool) :
    Gate.applyNat (modMultConstGateAux bits N a multBits (k + 1)) f
    = Gate.applyNat
        (controlledModAddConstGate bits N ((a * 2^k) % N)
          (adder_n_qubits (bits + 1) + k)
          (adder_n_qubits (bits + 1) + multBits))
        (Gate.applyNat (modMultConstGateAux bits N a multBits k) f)

State-level unfolding for the recursive step.

theoremcontrolledModAddConstGate_commute_update_outer

theorem controlledModAddConstGate_commute_update_outer
    (bits N c controlIdx flagIdx p : Nat) (v : Bool) (hbits : 1 ≤ bits)
    (hp_dim : adder_n_qubits (bits + 1) ≤ p)
    (h_p_ne_ctrl : p ≠ controlIdx) (h_p_ne_flag : p ≠ flagIdx) :
    ∀ (f : Nat → Bool),
      Gate.applyNat (controlledModAddConstGate bits N c controlIdx flagIdx)
          (update f p v)
      = update (Gate.applyNat (controlledModAddConstGate bits N c controlIdx flagIdx) f)
          p v

*Commute lemma for `controlledModAddConstGate`.** The gate commutes with an `update _ p v` when `p` is outside the gate's read/write set: `p ≥ adder_n_qubits (bits+1)` (above the adder block), `p ≠ controlIdx`, and `p ≠ flagIdx`. This is the key infrastructure for the inductive multiplier correctness proof, where each iteration's gate must commute past updates at OTHER multiplier-bit positions.

FormalRV.Arithmetic.ModularAdder.Gidney.Correctness

FormalRV/Arithmetic/ModularAdder/Gidney/Correctness.lean

FormalRV.Arithmetic.ModularAdder.Gidney.Correctness ─────────────────────────────────────────────────── THE semantic-correctness theorems for the Gidney-based modular adder `(x + c) mod N`. Surfaced here as thin wrappers; the heavy proofs live in the supporting files (`ForwardFaithfulness.lean`, `ControlledPipeline.lean`, `PowerOfTwoCase.lean`). Headlines: • `modAddConst_correct` — the uncontrolled `(x + c) mod N` gate is WellTyped, decodes the target register to `(x+c) mod N`, and restores the read / carry registers and the comparison flag. • `controlledModAddConst_correct` — the controlled version writes `(x+c) mod N` to the target iff the control bit is set (else leaves `x`), with all workspace restored. Where to look next: • Definition : `Gidney/Def.lean` • Resource (qubit budget) : `Gidney/Resource.lean`

theoremmodAddConst_correct

theorem modAddConst_correct
    (bits N c x : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2 ^ bits) (hx : x < N) (hc_pos : 0 < c) (hc : c < N) :
    Gate.WellTyped (adder_n_qubits (bits + 1) + 1) (modAddConstGate bits N c)
    ∧ gidney_target_val bits
        (Gate.applyNat (modAddConstGate bits N c) (adder_input_F (bits + 1) 0 x))
      = (x + c) % N
    ∧ (∀ i, i < bits + 1 →
        Gate.applyNat (modAddConstGate bits N c) (adder_input_F (bits + 1) 0 x)
          (read_idx i) = false)
    ∧ (∀ i, i < bits + 1 →
        Gate.applyNat (modAddConstGate bits N c) (adder_input_F (bits + 1) 0 x)

*Gidney modular adder — correctness (THE headline).** For `1 ≤ bits`, `0 < N ≤ 2^bits`, `x < N`, and `0 < c < N`, the gate `modAddConstGate bits N c` applied to the clean input `adder_input_F (bits+1) 0 x` (data `x` in the target register, read register 0, carries 0, flag 0): 1. is `WellTyped` on `adder_n_qubits (bits+1) + 1` qubits; 2. decodes the target register to `(x + c) mod N`; 3. restores the read register to 0; 4. restores the carry register to 0; 5. restores the comparison flag to 0.

theoremcontrolledModAddConst_correct

theorem controlledModAddConst_correct
    (bits N c x : Nat) (controlBit : Bool) (controlIdx flagIdx : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2 ^ bits)
    (hx : x < N) (hc_pos : 0 < c) (hc : c < N)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.applyNat (controlledModAddConstGate bits N c controlIdx flagIdx)
      (update (adder_input_F (bits + 1) 0 x) controlIdx controlBit)
    = update (adder_input_F (bits + 1) 0 (if controlBit then (x + c) % N else x))
        controlIdx controlBit

*Gidney controlled modular adder — correctness.** For any `controlBit` and out-of-band `controlIdx < flagIdx`, `controlledModAddConstGate bits N c controlIdx flagIdx` writes `(x+c) mod N` to the target register iff the control bit is set (otherwise leaves `x`), with all workspace (read / carry / flag, control bit) restored.

FormalRV.Arithmetic.ModularAdder.Gidney.Def

FormalRV/Arithmetic/ModularAdder/Gidney/Def.lean

FormalRV.Arithmetic.ModularAdder.Gidney.Def ─────────────────────────────────────────── THE definitions of the Gidney-based modular adder `(x + c) mod N`, as concrete `Gate`-IR data built on the patched Gidney ripple-carry adder (`gidney_adder_full_faithful_no_measurement_patched`). **Definitions only — no proofs.** THE adder is `modAddConstGate bits N c` (and its controlled form `controlledModAddConstGate`), assembled from the patched Gidney adder by the textbook modular-reduction pipeline: addConstGate c := load c into read reg ; patched adder ; unload (target += c) subConstGate N := addConstGate (2^bits − N) (two's-complement) conditionalAddConstGate := same, masked by a flag (no CCCX) modAddConstGate N c := add c ; sub N ; copy high bit → flag ; conditional add-back N ; uncompute flag (target = (x+c) mod N) A standalone modular MULTIPLIER (`modMultConstGate`, `modMultInPlace`) is also defined here by repeating the controlled modular adder over multiplier bits. Where to look next: • Correctness : `Gidney/Correctness.lean` • Resource : `Gidney/Resource.lean` • Supporting proofs : `Gidney/PowerOfTwoCase.lean`, `Gidney/ForwardFaithfulness.lean`, `Gidney/ControlledPipeline.lean`, `Gidney/SwapSemantics.lean`. NOTE: this Gidney modular adder is fully verified but STANDALONE; the verified Shor multiplier uses the Cuccaro/SQIR family (`ModularAdder.Cuccaro`).

defmodAddConstSpec

def modAddConstSpec (N c x : Nat) : Nat

Spec for modular addition by a constant under arbitrary modulus `N`.

defaddConstPow2Spec

def addConstPow2Spec (bits c x : Nat) : Nat

Spec specialized to `N = 2^bits` (the case the patched Gidney adder implements natively without any extra circuitry).

defsubConstPow2Spec

def subConstPow2Spec (bits N x : Nat) : Nat

Wraparound spec for subtraction by `N` modulo `2^bits`.

defsubConstPow2WideSpec

def subConstPow2WideSpec (bits N x : Nat) : Nat

Wraparound-subtraction spec at widened bit-count `bits + 1`.

defprepareMaskedConstRead

def prepareMaskedConstRead : Nat → Nat → Nat → Gate
  | 0,     _, _       => Gate.I
  | k + 1, N, flagIdx =>
      Gate.seq (prepareMaskedConstRead k N flagIdx)
               (if N.testBit k then Gate.CX flagIdx (read_idx k) else Gate.I)

Prepare the read register by XORing each `read_idx k` (for `k < bits`) with `flag ∧ N.testBit k`, where the flag bit lives at `flagIdx`. Implemented as a CX cascade conditioned on the bit pattern of `N`.

defconditionalAddConstGate

def conditionalAddConstGate (bits N flagIdx : Nat) : Gate

Conditional add-back gate: prepare the read register with the masked constant `flag ∧ N`, run the patched Gidney adder, un-prepare the read register. The result computes `target := (x + (if flag then N else 0)) mod 2^bits` without using any controlled-CCX (CCCX) gate.

defprepareConstRead

def prepareConstRead : Nat → Nat → Gate
  | 0,     _ => Gate.I
  | k + 1, c => Gate.seq (prepareConstRead k c)
                  (if c.testBit k then Gate.X (read_idx k) else Gate.I)

Unconditionally prepare `read_idx k := c.testBit k` for `k < bits` by applying `X (read_idx k)` whenever `c.testBit k = true`. When applied to a zero read register, sets it to the bits of `c`; applied again (involutive), it clears the read register back to zero.

defaddConstGate

def addConstGate (bits c : Nat) : Gate

Composable constant-add gate: prepare read with `c`, run the patched Gidney adder, unprepare read. Takes a clean `adder_input_F bits 0 x` and produces target = `(x + c) mod 2^bits`, with read register restored to zero and carries cleared.

defsubConstGate

def subConstGate (bits N : Nat) : Gate

Composable constant-sub gate, expressed as wraparound addition of `2^bits - N`. This implements `(x + (2^bits - N)) mod 2^bits`, which equals `(x - N) mod 2^bits` over the two's-complement view.

defmodAddConstArithmeticSpec

def modAddConstArithmeticSpec (bits N c x : Nat) : Nat

Arithmetic-level spec for the widened modular-addition pipeline at width `bits + 1`. Composes: subtract-`N` after add-`c`, conditionally add back `N` when the comparison flag indicates underflow.

defcopyTargetHighBitToFlag

def copyTargetHighBitToFlag (bits flagIdx : Nat) : Gate

Flag-copy gate: a single CX from `target_idx bits` into `flagIdx`.

defmodAddConstGate_dirtyFlag

def modAddConstGate_dirtyFlag (bits N c flagIdx : Nat) : Gate

The full DIRTY-FLAG modular add-constant gate. Pipeline: `addConstGate (bits+1) c ; subConstGate (bits+1) N ; copyTargetHighBitToFlag bits flagIdx ; conditionalAddConstGate (bits+1) N flagIdx`. The result has the low `bits` target bits encoding `(x + c) mod N`, but the flag bit at `flagIdx` is LEFT DIRTY at `decide ((x + c) < N)`. Flag uncomputation is handled in a later tick.

defflagUncomputeGate

def flagUncomputeGate (bits c flagIdx : Nat) : Gate

Reversible flag-uncompute gate: `subConstGate c ; CX (target_idx bits) flagIdx ; X flagIdx ; addConstGate c`. Restores `flagIdx` to false while leaving the target, read, and carry registers unchanged.

defmodAddConstGate

def modAddConstGate (bits N c : Nat) : Gate

*Clean modular add-constant gate**. Composition of the dirty-flag pipeline with the flag-uncompute step. The internal flag bit lives at `adder_n_qubits (bits + 1)`.

defcontrolledModAddConstGate

def controlledModAddConstGate (bits N c controlIdx flagIdx : Nat) : Gate

Controlled modular add-constant gate. Eight-step pipeline: controlled add `c` ; controlled sub `N` ; controlled flag-copy ; flag-controlled add-back `N` ; controlled sub `c` ; controlled flag-copy ; controlled X flag ; controlled add `c`.

defmodMultConstGateAux

def modMultConstGateAux (bits N a multBits : Nat) : Nat → Gate
  | 0 => Gate.I
  | k+1 =>
    Gate.seq
      (modMultConstGateAux bits N a multBits k)
      (controlledModAddConstGate bits N ((a * 2^k) % N)
        (adder_n_qubits (bits + 1) + k)
        (adder_n_qubits (bits + 1) + multBits))

Auxiliary recursive gate for the modular multiplier: applies controlled modular-add of `(a * 2^i) % N` for bits `i = 0, 1, ..., k-1`. The parameter `multBits` is the TOTAL multiplier width (used to position the shared flag qubit); `k` is the recursion index running from 0 up to `multBits`.

defmodMultConstGate

def modMultConstGate (bits N a multBits : Nat) : Gate

Modular multiplier gate: applies `controlledModAddConstGate` for each bit of the multiplier register, accumulating `(a * m) % N` into the adder's target register, where `m` is the natural-number value of the multiplier register.

defmult_input_F_aux

def mult_input_F_aux (bits multBits m : Nat) : Nat → (Nat → Bool) → (Nat → Bool)
  | 0, f => f
  | i+1, f =>
    update (mult_input_F_aux bits multBits m i f)
           (adder_n_qubits (bits + 1) + i) (Nat.testBit m i)

Auxiliary recursive helper for the multiplier-encoded input: starting from `f`, applies an `update _ (adder_n_qubits (bits+1) + j) (Nat.testBit m j)` for each `j = 0, 1, ..., i-1`, in order. The last update written is at `j = i - 1`.

defmult_input_F

def mult_input_F (bits multBits x m : Nat) : Nat → Bool

*Multiplier-encoded input.** Starts from `adder_input_F (bits+1) 0 x` (which puts value `x` in the adder's target register and 0 elsewhere within the adder block; `false` outside), then fills the multiplier qubits at positions `adder_n_qubits (bits+1) + j` (for `j = 0, ..., multBits - 1`) with the bits of `m`.

defmult_state_init

def mult_state_init (bits multBits x : Nat) : Nat → Bool

Initial state for the multiplier: the multiplier register holds `x`, the adder block and flag are zeroed.

deff_modmult_step_gate

def f_modmult_step_gate (bits N a multBits i : Nat) : Gate

The `i`-th step of the QPE multiplication cascade: multiplication by the constant `a^(2^i) mod N` applied to the multiplier-encoded state.

deff_modmult_gate_family

def f_modmult_gate_family (bits N a multBits : Nat) : Nat → Gate

Modular multiplication gate family indexed by QPE iterate.

defqubit_swap

def qubit_swap (a b : Nat) : Gate

Two-qubit SWAP: exchanges the values at qubits `a` and `b` via the standard three-CNOT decomposition.

defregister_swap_aux

def register_swap_aux (offsetA offsetB : Nat) : Nat → Gate
  | 0 => Gate.I
  | k+1 => Gate.seq (register_swap_aux offsetA offsetB k)
                    (qubit_swap (offsetA + k) (offsetB + k))

Auxiliary recursive register-swap helper. At iteration count `n`, applies pairwise `qubit_swap (offsetA + k) (offsetB + k)` for `k = 0, 1, ..., n - 1`.

defregister_swap

def register_swap (multBits offsetA offsetB : Nat) : Gate

Register-level SWAP: exchanges two `multBits`-wide registers at positions `[offsetA, offsetA + multBits)` and `[offsetB, offsetB + multBits)`.

defmult_target_swap_aux

def mult_target_swap_aux (bits : Nat) : Nat → Gate
  | 0 => Gate.I
  | k+1 => Gate.seq (mult_target_swap_aux bits k)
                    (qubit_swap (adder_n_qubits (bits + 1) + k) (target_idx k))

Auxiliary recursive multiplier-target SWAP at iteration count `n`: swaps `(adder_n_qubits (bits+1) + k, target_idx k)` for `k = 0, ..., n - 1`.

defmult_target_swap

def mult_target_swap (bits multBits : Nat) : Gate

Multiplier-target SWAP: pairwise exchanges multiplier-register qubits at `adder_n_qubits (bits+1) + k` with adder-target qubits at `target_idx k`, for `k = 0, ..., multBits - 1`.

defmodMultInPlace

def modMultInPlace (bits N a ainv multBits : Nat) : Gate

*In-place modular multiplier gate.** Three-stage composition: 1. `modMultConstGate bits N a multBits` — OOPmul(a): `|x⟩|0⟩ → |x⟩|a*x mod N⟩`. 2. `mult_target_swap bits multBits` — exchanges multiplier and target registers: `|x⟩|a*x mod N⟩ → |a*x mod N⟩|x⟩`. 3. `modMultConstGate bits N ((N - ainv) % N) multBits` — adds `(N - ainv) * (a*x mod N)` to the target, yielding 0 by `mod_inv_cancel_identity`. Net effect: `|a*x mod N⟩|0⟩`. The multiplier register holds the input `x` initially; after the gate, it holds `(a * x) mod N`, with adder and flag clean. This is exactly the in-place semantics of `MultiplyCircuitProperty`.

defreverse_register_swap_aux

def reverse_register_swap_aux (n offsetA offsetB : Nat) : Nat → Gate
  | 0 => Gate.I
  | k+1 => Gate.seq (reverse_register_swap_aux n offsetA offsetB k)
                    (qubit_swap (offsetA + k) (offsetB + (n - 1 - k)))

Auxiliary recursive reverse-pairing register SWAP at iteration count `k`: at step k, swaps `(offsetA + k, offsetB + (n - 1 - k))`.

defreverse_register_swap

def reverse_register_swap (n offsetA offsetB : Nat) : Gate

Reverse-pairing register SWAP: exchanges positions `[offsetA, offsetA + n)` and `[offsetB, offsetB + n)` with index reversal (position `offsetA + i` swaps with `offsetB + (n - 1 - i)`).

FormalRV.Arithmetic.ModularAdder.Gidney.ForwardFaithfulness

FormalRV/Arithmetic/ModularAdder/Gidney/ForwardFaithfulness.lean

(no documented top-level declarations)

FormalRV.Arithmetic.ModularAdder.Gidney.ForwardFaithfulness.AdderAndConstReadFrames

FormalRV/Arithmetic/ModularAdder/Gidney/ForwardFaithfulness/AdderAndConstReadFrames.lean

ForwardFaithfulness — Part1 (re-export shim part; same namespace, opens de-duplicated).

theoremgidney_adder_forward_faithful_full_preserves_above

theorem gidney_adder_forward_faithful_full_preserves_above
    (w : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * w ≤ p) :
    Gate.applyNat (gidney_adder_forward_faithful_full w) f p = f p

`forward_faithful_full w` preserves positions `≥ 3 * w`.

theoremgidney_adder_forward_faithful_full_reverse_patched_preserves_above

theorem gidney_adder_forward_faithful_full_reverse_patched_preserves_above
    (w : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * w ≤ p) :
    Gate.applyNat (gidney_adder_forward_faithful_full_reverse_patched w) f p = f p

`forward_faithful_full_reverse_patched w` preserves positions `≥ 3 * w`.

theoremgidney_final_cx_cascade_preserves_above

theorem gidney_final_cx_cascade_preserves_above
    (w : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * w ≤ p) :
    Gate.applyNat (gidney_final_cx_cascade w) f p = f p

`final_cx_cascade w` preserves positions `≥ 3 * w`.

theoremgidney_adder_full_faithful_no_measurement_patched_preserves_above

theorem gidney_adder_full_faithful_no_measurement_patched_preserves_above
    (w : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * w ≤ p) :
    Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched w) f p = f p

*Headline frame lemma**: the full patched Gidney adder of width `w` preserves positions `p ≥ 3 * w`. This is the tight bound: the cascade touches positions up to `carry_idx (w-1) = 3w - 1` for `w ≥ 2`.

theoremprepareConstRead_preserves_above

theorem prepareConstRead_preserves_above
    (bits c : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * bits ≤ p) :
    Gate.applyNat (prepareConstRead bits c) f p = f p

`prepareConstRead bits c` preserves positions `≥ 3 * bits`.

theoremaddConstGate_preserves_above_actual

theorem addConstGate_preserves_above_actual
    (bits c : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * bits ≤ p) :
    Gate.applyNat (addConstGate bits c) f p = f p

*Composable frame**: `addConstGate bits c` preserves positions `≥ 3 * bits`.

theoremsubConstGate_preserves_above_actual

theorem subConstGate_preserves_above_actual
    (bits N : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * bits ≤ p) :
    Gate.applyNat (subConstGate bits N) f p = f p

*Composable frame**: `subConstGate bits N` preserves positions `≥ 3 * bits`.

theoremaddConstGate_preserves_gap_read

theorem addConstGate_preserves_gap_read
    (bits c : Nat) (f : Nat → Bool) :
    Gate.applyNat (addConstGate (bits + 1) c) f (read_idx (bits + 1))
      = f (read_idx (bits + 1))

theoremaddConstGate_preserves_gap_target

theorem addConstGate_preserves_gap_target
    (bits c : Nat) (f : Nat → Bool) :
    Gate.applyNat (addConstGate (bits + 1) c) f (target_idx (bits + 1))
      = f (target_idx (bits + 1))

theoremsubConstGate_preserves_gap_read

theorem subConstGate_preserves_gap_read
    (bits N : Nat) (f : Nat → Bool) :
    Gate.applyNat (subConstGate (bits + 1) N) f (read_idx (bits + 1))
      = f (read_idx (bits + 1))

theoremsubConstGate_preserves_gap_target

theorem subConstGate_preserves_gap_target
    (bits N : Nat) (f : Nat → Bool) :
    Gate.applyNat (subConstGate (bits + 1) N) f (target_idx (bits + 1))
      = f (target_idx (bits + 1))

theoremaddConstGate_modAdd_step1_state_eq

theorem addConstGate_modAdd_step1_state_eq
    (bits N c x : Nat) (hbits : 1 ≤ bits) (hN : N ≤ 2^bits)
    (hx : x < N) (hc : c < N) :
    Gate.applyNat (addConstGate (bits + 1) c) (adder_input_F (bits + 1) 0 x)
    = adder_input_F (bits + 1) 0 (x + c)

*Strong normal-form for step 1**: `addConstGate (bits + 1) c` applied to the clean input `adder_input_F (bits + 1) 0 x` produces a function extensionally equal to `adder_input_F (bits + 1) 0 (x + c)`. This supersedes the WEAK `_state_normal` form above.

theoremsubConstGate_modAdd_step2_state_eq

theorem subConstGate_modAdd_step2_state_eq
    (bits N s : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hs : s < 2 * N) :
    Gate.applyNat (subConstGate (bits + 1) N) (adder_input_F (bits + 1) 0 s)
    = adder_input_F (bits + 1) 0 (subConstPow2WideSpec bits N s)

*Strong normal-form for step 2**: `subConstGate (bits + 1) N` applied to the clean input `adder_input_F (bits + 1) 0 s` produces a function extensionally equal to `adder_input_F (bits + 1) 0 y` where `y := subConstPow2WideSpec bits N s`.

theoremadder_input_F_at_high

theorem adder_input_F_at_high
    (w a b k : Nat) (hk : 3 * w ≤ k) :
    adder_input_F w a b k = false

Helper: `adder_input_F w a b` is `false` at any position `≥ 3 * w` (all working positions are below `3 * w`, and out-of-range bits of `a` and `b` are zero by the `decide(k/3 < w)` guard).

theoremconditionalAddConstGate_target_bit

theorem conditionalAddConstGate_target_bit
    (bits N flagIdx y i : Nat) (flag : Bool)
    (hbits : 2 ≤ bits) (hN : N < 2^bits) (hy : y < 2^bits)
    (hflagIdx : adder_n_qubits bits ≤ flagIdx) (hi : i < bits) :
    Gate.applyNat (conditionalAddConstGate bits N flagIdx)
      (update (adder_input_F bits 0 y) flagIdx flag) (target_idx i)
    = (y + (if flag then N else 0)).testBit i

Bit-level conditional add-back: applied to an `update (adder_input_F bits 0 y) flagIdx flag` input (target holds `y`, read/carry zero, flag at `flagIdx ≥ adder_n_qubits bits`), the gate writes `(y + (if flag then N else 0)).testBit i` at `target_idx i` for `i < bits`.

theoremmodAddConstGate_dirtyFlag_target_decode

theorem modAddConstGate_dirtyFlag_target_decode
    (bits N c x flagIdx : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc : c < N)
    (hflagIdx : adder_n_qubits (bits + 1) ≤ flagIdx) :
    gidney_target_val bits
      (Gate.applyNat (modAddConstGate_dirtyFlag bits N c flagIdx)
        (adder_input_F (bits + 1) 0 x))
    = (x + c) % N

*Tick 2 HEADLINE**: the dirty-flag modular add-constant gate decodes its target register (low `bits` bits) to `(x + c) mod N`.

FormalRV.Arithmetic.ModularAdder.Gidney.ForwardFaithfulness.ConditionalAddStateEq

FormalRV/Arithmetic/ModularAdder/Gidney/ForwardFaithfulness/ConditionalAddStateEq.lean

ForwardFaithfulness — Part3 (re-export shim part; same namespace, opens de-duplicated).

theoremconditionalAddConstGate_preserves_above_not_flag

theorem conditionalAddConstGate_preserves_above_not_flag
    (bits N flagIdx : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * bits ≤ p) :
    Gate.applyNat (conditionalAddConstGate bits N flagIdx) f p = f p

`conditionalAddConstGate bits N flagIdx` preserves positions `≥ 3 * bits`.

theoremmodAddConstGate_dirtyFlag_preserves_above_not_flag

theorem modAddConstGate_dirtyFlag_preserves_above_not_flag
    (bits N c flagIdx : Nat) (f : Nat → Bool) (p : Nat)
    (hp : 3 * (bits + 1) ≤ p) (h_p_ne_flag : p ≠ flagIdx) :
    Gate.applyNat (modAddConstGate_dirtyFlag bits N c flagIdx) f p = f p

`modAddConstGate_dirtyFlag bits N c flagIdx` preserves positions `≥ 3*(bits + 1)` that are not `flagIdx`.

theoremmodAddConstGate_dirtyFlag_state_eq

theorem modAddConstGate_dirtyFlag_state_eq
    (bits N c x flagIdx : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc : c < N)
    (hflagIdx : adder_n_qubits (bits + 1) ≤ flagIdx) :
    Gate.applyNat (modAddConstGate_dirtyFlag bits N c flagIdx)
      (adder_input_F (bits + 1) 0 x)
    = update (adder_input_F (bits + 1) 0 ((x + c) % N)) flagIdx (decide ((x + c) < N))

*Strong state-eq for `modAddConstGate_dirtyFlag`**. The output is extensionally equal to the canonical "input form" with target encoding `(x + c) mod N` and the flag bit at `flagIdx` holding `decide ((x+c)<N)`.

theoremmodAddConstGate_state_eq

theorem modAddConstGate_state_eq
    (bits N c x : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc_pos : 0 < c) (hc : c < N) :
    Gate.applyNat (modAddConstGate bits N c) (adder_input_F (bits + 1) 0 x)
    = adder_input_F (bits + 1) 0 ((x + c) % N)

*Tick 5 HEADLINE — clean modular add-constant**. Applied to `adder_input_F (bits + 1) 0 x`, the clean modular adder produces `adder_input_F (bits + 1) 0 ((x + c) mod N)` — full state-eq with target encoding the modular sum and ALL workspace (read, carry, internal flag) restored.

theoremmodAddConstGate_clean

theorem modAddConstGate_clean
    (bits N c x : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc_pos : 0 < c) (hc : c < N) :
    Gate.WellTyped (adder_n_qubits (bits + 1) + 1) (modAddConstGate bits N c)
    ∧ gidney_target_val bits
        (Gate.applyNat (modAddConstGate bits N c) (adder_input_F (bits + 1) 0 x))
      = (x + c) % N
    ∧ (∀ i, i < bits + 1 →
        Gate.applyNat (modAddConstGate bits N c) (adder_input_F (bits + 1) 0 x)
          (read_idx i) = false)
    ∧ (∀ i, i < bits + 1 →
        Gate.applyNat (modAddConstGate bits N c) (adder_input_F (bits + 1) 0 x)

*Bundled clean theorem** — WellTyped, decoded target, read / carry / flag all restored. Derives from `modAddConstGate_state_eq`.

theoremcontrolledModAddConstGate_wellTyped

theorem controlledModAddConstGate_wellTyped
    (bits N c controlIdx flagIdx : Nat) (hbits : 1 ≤ bits)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.WellTyped (flagIdx + 1) (controlledModAddConstGate bits N c controlIdx flagIdx)

`controlledModAddConstGate` is `WellTyped` at `flagIdx + 1` when `controlIdx` and `flagIdx` are both out-of-band, with `controlIdx < flagIdx`.

theoremconditionalAddConstGate_state_eq

theorem conditionalAddConstGate_state_eq
    (bits N flagIdx x : Nat) (flag : Bool)
    (hbits : 2 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (hflagIdx : adder_n_qubits bits ≤ flagIdx) :
    Gate.applyNat (conditionalAddConstGate bits N flagIdx)
      (update (adder_input_F bits 0 x) flagIdx flag)
    = update (adder_input_F bits 0 ((x + (if flag then N else 0)) % 2^bits)) flagIdx flag

*`conditionalAddConstGate` full state-eq.** Applied to `update (adder_input_F bits 0 x) flagIdx flag`, the gate produces `update (adder_input_F bits 0 ((x + (if flag then N else 0)) % 2^bits)) flagIdx flag` — i.e. flag preserved, target = `(x + flag·N) mod 2^bits`, read / carry restored.

FormalRV.Arithmetic.ModularAdder.Gidney.ForwardFaithfulness.ControlledModAddCommute

FormalRV/Arithmetic/ModularAdder/Gidney/ForwardFaithfulness/ControlledModAddCommute.lean

ForwardFaithfulness — Part4 (re-export shim part; same namespace, opens de-duplicated).

theoremprepareMaskedConstRead_commute_update_outer

theorem prepareMaskedConstRead_commute_update_outer
    (bits N flagIdx p : Nat) (v : Bool)
    (h_p_ne_flag : p ≠ flagIdx)
    (h_p_ne_read : ∀ i, i < bits → p ≠ read_idx i) :
    ∀ (f : Nat → Bool),
      Gate.applyNat (prepareMaskedConstRead bits N flagIdx) (update f p v)
      = update (Gate.applyNat (prepareMaskedConstRead bits N flagIdx) f) p v

`prepareMaskedConstRead bits N flagIdx` commutes with `update _ p v` when `p` is outside the gate's read/write set: `p ≠ flagIdx` (not read as control) and `p ≠ read_idx k` for any `k < bits` (not written).

theoremconditionalAddConstGate_commute_update_outer

theorem conditionalAddConstGate_commute_update_outer
    (bits N flagIdx p : Nat) (v : Bool)
    (hbits : 2 ≤ bits)
    (hp_dim : adder_n_qubits bits ≤ p)
    (h_p_ne_flag : p ≠ flagIdx) :
    ∀ (f : Nat → Bool),
      Gate.applyNat (conditionalAddConstGate bits N flagIdx) (update f p v)
      = update (Gate.applyNat (conditionalAddConstGate bits N flagIdx) f) p v

`conditionalAddConstGate bits N flagIdx` commutes with `update _ p v` when `p` is outside the gate's actual support: `p ≥ adder_n_qubits bits` and `p ≠ flagIdx`. Composes prep + adder + prep commute lemmas.

theoremconditionalAddConstGate_state_eq_with_outer

theorem conditionalAddConstGate_state_eq_with_outer
    (bits N flagIdx outerIdx x : Nat) (flag outerVal : Bool)
    (hbits : 2 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (hflagIdx : adder_n_qubits bits ≤ flagIdx)
    (hOuter : adder_n_qubits bits ≤ outerIdx) (hOuter_ne_flag : outerIdx ≠ flagIdx) :
    Gate.applyNat (conditionalAddConstGate bits N flagIdx)
      (update (update (adder_input_F bits 0 x) flagIdx flag) outerIdx outerVal)
    = update
        (update (adder_input_F bits 0 ((x + (if flag then N else 0)) % 2^bits))
          flagIdx flag) outerIdx outerVal

State-eq for `conditionalAddConstGate` lifted past an outer update at `outerIdx`. This is the form that lets us chain through `controlledModAddConstGate`'s 8 steps where each sub-state has both `flagIdx` and `controlIdx` updates active simultaneously.

theoremcollapse_flag_false_update_at_high

theorem collapse_flag_false_update_at_high
    (n flagIdx outerIdx x : Nat) (outerVal : Bool)
    (hflag_high : 3 * n ≤ flagIdx) :
    update (update (adder_input_F n 0 x) flagIdx false) outerIdx outerVal
    = update (adder_input_F n 0 x) outerIdx outerVal

Helper: an `update` at a high `flagIdx` to `false` is idempotent relative to `adder_input_F n 0 x` (since `adder_input_F` is already `false` at any position `≥ 3 * n`). Used in the `controlBit = false` chain proof to insert/remove redundant flagIdx updates so state forms match `conditionalAddConstGate_state_eq_with_outer`'s expected shape.

theoremconditionalAddConstGate_identity_when_flag_false

theorem conditionalAddConstGate_identity_when_flag_false
    (bits N flagIdx x : Nat) (hbits : 2 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (hflagIdx : adder_n_qubits bits ≤ flagIdx) :
    Gate.applyNat (conditionalAddConstGate bits N flagIdx)
      (update (adder_input_F bits 0 x) flagIdx false)
    = update (adder_input_F bits 0 x) flagIdx false

Corollary of `conditionalAddConstGate_state_eq` for `flag = false`: the gate is identity on the canonical input form.

theoremconditionalAddConstGate_identity_when_flag_false_with_outer

theorem conditionalAddConstGate_identity_when_flag_false_with_outer
    (bits N flagIdx outerIdx x : Nat) (outerVal : Bool)
    (hbits : 2 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (hflagIdx : adder_n_qubits bits ≤ flagIdx)
    (hOuter : adder_n_qubits bits ≤ outerIdx) (hOuter_ne_flag : outerIdx ≠ flagIdx) :
    Gate.applyNat (conditionalAddConstGate bits N flagIdx)
      (update (update (adder_input_F bits 0 x) flagIdx false) outerIdx outerVal)
    = update (update (adder_input_F bits 0 x) flagIdx false) outerIdx outerVal

Corollary of `conditionalAddConstGate_state_eq_with_outer` for `flag = false`: the gate is identity on the *double-update* form, useful when chaining through `controlledModAddConstGate`'s steps.

theoremcontrolledModAddConstGate_correct_false

theorem controlledModAddConstGate_correct_false
    (bits N c x : Nat) (controlIdx flagIdx : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < N) (hc_pos : 0 < c) (hc : c < N)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.applyNat (controlledModAddConstGate bits N c controlIdx flagIdx)
      (update (adder_input_F (bits + 1) 0 x) controlIdx false)
    = update (adder_input_F (bits + 1) 0 x) controlIdx false

*Tick 6g HEADLINE — `controlBit = false` branch of `controlledModAddConstGate_correct`**. When the control bit is `false`, the entire 8-step controlled modular-add pipeline is identity: target / read / carry / flag all unchanged. Proved by chaining 8 identity rewrites.

theoremcontrolled_step1_true

theorem controlled_step1_true
    (bits c x controlIdx : Nat) (hbits : 1 ≤ bits)
    (hc_succ : c < 2^(bits+1)) (hxc_lt : x + c < 2^(bits+1))
    (hx_succ : x < 2^(bits+1))
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx) :
    Gate.applyNat (conditionalAddConstGate (bits + 1) c controlIdx)
      (update (adder_input_F (bits + 1) 0 x) controlIdx true)
    = update (adder_input_F (bits + 1) 0 (x + c)) controlIdx true

Intermediate: applying step 1 of controlled pipeline (controlled add c) with controlBit = true gives target = `x + c`.

theoremcontrolled_step2_true

theorem controlled_step2_true
    (bits N c x controlIdx : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc : c < N)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx) :
    Gate.applyNat (conditionalAddConstGate (bits + 1) (2^(bits+1) - N) controlIdx)
      (update (adder_input_F (bits + 1) 0 (x + c)) controlIdx true)
    = update (adder_input_F (bits + 1) 0 (subConstPow2WideSpec bits N (x + c))) controlIdx true

Intermediate: applying step 2 of controlled pipeline (controlled sub N) with controlBit = true takes target from `x + c` to `subConstPow2WideSpec bits N (x+c)`.

theoremcontrolled_step3_true

theorem controlled_step3_true
    (bits N c x controlIdx flagIdx : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc : c < N)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.applyNat (Gate.CCX controlIdx (target_idx bits) flagIdx)
      (update (adder_input_F (bits + 1) 0 (subConstPow2WideSpec bits N (x + c))) controlIdx true)
    = update (update (adder_input_F (bits + 1) 0 (subConstPow2WideSpec bits N (x + c)))
                controlIdx true) flagIdx (decide ((x + c) < N))

Intermediate: applying step 3 of controlled pipeline (CCX flag-copy) with controlBit = true puts `decide ((x+c) < N)` into `flagIdx`.

theoremcontrolled_step4_true

theorem controlled_step4_true
    (bits N c x controlIdx flagIdx : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc : c < N)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.applyNat (conditionalAddConstGate (bits + 1) N flagIdx)
      (update (update (adder_input_F (bits + 1) 0 (subConstPow2WideSpec bits N (x + c)))
                  controlIdx true) flagIdx (decide ((x + c) < N)))
    = update (update (adder_input_F (bits + 1) 0 ((x + c) % N))
                controlIdx true) flagIdx (decide ((x + c) < N))

Intermediate: applying step 4 of controlled pipeline (flag-controlled add-back of N) takes target from `subConstPow2WideSpec bits N (x+c)` to `(x + c) % N` when flag holds `decide ((x+c) < N)`.

FormalRV.Arithmetic.ModularAdder.Gidney.ForwardFaithfulness.DirtyFlagWorkspaceAndUncompute

FormalRV/Arithmetic/ModularAdder/Gidney/ForwardFaithfulness/DirtyFlagWorkspaceAndUncompute.lean

ForwardFaithfulness — Part2 (re-export shim part; same namespace, opens de-duplicated).

theoremmodAddConstGate_dirtyFlag_after_three_steps_eq

theorem modAddConstGate_dirtyFlag_after_three_steps_eq
    (bits N c x flagIdx : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc : c < N)
    (hflagIdx : adder_n_qubits (bits + 1) ≤ flagIdx) :
    Gate.applyNat (Gate.seq (addConstGate (bits + 1) c)
      (Gate.seq (subConstGate (bits + 1) N)
        (copyTargetHighBitToFlag bits flagIdx)))
      (adder_input_F (bits + 1) 0 x)
    = update (adder_input_F (bits + 1) 0 (subConstPow2WideSpec bits N (x + c))) flagIdx
        (decide ((x + c) < N))

Intermediate: the state after the first three steps (add ; sub ; copy-flag) of `modAddConstGate_dirtyFlag` is extensionally equal to `update (adder_input_F (bits+1) 0 y) flagIdx (decide ((x+c)<N))`, where `y := subConstPow2WideSpec bits N (x+c)`.

theoremmodAddConstGate_dirtyFlag_workspace

theorem modAddConstGate_dirtyFlag_workspace
    (bits N c x flagIdx : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < N) (hc : c < N)
    (hflagIdx : adder_n_qubits (bits + 1) ≤ flagIdx) :
    Gate.WellTyped (flagIdx + 1) (modAddConstGate_dirtyFlag bits N c flagIdx)
    ∧ (∀ i, i < bits + 1 →
        Gate.applyNat (modAddConstGate_dirtyFlag bits N c flagIdx)
          (adder_input_F (bits + 1) 0 x) (read_idx i) = false)
    ∧ (∀ i, i < bits + 1 →
        Gate.applyNat (modAddConstGate_dirtyFlag bits N c flagIdx)
          (adder_input_F (bits + 1) 0 x) (carry_idx i) = false)
    ∧ Gate.applyNat (modAddConstGate_dirtyFlag bits N c flagIdx)

*Tick 3 HEADLINE — dirty-flag workspace theorem**. The `modAddConstGate_dirtyFlag` is WellTyped at the enlarged dimension `flagIdx + 1`, restores the read register to zero, clears the carry register, and places the comparison flag `decide ((x + c) < N)` at `flagIdx`. The flag bit is DIRTY — not restored to false.

theoremaddConstGate_state_eq_general

theorem addConstGate_state_eq_general
    (bits c x : Nat) (hbits : 2 ≤ bits) (hc : c < 2^bits) (hx : x < 2^bits) :
    Gate.applyNat (addConstGate bits c) (adder_input_F bits 0 x)
    = adder_input_F bits 0 ((x + c) % 2^bits)

General state-eq: `addConstGate bits c` applied to a clean input `adder_input_F bits 0 x` produces `adder_input_F bits 0 ((x + c) % 2^bits)`, under just `c < 2^bits` and `x < 2^bits`.

theoremsubConstGate_state_eq_general

theorem subConstGate_state_eq_general
    (bits N x : Nat) (hbits : 2 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Gate.applyNat (subConstGate bits N) (adder_input_F bits 0 x)
    = adder_input_F bits 0 (subConstPow2Spec bits N x)

General state-eq for subConstGate. Follows from `addConstGate_state_eq_general` via the definition `subConstGate = addConstGate (2^bits - N)`.

theoremflagUncomputeGate_correct

theorem flagUncomputeGate_correct
    (bits c flagIdx m : Nat) (hbits : 1 ≤ bits) (hc_pos : 0 < c)
    (hc : c < 2^bits) (hm : m < 2^bits)
    (hflagIdx : adder_n_qubits (bits + 1) ≤ flagIdx) :
    Gate.applyNat (flagUncomputeGate bits c flagIdx)
      (update (adder_input_F (bits + 1) 0 m) flagIdx (decide (m ≥ c)))
    = adder_input_F (bits + 1) 0 m

*Tick 4 HEADLINE — flag uncomputation correctness**. Given a state of the form `update (adder_input_F (bits+1) 0 m) flagIdx (decide (m ≥ c))` (target encoding `m < 2^bits`, flag stored at out-of-band `flagIdx`), the flag-uncompute gate restores the state to a clean `adder_input_F (bits+1) 0 m` — i.e., flag becomes false, target / read / carry unchanged.

theoremflagUncomputeGate_wellTyped

theorem flagUncomputeGate_wellTyped
    (bits c flagIdx : Nat) (hbits : 1 ≤ bits) (hc_pos : 0 < c) (hc : c < 2^bits)
    (hflagIdx : adder_n_qubits (bits + 1) ≤ flagIdx) :
    Gate.WellTyped (flagIdx + 1) (flagUncomputeGate bits c flagIdx)

WellTyped at `flagIdx + 1`. All four sub-gates are WellTyped at `adder_n_qubits (bits + 1) ≤ flagIdx + 1`; the CX and X explicitly touch `flagIdx`.

theoremmodAddConstArithmeticSpec_lt_pow_bits

theorem modAddConstArithmeticSpec_lt_pow_bits
    (bits N c x : Nat) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < N) (hc : c < N) :
    modAddConstArithmeticSpec bits N c x < 2^bits

Auxiliary: `modAddConstArithmeticSpec bits N c x < 2^bits` under modular hypotheses. Both flag cases produce a value in `[0, N - 1]`, hence `< 2^bits`.

theoremmodAddConstArithmeticSpec_eq_mod

theorem modAddConstArithmeticSpec_eq_mod
    (bits N c x : Nat) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hx : x < N) (hc : c < N) :
    modAddConstArithmeticSpec bits N c x = (x + c) % N

`modAddConstArithmeticSpec` equals `(x + c) mod N` (the high bit is zero, so the mod-`2^(bits+1)` mask is the value itself).

FormalRV.Arithmetic.ModularAdder.Gidney.PowerOfTwoCase

FormalRV/Arithmetic/ModularAdder/Gidney/PowerOfTwoCase.lean

(no documented top-level declarations)

FormalRV.Arithmetic.ModularAdder.Gidney.PowerOfTwoCase.FrameRestorationAndUnderflowFlag

FormalRV/Arithmetic/ModularAdder/Gidney/PowerOfTwoCase/FrameRestorationAndUnderflowFlag.lean

PowerOfTwoCase — Part2 (re-export shim part; same namespace, opens de-duplicated).

theoremGate.WellTyped.mono

theorem Gate.WellTyped.mono {dim dim' : Nat} {g : Gate}
    (h : Gate.WellTyped dim g) (h_le : dim ≤ dim') :
    Gate.WellTyped dim' g

*WellTyped monotonicity**: `WellTyped` is preserved under dimension enlargement. Generic helper, applies to any `Gate`.

theoremprepareMaskedConstRead_wellTyped

theorem prepareMaskedConstRead_wellTyped
    (bits N flagIdx : Nat) (h_flag : adder_n_qubits bits ≤ flagIdx) :
    Gate.WellTyped (flagIdx + 1) (prepareMaskedConstRead bits N flagIdx)

`prepareMaskedConstRead` is `WellTyped` in dimension `flagIdx + 1` whenever the flag is placed above the adder's working register.

theoremconditionalAddConstGate_read_restored

theorem conditionalAddConstGate_read_restored
    (bits N x flagIdx : Nat) (flag : Bool)
    (hbits : 2 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (hflagIdx : adder_n_qubits bits ≤ flagIdx) :
    ∀ i, i < bits →
      Gate.applyNat (conditionalAddConstGate bits N flagIdx)
        (update (adder_input_F bits 0 x) flagIdx flag) (read_idx i)
      = false

*Deliverable A — read register restored.** After the full conditional add-back, every in-range read position is back to zero (the read register served only as a scratch space during the underlying adder).

theoremconditionalAddConstGate_carries_cleared

theorem conditionalAddConstGate_carries_cleared
    (bits N x flagIdx : Nat) (flag : Bool)
    (hbits : 2 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (hflagIdx : adder_n_qubits bits ≤ flagIdx) :
    ∀ i, i < bits →
      Gate.applyNat (conditionalAddConstGate bits N flagIdx)
        (update (adder_input_F bits 0 x) flagIdx flag) (carry_idx i)
      = false

*Deliverable B — carry register cleared.** Every in-range carry position is `false` after the full conditional add-back (carries are fully cleared by the inner patched Gidney adder, and the outer prep cascade touches no carry positions).

theoremconditionalAddConstGate_flag_preserved

theorem conditionalAddConstGate_flag_preserved
    (bits N x flagIdx : Nat) (flag : Bool)
    (hbits : 2 ≤ bits)
    (hflagIdx : adder_n_qubits bits ≤ flagIdx) :
    Gate.applyNat (conditionalAddConstGate bits N flagIdx)
      (update (adder_input_F bits 0 x) flagIdx flag) flagIdx = flag

*Deliverable C — flag preserved.** The flag bit at `flagIdx` survives the full conditional add-back unchanged. Follows from the adder commuting past the flag update (by `WellTyped` framing) and both preps preserving positions outside the read range.

theoremconditionalAddConstGate_wellTyped

theorem conditionalAddConstGate_wellTyped
    (bits N flagIdx : Nat) (hbits : 2 ≤ bits)
    (hflagIdx : adder_n_qubits bits ≤ flagIdx) :
    Gate.WellTyped (flagIdx + 1) (conditionalAddConstGate bits N flagIdx)

*Deliverable D — `WellTyped` at `flagIdx + 1`.** The whole conditional add-back gate is `WellTyped` in the enlarged dimension that includes the out-of-band flag bit.

theoremconditionalAddConstGate_clean

theorem conditionalAddConstGate_clean
    (bits N x flagIdx : Nat) (flag : Bool)
    (hbits : 2 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (hflagIdx : adder_n_qubits bits ≤ flagIdx) :
    Gate.WellTyped (flagIdx + 1) (conditionalAddConstGate bits N flagIdx)
    ∧ gidney_target_val bits
        (Gate.applyNat (conditionalAddConstGate bits N flagIdx)
          (update (adder_input_F bits 0 x) flagIdx flag))
      = (x + (if flag then N else 0)) % 2^bits
    ∧ (∀ i, i < bits →
        Gate.applyNat (conditionalAddConstGate bits N flagIdx)
          (update (adder_input_F bits 0 x) flagIdx flag) (read_idx i) = false)

*Deliverable E — bundled clean primitive.** The headline characterisation of `conditionalAddConstGate`: WellTyped, correct target decode, read register restored, carries cleared, flag preserved. This is the one theorem downstream consumers should call.

theoremprepareConstRead_preserves_outside

theorem prepareConstRead_preserves_outside
    (bits c : Nat) (f : Nat → Bool) (p : Nat)
    (h : ∀ i, i < bits → p ≠ read_idx i) :
    Gate.applyNat (prepareConstRead bits c) f p = f p

Outside the read register's `[0, bits)` window, `prepareConstRead` is the identity (so target, carry, and any extra ancillas are preserved).

theoremprepareConstRead_at_read_idx

theorem prepareConstRead_at_read_idx
    (bits c : Nat) (f : Nat → Bool) (j : Nat) (hj : j < bits) :
    Gate.applyNat (prepareConstRead bits c) f (read_idx j) =
    xor (f (read_idx j)) (c.testBit j)

At `read_idx j` (for `j < bits`), `prepareConstRead` XORs the value with `c.testBit j`.

theoremprepareConstRead_yields_input_F

theorem prepareConstRead_yields_input_F
    (bits c x : Nat) :
    Gate.applyNat (prepareConstRead bits c) (adder_input_F bits 0 x)
    = adder_input_F bits c x

`prepareConstRead bits c` applied to `adder_input_F bits 0 x` produces exactly `adder_input_F bits c x` — i.e., the read register has been loaded with the bits of `c`.

theoremprepareConstRead_wellTyped

theorem prepareConstRead_wellTyped
    (bits c : Nat) :
    Gate.WellTyped (adder_n_qubits bits) (prepareConstRead bits c)

`prepareConstRead bits c` is WellTyped at the adder's natural dimension `adder_n_qubits bits = 3*bits + 2`.

theoremaddConstGate_clean

theorem addConstGate_clean
    (bits c x : Nat) (hbits : 2 ≤ bits) (hc : c < 2^bits) (hx : x < 2^bits) :
    Gate.WellTyped (adder_n_qubits bits) (addConstGate bits c)
    ∧ gidney_target_val bits
        (Gate.applyNat (addConstGate bits c) (adder_input_F bits 0 x))
      = (x + c) % 2^bits
    ∧ (∀ i, i < bits →
        Gate.applyNat (addConstGate bits c) (adder_input_F bits 0 x) (read_idx i) = false)
    ∧ (∀ i, i < bits →
        Gate.applyNat (addConstGate bits c) (adder_input_F bits 0 x) (carry_idx i) = false)

*Bundled clean primitive** for `addConstGate`. Takes a clean `adder_input_F bits 0 x` and produces: WellTyped at the natural dimension `adder_n_qubits bits`; target decodes to `(x + c) mod 2^bits`; read register restored to zero; carries cleared.

theoremsubConstGate_clean

theorem subConstGate_clean
    (bits N x : Nat) (hbits : 2 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Gate.WellTyped (adder_n_qubits bits) (subConstGate bits N)
    ∧ gidney_target_val bits
        (Gate.applyNat (subConstGate bits N) (adder_input_F bits 0 x))
      = subConstPow2Spec bits N x
    ∧ (∀ i, i < bits →
        Gate.applyNat (subConstGate bits N) (adder_input_F bits 0 x) (read_idx i) = false)
    ∧ (∀ i, i < bits →
        Gate.applyNat (subConstGate bits N) (adder_input_F bits 0 x) (carry_idx i) = false)

*Bundled clean primitive** for `subConstGate`. Follows directly from `addConstGate_clean` with `c = 2^bits - N`.

theoremsubConstPow2WideSpec_high_bit_bounded_sum_of_le

theorem subConstPow2WideSpec_high_bit_bounded_sum_of_le
    (bits N s : Nat) (hN : N ≤ 2^bits) (hle : N ≤ s) (hs : s < 2 * N) :
    (subConstPow2WideSpec bits N s).testBit bits = false

Generalized no-underflow high-bit lemma. When `N ≤ s` and `s < 2*N`, the widened result equals `s - N`, which fits in `bits` bits, so bit `bits` is `false`. Drops the `s < 2^bits` assumption of `subConstPow2WideSpec_high_bit_of_le`.

theoremsubConstPow2WideSpec_high_bit_bounded_sum_of_lt

theorem subConstPow2WideSpec_high_bit_bounded_sum_of_lt
    (bits N s : Nat) (hN : N ≤ 2^bits) (hlt : s < N) :
    (subConstPow2WideSpec bits N s).testBit bits = true

Generalized underflow high-bit lemma for `s < N` and `N ≤ 2^bits`. Identical to `subConstPow2WideSpec_high_bit_of_lt`, restated here as a named entry point for the post-add-step comparison flag.

theoremsubConstPow2WideSpec_high_bit_bounded_sum

theorem subConstPow2WideSpec_high_bit_bounded_sum
    (bits N s : Nat) (hN_pos : 0 < N) (hN : N ≤ 2^bits) (hs : s < 2 * N) :
    (subConstPow2WideSpec bits N s).testBit bits = decide (s < N)

*Generalized main high-bit theorem** for the widened subtraction under `s < 2*N`. After the first add-step of the modular-adder pipeline, the intermediate sum is bounded by `2*N` (not `2^bits`), yet the widened subtraction's high bit still equals `decide (s < N)`.

theorempatched_adder_sub_const_underflow_flag_bounded_sum

theorem patched_adder_sub_const_underflow_flag_bounded_sum
    (bits N s : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hs : s < 2 * N) :
    Gate.applyNat
      (gidney_adder_full_faithful_no_measurement_patched (bits + 1))
      (adder_input_F (bits + 1) (2^(bits + 1) - N) s)
      (target_idx bits)
    = decide (s < N)

*Generalized gate-level underflow flag.** After the first add-step of a modular adder, the intermediate sum `s` may have `s ≥ 2^bits` but always satisfies `s < 2*N`. The widened patched Gidney adder's target bit at position `bits` is exactly `decide (s < N)` under this weaker bound.

FormalRV.Arithmetic.ModularAdder.Gidney.PowerOfTwoCase.PerStepAndCascadeFrames

FormalRV/Arithmetic/ModularAdder/Gidney/PowerOfTwoCase/PerStepAndCascadeFrames.lean

PowerOfTwoCase — Part4 (re-export shim part; same namespace, opens de-duplicated).

theoremgidney_adder_bit_step_faithful_first_preserves_above

theorem gidney_adder_bit_step_faithful_first_preserves_above
    (f : Nat → Bool) (p : Nat) (hp : 5 ≤ p) :
    Gate.applyNat gidney_adder_bit_step_faithful_first f p = f p

theoremgidney_adder_bit_step_faithful_interior_preserves_above

theorem gidney_adder_bit_step_faithful_interior_preserves_above
    (i : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * i + 5 ≤ p) :
    Gate.applyNat (gidney_adder_bit_step_faithful_interior i) f p = f p

theoremgidney_adder_bit_step_faithful_last_preserves_above

theorem gidney_adder_bit_step_faithful_last_preserves_above
    (i : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * i + 3 ≤ p) :
    Gate.applyNat (gidney_adder_bit_step_faithful_last i) f p = f p

theoremgidney_adder_bit_step_faithful_first_reverse_preserves_above

theorem gidney_adder_bit_step_faithful_first_reverse_preserves_above
    (f : Nat → Bool) (p : Nat) (hp : 5 ≤ p) :
    Gate.applyNat gidney_adder_bit_step_faithful_first_reverse f p = f p

theoremgidney_adder_bit_step_faithful_interior_reverse_preserves_above

theorem gidney_adder_bit_step_faithful_interior_reverse_preserves_above
    (i : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * i + 5 ≤ p) :
    Gate.applyNat (gidney_adder_bit_step_faithful_interior_reverse i) f p = f p

theoremgidney_adder_bit_step_faithful_last_reverse_preserves_above

theorem gidney_adder_bit_step_faithful_last_reverse_preserves_above
    (i : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * i + 3 ≤ p) :
    Gate.applyNat (gidney_adder_bit_step_faithful_last_reverse i) f p = f p

theoremgidney_adder_bit_step_faithful_first_reverse_patched_preserves_above

theorem gidney_adder_bit_step_faithful_first_reverse_patched_preserves_above
    (f : Nat → Bool) (p : Nat) (hp : 5 ≤ p) :
    Gate.applyNat gidney_adder_bit_step_faithful_first_reverse_patched f p = f p

theoremgidney_adder_bit_step_faithful_interior_reverse_patched_preserves_above

theorem gidney_adder_bit_step_faithful_interior_reverse_patched_preserves_above
    (i : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * i + 5 ≤ p) :
    Gate.applyNat (gidney_adder_bit_step_faithful_interior_reverse_patched i) f p = f p

theoremgidney_adder_bit_step_faithful_last_reverse_patched_preserves_above

theorem gidney_adder_bit_step_faithful_last_reverse_patched_preserves_above
    (i : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * i + 3 ≤ p) :
    Gate.applyNat (gidney_adder_bit_step_faithful_last_reverse_patched i) f p = f p

theoremgidney_adder_forward_with_propagation_preserves_above

theorem gidney_adder_forward_with_propagation_preserves_above
    (k : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * k + 2 ≤ p) :
    Gate.applyNat (gidney_adder_forward_with_propagation k) f p = f p

`forward_with_propagation k` preserves positions `≥ 3 * k + 2`.

theoremgidney_adder_forward_with_propagation_reverse_patched_preserves_above

theorem gidney_adder_forward_with_propagation_reverse_patched_preserves_above
    (k : Nat) (f : Nat → Bool) (p : Nat) (hp : 3 * k + 2 ≤ p) :
    Gate.applyNat (gidney_adder_forward_with_propagation_reverse_patched k) f p = f p

`forward_with_propagation_reverse_patched k` preserves positions `≥ 3 * k + 2`.

FormalRV.Arithmetic.ModularAdder.Gidney.PowerOfTwoCase.PowerOfTwoAdderAndSplitCases

FormalRV/Arithmetic/ModularAdder/Gidney/PowerOfTwoCase/PowerOfTwoAdderAndSplitCases.lean

PowerOfTwoCase — Part1 (re-export shim part; same namespace, opens de-duplicated).

theorempatched_adder_add_const_pow2

theorem patched_adder_add_const_pow2
    (bits c x : Nat) (hbits : 2 ≤ bits) (hc : c < 2^bits) (hx : x < 2^bits) :
    gidney_target_val bits
      (Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
        (adder_input_F bits c x))
    = addConstPow2Spec bits c x

*The patched Gidney adder implements `(x + c) mod 2^bits`.** With the constant `c` placed in the read register and the data `x` placed in the target register, applying the patched full faithful no-measurement Gidney adder writes `(x + c) mod 2^bits` into the target register.

theorempatched_adder_add_const_pow2_bundled

theorem patched_adder_add_const_pow2_bundled
    (bits c x : Nat) (hbits : 2 ≤ bits) (hc : c < 2^bits) (hx : x < 2^bits) :
    Gate.WellTyped (adder_n_qubits bits)
      (gidney_adder_full_faithful_no_measurement_patched bits)
    ∧ gidney_target_val bits
        (Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
          (adder_input_F bits c x))
      = addConstPow2Spec bits c x
    ∧ (∀ i, i < bits →
        Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
          (adder_input_F bits c x) (read_idx i) = c.testBit i)
    ∧ (∀ i, i < bits →

*Bundled `(x + c) mod 2^bits` primitive.** Combines the power-of-2 modular-addition spec, the patched-adder WellTyped, the read-register preservation (constant `c` survives), and the carry clearing (workspace zeroed) — the single theorem a modular- multiplication layer should call when adding a constant modulo `2^bits`.

theorempatched_adder_sub_const_pow2

theorem patched_adder_sub_const_pow2
    (bits N x : Nat) (hbits : 2 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    gidney_target_val bits
      (Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
        (adder_input_F bits (2^bits - N) x))
    = subConstPow2Spec bits N x

*The patched Gidney adder with `read = 2^bits - N` implements the wraparound subtraction**. For `0 < N ≤ 2^bits` and `x < 2^bits`, applying the patched adder to `adder_input_F bits (2^bits - N) x` decodes the target register to `(x + (2^bits - N)) mod 2^bits`.

theoremsubConstPow2Spec_of_le

theorem subConstPow2Spec_of_le
    (bits N x : Nat) (hN : N ≤ 2^bits) (hx : x < 2^bits) (hle : N ≤ x) :
    subConstPow2Spec bits N x = x - N

No-underflow case: `N ≤ x` ⇒ `subConstPow2Spec bits N x = x - N`.

theoremsubConstPow2Spec_of_lt

theorem subConstPow2Spec_of_lt
    (bits N x : Nat) (hN : N ≤ 2^bits) (hx_lt : x < N) :
    subConstPow2Spec bits N x = x + 2^bits - N

Underflow case: `x < N` ⇒ `subConstPow2Spec bits N x = x + 2^bits - N`.

theoremsubConstPow2WideSpec_high_bit_of_le

theorem subConstPow2WideSpec_high_bit_of_le
    (bits N x : Nat) (hN : N ≤ 2^bits) (hx : x < 2^bits) (hle : N ≤ x) :
    (subConstPow2WideSpec bits N x).testBit bits = false

Arithmetic high-bit lemma, no-underflow case: when `N ≤ x` the widened result equals `x - N`, which fits in `bits` bits, so bit `bits` is `false`.

theoremsubConstPow2WideSpec_high_bit_of_lt

theorem subConstPow2WideSpec_high_bit_of_lt
    (bits N x : Nat) (hN : N ≤ 2^bits) (hx_lt : x < N) :
    (subConstPow2WideSpec bits N x).testBit bits = true

Arithmetic high-bit lemma, underflow case: when `x < N ≤ 2^bits` the widened result lies in `[2^bits, 2^(bits+1))`, so bit `bits` is `true`.

theoremsubConstPow2WideSpec_high_bit

theorem subConstPow2WideSpec_high_bit
    (bits N x : Nat) (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    (subConstPow2WideSpec bits N x).testBit bits = decide (x < N)

*Main high-bit theorem**: bit `bits` of the widened-subtraction result is exactly the comparison flag `decide (x < N)`.

theorempatched_adder_sub_const_underflow_flag

theorem patched_adder_sub_const_underflow_flag
    (bits N x : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < 2^bits) :
    Gate.applyNat
      (gidney_adder_full_faithful_no_measurement_patched (bits + 1))
      (adder_input_F (bits + 1) (2^(bits + 1) - N) x)
      (target_idx bits)
    = decide (x < N)

*Gate-level underflow flag theorem** (Deliverable C). Instantiating the patched Gidney adder at width `bits + 1` with `read = 2^(bits + 1) - N`, the target bit at position `bits` is exactly `decide (x < N)`.

theoremtestBit_add_two_pow_below

theorem testBit_add_two_pow_below
    (y i n : Nat) (h : i < n) :
    (y + 2^n).testBit i = y.testBit i

*Helper**: bit `i` of `y + 2^n` equals bit `i` of `y` when `i < n` (adding a power of 2 at position `n` doesn't affect lower bits).

theorempatched_adder_sub_const_low_bits

theorem patched_adder_sub_const_low_bits
    (bits N x i : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hx : x < 2^bits) (hi : i < bits) :
    Gate.applyNat
      (gidney_adder_full_faithful_no_measurement_patched (bits + 1))
      (adder_input_F (bits + 1) (2^(bits + 1) - N) x)
      (target_idx i)
    = (subConstPow2Spec bits N x).testBit i

*Gate-level low-bits theorem** (Deliverable D). At the widened adder, the lower `bits` target positions decode to the bits of `subConstPow2Spec bits N x` — i.e., they hold the wraparound subtraction value (mod `2^bits`) just as the narrow adder would.

theoremprepareMaskedConstRead_preserves_outside

theorem prepareMaskedConstRead_preserves_outside
    (bits N flagIdx : Nat) (f : Nat → Bool) (p : Nat)
    (h : ∀ i, i < bits → p ≠ read_idx i) :
    Gate.applyNat (prepareMaskedConstRead bits N flagIdx) f p = f p

Outside the read register's `[0, bits)` window, `prepareMaskedConstRead` acts as the identity (in particular: target, carry, and `flagIdx` are preserved).

theoremprepareMaskedConstRead_at_read_idx

theorem prepareMaskedConstRead_at_read_idx
    (bits N flagIdx : Nat) (f : Nat → Bool) (j : Nat) (hj : j < bits)
    (h_flag_disj_read : ∀ i, i < bits → flagIdx ≠ read_idx i) :
    Gate.applyNat (prepareMaskedConstRead bits N flagIdx) f (read_idx j) =
    xor (f (read_idx j)) (f flagIdx && N.testBit j)

At `read_idx j` (for `j < bits`), `prepareMaskedConstRead` XORs the existing value with `f flagIdx && N.testBit j` — i.e. it conditionally flips the read bit based on the flag and the constant pattern.

theoremapplyNat_commute_update_above_dim

theorem applyNat_commute_update_above_dim
    (dim : Nat) (g : Gate) (h_wt : Gate.WellTyped dim g)
    (f : Nat → Bool) (p : Nat) (v : Bool) (h_p : dim ≤ p) :
    Gate.applyNat g (update f p v) = update (Gate.applyNat g f) p v

Any `WellTyped dim` gate commutes with `update _ p v` for `p ≥ dim`.

theoremadder_input_F_at_read_idx_eq

theorem adder_input_F_at_read_idx_eq
    (n a b j : Nat) (hj : j < n) :
    adder_input_F n a b (read_idx j) = a.testBit j

theoremadder_input_F_eq_outside_read_in_range

theorem adder_input_F_eq_outside_read_in_range
    (n a b k : Nat) (h : ∀ j, j < n → k ≠ read_idx j) :
    adder_input_F n a b k = adder_input_F n 0 b k

theoremprepareMaskedConstRead_yields_input_F

theorem prepareMaskedConstRead_yields_input_F
    (bits N flagIdx x : Nat) (flag : Bool)
    (h_disj : ∀ j, j < bits → flagIdx ≠ read_idx j) :
    Gate.applyNat (prepareMaskedConstRead bits N flagIdx)
      (update (adder_input_F bits 0 x) flagIdx flag)
    = update (adder_input_F bits (if flag then N else 0) x) flagIdx flag

*Key intermediate theorem.** Applying `prepareMaskedConstRead` to `update (adder_input_F bits 0 x) flagIdx flag` yields `update (adder_input_F bits (if flag then N else 0) x) flagIdx flag` — i.e. the read register has been re-loaded with the **conditionally masked** constant `flag ∧ N`.

theoremconditionalAddConstGate_target_decode

theorem conditionalAddConstGate_target_decode
    (bits N flagIdx x : Nat) (flag : Bool)
    (hbits : 2 ≤ bits) (hN : N < 2^bits) (hx : x < 2^bits)
    (h_flag : adder_n_qubits bits ≤ flagIdx) :
    gidney_target_val bits
      (Gate.applyNat (conditionalAddConstGate bits N flagIdx)
        (update (adder_input_F bits 0 x) flagIdx flag))
    = (x + (if flag then N else 0)) % 2^bits

*Conditional add-back target decode.** Applied to `update (adder_input_F bits 0 x) flagIdx flag` (read register zero, target register `x`, carry register zero, flag at `flagIdx`), the `conditionalAddConstGate` produces target register equal to `(x + (if flag then N else 0)) mod 2^bits`.

FormalRV.Arithmetic.ModularAdder.Gidney.PowerOfTwoCase.WidenedModAddPipeline

FormalRV/Arithmetic/ModularAdder/Gidney/PowerOfTwoCase/WidenedModAddPipeline.lean

PowerOfTwoCase — Part3 (re-export shim part; same namespace, opens de-duplicated).

theoremmodAdd_sum_bound

theorem modAdd_sum_bound
    (bits N x c : Nat) (hN : N ≤ 2^bits) (hx : x < N) (hc : c < N) :
    x + c < 2^(bits + 1)

After widened add, the sum fits in `bits + 1` bits.

theoremmodAdd_sum_lt_twoN

theorem modAdd_sum_lt_twoN
    (N x c : Nat) (hx : x < N) (hc : c < N) :
    x + c < 2 * N

After widened add, the sum is bounded by `2N` (the tighter bound needed by the generalized underflow theorem).

theoremmodAddConstArithmeticSpec_correct

theorem modAddConstArithmeticSpec_correct
    (bits N c x : Nat) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < N) (hc : c < N) :
    modAddConstArithmeticSpec bits N c x % 2^bits = (x + c) % N

*Widened modular-add pipeline correctness** (arithmetic level). For `0 < N ≤ 2^bits` and `x, c < N`, the low `bits` bits of the widened pipeline result equal `(x + c) mod N`.

theoremmodAddConstArithmeticSpec_low_bit_correct

theorem modAddConstArithmeticSpec_low_bit_correct
    (bits N c x i : Nat) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (hx : x < N) (hc : c < N) (hi : i < bits) :
    (modAddConstArithmeticSpec bits N c x).testBit i
    = ((x + c) % N).testBit i

Bit-level form of `modAddConstArithmeticSpec_correct`: bit `i` of the pipeline result (for `i < bits`) equals bit `i` of `(x + c) % N`.

theoremmodAdd_step1_target_decode

theorem modAdd_step1_target_decode
    (bits N c x : Nat) (hbits : 1 ≤ bits) (hN : N ≤ 2^bits)
    (hx : x < N) (hc : c < N) :
    gidney_target_val (bits+1)
      (Gate.applyNat (addConstGate (bits+1) c) (adder_input_F (bits+1) 0 x))
    = x + c

*Step 1 — first add**. Applied to a clean `adder_input_F (bits+1) 0 x`, `addConstGate (bits+1) c` decodes its target register to `x + c` (no overflow, since `x + c < 2^(bits+1)`).

theoremmodAdd_step2_flag_at_target_idx_bits

theorem modAdd_step2_flag_at_target_idx_bits
    (bits N s : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hs : s < 2 * N) :
    Gate.applyNat (addConstGate (bits+1) (2^(bits+1) - N))
      (adder_input_F (bits+1) 0 s) (target_idx bits)
    = decide (s < N)

*Step 2 — subtract `N`, observe comparison flag at `target_idx bits`**. Applied to an *idealized* `adder_input_F (bits+1) 0 s` (i.e., target holds `s` and read/carry are zero), `addConstGate (bits+1) (2^(bits+1) - N)` makes the bit at `target_idx bits` equal `decide (s < N)`.

theoremmodAdd_step3_target_decode

theorem modAdd_step3_target_decode
    (bits N flagIdx y : Nat) (flag : Bool)
    (hbits : 1 ≤ bits) (hN : N < 2^(bits+1)) (hy : y < 2^(bits+1))
    (hflagIdx : adder_n_qubits (bits+1) ≤ flagIdx) :
    gidney_target_val (bits+1)
      (Gate.applyNat (conditionalAddConstGate (bits+1) N flagIdx)
        (update (adder_input_F (bits+1) 0 y) flagIdx flag))
    = (y + (if flag then N else 0)) % 2^(bits+1)

*Step 3 — conditional add-back**. Applied to the idealized `update (adder_input_F (bits+1) 0 y) flagIdx flag` (target holds `y`, read/carry zero, flag bit at out-of-band `flagIdx`), the `conditionalAddConstGate (bits+1) N flagIdx` decodes target to `(y + (if flag then N else 0)) mod 2^(bits+1)` — which is exactly the `modAddConstArithmeticSpec` value when `y = subConstPow2WideSpec bits N s` and `flag = decide (s < N)`.

theoremaddConstGate_target_bit

theorem addConstGate_target_bit
    (bits c x i : Nat) (hbits : 2 ≤ bits) (hc : c < 2^bits) (hx : x < 2^bits)
    (hi : i < bits) :
    Gate.applyNat (addConstGate bits c) (adder_input_F bits 0 x) (target_idx i)
    = ((x + c) % 2^bits).testBit i

Bit-level form of `addConstGate_clean`'s target-decode line: applied to `adder_input_F bits 0 x`, the gate's value at `target_idx i` (for `i < bits`) equals bit `i` of `(x + c) % 2^bits`.

theoremaddConstGate_target_bit_no_overflow

theorem addConstGate_target_bit_no_overflow
    (bits N c x i : Nat) (hbits : 1 ≤ bits) (hN : N ≤ 2^bits)
    (hx : x < N) (hc : c < N) (hi : i < bits + 1) :
    Gate.applyNat (addConstGate (bits + 1) c) (adder_input_F (bits + 1) 0 x) (target_idx i)
    = (x + c).testBit i

No-overflow corollary for widened addition. When `x, c < N ≤ 2^bits`, the widened sum `x + c` fits in `bits + 1` bits, so bit `i` of the target is `(x + c).testBit i` (no mod needed).

theoremaddConstGate_modAdd_step1_state_normal

theorem addConstGate_modAdd_step1_state_normal
    (bits N c x : Nat) (hbits : 1 ≤ bits) (hN : N ≤ 2^bits)
    (hx : x < N) (hc : c < N) :
    (∀ i, i < bits + 1 →
      Gate.applyNat (addConstGate (bits + 1) c) (adder_input_F (bits + 1) 0 x) (target_idx i)
      = (x + c).testBit i)
    ∧ (∀ i, i < bits + 1 →
      Gate.applyNat (addConstGate (bits + 1) c) (adder_input_F (bits + 1) 0 x) (read_idx i)
      = false)
    ∧ (∀ i, i < bits + 1 →
      Gate.applyNat (addConstGate (bits + 1) c) (adder_input_F (bits + 1) 0 x) (carry_idx i)
      = false)

After step 1, the read register is zero, carries are cleared, and target bits 0..bits encode `(x + c)` (no overflow under `x, c < N`). This is the WEAK normal-form: it does NOT claim function equality at positions outside the working range.

theoremsubConstGate_modAdd_step2_state_normal

theorem subConstGate_modAdd_step2_state_normal
    (bits N s : Nat) (hbits : 1 ≤ bits) (hN_pos : 0 < N)
    (hN : N ≤ 2^bits) (hs : s < 2 * N) :
    (∀ i, i < bits + 1 →
      Gate.applyNat (subConstGate (bits + 1) N) (adder_input_F (bits + 1) 0 s) (target_idx i)
      = (subConstPow2WideSpec bits N s).testBit i)
    ∧ Gate.applyNat (subConstGate (bits + 1) N) (adder_input_F (bits + 1) 0 s) (target_idx bits)
      = decide (s < N)
    ∧ (∀ i, i < bits + 1 →
      Gate.applyNat (subConstGate (bits + 1) N) (adder_input_F (bits + 1) 0 s) (read_idx i)
      = false)
    ∧ (∀ i, i < bits + 1 →

Weak normal-form for step 2. Same caveat as step 1: working positions only.

theoremcopyTargetHighBitToFlag_correct

theorem copyTargetHighBitToFlag_correct
    (bits flagIdx : Nat) (f : Nat → Bool) (h_init : f flagIdx = false) :
    Gate.applyNat (copyTargetHighBitToFlag bits flagIdx) f flagIdx
    = f (target_idx bits)

Correctness: when the flag bit is initially `false`, the gate sets it to the value of `target_idx bits`.

theoremcopyTargetHighBitToFlag_preserves_working

theorem copyTargetHighBitToFlag_preserves_working
    (bits flagIdx : Nat) (f : Nat → Bool) (p : Nat)
    (hflagIdx : adder_n_qubits (bits + 1) ≤ flagIdx)
    (h_p_lt : p < adder_n_qubits (bits + 1)) :
    Gate.applyNat (copyTargetHighBitToFlag bits flagIdx) f p = f p

Frame: when `flagIdx` is out-of-band (`flagIdx ≥ adder_n_qubits (bits+1)`), the flag-copy gate preserves all positions strictly inside the working dimension.

theoremcopyTargetHighBitToFlag_wellTyped

theorem copyTargetHighBitToFlag_wellTyped
    (bits flagIdx : Nat)
    (hflagIdx : adder_n_qubits (bits + 1) ≤ flagIdx) :
    Gate.WellTyped (flagIdx + 1) (copyTargetHighBitToFlag bits flagIdx)

WellTyped at the enlarged dimension `flagIdx + 1`.

FormalRV.Arithmetic.ModularAdder.Gidney.Resource

FormalRV/Arithmetic/ModularAdder/Gidney/Resource.lean

FormalRV.Arithmetic.ModularAdder.Gidney.Resource ──────────────────────────────────────────────── THE resource theorem for the Gidney-based modular adder. The natural resource here is the **qubit budget**: the (controlled) modular adder is `WellTyped` on a fixed number of qubits (the widened `bits+1` adder block, plus the out-of-band control/flag qubits). Surfaced as a thin wrapper. T-count note: `modAddConstGate` is exactly **five** invocations of the patched Gidney adder — add `c`, subtract `N`, conditional add-back `N`, then subtract `c` and add `c` to uncompute the flag — wrapped by T-free X/CX prepare cascades. So its T-count is `5 ×` the base adder's; the controlled multiplier applies one such block per multiplier bit. No separate closed-form T-count is proven here (the base adder owns the T-count theorem). Where to look next: • Definition : `Gidney/Def.lean` • Correctness : `Gidney/Correctness.lean`

theoremcontrolledModAddConst_wellTyped

theorem controlledModAddConst_wellTyped
    (bits N c controlIdx flagIdx : Nat) (hbits : 1 ≤ bits)
    (hcontrolIdx : adder_n_qubits (bits + 1) ≤ controlIdx)
    (hflagIdx : controlIdx < flagIdx) :
    Gate.WellTyped (flagIdx + 1) (controlledModAddConstGate bits N c controlIdx flagIdx)

*Gidney controlled modular adder — qubit budget (THE resource headline).** For `1 ≤ bits` and out-of-band `adder_n_qubits (bits+1) ≤ controlIdx < flagIdx`, the controlled modular adder is `WellTyped` on `flagIdx + 1` qubits (the `bits+1`-wide adder block `adder_n_qubits (bits+1) = 3·(bits+1) + 2`, the comparison flag, and the external control).

theoremmodMultConst_wellTyped_at_shor_dim

theorem modMultConst_wellTyped_at_shor_dim
    (bits N a multBits : Nat) (hbits : 1 ≤ bits) :
    Gate.WellTyped (multBits + (adder_n_qubits (bits + 1) + 1))
      (modMultConstGate bits N a multBits)

*Gidney modular multiplier — qubit budget at the Shor dimension.** The repeated-controlled-addition multiplier `modMultConstGate` is `WellTyped` on the Shor register dimension.

FormalRV.Arithmetic.ModularAdder.Gidney.SwapSemantics

FormalRV/Arithmetic/ModularAdder/Gidney/SwapSemantics.lean

### Tick 18 — SWAP semantics on `mult_input_F`.

theoremmult_target_swap_on_mult_input_F

theorem mult_target_swap_on_mult_input_F
    (bits multBits x m : Nat)
    (h_multBits_le : multBits ≤ bits + 1)
    (hx : x < 2^multBits) (hm : m < 2^multBits) :
    Gate.applyNat (mult_target_swap bits multBits)
                  (mult_input_F bits multBits x m)
    = mult_input_F bits multBits m x

*HEADLINE: SWAP exchanges multiplier-register and target-register values on `mult_input_F`.** Applied to `mult_input_F bits multBits x m` (multiplier holds `m`, target holds `x`), the multiplier-target SWAP produces `mult_input_F bits multBits m x` (multiplier holds `x`, target holds `m`). Requires `multBits ≤ bits + 1` (multiplier no wider than adder) and `x, m < 2^multBits` (so they fit in the multBits-wide register and have no high bits leaking into unswapped positions).

theoremmodMultInPlace_correct

theorem modMultInPlace_correct
    (bits N a ainv multBits x : Nat)
    (hbits : 1 ≤ bits) (hN_pos : 0 < N) (hN : N ≤ 2^bits)
    (h_multBits_le : multBits ≤ bits + 1)
    (h_N_le_pow_multBits : N ≤ 2^multBits)
    (ha_pos : 0 < a) (ha_lt : a < N)
    (hainv_pos : 0 < ainv) (hainv_lt : ainv < N)
    (h_inv : a * ainv % N = 1)
    (hx_lt : x < N)
    (h_const_pos_a : ∀ j, j < multBits → 0 < (a * 2^j) % N)
    (h_const_pos_inv : ∀ j, j < multBits → 0 < ((N - ainv) % N * 2^j) % N) :
    Gate.applyNat (modMultInPlace bits N a ainv multBits)

*HEADLINE: `modMultInPlace` is a correct in-place modular multiplier.** Applied to `mult_state_init bits multBits x` (multiplier register holds `x`, adder zeroed), the gate produces `mult_input_F bits multBits 0 ((a * x) % N)` — the multiplier register now holds the result `a*x mod N` and the adder is zeroed. Hypotheses: - Structural: `1 ≤ bits`, `multBits ≤ bits + 1`, `N ≤ 2^multBits`. - Modular: `0 < N`, `N ≤ 2^bits`, `0 < a < N`, `0 < ainv < N`, `a * ainv ≡ 1 (mod N)`. - Input: `x < N`. - Coprimality of each per-bit constant `(a * 2^j) % N` and `((N - ainv) % N * 2^j) % N` is non-zero, used by the `modMultConstGate_correct` invocations.

theoremreverse_register_swap_aux_succ

theorem reverse_register_swap_aux_succ
    (n offsetA offsetB k : Nat) :
    reverse_register_swap_aux n offsetA offsetB (k + 1)
    = Gate.seq (reverse_register_swap_aux n offsetA offsetB k)
               (qubit_swap (offsetA + k) (offsetB + (n - 1 - k)))

Recursion unfolding for `reverse_register_swap_aux`.

theoremreverse_register_swap_aux_wellTyped

theorem reverse_register_swap_aux_wellTyped
    (dim n offsetA offsetB k : Nat) (hdim : 0 < dim)
    (hA : offsetA + n ≤ dim) (hB : offsetB + n ≤ dim)
    (h_disjoint : offsetA + n ≤ offsetB ∨ offsetB + n ≤ offsetA)
    (hk : k ≤ n) :
    Gate.WellTyped dim (reverse_register_swap_aux n offsetA offsetB k)

*WellTyped for `reverse_register_swap_aux`.** Disjoint ranges suffice.

theoremreverse_register_swap_wellTyped

theorem reverse_register_swap_wellTyped
    (dim n offsetA offsetB : Nat) (hdim : 0 < dim)
    (hA : offsetA + n ≤ dim) (hB : offsetB + n ≤ dim)
    (h_disjoint : offsetA + n ≤ offsetB ∨ offsetB + n ≤ offsetA) :
    Gate.WellTyped dim (reverse_register_swap n offsetA offsetB)

*WellTyped for `reverse_register_swap`.**

theoremreverse_register_swap_aux_at_other

theorem reverse_register_swap_aux_at_other
    (n offsetA offsetB k : Nat) (f : Nat → Bool) (q : Nat)
    (h_disjoint : offsetA + n ≤ offsetB ∨ offsetB + n ≤ offsetA)
    (hk : k ≤ n)
    (h_outside : ∀ i, i < k →
      q ≠ offsetA + i ∧ q ≠ offsetB + (n - 1 - i)) :
    Gate.applyNat (reverse_register_swap_aux n offsetA offsetB k) f q = f q

*Correctness at "other" positions** of `reverse_register_swap_aux`. At positions outside both `[offsetA, offsetA + k)` and `[offsetB + n - k, offsetB + n)` (the touched range up to iteration `k`), the gate is identity.

theoremreverse_register_swap_aux_at_A

theorem reverse_register_swap_aux_at_A
    (n offsetA offsetB k : Nat) (f : Nat → Bool) (j : Nat) (hj : j < k)
    (h_disjoint : offsetA + n ≤ offsetB ∨ offsetB + n ≤ offsetA)
    (hk : k ≤ n) :
    Gate.applyNat (reverse_register_swap_aux n offsetA offsetB k) f
      (offsetA + j)
    = f (offsetB + (n - 1 - j))

*At A-side position**: at `offsetA + j` (j < k), the gate returns `f (offsetB + (n - 1 - j))`. The reversed-pairing semantics.

theoremreverse_register_swap_aux_at_B

theorem reverse_register_swap_aux_at_B
    (n offsetA offsetB k : Nat) (f : Nat → Bool) (j : Nat) (hj : j < k)
    (h_disjoint : offsetA + n ≤ offsetB ∨ offsetB + n ≤ offsetA)
    (hk : k ≤ n) :
    Gate.applyNat (reverse_register_swap_aux n offsetA offsetB k) f
      (offsetB + (n - 1 - j))
    = f (offsetA + j)

*At B-side position (reversed)**: at `offsetB + (n - 1 - j)` (j < k), the gate returns `f (offsetA + j)`. The dual of `_at_A`.

FormalRV.Arithmetic.ModularAdder.Gidney.TimeCount

FormalRV/Arithmetic/ModularAdder/Gidney/TimeCount.lean

FormalRV.Arithmetic.ModularAdder.Gidney.TimeCount ───────────────────────────────────────────────── THE time-resource (T-count) theorems for the Gidney-style modular adder — closing the audit gap "ModularAdder semantics + space are anchored, but the closed-form TIME counts are missing" (arithmetic-gadget audit, 2026-06-10). Every theorem is ANCHORED: `Gate.tcount` (the independent tree-walker, = `Resource.countT`) applied to THE SAME syntactic objects the correctness theorems in `ForwardFaithfulness.lean` / `Correctness.lean` verify. The missing prerequisite proven here first: the carry-clearing PATCHED Gidney adder has the same `14·bits` T-count as the base adder (the patch is a T-free CX per reverse step). Closed forms (the pipelines run at internal width `bits + 1`; stated at `n + 2 = bits + 1`, i.e. data width `bits = n + 1`, so the patched adder's nontrivial case applies): gidney patched adder (n+2) 14·(n+2) prepareConstRead / prepareMaskedConstRead 0 addConstGate / subConstGate (n+2) 14·(n+2) conditionalAddConstGate (n+2) 14·(n+2) copyTargetHighBitToFlag 0 modAddConstGate_dirtyFlag (n+1) 42·(n+2) flagUncomputeGate (n+1) 28·(n+2) modAddConstGate (n+1) (THE clean gate) 70·(n+2) controlledModAddConstGate (n+1) 70·(n+2) + 14

theoremtcount_gidney_adder_forward_with_propagation_reverse_patched

theorem tcount_gidney_adder_forward_with_propagation_reverse_patched :
    ∀ n, tcount (gidney_adder_forward_with_propagation_reverse_patched n) = 7 * n
  | 0 => rfl
  | 1 => rfl
  | n + 2 =>

theoremtcount_gidney_adder_forward_faithful_full_reverse_patched

theorem tcount_gidney_adder_forward_faithful_full_reverse_patched (n : Nat) :
    tcount (gidney_adder_forward_faithful_full_reverse_patched (n + 2)) = 7 * (n + 2)

theoremtcount_gidney_adder_full_faithful_no_measurement_patched

theorem tcount_gidney_adder_full_faithful_no_measurement_patched (n : Nat) :
    tcount (gidney_adder_full_faithful_no_measurement_patched (n + 2)) = 14 * (n + 2)

*The carry-clean PATCHED Gidney adder has T-count `14·bits`** — the same syntactic object verified by `gidney_adder_correct_full` (the reusable modular-adder primitive).

theoremtcount_addConstGate

theorem tcount_addConstGate (n c : Nat) :
    tcount (addConstGate (n + 2) c) = 14 * (n + 2)

*Add-constant T-count = 14·bits** — the object verified by the `addConstGate` correctness chain.

theoremtcount_subConstGate

theorem tcount_subConstGate (n N : Nat) :
    tcount (subConstGate (n + 2) N) = 14 * (n + 2)

theoremtcount_conditionalAddConstGate

theorem tcount_conditionalAddConstGate (n N flagIdx : Nat) :
    tcount (conditionalAddConstGate (n + 2) N flagIdx) = 14 * (n + 2)

*Conditional (flag-masked) add-constant T-count = 14·bits**.

theoremtcount_modAddConstGate_dirtyFlag

theorem tcount_modAddConstGate_dirtyFlag (n N c flagIdx : Nat) :
    tcount (modAddConstGate_dirtyFlag (n + 1) N c flagIdx) = 42 * (n + 2)

Dirty-flag pipeline (add ; sub ; flag-copy ; conditional add-back): `42·(bits+1)` — the object verified by the dirty-flag correctness chain.

theoremtcount_flagUncomputeGate

theorem tcount_flagUncomputeGate (n c flagIdx : Nat) :
    tcount (flagUncomputeGate (n + 1) c flagIdx) = 28 * (n + 2)

Flag-uncompute (sub ; CX ; X ; add): `28·(bits+1)`.

theoremtcount_modAddConstGate

theorem tcount_modAddConstGate (n N c : Nat) :
    tcount (modAddConstGate (n + 1) N c) = 70 * (n + 2)

*THE clean Gidney modular add-constant gate: T-count = 70·(bits+1)** — the object verified by `modAddConstGate`'s correctness bundle (dirty-flag pipeline + flag uncompute).

theoremtcount_controlledModAddConstGate

theorem tcount_controlledModAddConstGate (n N c controlIdx flagIdx : Nat) :
    tcount (controlledModAddConstGate (n + 1) N c controlIdx flagIdx)
      = 70 * (n + 2) + 14

*The CONTROLLED Gidney modular add-constant gate: T-count = 70·(bits+1) + 14** (five conditional adds + two Toffolis).

example(example)

example : tcount (addConstGate 3 5) = 42

example(example)

example : tcount (modAddConstGate 2 5 3) = 210

FormalRV.Arithmetic.ModularAdder.GidneyModAddReg

FormalRV/Arithmetic/ModularAdder/GidneyModAddReg.lean

FormalRV.Arithmetic.ModularAdder.GidneyModAddReg — the KEYSTONE: a value-correct REGISTER-register MEASURED modular adder `target := (target + read) % p`. ## Why this exists `GidneySubtractFixup.gidneyModAddFixup` adds a *constant* `(x + c) % p`. The faithful windowed multiplier needs to add the *looked-up register value* `T[window]` into the accumulator mod `N` — a register-register modular add. This file builds it, reusing the subtract-fixup machinery: the front adds the read register (`gidneyAdderMeasured`), clears the read word (`mz`), and subtracts `p` (`addConstMeasured (2^W − p)`); the result is the SAME intermediate the constant fixup produces with `c := read`, so stages 2–4 (copy underflow flag, conditional `+p`, measure-clear) reuse `fixup_value`/`conditionalAddP_bundle` verbatim. `gidneyModAddRegMeasured_correct`: on the clean input `adder_input_F (bits+1) b x` (read `= b`, target `= x`, both `< p`), the low `bits` register decodes to `(x + b) % p`, and the top qubit and the fixup-flag ancilla are released to `0`. Every Toffoli is a measured temporary AND. No `sorry`, no `native_decide`, no axioms beyond the prelude.

defgidneyModAddRegMeasured

def gidneyModAddRegMeasured (bits p : Nat) : EGate

*The COST-OPTIMAL register-register measured modular adder.** `target := (target + b) % p` where the read register holds the **biased addend** `2^W − (p − b)` (the lookup writes this, folding the `−p` into the table — free). Then: add read into target (`gidneyAdderMeasured`, which directly lands `x + b − p (mod 2^W)` with the top qubit `Q = target_idx bits` the underflow flag), clear read (`mz`), copy `Q` into the flag, conditionally add `p` back, measure-clear the flag. TWO measured adds — matching Gidney-2025's `2.5n` register modular-add `addCost`; the separate `addConstMeasured (2^W − p)` of the naive 3-add version is eliminated by the bias.

theoremgidneyModAddRegMeasured_correct

theorem gidneyModAddRegMeasured_correct
    (bits p b x : Nat) (hbits : 1 ≤ bits)
    (hp : 0 < p) (hpH : p ≤ 2 ^ bits) (hx : x < p) (hb : b < p) :
    gidney_target_val bits
        (EGate.applyNat (gidneyModAddRegMeasured bits p)
          (adder_input_F (bits + 1) (2 ^ (bits + 1) - (p - b)) x))
      = (x + b) % p
    ∧ EGate.applyNat (gidneyModAddRegMeasured bits p)
        (adder_input_F (bits + 1) (2 ^ (bits + 1) - (p - b)) x) (target_idx bits) = false
    ∧ EGate.applyNat (gidneyModAddRegMeasured bits p)
        (adder_input_F (bits + 1) (2 ^ (bits + 1) - (p - b)) x) (adder_n_qubits (bits + 1)) = false
    ∧ (∀ i, i < bits → EGate.applyNat (gidneyModAddRegMeasured bits p)

*★ KEYSTONE VALUE-CORRECTNESS ★** — on the read register pre-loaded with the BIASED addend `2^(bits+1) − (p − b)` (`b < p`), the gate computes `target := (x + b) % p`, with the top qubit `Q` and the fixup-flag ancilla released to `0`. TWO measured adds.

theoremmzList_boundedBy

theorem mzList_boundedBy (B : Nat) :
    ∀ (L : List Nat), (∀ x ∈ L, x < B) → EGate.boundedBy B (mzList L)
  | [], _ => by simp [mzList, EGate.boundedBy, Gate.boundedBy]
  | q :: qs, h =>

theoremprepareMaskedP_boundedBy

theorem prepareMaskedP_boundedBy (flagIdx B : Nat) (hflag : flagIdx < B) :
    ∀ (W p : Nat), (∀ k, k < W → read_idx k < B) → Gate.boundedBy B (prepareMaskedP flagIdx W p)
  | 0, _, _ => by simp [prepareMaskedP, Gate.boundedBy]
  | W + 1, p, h =>

theoremconditionalAddP_boundedBy

theorem conditionalAddP_boundedBy (n flagIdx B p : Nat) (hflag : flagIdx < B)
    (hadd : adder_n_qubits (n + 2) ≤ B) (hread : ∀ k, k < n + 2 → read_idx k < B) :
    EGate.boundedBy B (conditionalAddP (n + 2) flagIdx p)

theoremgidneyModAddRegMeasured_boundedBy

theorem gidneyModAddRegMeasured_boundedBy (n p : Nat) :
    EGate.boundedBy (adder_n_qubits (n + 2) + 1) (gidneyModAddRegMeasured (n + 1) p)

*The register-register measured modular adder touches only indices `≤ adder_n_qubits (bits+1)`** (the fixup-flag ancilla), so it fixes everything strictly above — the locality that lets a fold frame the y-register / lookup ancilla / control.

theoremgidneyModAddRegMeasured_preserves_slack

theorem gidneyModAddRegMeasured_preserves_slack
    (n p q : Nat) (hq_lo : 3 * (n + 2) ≤ q) (hq_hi : q < adder_n_qubits (n + 2))
    (f : Nat → Bool) :
    EGate.applyNat (gidneyModAddRegMeasured (n + 1) p) f q = f q

*★ SLACK PRESERVATION ★** — `gidneyModAddRegMeasured (n+1) p` is the identity on the two "middle slots" `{3·(n+2), 3·(n+2)+1}` strictly between the adder's tight bound `3·(n+2)` and the fixup-flag ancilla `adder_n_qubits (n+2) = 3·(n+2)+2`. Neither the front adder/subtract (tight-bounded by `3·(n+2)`), the read-clearing `mz`s (touch `read_idx < 3·(n+2)`), the flag-copy `CX` (control `target_idx (n+1) < 3·(n+2)`, target the flag), the conditional `+p` (touches only the block and the flag), nor the flag-`mz` writes these slots. This is the locality the fold needs at the keystone's FULL congruence range `adder_n_qubits (n+2) + 1`: it pins the slots the value lemma (which speaks of `i < n+1`) and the boundedBy (`≤ flag + 1`) leave unconstrained.

FormalRV.Arithmetic.ModularAdder.GidneyModAddRegWellTyped

FormalRV/Arithmetic/ModularAdder/GidneyModAddRegWellTyped.lean

FormalRV.Arithmetic.ModularAdder.GidneyModAddRegWellTyped — `EGate.WellTypedAt` for the measured Gidney modular adder keystone and its sub-components. The repo only ever proved `EGate.boundedBy` for the measured-adder family; `boundedBy` omits the CX/CCX index DISTINCTNESS conjuncts, so it does NOT imply `WellTypedAt`. These lemmas prove `WellTypedAt` fresh (distinctness of the 3-per-bit `read/target/carry` indices is omega-provable), which the EGate→reversible bridge (`MeasuredEqualsReversibleOnEncoded.eg_wellTyped`) consumes. Also introduces `EGate.WellTypedAt.mono` (missing in the repo; mirrors `EGate.boundedBy_mono`). No `sorry`, no `native_decide`, no axioms beyond the prelude.

theoremEGate.WellTypedAt.mono

theorem EGate.WellTypedAt.mono {dim dim' : Nat} (h_le : dim ≤ dim') :
    ∀ (eg : EGate), EGate.WellTypedAt dim eg → EGate.WellTypedAt dim' eg

theoremfirst_step_wt

theorem first_step_wt : Gate.WellTyped 5 gidney_adder_bit_step_faithful_first

theoreminterior_step_wt

theorem interior_step_wt (i : Nat) (hi : 0 < i) :
    Gate.WellTyped (3 * i + 5) (gidney_adder_bit_step_faithful_interior i)

theoremlast_step_wt

theorem last_step_wt (i : Nat) (hi : 0 < i) :
    Gate.WellTyped (3 * i + 3) (gidney_adder_bit_step_faithful_last i)

theoremforward_with_propagation_wt

theorem forward_with_propagation_wt :
    ∀ k, 0 < k → Gate.WellTyped (3 * k + 2) (gidney_adder_forward_with_propagation k)
  | 0 => by intro h; omega
  | 1 => by intro _; exact Gate.WellTyped.mono first_step_wt (by omega)
  | n + 2 =>

theoremgidney_adder_forward_faithful_full_wt

theorem gidney_adder_forward_faithful_full_wt (n : Nat) :
    Gate.WellTyped (adder_n_qubits (n + 2)) (gidney_adder_forward_faithful_full (n + 2))

theoremfinal_cx_cascade_wt

theorem final_cx_cascade_wt :
    ∀ k, 0 < k → Gate.WellTyped (3 * k) (gidney_final_cx_cascade k)
  | 0 => by intro h; omega
  | 1 =>

theoremmeasFirstReverse_wt

theorem measFirstReverse_wt : EGate.WellTypedAt 5 gidneyMeasFirstReverse

theoremmeasInteriorReverse_wt

theorem measInteriorReverse_wt (i : Nat) (hi : 0 < i) :
    EGate.WellTypedAt (3 * i + 5) (gidneyMeasInteriorReverse i)

theoremmeasLastReverse_wt

theorem measLastReverse_wt (i : Nat) (hi : 0 < i) :
    EGate.WellTypedAt (3 * i + 3) (gidneyMeasLastReverse i)

theoremmeasPropReverse_wt

theorem measPropReverse_wt :
    ∀ K, 0 < K → EGate.WellTypedAt (3 * K + 2) (gidneyMeasPropReverse K)
  | 0 => by intro h; omega
  | 1 => by intro _; exact EGate.WellTypedAt.mono (by omega) _ measFirstReverse_wt
  | n + 2 =>

theoremmeasFullReverse_wt

theorem measFullReverse_wt (n : Nat) :
    EGate.WellTypedAt (adder_n_qubits (n + 2)) (gidneyMeasFullReverse (n + 2))

theoremgidneyAdderMeasured_wellTypedAt

theorem gidneyAdderMeasured_wellTypedAt (n q_start dim : Nat)
    (hdim : adder_n_qubits (n + 2) ≤ dim) :
    EGate.WellTypedAt dim (gidneyAdderMeasured (n + 2) q_start)

theoremloadConst_wellTyped

theorem loadConst_wellTyped (B : Nat) (hB : 0 < B) :
    ∀ (W d : Nat), (∀ k, k < W → read_idx k < B) → Gate.WellTyped B (loadConst W d)
  | 0, _, _ => by show 0 < B; exact hB
  | W + 1, d, h =>

theoremprepareMaskedP_wellTyped

theorem prepareMaskedP_wellTyped (flagIdx B : Nat) (hflag : flagIdx < B) :
    ∀ (W p : Nat), (∀ k, k < W → read_idx k < B) → (∀ k, k < W → flagIdx ≠ read_idx k) →
      Gate.WellTyped B (prepareMaskedP flagIdx W p)
  | 0, _, _, _ => by show 0 < B; omega
  | W + 1, p, h, hne =>

theoremmzList_wellTypedAt

theorem mzList_wellTypedAt (B : Nat) (hB : 0 < B) :
    ∀ (L : List Nat), (∀ x ∈ L, x < B) → EGate.WellTypedAt B (mzList L)
  | [], _ => by show 0 < B; exact hB
  | q :: qs, h =>

theoremaddConstMeasured_wellTypedAt

theorem addConstMeasured_wellTypedAt (n d dim : Nat)
    (hdim : adder_n_qubits (n + 2) ≤ dim) :
    EGate.WellTypedAt dim (addConstMeasured (n + 2) d)

theoremconditionalAddP_wellTypedAt

theorem conditionalAddP_wellTypedAt (n flagIdx p dim : Nat)
    (hflag : flagIdx < dim) (hdim : adder_n_qubits (n + 2) ≤ dim)
    (hne : ∀ k, k < n + 2 → flagIdx ≠ read_idx k) :
    EGate.WellTypedAt dim (conditionalAddP (n + 2) flagIdx p)

theoremgidneyModAddRegMeasured_wellTypedAt

theorem gidneyModAddRegMeasured_wellTypedAt (n p dim : Nat)
    (hdim : adder_n_qubits (n + 2) + 1 ≤ dim) :
    EGate.WellTypedAt dim (gidneyModAddRegMeasured (n + 1) p)

FormalRV.Arithmetic.ModularAdder.GidneySubtractFixup

FormalRV/Arithmetic/ModularAdder/GidneySubtractFixup.lean

FormalRV.Arithmetic.ModularAdder.GidneySubtractFixup ───────────────────────────────────────────────────── THE faithful Gidney-2025 (arXiv:2505.15917) `2.5n`-Toffoli modular adder — the SUBTRACT-with-underflow + lookup-fixup construction the paper actually uses for `X += c (mod p)` (main.tex L972–975). Follows the Def / Correctness / Resource spine of the sibling `ModularAdder.Cuccaro` and `ModularAdder.Gidney`. • `GidneySubtractFixup/Def.lean` — `gidneyModAddFixup bits p c` (subtract `p−c` → copy underflow `Q` → conditional `+p` → measure-uncompute the flag), built on the verified MEASURED Gidney adder (`FormalRV.Arithmetic.MeasuredAdder.gidneyAdderMeasured`). • `GidneySubtractFixup/Correctness.lean` — `gidneyModAddFixup_correct`: the low `bits` target register decodes to `(x + c) % p` for `x < p`, `c < p`, with the extra top qubit `Q` AND the fixup flag released to `0`. • `GidneySubtractFixup/Resource.lean` — `toffoli_gidneyModAddFixup = 2·(bits+1)` (two `n`-Toffoli measured adds), and `gidneyModAddFixup_meets_g2025_modadd` linking it to the paper's `g2025_modadd_toffoli_halves` (`= 2.5n`). This is a STANDALONE faithful model of the paper's modular adder; the verified Shor multiplier still uses the Cuccaro/SQIR family (`ModularAdder.Cuccaro`). The controlled additions of dlogs feeding this adder are bridged to the verified residue by `FormalRV.CFS.dlog_reduction_eq_residueAccumulate`. Refs: Gidney 2025 arXiv:2505.15917 main.tex L972–975 (construction), L977 (`2.5n` vs Berry `3.5n`); Gidney arXiv:1709.06648 (temporary AND).

(no documented top-level declarations)

FormalRV.Arithmetic.ModularAdder.GidneySubtractFixup.Correctness

FormalRV/Arithmetic/ModularAdder/GidneySubtractFixup/Correctness.lean

FormalRV.Arithmetic.ModularAdder.GidneySubtractFixup.Correctness ────────────────────────────────────────────────────────────── VALUE correctness of the faithful Gidney-2025 subtract-fixup modular adder: the low `bits` target register decodes to `(x + c) % p` for `x < p`, `c < p`, with the extra top qubit `Q` released to `0`. See `Def.lean` for the construction. The two additions are the MEASURED Gidney adder, REUSED verbatim (`gidneyAdderMeasured_target_val`, `gidneyAdderMeasured_correct`); the only NEW arithmetic is the two's-complement underflow case-split (`fixup_value`).

theoremforward_first_bnd

private theorem forward_first_bnd : Gate.boundedBy 5 gidney_adder_bit_step_faithful_first

theoremforward_interior_bnd

private theorem forward_interior_bnd (i : Nat) :
    Gate.boundedBy (3 * i + 5) (gidney_adder_bit_step_faithful_interior i)

theoremforward_last_bnd

private theorem forward_last_bnd (i : Nat) :
    Gate.boundedBy (3 * i + 3) (gidney_adder_bit_step_faithful_last i)

theoremforward_prop_bnd

private theorem forward_prop_bnd :
    ∀ k, Gate.boundedBy (3 * k + 2) (gidney_adder_forward_with_propagation k)
  | 0 => trivial
  | 1 => Gate.boundedBy_mono (by omega) _ forward_first_bnd
  | n + 2 => ⟨Gate.boundedBy_mono (by omega) _ (forward_prop_bnd (n + 1)),
             Gate.boundedBy_mono (by omega) _ (forward_interior_bnd (n + 1))⟩

theoremforward_full_bnd_tight

theorem forward_full_bnd_tight (n : Nat) :
    Gate.boundedBy (3 * (n + 2)) (gidney_adder_forward_faithful_full (n + 2))

The faithful forward sweep at width `n+2` is bounded by `3·(n+2)` (tight).

theoremfinal_cx_bnd_tight

theorem final_cx_bnd_tight : ∀ k, Gate.boundedBy (3 * k) (gidney_final_cx_cascade k)
  | 0 => trivial
  | k + 1 =>

The final-CX cascade of length `k` is bounded by `3·k` (tight).

theoremmeasFirst_bnd

private theorem measFirst_bnd : EGate.boundedBy 5 gidneyMeasFirstReverse

theoremmeasInterior_bnd

private theorem measInterior_bnd (i : Nat) :
    EGate.boundedBy (3 * i + 5) (gidneyMeasInteriorReverse i)

theoremmeasLast_bnd

private theorem measLast_bnd (i : Nat) :
    EGate.boundedBy (3 * i + 3) (gidneyMeasLastReverse i)

theoremmeasProp_bnd

private theorem measProp_bnd :
    ∀ K, EGate.boundedBy (3 * K + 2) (gidneyMeasPropReverse K)
  | 0 => by simp [gidneyMeasPropReverse, EGate.boundedBy, Gate.boundedBy]
  | 1 => EGate.boundedBy_mono (by omega) _ measFirst_bnd
  | n + 2 =>

theoremmeasFullReverse_bnd_tight

theorem measFullReverse_bnd_tight (n : Nat) :
    EGate.boundedBy (3 * (n + 2)) (gidneyMeasFullReverse (n + 2))

The measured full-reverse cascade at width `n+2` is bounded by `3·(n+2)` (tight).

theoremgidneyAdderMeasured_boundedBy_tight

theorem gidneyAdderMeasured_boundedBy_tight (n q_start : Nat) :
    EGate.boundedBy (3 * (n + 2)) (gidneyAdderMeasured (n + 2) q_start)

*TIGHT boundedness of the measured adder:** `gidneyAdderMeasured (n+2) q_start` references only indices `< 3·(n+2)`, so it is the identity at every index `≥ 3·(n+2)` — in particular the two free slots `read_idx (n+2)` and `target_idx (n+1)` above the `(n+2)`-bit register.

theoremloadConst_preserves_outside

theorem loadConst_preserves_outside
    (W d : Nat) (f : Nat → Bool) (q : Nat) (h : ∀ i, i < W → q ≠ read_idx i) :
    Gate.applyNat (loadConst W d) f q = f q

`loadConst W d` is the identity outside the read window `read_idx [0, W)`.

theoremloadConst_at_read_idx

theorem loadConst_at_read_idx
    (W d : Nat) (f : Nat → Bool) (j : Nat) (hj : j < W) :
    Gate.applyNat (loadConst W d) f (read_idx j) = xor (f (read_idx j)) (d.testBit j)

At `read_idx j` (for `j < W`), `loadConst W d` XORs in `d.testBit j`.

theoremloadConst_clean_eq_input

theorem loadConst_clean_eq_input (W d x : Nat) :
    Gate.applyNat (loadConst W d) (adder_input_F W 0 x) = adder_input_F W d x

*Loading turns the clean input into the two-operand input.** On the clean input `adder_input_F W 0 x` (read register `0`, target = `x`, carries `0`), `loadConst W d` produces exactly `adder_input_F W d x`.

theoremgidneyAdderMeasured_read

theorem gidneyAdderMeasured_read
    (n a b q_start : Nat) (ha : a < 2 ^ (n + 2)) (hb : b < 2 ^ (n + 2)) :
    ∀ i, i < n + 2 →
      EGate.applyNat (gidneyAdderMeasured (n + 2) q_start)
          (adder_input_F (n + 2) a b) (read_idx i) = a.testBit i

*The measured Gidney adder preserves the read register** (`= a`).

theoremloadConst_target

theorem loadConst_target (W d : Nat) (f : Nat → Bool) (i : Nat) :
    Gate.applyNat (loadConst W d) f (target_idx i) = f (target_idx i)

`loadConst W d` is the identity at every target/carry index (it only touches read indices).

theoremloadConst_carry

theorem loadConst_carry (W d : Nat) (f : Nat → Bool) (i : Nat) :
    Gate.applyNat (loadConst W d) f (carry_idx i) = f (carry_idx i)

theoremaddConstMeasured_bundle

theorem addConstMeasured_bundle (n d x : Nat) (hd : d < 2 ^ (n + 2)) (hx : x < 2 ^ (n + 2)) :
    ∀ i, i < n + 2 →
      (EGate.applyNat (addConstMeasured (n + 2) d) (adder_input_F (n + 2) 0 x) (target_idx i)
        = (x + d).testBit i)
      ∧ (EGate.applyNat (addConstMeasured (n + 2) d) (adder_input_F (n + 2) 0 x) (read_idx i)
        = false)
      ∧ (EGate.applyNat (addConstMeasured (n + 2) d) (adder_input_F (n + 2) 0 x) (carry_idx i)
        = false)

*`addConstMeasured` per-index bundle.** On the clean input `adder_input_F (n+2) 0 x`, the gate `addConstMeasured (n+2) d` (load `d`, measured-add, unload) gives, for every `i < n+2`: • `target[i] = (x + d).testBit i`, • `read[i] = false` (read register restored to `0`), • `carry[i] = false` (carries released).

theoremaddConstMeasured_target_val

theorem addConstMeasured_target_val (n d x : Nat) (hd : d < 2 ^ (n + 2)) (hx : x < 2 ^ (n + 2)) :
    gidney_target_val (n + 2)
        (EGate.applyNat (addConstMeasured (n + 2) d) (adder_input_F (n + 2) 0 x))
      = (x + d) % 2 ^ (n + 2)

*`addConstMeasured` decoded value.** On the clean input, the low `n+2` target register decodes to `(x + d) % 2^(n+2)`.

theoremloadConst_boundedBy

theorem loadConst_boundedBy (W d : Nat) :
    Gate.boundedBy (3 * W) (loadConst W d)

`loadConst W d` references only read indices `read_idx j = 3j` for `j < W`, hence is bounded by `3·W` (its highest target is `read_idx (W-1) = 3W-3 < 3W`).

theoremaddConstMeasured_boundedBy

theorem addConstMeasured_boundedBy (n d : Nat) :
    EGate.boundedBy (adder_n_qubits (n + 2)) (addConstMeasured (n + 2) d)

*`addConstMeasured W d` is bounded by `adder_n_qubits W`.**

theoremaddConstMeasured_boundedBy_tight

theorem addConstMeasured_boundedBy_tight (n d : Nat) :
    EGate.boundedBy (3 * (n + 2)) (addConstMeasured (n + 2) d)

*TIGHT boundedness of `addConstMeasured`:** bounded by `3·(n+2)`, so it is the identity at every index `≥ 3·(n+2)` (the two free slots above the register).

theoremprepareMaskedP_preserves_outside

theorem prepareMaskedP_preserves_outside
    (flagIdx W p : Nat) (f : Nat → Bool) (q : Nat)
    (hflag : ∀ i, i < W → flagIdx ≠ read_idx i)
    (h : ∀ i, i < W → q ≠ read_idx i) :
    Gate.applyNat (prepareMaskedP flagIdx W p) f q = f q

`prepareMaskedP flagIdx W p` is the identity outside the read window, PROVIDED the control `flagIdx` is itself outside the read window (so the CXs do not modify it).

theoremprepareMaskedP_at_read_idx

theorem prepareMaskedP_at_read_idx
    (flagIdx W p : Nat) (f : Nat → Bool) (j : Nat) (hj : j < W)
    (hflag : ∀ i, i < W → flagIdx ≠ read_idx i) :
    Gate.applyNat (prepareMaskedP flagIdx W p) f (read_idx j)
      = xor (f (read_idx j)) (f flagIdx && p.testBit j)

At `read_idx j` (for `j < W`), `prepareMaskedP flagIdx W p` XORs in `(f flagIdx) ∧ p.testBit j`, provided `flagIdx` is outside the read window.

theoremmask_testBit

private theorem mask_testBit (flag : Bool) (p j : Nat) :
    (flag && p.testBit j) = (if flag then p else 0).testBit j

A constant's bit-`j` masked by a boolean flag: `(flag ∧ p.testBit j) = (if flag then p else 0).testBit j`.

theoremprepareMaskedP_eq_adder_input

theorem prepareMaskedP_eq_adder_input
    (W p v flagIdx : Nat) (g : Nat → Bool)
    (hflag : adder_n_qubits W ≤ flagIdx)
    (hblock : ∀ q, q < adder_n_qubits W → g q = adder_input_F W 0 v q)
    (q : Nat) (hq : q < adder_n_qubits W) :
    Gate.applyNat (prepareMaskedP flagIdx W p) g q
      = adder_input_F W (if g flagIdx then p else 0) v q

*The masked prepare turns a block-clean state into the gated-addend input.** If `g` equals `adder_input_F W 0 v` on every adder-block index (read `0`, target = bits of `v`, carries `0`) and the flag control sits out-of-band (`adder_n_qubits W ≤ flagIdx`), then `prepareMaskedP flagIdx W p` produces, on every block index `q < adder_n_qubits W`, exactly `adder_input_F W (if g flagIdx then p else 0) v` — the read register loaded with the GATED constant, target and carries unchanged.

theoremGate_boundedBy_preserves_ge

theorem Gate_boundedBy_preserves_ge (B : Nat) :
    ∀ (gate : Gate), Gate.boundedBy B gate →
      ∀ (f : Nat → Bool) (q : Nat), B ≤ q → Gate.applyNat gate f q = f q

A `Gate` bounded by `B` preserves every index `≥ B`.

theoremEGate_boundedBy_preserves_ge

theorem EGate_boundedBy_preserves_ge (B : Nat) :
    ∀ (eg : EGate), EGate.boundedBy B eg →
      ∀ (f : Nat → Bool) (q : Nat), B ≤ q → EGate.applyNat eg f q = f q

An `EGate` bounded by `B` preserves every index `≥ B`.

theoremconditionalAddP_bundle

theorem conditionalAddP_bundle
    (n p v flagIdx : Nat) (g : Nat → Bool)
    (hflag : adder_n_qubits (n + 2) ≤ flagIdx)
    (hblock : ∀ q, q < adder_n_qubits (n + 2) → g q = adder_input_F (n + 2) 0 v q)
    (hp : p < 2 ^ (n + 2)) (hv : v < 2 ^ (n + 2)) :
    (∀ i, i < n + 2 →
      (EGate.applyNat (conditionalAddP (n + 2) flagIdx p) g (target_idx i)
        = (v + (if g flagIdx then p else 0)).testBit i)
      ∧ (EGate.applyNat (conditionalAddP (n + 2) flagIdx p) g (read_idx i) = false)
      ∧ (EGate.applyNat (conditionalAddP (n + 2) flagIdx p) g (carry_idx i) = false))
    ∧ EGate.applyNat (conditionalAddP (n + 2) flagIdx p) g flagIdx = g flagIdx

*`conditionalAddP` per-index bundle.** Let `g` be block-clean — equal to `adder_input_F (n+2) 0 v` on every adder-block index (read `0`, target = bits of `v`, carries `0`) — with the flag control out-of-band (`adder_n_qubits (n+2) ≤ flagIdx`) and `v < 2^(n+2)`, `p < 2^(n+2)`. Then `conditionalAddP (n+2) flagIdx p` gives, for every `i < n+2`: • `target[i] = (v + (if g flagIdx then p else 0)).testBit i`, • `read[i] = false`, • `carry[i] = false`, and the flag bit is preserved (`out (flagIdx) = g flagIdx`).

theoremsubtract_underflow

theorem subtract_underflow (H p c x : Nat)
    (hp : 0 < p) (hpH : p ≤ H) (hx : x < p) (hc : c < p) :
    (x + c < p → (x + (2 * H - (p - c))) % (2 * H) = x + c + 2 * H - p
                  ∧ H ≤ x + c + 2 * H - p)
    ∧ (p ≤ x + c → (x + (2 * H - (p - c))) % (2 * H) = x + c - p
                    ∧ x + c - p < H)

*Subtraction result range + underflow flag.** For `x < p`, `c < p`, `0 < p ≤ H` (`H = 2^bits`), the subtraction `v1 = (x + (2·H − (p−c))) % (2·H)` satisfies: if `x + c ≥ p` then `v1 = x + c − p < H`; if `x + c < p` then `v1 = x + c − p + 2·H` with `H ≤ v1 < 2·H`.

theoremfixup_value

theorem fixup_value (H p c x : Nat)
    (hp : 0 < p) (hpH : p ≤ H) (hx : x < p) (hc : c < p) :
    ((x + (2 * H - (p - c))) % (2 * H)
      + (if (x + (2 * H - (p - c))) % (2 * H) ≥ H then p else 0)) % (2 * H)
      = (x + c) % p
    ∧ ((x + (2 * H - (p - c))) % (2 * H)
        + (if (x + (2 * H - (p - c))) % (2 * H) ≥ H then p else 0)) % (2 * H) < H

*The full fixup value = `(x+c) % p`.** Combining the two branches of `subtract_underflow` with the conditional add-back of `p`.

theoremtestBit_top_eq_threshold

theorem testBit_top_eq_threshold (bits v : Nat) (hv : v < 2 ^ (bits + 1)) :
    v.testBit bits = decide (2 ^ bits ≤ v)

*Top bit = high-half threshold.** For `v < 2^(bits+1)`, the bit at position `bits` of `v` equals `decide(2^bits ≤ v)`.

theoremgidney_target_val_low

theorem gidney_target_val_low (bits S : Nat) (f : Nat → Bool)
    (h : ∀ i, i < bits → f (target_idx i) = S.testBit i) :
    gidney_target_val bits f = S % 2 ^ bits

The `bits`-bit target register decoder restricted to the low `bits` of a `(bits+1)`-bit value: if `v < 2^bits`, then `gidney_target_val bits` reads `v` back, while `gidney_target_val (bits+1)` reads `v` too (the top bit being `0`). We only need the former. The decoder reads `target_idx i` for `i < bits`, so it is determined by those bits alone.

theoremgidneyModAddFixup_correct

theorem gidneyModAddFixup_correct
    (bits p c x : Nat) (hbits : 1 ≤ bits)
    (hp : 0 < p) (hpH : p ≤ 2 ^ bits) (hx : x < p) (hc : c < p) :
    gidney_target_val bits
        (EGate.applyNat (gidneyModAddFixup bits p c) (adder_input_F (bits + 1) 0 x))
      = (x + c) % p
    ∧ EGate.applyNat (gidneyModAddFixup bits p c) (adder_input_F (bits + 1) 0 x)
        (target_idx bits) = false
    ∧ EGate.applyNat (gidneyModAddFixup bits p c) (adder_input_F (bits + 1) 0 x)
        (adder_n_qubits (bits + 1)) = false

*★ HEADLINE — the faithful Gidney-2025 subtract-fixup modular adder is correct.** For `1 ≤ bits`, `0 < p ≤ 2^bits`, `x < p`, `c < p`, the gate `gidneyModAddFixup bits p c` applied to the clean input `adder_input_F (bits+1) 0 x` (target = `x`, read `0`, carries `0`, flag `0`): 1. decodes the low `bits` target register to `(x + c) % p`; 2. releases the extra top qubit `Q = target_idx bits` to `0`; 3. releases the fixup flag ancilla (`adder_n_qubits (bits+1)`) to `0`.

FormalRV.Arithmetic.ModularAdder.GidneySubtractFixup.Def

FormalRV/Arithmetic/ModularAdder/GidneySubtractFixup/Def.lean

FormalRV.Arithmetic.ModularAdder.GidneySubtractFixup.Def ───────────────────────────────────────────────────────── THE definition of the FAITHFUL Gidney-2025 (arXiv:2505.15917, main.tex L972–975) `2.5n`-Toffoli modular adder — the **subtract-with-underflow + lookup-fixup** construction the paper actually uses for `X += c (mod p)`. ## The paper's construction (main.tex L972–975, verbatim) To do `X += c (mod p)` with `x < p`, `c < p` (X held in `bits = len(p)` value bits plus ONE extra top qubit `Q`, total width `W = bits + 1`): 1. Flip the addition into a SUBTRACTION of `T2 = p − c`: compute `X -= T2`. The extra qubit `Q` (target bit `bits`) catches the underflow — `Q = 1` iff `x − T2 < 0`, i.e. iff `x < p − c`, i.e. iff `x + c < p`. 2. Complete the modular addition with `X += [0, p][Q]` — a conditional add of the constant `p` controlled by `Q` (add `p` iff `Q = 1`). 3. Uncompute `Q` by measurement-based uncomputation (Toffoli-free). Correctness: `X -= (p−c)` gives `x + c − p` (2's complement). If `x+c ≥ p` there is no underflow (`Q=0`) and the result is `x+c−p = (x+c) mod p`; if `x+c < p` there is underflow (`Q=1`) and `+= p` gives `x+c−p+p = x+c = (x+c) mod p`. Either way the low `bits` register holds `(x+c) mod p`, and the conditional add-back clears `Q` back to `0` automatically (both `x+c−p` and `x+c` are `< 2^bits`). ## How it is built here (composition, maximal reuse) The two additions are the MEASURED Gidney ripple-carry adder `FormalRV.Arithmetic.MeasuredAdder.gidneyAdderMeasured` (`n` Toffoli per add, the carry ancillas released by Gidney's measurement-based AND-uncompute). The conditional `+p` is realised WITHOUT any CCCX by the masked-constant-read idiom: a CX cascade copies `Q ∧ p.testBit i` into the adder's read register, the ordinary measured adder runs, and the cascade is un-applied (CX is involutive). The `Q` uncompute is FREE: after the fixup, `Q` is already `0`. Where to look next: • Correctness : `GidneySubtractFixup/Correctness.lean` (value = `(x+c) % p`) • Resource : `GidneySubtractFixup/Resource.lean` (Toffoli count + paper link) Refs: Gidney 2025 arXiv:2505.15917 main.tex L972–975 (construction), L977 (`2.5n` vs Berry `3.5n`); the controlled additions of dlogs feeding this adder are bridged to the verified residue by `FormalRV.CFS.dlog_reduction_eq_residueAccumulate`.

defgidneyModAddFixupSpec

def gidneyModAddFixupSpec (p c x : Nat) : Nat

Spec for modular addition by a constant `c` under modulus `p`.

defloadConst

def loadConst : Nat → Nat → Gate
  | 0,     _ => Gate.I
  | k + 1, d => Gate.seq (loadConst k d)
                  (if d.testBit k then Gate.X (read_idx k) else Gate.I)

Load `d` into the read register: `X (read_idx k)` for each `k < W` with `d.testBit k`. Applied to a clean (`read = 0`) register it writes the bits of `d`; applied again it clears them (X involutive).

defaddConstMeasured

def addConstMeasured (W d : Nat) : EGate

Add the constant `d` (mod `2^W`) to the target via the measured Gidney adder: load `d` into the read register, run the measured adder, unload. Computes `target := (target + d) % 2^W` with the read register restored to `0` and the carry register released.

defprepareMaskedP

def prepareMaskedP (flagIdx : Nat) : Nat → Nat → Gate
  | 0,     _ => Gate.I
  | k + 1, p =>
      Gate.seq (prepareMaskedP flagIdx k p)
               (if p.testBit k then Gate.CX flagIdx (read_idx k) else Gate.I)

Prepare the read register by XORing each `read_idx k` (for `k < W`) with `flag ∧ p.testBit k`, where the control `flag` lives at qubit index `flagIdx`. A CX cascade conditioned on the bit pattern of `p`.

defconditionalAddP

def conditionalAddP (W flagIdx p : Nat) : EGate

Conditional add of the constant `p` controlled by the qubit at `flagIdx`: prepare the read register with the masked constant `flag ∧ p`, run the measured Gidney adder at width `W`, un-prepare. Computes `target := (target + (if flag then p else 0)) % 2^W` with the read register restored to `0` and carries released.

defgidneyModAddFixup

def gidneyModAddFixup (bits p c : Nat) : EGate

FormalRV.Arithmetic.ModularAdder.GidneySubtractFixup.Resource

FormalRV/Arithmetic/ModularAdder/GidneySubtractFixup/Resource.lean

FormalRV.Arithmetic.ModularAdder.GidneySubtractFixup.Resource ────────────────────────────────────────────────────────────── COUNT theorem for the faithful Gidney-2025 subtract-fixup modular adder, and its link to the paper's headline `2.5n` figure. ## The exact count `gidneyModAddFixup bits p c` is TWO measured Gidney adds (the subtraction of `p−c` and the conditional add-back of `p`) plus Toffoli-FREE glue (the constant/masked load cascades are X/CX, the flag copy is a CX, the flag release is a measurement). Each measured add at width `W = bits + 1` costs `W` Toffoli (`toffoli_gidneyAdderMeasured`), so the total is exactly toffoli (gidneyModAddFixup bits p c) = 2·(bits + 1) (= `2n` essentially). ## Relation to the paper Gidney 2025 (main.tex L977) charges `2.5n` Toffoli to its modular adder (the audit constant `g2025_modadd_toffoli_halves n = 5n` half-units, i.e. `2.5n` full), versus Berry et al.'s `3.5n`. The `2.5n` figure INCLUDES a deferred phase-correction overhead; the paper itself notes the construction is `2n` "if the phase correction is deferred". Our measurement-based Boolean construction realises exactly that `2n` lower variant — `2·(bits + 1)` Toffoli, two `n`-Toffoli measured adds — so it MEETS (indeed BEATS, for `bits ≥ 4`) the paper's `2.5n` headline. `gidneyModAddFixup_meets_g2025_modadd` states the head-to-head in the paper's half-Toffoli currency: `2 · toffoli ≤ g2025_modadd_toffoli_halves bits` for `4 ≤ bits`.

theoremtcount_loadConst

theorem tcount_loadConst : ∀ (W d : Nat), Gate.tcount (loadConst W d) = 0
  | 0,     _ => rfl
  | k + 1, d =>

The constant-load cascade is Toffoli-free (only `X` / `I`).

theoremtcount_prepareMaskedP

theorem tcount_prepareMaskedP : ∀ (flagIdx W p : Nat),
    Gate.tcount (prepareMaskedP flagIdx W p) = 0
  | _,       0,     _ => rfl
  | flagIdx, k + 1, p =>

The masked-prepare cascade is Toffoli-free (only `CX` / `I`).

theoremtcount_addConstMeasured

theorem tcount_addConstMeasured (n d : Nat) :
    EGate.tcount (addConstMeasured (n + 2) d) = 7 * (n + 2)

`addConstMeasured W d` costs exactly `W` Toffoli (the measured add; the load cascades are Toffoli-free).

theoremtcount_conditionalAddP

theorem tcount_conditionalAddP (n flagIdx p : Nat) :
    EGate.tcount (conditionalAddP (n + 2) flagIdx p) = 7 * (n + 2)

`conditionalAddP W flagIdx p` costs exactly `W` Toffoli (one measured add; the masked prepare/un-prepare cascades are Toffoli-free).

theoremtoffoli_gidneyModAddFixup

theorem toffoli_gidneyModAddFixup (n p c : Nat) :
    EGate.toffoli (gidneyModAddFixup (n + 1) p c) = 2 * (n + 2)

*★ THE COUNT — `gidneyModAddFixup bits p c` is exactly `2·(bits+1)` Toffoli.** Two measured Gidney adds at width `bits+1` (`bits+1` Toffoli each); the constant / masked-load cascades, the flag-copy CX, and the flag-release measurement are all Toffoli-free. With `bits = n+1` this is `2·(n+2)`.

theoremgidneyModAddFixup_meets_g2025_modadd

theorem gidneyModAddFixup_meets_g2025_modadd (n : Nat) (hn : 3 ≤ n) :
    2 * EGate.toffoli (gidneyModAddFixup (n + 1) 0 0)
      ≤ FormalRV.Audit.Gidney2025.g2025_modadd_toffoli_halves (n + 1)

*The fixup adder MEETS (and beats) Gidney's `2.5n`.** In the paper's half-Toffoli currency, twice the verified Toffoli count of `gidneyModAddFixup` (i.e. its cost in half-units) is `≤` the paper's `g2025_modadd_toffoli_halves bits = 5·bits` for `bits ≥ 4`. Our measurement-based construction realises the paper's deferred-phase-correction `2n` variant (two `n`-Toffoli measured adds), strictly below the `2.5n` headline once `bits ≥ 4`.

FormalRV.Arithmetic.ObliviousRunwayAdder

FormalRV/Arithmetic/ObliviousRunwayAdder.lean

FormalRV.Arithmetic.ObliviousRunwayAdder ──────────────────────────────────────── `runwayAddK gSep k : Gate` is a verified, fully-constructed segmented quantum adder. It adds two `k`-segment numbers (each segment `gSep` data bits, `n = k·gSep`) with carries DEFERRED into per-segment runway bits rather than propagated across the whole register — the core primitive of Gidney's oblivious-carry-runway scheme. Import this umbrella to get the whole verified adder for auditing any paper that uses oblivious carry runways. See `README.md` for the spine (which file holds which headline theorem) and the honest-scope ledger.

(no documented top-level declarations)

FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth

FormalRV/Arithmetic/ObliviousRunwayAdder/ParallelDepth.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth ────────────────────────────────────────────────────── Thin re-export shim. A **PARALLEL (ASAP critical-path) depth** measure on the `Gate` IR and the oblivious-carry-runway adder's depth advantage proved against it (formerly a single 693-line file) has been split into four submodules — for per-file compile memory only — each keeping the SAME module namespace `FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth`. Every declaration, statement, name and proof is preserved VERBATIM: • `…ParallelDepth.Scheduler` — §1–§5: gate support `supp`, the ASAP scheduler (`tick`/`sched`), `maxOver` algebra, the disjoint-`seq` law `parallelDepth_seq_disjoint`, and `parallelDepth_le_depth`. • `…ParallelDepth.CuccaroSupport` — §6–§7: Cuccaro support containment and runway-segment support/disjointness (`runwayAddK_segAdd_disjoint`). • `…ParallelDepth.Shift` — §8–§9: shift-invariance of `parallelDepth` and base-independence of the Cuccaro circuit's parallel depth. • `…ParallelDepth.Headline` — §10: the headline `parallelDepth_runwayAddK_eq`. `Gate.depth` (in `Core/Gate.lean`) SUMS over `seq`, so it is the SEQUENTIAL gate count: a `seq` of `k` disjoint segments costs `k ×` one segment — it can never show a parallelism win. `parallelDepth` here instead schedules every gate As-Soon-As-Possible, so two gates on DISJOINT qubits do not delay each other and `parallelDepth (runwayAddK gSep k)` equals ONE segment's depth, INDEPENDENT of `k` — the realized oblivious-carry depth advantage. See `Example.lean`. Importing `ParallelDepth` re-exports all four, so existing importers are unchanged.

(no documented top-level declarations)

FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth.CuccaroSupport

FormalRV/Arithmetic/ObliviousRunwayAdder/ParallelDepth/CuccaroSupport.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth.CuccaroSupport ────────────────────────────────────────────────────── Submodule of `ParallelDepth` (split out for per-file compile memory). Contains §6–§7: the Cuccaro support-containment bounds (`supp_cuccaro_maj_chain` … `supp_cuccaro_subset`) and the runway-segment support/disjointness lemmas (`supp_segAdd_subset` … `runwayAddK_segAdd_disjoint`). Re-exported VERBATIM from the original `ParallelDepth.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

theoremsupp_cuccaro_maj_chain

theorem supp_cuccaro_maj_chain (n q : Nat) :
    ∀ p, p ∈ supp (cuccaro_maj_chain n q) → q ≤ p ∧ p < q + 2 * n + 1

A two-sided support bound: every qubit of `cuccaro_maj_chain n q` lies in `[q, q + 2n + 1)` (= `[q, q + span n)`).

theoremsupp_cuccaro_uma_chain_reverse

theorem supp_cuccaro_uma_chain_reverse (n q : Nat) :
    ∀ p, p ∈ supp (cuccaro_uma_chain_reverse n q) → q ≤ p ∧ p < q + 2 * n + 1

Every qubit of `cuccaro_uma_chain_reverse n q` lies in `[q, q + 2n + 1)`.

theoremsupp_cuccaro_subset

theorem supp_cuccaro_subset (n base : Nat) :
    ∀ p, p ∈ supp (cuccaroAdder.circuit n base) → base ≤ p ∧ p < base + (2 * n + 1)

*Support of the Cuccaro `n`-bit adder is contained in its block.** Every qubit it touches lies in `[base, base + (2n+1)) = [base, base + span n)`.

theoremsupp_segAdd_subset

theorem supp_segAdd_subset (gSep m : Nat) :
    ∀ p, p ∈ supp (segAdd gSep m) →
      segBase gSep m ≤ p ∧ p < segBase gSep m + (2 * gSep + 3)

`segAdd gSep m` (a width-`(gSep+1)` Cuccaro at `segBase gSep m`) touches only qubits in its segment block `[segBase m, segBase m + (2·gSep+3))`.

theoremsupp_runwayAddK_lt

theorem supp_runwayAddK_lt (gSep : Nat) :
    ∀ (k : Nat) p, p ∈ supp (runwayAddK gSep k) → p < segBase gSep k

`runwayAddK gSep k` (segments `0…k-1`) touches only qubits strictly below `segBase gSep k = k·stride`.

theoremrunwayAddK_segAdd_disjoint

theorem runwayAddK_segAdd_disjoint (gSep k : Nat) :
    ∀ x, x ∈ supp (runwayAddK gSep k) → x ∉ supp (segAdd gSep k)

*Segment disjointness.** `runwayAddK gSep k` and `segAdd gSep k` touch no common qubit: the former lives below `segBase k`, the latter at or above it.

FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth.Headline

FormalRV/Arithmetic/ObliviousRunwayAdder/ParallelDepth/Headline.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth.Headline ────────────────────────────────────────────────────── Submodule of `ParallelDepth` (split out for per-file compile memory). Contains §10: the headline that the runway adder's parallel depth is constant in `k` (`parallelDepth_runwayAddK_eq_max` … `parallelDepth_runwayAddK_eq`). Re-exported VERBATIM from the original `ParallelDepth.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

theoremparallelDepth_runwayAddK_eq_max

theorem parallelDepth_runwayAddK_eq_max (gSep k : Nat) :
    parallelDepth (runwayAddK gSep (k + 1))
      = max (parallelDepth (runwayAddK gSep k)) (parallelDepth (segAdd gSep k))

*THE STRUCTURAL FACT (max, not sum).** Adding one more disjoint segment does not ADD to the parallel depth — it is the `max` with the new segment's depth. This alone proves no segment serializes against the others.

theoremparallelDepth_runwayAddK_eq

theorem parallelDepth_runwayAddK_eq (gSep : Nat) :
    ∀ k, 1 ≤ k →
      parallelDepth (runwayAddK gSep k)
        = parallelDepth (cuccaroAdder.circuit (gSep + 1) (segBase gSep 0))

*HEADLINE — the realized depth advantage.** For `k ≥ 1`, the ASAP parallel depth of the `k`-segment oblivious-carry-runway adder equals ONE segment's parallel depth, INDEPENDENT of `k`. The `k` disjoint segments run concurrently, so adding more segments never increases the depth. (Achieved via SHIFT-INVARIANCE: `parallelDepth_segAdd_const`, every segment has the same depth as the base-`0` segment.)

FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth.Scheduler

FormalRV/Arithmetic/ObliviousRunwayAdder/ParallelDepth/Scheduler.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth.Scheduler ────────────────────────────────────────────────────── Submodule of `ParallelDepth` (split out for per-file compile memory). Contains §1–§5: the gate support `supp`, the ASAP scheduler (`tick`/`sched`), `maxOver` algebra, the headline disjoint-`seq` law `parallelDepth_seq_disjoint`, and the sanity bound `parallelDepth_le_depth`. Re-exported VERBATIM from the original `ParallelDepth.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

defsupp

def supp : Gate → List Nat
  | Gate.I          => []
  | Gate.X q        => [q]
  | Gate.CX a b     => [a, b]
  | Gate.CCX a b c  => [a, b, c]
  | Gate.seq g₁ g₂  => supp g₁ ++ supp g₂

The (syntactic) support of a gate: the list of qubits it touches.

deftick

def tick (qs : List Nat) (s : Nat → Nat) : Nat → Nat

Advance the ready-time map `s` by scheduling ONE gate acting on `qs`: it runs at `τ = 1 + max_{q∈qs} s q`, after which every `q ∈ qs` is ready at `τ`; all other qubits keep their old ready-time.

defsched

def sched : Gate → (Nat → Nat) → (Nat → Nat)
  | Gate.I,          s => s
  | Gate.X q,        s => tick [q] s
  | Gate.CX a b,     s => tick [a, b] s
  | Gate.CCX a b c,  s => tick [a, b, c] s
  | Gate.seq g₁ g₂,  s => sched g₂ (sched g₁ s)

ASAP schedule: thread the ready-time map through the gate list.

defmaxOver

def maxOver (qs : List Nat) (f : Nat → Nat) : Nat

`max` of `f` over a qubit list (fold, identity `0`).

defparallelDepth

def parallelDepth (g : Gate) : Nat

*ASAP critical-path depth.** Schedule from the all-zero ready map, then take the max finish-time over the gate's support.

theoremtick_frame

theorem tick_frame (qs : List Nat) (s : Nat → Nat) (x : Nat) (hx : x ∉ qs) :
    tick qs s x = s x

Off its qubit set, `tick` leaves the ready-time unchanged.

theoremfoldl_max_ge_init

theorem foldl_max_ge_init (qs : List Nat) (s : Nat → Nat) (init : Nat) :
    init ≤ qs.foldl (fun m q => max m (s q)) init

A `foldl`-max is at least its seed.

theoremfoldl_max_ge_mem

theorem foldl_max_ge_mem (qs : List Nat) (s : Nat → Nat) (x : Nat)
    (hx : x ∈ qs) :
    ∀ init : Nat, s x ≤ qs.foldl (fun m q => max m (s q)) init

The `foldl`-max over `qs` dominates `s x` for every member `x ∈ qs` (with any seed `init`).

theoremtick_mono

theorem tick_mono (qs : List Nat) (s : Nat → Nat) (x : Nat) :
    s x ≤ tick qs s x

`tick` only increases ready-times.

theoremsched_mono

theorem sched_mono (g : Gate) (s : Nat → Nat) (x : Nat) :
    s x ≤ sched g s x

`sched` only increases ready-times.

theoremsched_frame

theorem sched_frame (g : Gate) (s : Nat → Nat) (x : Nat) (hx : x ∉ supp g) :
    sched g s x = s x

Off the support of `g`, `sched g` leaves the ready-time unchanged.

theoremtick_local

theorem tick_local (qs : List Nat) (s s' : Nat → Nat)
    (h : ∀ q, q ∈ qs → s q = s' q) (x : Nat) (hx : x ∈ qs) :
    tick qs s x = tick qs s' x

`tick` is LOCAL: its action on a support member depends only on the values of `s` over the qubit set `qs`.

theoremsched_local

theorem sched_local (g : Gate) (s s' : Nat → Nat)
    (h : ∀ q, q ∈ supp g → s q = s' q) :
    ∀ q, q ∈ supp g → sched g s q = sched g s' q

`sched g` is LOCAL on `supp g`: its action on a support qubit depends only on the input ready-times restricted to `supp g`.

theoremmaxOver_append

theorem maxOver_append (l₁ l₂ : List Nat) (f : Nat → Nat) :
    maxOver (l₁ ++ l₂) f = max (maxOver l₁ f) (maxOver l₂ f)

`maxOver` over an append splits as a `max`.

theoremmaxOver_congr

theorem maxOver_congr (l : List Nat) (f g : Nat → Nat)
    (h : ∀ q, q ∈ l → f q = g q) :
    maxOver l f = maxOver l g

`maxOver` only depends on `f`'s values over the list.

theoremparallelDepth_seq_disjoint

theorem parallelDepth_seq_disjoint (g₁ g₂ : Gate)
    (hdisj : ∀ x, x ∈ supp g₁ → x ∉ supp g₂) :
    parallelDepth (Gate.seq g₁ g₂)
      = max (parallelDepth g₁) (parallelDepth g₂)

*`parallelDepth` of a `seq` of qubit-DISJOINT gates is the `max`, not the sum.** Two gates touching no common qubit do not delay each other, so the sequential composition's ASAP depth is the larger of the two — the structural fact that underlies every parallel-depth win.

example(example)

example : parallelDepth (Gate.seq (Gate.X 0) (Gate.X 0)) = 2

example(example)

example : parallelDepth (Gate.seq (Gate.X 0) (Gate.X 1)) = 1

theoremmaxOver_mono

theorem maxOver_mono (l : List Nat) (f g : Nat → Nat)
    (h : ∀ q, q ∈ l → f q ≤ g q) :
    maxOver l f ≤ maxOver l g

`maxOver` of a `≤`-dominated function is `≤` the dominating `maxOver`.

theoremfoldl_max_le

theorem foldl_max_le (qs : List Nat) (s : Nat → Nat) (B : Nat)
    (hq : ∀ q, q ∈ qs → s q ≤ B) :
    ∀ init, init ≤ B → qs.foldl (fun m q => max m (s q)) init ≤ B

A `foldl`-max over `qs` is bounded by `B` once the seed and every `s q` are.

theoremtick_le_of_bound

theorem tick_le_of_bound (qs : List Nat) (s : Nat → Nat) (B : Nat)
    (hB : ∀ y, s y ≤ B) (x : Nat) :
    tick qs s x ≤ B + 1

`tick` finish-time everywhere is bounded by `B + 1` once `s` is `≤ B` everywhere.

theoremsched_le_of_bound

theorem sched_le_of_bound (g : Gate) :
    ∀ (s : Nat → Nat) (B : Nat), (∀ y, s y ≤ B) → ∀ x, sched g s x ≤ B + Gate.depth g

*Uniform-bound scheduling bound.** If every ready-time in `s` is `≤ B`, then every scheduled finish-time is `≤ B + Gate.depth g`.

theoremparallelDepth_le_depth

theorem parallelDepth_le_depth (g : Gate) : parallelDepth g ≤ Gate.depth g

*`parallelDepth ≤ Gate.depth`.** The achievable parallel (ASAP) depth never exceeds the sequential gate count — `parallelDepth` is a genuine depth.

FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth.Shift

FormalRV/Arithmetic/ObliviousRunwayAdder/ParallelDepth/Shift.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.ParallelDepth.Shift ────────────────────────────────────────────────────── Submodule of `ParallelDepth` (split out for per-file compile memory). Contains §8–§9: shift-invariance of `parallelDepth` (`shiftGate` … `parallelDepth_shiftGate`) and the base-independence of the Cuccaro circuit's parallel depth (`shiftGate_shiftGate` … `cuccaro_circuit_shift`, `parallelDepth_cuccaro_base`, `parallelDepth_segAdd_const`). Re-exported VERBATIM from the original `ParallelDepth.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

defshiftGate

def shiftGate (d : Nat) : Gate → Gate
  | Gate.I          => Gate.I
  | Gate.X q        => Gate.X (q + d)
  | Gate.CX a b     => Gate.CX (a + d) (b + d)
  | Gate.CCX a b c  => Gate.CCX (a + d) (b + d) (c + d)
  | Gate.seq g₁ g₂  => Gate.seq (shiftGate d g₁) (shiftGate d g₂)

Relabel every qubit of `g` by `· + d`.

theoremsupp_shiftGate

theorem supp_shiftGate (d : Nat) (g : Gate) :
    supp (shiftGate d g) = (supp g).map (· + d)

`supp` commutes with shifting: `supp (shiftGate d g) = (supp g).map (· + d)`.

theoremmem_map_add_iff

theorem mem_map_add_iff (d x : Nat) (qs : List Nat) :
    x + d ∈ qs.map (· + d) ↔ x ∈ qs

`x + d ∈ map (·+d) qs ↔ x ∈ qs`.

theoremfoldl_max_map_add

theorem foldl_max_map_add (d : Nat) (s : Nat → Nat) (qs : List Nat) :
    ∀ init, (qs.map (· + d)).foldl (fun m q => max m (s q)) init
      = qs.foldl (fun m q => max m (s (q + d))) init

A `foldl`-max over the shifted list of `s` equals the `foldl`-max over the original list of `s ∘ (·+d)`.

theoremtick_shift

theorem tick_shift (d : Nat) (qs : List Nat) (s : Nat → Nat) (x : Nat) :
    tick (qs.map (· + d)) s (x + d) = tick qs (fun y => s (y + d)) x

*`tick` shift law.** `tick (map (·+d) qs) s (x+d) = tick qs (s∘(·+d)) x`.

theoremsched_shift

theorem sched_shift (d : Nat) (g : Gate) :
    ∀ (s : Nat → Nat) (x : Nat),
      sched (shiftGate d g) s (x + d) = sched g (fun y => s (y + d)) x

*`sched` shift law.** Scheduling the shifted gate at a shifted position equals scheduling the original gate (with a shifted initial map).

theoremmaxOver_map_add

theorem maxOver_map_add (d : Nat) (f : Nat → Nat) (l : List Nat) :
    maxOver (l.map (· + d)) f = maxOver l (fun y => f (y + d))

`maxOver` over a shifted list of `f` = `maxOver` over the original of `f ∘ (·+d)`.

theoremparallelDepth_shiftGate

theorem parallelDepth_shiftGate (d : Nat) (g : Gate) :
    parallelDepth (shiftGate d g) = parallelDepth g

*`parallelDepth` is shift-invariant.** A uniform relabelling `q ↦ q + d` does not change the ASAP parallel depth.

theoremshiftGate_shiftGate

theorem shiftGate_shiftGate (d₁ d₂ : Nat) (g : Gate) :
    shiftGate d₁ (shiftGate d₂ g) = shiftGate (d₂ + d₁) g

Shifting composes: `shiftGate d₁ (shiftGate d₂ g) = shiftGate (d₂ + d₁) g`.

theoremcuccaro_maj_chain_shift

theorem cuccaro_maj_chain_shift (n : Nat) :
    ∀ q, cuccaro_maj_chain n q = shiftGate q (cuccaro_maj_chain n 0)

The MAJ chain at base `q` is the base-`0` chain relabelled by `· + q`.

theoremcuccaro_uma_chain_reverse_shift

theorem cuccaro_uma_chain_reverse_shift (n : Nat) :
    ∀ q, cuccaro_uma_chain_reverse n q = shiftGate q (cuccaro_uma_chain_reverse n 0)

The reverse UMA chain at base `q` is the base-`0` chain relabelled by `· + q`.

theoremcuccaro_circuit_shift

theorem cuccaro_circuit_shift (n q : Nat) :
    cuccaroAdder.circuit n q = shiftGate q (cuccaroAdder.circuit n 0)

*The Cuccaro `n`-bit adder at base `q` is the base-`0` adder shifted by `q`.**

theoremparallelDepth_cuccaro_base

theorem parallelDepth_cuccaro_base (n q : Nat) :
    parallelDepth (cuccaroAdder.circuit n q) = parallelDepth (cuccaroAdder.circuit n 0)

*Parallel depth of a Cuccaro add is base-independent.**

theoremparallelDepth_segAdd_const

theorem parallelDepth_segAdd_const (gSep m : Nat) :
    parallelDepth (segAdd gSep m)
      = parallelDepth (cuccaroAdder.circuit (gSep + 1) (segBase gSep 0))

Every segment's parallel depth equals the base-`0` segment's: all segments are width-`(gSep+1)` Cuccaro adds, only their base differs, and parallel depth is base-independent.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderAdvance

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderAdvance.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderAdvance ─────────────────────────────────────────────── The k-segment oblivious-runway adder's RUNWAY OCCUPANCY EQUALS THE ACTUAL DEFERRED CARRIES — a genuine circuit property, wired to `applyNat`. The committed `RunwayAdderFunctional.lean` proves `runwayAddK_advance : kRunwayOccupancy … ≤ k`, but that bound is STRUCTURALLY TRIVIAL (a sum of `k` single-bit runways is ≤ k for ANY state — the `applyNat` is decorative). This file UPGRADES that bound into a real theorem about the circuit's action. For each segment `m < k`, after running `runwayAddK gSep k` on a clean input, the runway bit of segment `m` holds EXACTLY the genuine carry-out of that segment's `gSep`-bit add: segRunway gSep m (applyNat (runwayAddK gSep k) f) = (a_m + b_m) / 2^gSep (lemma #4) where `a_m`, `b_m` are the segment's `gSep`-bit augend/addend reads. Summing over the segments gives the OCCUPANCY = CARRY-SUM equality: kRunwayOccupancy gSep k (applyNat (runwayAddK gSep k) f) = kCarrySum gSep k f (lemma #5) and then `runwayAddK_advance_genuine : occupancy ≤ k` is re-proved THROUGH this carry equality (each `(a_m+b_m)/2^gSep ≤ 1` because the runway is one bit) — the occupancy is the REAL deferred-carry count, not just "we built k runways." Everything is `Gate.applyNat (runwayAddK gSep k) f` read through the concrete `def`s of `RunwayAdderFunctional`; the RHS of each headline is a CONCRETE carry expression `(a_m + b_m) / 2^gSep`, never a free field. No `sorry`, no `native_decide`, no axioms beyond the prelude. WHAT IS STILL OPEN (carried over from `RunwayAdderFunctional`): the MULTI-ADD accumulation — how the deferred carries fold/accumulate across a SEQUENCE of oblivious adds — and the connection between the runway-interspersed encoding and the contiguous coset value. This file establishes the single-add per-segment carry-occupancy equality only.

theoremsegReg_eq_data_add_runway

theorem segReg_eq_data_add_runway (gSep k : Nat) (f : Nat → Bool) :
    segReg gSep k f
      = decodeReg (cuccaroAdder.augendIdx (segBase gSep k)) gSep f
          + 2 ^ gSep * segRunway gSep k f

*Split lemma.** Segment `k`'s `(gSep+1)`-bit register splits as its low `gSep`-bit data plus the runway bit at place `2^gSep`.

theoremsegRunway_segAdd_eq_carry

theorem segRunway_segAdd_eq_carry (gSep k : Nat) (g : Nat → Bool)
    (hAnc : g (segBase gSep k) = false)
    (hRun : g (cuccaroAdder.augendIdx (segBase gSep k) gSep) = false)
    (hAddTop : g (cuccaroAdder.addendIdx (segBase gSep k) gSep) = false) :
    segRunway gSep k (Gate.applyNat (segAdd gSep k) g)
      = (decodeReg (cuccaroAdder.augendIdx (segBase gSep k)) gSep g
          + decodeReg (cuccaroAdder.addendIdx (segBase gSep k)) gSep g) / 2 ^ gSep

*Per-segment runway = carry.** Running segment `k`'s width-`(gSep+1)` add on a state `g` with its carry-in ancilla, runway bit, and addend-top bit clear, the runway bit holds EXACTLY the genuine carry-out `(a + b) / 2^gSep` of the segment's `gSep`-bit operands. Wired to `Gate.applyNat (segAdd gSep k) g`.

theoremsegAdd_fixes_runway_below

theorem segAdd_fixes_runway_below (gSep j m : Nat) (g : Nat → Bool) (hm : m < j) :
    Gate.applyNat (segAdd gSep j) g (cuccaroAdder.augendIdx (segBase gSep m) gSep)
      = g (cuccaroAdder.augendIdx (segBase gSep m) gSep)

*A higher segment's add fixes a lower segment's runway position.** For `m < j`, segment `m`'s runway sits at `segBase m + 2gSep+1`, which is below `segBase (m+1) = segBase m + (2gSep+3) ≤ segBase j` — outside segment `j`'s block `[segBase j, segBase j + (2gSep+3))`.

theoremsegRunway_runwayAddK_eq

theorem segRunway_runwayAddK_eq (gSep : Nat) :
    ∀ (k : Nat) (f : Nat → Bool), kClean gSep k f → ∀ (m : Nat), m < k →
      segRunway gSep m (Gate.applyNat (runwayAddK gSep k) f)
        = (decodeReg (cuccaroAdder.augendIdx (segBase gSep m)) gSep f
            + decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep f) / 2 ^ gSep

*MAIN.** After running the full `k`-segment runway adder on a clean input, EACH segment `m < k`'s runway bit holds EXACTLY that segment's genuine carry-out `(a_m + b_m) / 2^gSep`. Proved by induction on `k`; literally about `Gate.applyNat (runwayAddK gSep k) f` and a CONCRETE carry RHS.

defkCarrySum

def kCarrySum (gSep : Nat) : Nat → (Nat → Bool) → Nat
  | 0, _ => 0
  | k + 1, f =>
      kCarrySum gSep k f
        + (decodeReg (cuccaroAdder.augendIdx (segBase gSep k)) gSep f
            + decodeReg (cuccaroAdder.addendIdx (segBase gSep k)) gSep f) / 2 ^ gSep

Total deferred-carry value across the `k` segments: `Σ_{j<k} (a_j+b_j)/2^gSep`, reading each segment's `gSep`-bit augend/addend off `f` at the same `segBase` positions used by `kRunwayOccupancy` / `segRunway`.

theoremrunwayAddK_occupancy_eq_carries

theorem runwayAddK_occupancy_eq_carries (gSep : Nat) :
    ∀ (k : Nat) (f : Nat → Bool), kClean gSep k f →
      kRunwayOccupancy gSep k (Gate.applyNat (runwayAddK gSep k) f)
        = kCarrySum gSep k f

*Occupancy = carry-sum.** The total runway occupancy after the full `k`-segment runway add equals the total genuine deferred-carry sum — every runway holds its segment's real carry-out (folding the MAIN lemma over all segments). Wired to `Gate.applyNat (runwayAddK gSep k) f`.

theoremsegCarry_le_one

theorem segCarry_le_one (gSep j : Nat) (f : Nat → Bool) :
    (decodeReg (cuccaroAdder.augendIdx (segBase gSep j)) gSep f
      + decodeReg (cuccaroAdder.addendIdx (segBase gSep j)) gSep f) / 2 ^ gSep ≤ 1

A single segment's deferred carry is at most one bit: `a + b < 2^(gSep+1)`, so `(a + b) / 2^gSep ≤ 1`. This is what makes each runway a genuine 0/1 carry.

theoremkCarrySum_le

theorem kCarrySum_le (gSep : Nat) :
    ∀ (k : Nat) (f : Nat → Bool), kCarrySum gSep k f ≤ k

The total deferred-carry sum is at most `k` — NON-trivially: it is a sum of `k` genuine 0/1 carries (`segCarry_le_one`), not an abstract bit count.

theoremrunwayAddK_advance_genuine

theorem runwayAddK_advance_genuine (gSep k : Nat) (f : Nat → Bool)
    (hclean : kClean gSep k f) :
    kRunwayOccupancy gSep k (Gate.applyNat (runwayAddK gSep k) f) ≤ k

*THE ADVANCE BOUND, RE-PROVED AS A GENUINE CIRCUIT PROPERTY.** The total runway occupancy after the full `k`-segment runway add is `≤ k` — and now this is proved THROUGH the carry equality (`runwayAddK_occupancy_eq_carries` + `kCarrySum_le`): the occupancy IS the real deferred-carry count, each runway holding its segment's genuine 0/1 carry-out. Contrast the structurally-trivial bit-count proof of `RunwayAdderFunctional.runwayAddK_advance`, which holds for any state and never inspects the circuit's action.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderContiguous

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderContiguous.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderContiguous ────────────────────────────────────────────────── CLOSING THE "runway-interspersed encoding ↔ contiguous integer value" GAP. The committed `RunwayAdderFunctional`/`RunwayAdderAdvance` prove that the k-segment oblivious-carry-runway adder leaves each segment `m`'s width-`(gSep+1)` register holding EXACTLY `a_m + b_m` (data + the deferred carry, in the runway bit at place `2^gSep` of that register). Those files read each segment's register at the SPREAD place value `2^(m·(gSep+1))`, which keeps every runway carry "parked" — the carries are deferred, not folded. THE KEY INSIGHT OF THIS FILE. Re-read those same registers at CONTIGUOUS place value `2^(m·gSep)` (NOT the spread `2^(m·(gSep+1))`). Then segment `m`'s runway carry, which lives at the segment-internal bit `2^gSep`, lands at GLOBAL place `2^(m·gSep) · 2^gSep = 2^((m+1)·gSep)` — i.e. EXACTLY segment `(m+1)`'s low place. So reading at contiguous spacing performs the inter-segment carry FOLD FOR FREE, by place value alone: Σ_{m<k} segReg_m(output) · 2^(m·gSep) = Σ_{m<k} (a_m + b_m) · 2^(m·gSep) = contiguousAugend f + contiguousAddend f. Once each `segReg_m(output) = a_m + b_m` is in hand (the MAIN lemma below, which MIRRORS `RunwayAdderAdvance.segRunway_runwayAddK_eq` but for the FULL register instead of just the runway bit), the headline is pure place-value algebra — no division, no truncation, no overflow precondition. THE HEADLINE (wired to `applyNat`, concrete recursive RHS, NOT free fields): contiguousDecode gSep k (Gate.applyNat (runwayAddK gSep k) f) = contiguousAugend gSep k f + contiguousAddend gSep k f. Everything is `Gate.applyNat (runwayAddK gSep k) f` read through concrete `def`s; no `sorry`, no `native_decide`, no axioms beyond the prelude. WHAT IS STILL OPEN (carried over). The multi-add SEQUENCE: between successive oblivious adds the runways must be folded/cleared for `kClean` to hold again, so the deferred carries accumulate across adds — the value-advance `Δ` relevant to the wrap/deviation bound. And the probabilistic wrap bound itself. This file closes ONLY the single-add encoding↔contiguous-value connection: at contiguous spacing the runway register decodes to exactly `contiguous(a) + contiguous(b)`, the inter-segment carry-fold performed implicitly by place value.

theoremsegAdd_fixes_segReg_below

theorem segAdd_fixes_segReg_below (gSep j m : Nat) (g : Nat → Bool) (hm : m < j) :
    decodeReg (cuccaroAdder.augendIdx (segBase gSep m)) (gSep + 1)
        (Gate.applyNat (segAdd gSep j) g)
      = decodeReg (cuccaroAdder.augendIdx (segBase gSep m)) (gSep + 1) g

*Frame lemma (FULL register).** For `m < j`, running segment `j`'s width-`(gSep+1)` add leaves segment `m`'s WHOLE `(gSep+1)`-bit register unchanged: every position `segBase m + 2i+1` (i ≤ gSep) is `< segBase j`, hence below segment `j`'s block `[segBase j, segBase j + (2gSep+3))`. Mirrors `RunwayAdderAdvance.segAdd_fixes_runway_below` but for the entire register, not just the runway bit.

theoremsegReg_runwayAddK_eq

theorem segReg_runwayAddK_eq (gSep : Nat) :
    ∀ (k : Nat) (f : Nat → Bool), kClean gSep k f → ∀ (m : Nat), m < k →
      segReg gSep m (Gate.applyNat (runwayAddK gSep k) f)
        = decodeReg (cuccaroAdder.augendIdx (segBase gSep m)) gSep f
            + decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep f

*MAIN.** After running the full `k`-segment runway adder on a clean input, EACH segment `m < k`'s width-`(gSep+1)` register holds EXACTLY that segment's `a_m + b_m` (the gSep-bit augend read + the gSep-bit addend read), the carry deposited in the runway bit, NO truncation. Proved by induction on `k`, MIRRORING `RunwayAdderAdvance.segRunway_runwayAddK_eq` but for the full register (`segReg`) instead of the runway bit — no division. Literally about `Gate.applyNat (runwayAddK gSep k) f` with a CONCRETE `a_m + b_m` RHS.

defcontiguousDecode

def contiguousDecode (gSep : Nat) : Nat → (Nat → Bool) → Nat
  | 0, _ => 0
  | k + 1, f => contiguousDecode gSep k f + segReg gSep k f * 2 ^ (k * gSep)

*Contiguous place-value decode.** Read each segment's width-`(gSep+1)` register at the CONTIGUOUS place `2^(m·gSep)` (NOT the spread `2^(m·(gSep+1))`). Each segment's runway carry, internal place `2^gSep`, then lands at global place `2^((m+1)·gSep)` — segment `(m+1)`'s low place — so the inter-segment carry-fold is performed implicitly by place value.

defcontiguousAugend

def contiguousAugend (gSep : Nat) : Nat → (Nat → Bool) → Nat
  | 0, _ => 0
  | k + 1, f =>
      contiguousAugend gSep k f
        + decodeReg (cuccaroAdder.augendIdx (segBase gSep k)) gSep f * 2 ^ (k * gSep)

Contiguous augend value: `Σ_{m<k} a_m · 2^(m·gSep)`, reading each segment's `gSep`-bit augend register at contiguous spacing.

defcontiguousAddend

def contiguousAddend (gSep : Nat) : Nat → (Nat → Bool) → Nat
  | 0, _ => 0
  | k + 1, f =>
      contiguousAddend gSep k f
        + decodeReg (cuccaroAdder.addendIdx (segBase gSep k)) gSep f * 2 ^ (k * gSep)

Contiguous addend value: `Σ_{m<k} b_m · 2^(m·gSep)`.

theoremcontiguousDecode_congr

theorem contiguousDecode_congr (gSep : Nat) :
    ∀ (k : Nat) (f g : Nat → Bool),
      (∀ p, p < segBase gSep k → f p = g p) →
      contiguousDecode gSep k f = contiguousDecode gSep k g

`contiguousDecode gSep k` depends only on the state at positions `< segBase gSep k`: each of its segment registers `m < k` sits entirely below `segBase gSep k`. (Folds the FULL-register frame over the `k` summands.)

theoremrunwayAddK_contiguous

theorem runwayAddK_contiguous (gSep : Nat) :
    ∀ (k : Nat) (f : Nat → Bool), kClean gSep k f →
      contiguousDecode gSep k (Gate.applyNat (runwayAddK gSep k) f)
        = contiguousAugend gSep k f + contiguousAddend gSep k f

*HEADLINE — contiguous correctness.** Reading the runway adder's output at CONTIGUOUS place value yields EXACTLY `contiguous(a) + contiguous(b)`: the inter-segment carry-fold is done implicitly by place value. Proved by induction on `k`; the top segment is rewritten via the MAIN lemma (at `m = k`), the lower segments stay (the full-register frame, folded through `contiguousDecode_congr`) and reduce to the IH on `Gate.applyNat (runwayAddK gSep k) f`; then `ring`. Literally about `Gate.applyNat (runwayAddK gSep k) f` with a CONCRETE contiguous-sum RHS.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderFunctional.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional ────────────────────────────────────────────────── Thin re-export shim. The genuinely functional oblivious-carry-runway adder (formerly a single 754-line file) has been split into four submodules — for per-file compile memory only — each keeping the SAME module namespace `FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional`. Every declaration, statement, name and proof is preserved VERBATIM: • `…RunwayAdderFunctional.TwoSegment` — §1–§10: the k = 2 one-runway adder (layout, circuit, concrete decodes, operand values, well-typedness, `decodeReg` helpers, cross-block frame lemmas, per-segment exactness, the headline `runwayAdd2_exact`, and the k = 2 advance bound). • `…RunwayAdderFunctional.KSegment` — §11–§13: the uniform k-segment adder (`runwayAddK`, k-segment decodes/operands, k-segment frame lemmas, and the headline `runwayAddK_exact`). • `…RunwayAdderFunctional.Advance` — §14: the k-segment advance bound. • `…RunwayAdderFunctional.WellTyped` — §15: k-segment well-typedness. Importing `RunwayAdderFunctional` re-exports all four, so existing importers are unchanged.

(no documented top-level declarations)

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional.Advance

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderFunctional/Advance.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional.Advance ────────────────────────────────────────────────── Submodule of `RunwayAdderFunctional` (split out for per-file compile memory). Contains §14: the k-segment advance bound — total runway occupancy ≤ k, and its `k = n/gSep` instantiation. Re-exported VERBATIM from the original `RunwayAdderFunctional.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

defkRunwayOccupancy

def kRunwayOccupancy (gSep : Nat) : Nat → (Nat → Bool) → Nat
  | 0, _ => 0
  | k + 1, f => kRunwayOccupancy gSep k f + segRunway gSep k f

Total deferred-carry occupancy across the `k` runways: `Σ_{j<k} segRunway j`.

theoremsegRunway_le_one

theorem segRunway_le_one (gSep j : Nat) (f : Nat → Bool) :
    segRunway gSep j f ≤ 1

Each segment runway holds at most one bit.

theoremrunwayAddK_advance

theorem runwayAddK_advance (gSep k : Nat) (f : Nat → Bool) :
    kRunwayOccupancy gSep k (Gate.applyNat (runwayAddK gSep k) f) ≤ k

*The k-segment runway-occupancy bound.** The total occupancy of the `k` runways is at most `k`. HONEST CAVEAT: this is STRUCTURALLY TRIVIAL — the proof (`key : ∀ j g, …`) holds for ANY state, since occupancy is a sum of `k` single-bit runways; the `Gate.applyNat (runwayAddK gSep k) f` argument is decorative here. It expresses only "we built `k` runways," NOT a non-trivial property of the circuit's action. The genuine circuit content lives in `runwayAddK_exact` (exact sum with carries deferred into the runways) and, for k = 2, `runwayAdd2_runway_eq_carry` (the runway bit IS the real carry).

defnumSegments

def numSegments (n gSep : Nat) : Nat

Number of `gSep`-data segments an `n`-bit register splits into.

theoremrunwayAddK_advance_div

theorem runwayAddK_advance_div (n gSep : Nat) (f : Nat → Bool) :
    kRunwayOccupancy gSep (numSegments n gSep)
        (Gate.applyNat (runwayAddK gSep (numSegments n gSep)) f)
      ≤ n / gSep

*Runway count at `k = n/gSep`.** Instantiating the occupancy bound at `k = n/gSep` segments gives `≤ n/gSep`. HONEST CAVEAT (do NOT overstate, cf. `runwayAddK_advance`): this is the STRUCTURAL "there are `n/gSep` runways, each ≤ 1 bit" fact — true for any state, `applyNat` decorative. It is NOT the deviation-relevant `Δ`. The paper's `Δ = n/g_sep` advance is about the deferred-carry VALUE accumulating over a SEQUENCE of additions relative to the coset padding — a multi-add invariant this single-add count does NOT establish. What IS genuinely circuit-wired is `runwayAddK_exact` (this adder computes the exact sum, carries deferred into the runways, in the runway-interspersed encoding). Connecting that encoding to the contiguous coset value, and the multi-add value-advance, remain open.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional.KSegment

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderFunctional/KSegment.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional.KSegment ────────────────────────────────────────────────── Submodule of `RunwayAdderFunctional` (split out for per-file compile memory). Contains §11–§13: the generalization to `k` uniform runway segments — the `runwayAddK` circuit, the k-segment decodes/operand values, the k-segment frame lemmas, and the headline `runwayAddK_exact`. Re-exported VERBATIM from the original `RunwayAdderFunctional.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

defsegStride

def segStride (gSep : Nat) : Nat

Qubits reserved per segment: the width-`(gSep+1)` Cuccaro span `2·gSep+3` (the `+1` augend bit is the runway).

defsegBase

def segBase (gSep j : Nat) : Nat

Base qubit of segment `j`.

defsegAdd

def segAdd (gSep j : Nat) : Gate

Segment `j`'s width-`(gSep+1)` Cuccaro add (runway = its top augend bit).

defrunwayAddK

def runwayAddK (gSep : Nat) : Nat → Gate
  | 0 => Gate.I
  | k + 1 => Gate.seq (runwayAddK gSep k) (segAdd gSep k)

*The uniform k-segment oblivious-carry-runway adder.** Segments added low-to-high; segment `k` is applied last (outermost).

defsegReg

def segReg (gSep j : Nat) (f : Nat → Bool) : Nat

Segment `j`'s `(gSep+1)`-bit register value (its data + runway).

defsegRunway

def segRunway (gSep j : Nat) (f : Nat → Bool) : Nat

Segment `j`'s runway bit (its top augend bit).

defkDecode

def kDecode (gSep : Nat) : Nat → (Nat → Bool) → Nat
  | 0, _ => 0
  | k + 1, f => kDecode gSep k f + segReg gSep k f * 2 ^ (k * (gSep + 1))

*k-segment place-value decode** (concrete): `Σ_{j<k} segReg j · 2^(j·(gSep+1))`.

defkAugend

def kAugend (gSep : Nat) : Nat → (Nat → Bool) → Nat
  | 0, _ => 0
  | k + 1, f =>
      kAugend gSep k f
        + decodeReg (cuccaroAdder.augendIdx (segBase gSep k)) gSep f
            * 2 ^ (k * (gSep + 1))

k-segment input augend value: `Σ_{j<k} a_j · 2^(j·(gSep+1))`, reading the `gSep` data bits of each segment's augend register (runways pre-cleared).

defkAddend

def kAddend (gSep : Nat) : Nat → (Nat → Bool) → Nat
  | 0, _ => 0
  | k + 1, f =>
      kAddend gSep k f
        + decodeReg (cuccaroAdder.addendIdx (segBase gSep k)) gSep f
            * 2 ^ (k * (gSep + 1))

k-segment input addend value: `Σ_{j<k} b_j · 2^(j·(gSep+1))`.

theoremsegAdd_fixes_below

theorem segAdd_fixes_below (gSep k : Nat) (f : Nat → Bool) (p : Nat)
    (hp : p < segBase gSep k) :
    Gate.applyNat (segAdd gSep k) f p = f p

Segment `k`'s add fixes every position strictly below its base — lower segments are untouched by a higher segment's add.

theoremrunwayAddK_fixes_above

theorem runwayAddK_fixes_above (gSep : Nat) :
    ∀ (k : Nat) (f : Nat → Bool) (p : Nat), segBase gSep k ≤ p →
      Gate.applyNat (runwayAddK gSep k) f p = f p

`runwayAddK gSep k` (segments `0…k-1`) fixes every position at or above `segBase gSep k`: each lower segment `j < k` lives in `[segBase j, (j+1)·stride)` and `(j+1)·stride ≤ k·stride = segBase k ≤ p`.

theoremsegReg_segAdd_exact_base

theorem segReg_segAdd_exact_base (gSep q : Nat) (f : Nat → Bool)
    (hAnc : f q = false)
    (hRunway0 : f (cuccaroAdder.augendIdx q gSep) = false)
    (hAddTop : f (cuccaroAdder.addendIdx q gSep) = false) :
    decodeReg (cuccaroAdder.augendIdx q) (gSep + 1)
        (Gate.applyNat (cuccaroAdder.circuit (gSep + 1) q) f)
      = decodeReg (cuccaroAdder.augendIdx q) gSep f
          + decodeReg (cuccaroAdder.addendIdx q) gSep f

*Per-segment exactness at an arbitrary base** (generalizes `lowReg_lowAdd_exact` to base `q`). A width-`(gSep+1)` Cuccaro at base `q`, run with its carry-in clean, runway pre-cleared, and addend top bit clear, leaves its `(gSep+1)`-bit register holding EXACTLY `a + b` (the carry-out is deposited in the runway, no truncation).

theoremkDecode_congr

theorem kDecode_congr (gSep : Nat) :
    ∀ (k : Nat) (f g : Nat → Bool),
      (∀ p, p < segBase gSep k → f p = g p) →
      kDecode gSep k f = kDecode gSep k g

`kDecode gSep k` depends only on the state at positions `< segBase gSep k`: its segment registers `j < k` all sit below `segBase gSep k`.

defkClean

def kClean (gSep k : Nat) (f : Nat → Bool) : Prop

Input cleanliness for the k-segment runway adder: every segment `j < k` has its carry-in ancilla, runway bit, and addend top bit pre-cleared to `0`.

theoremrunwayAddK_exact

theorem runwayAddK_exact (gSep : Nat) :
    ∀ (k : Nat) (f : Nat → Bool), kClean gSep k f →
      kDecode gSep k (Gate.applyNat (runwayAddK gSep k) f)
        = kAugend gSep k f + kAddend gSep k f

*k-segment runway adder, EXACT** (the genuine `Δ = n/g_sep` construction). With every segment pre-cleared (`kClean`), the concrete k-segment place-value decode of the output equals `augend + addend` EXACTLY — NO overflow precondition, because every segment has its own runway absorbing its single carry bit. Proven by induction on `k`; wired to `Gate.applyNat (runwayAddK gSep k) f`.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional.TwoSegment

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderFunctional/TwoSegment.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional.TwoSegment ────────────────────────────────────────────────── Submodule of `RunwayAdderFunctional` (split out for per-file compile memory). Contains §1–§10: the k = 2 (one-runway) oblivious-carry-runway adder — layout, circuit, concrete decodes, operand values, well-typedness, `decodeReg` helpers, cross-block frame lemmas, per-segment exactness, the headline `runwayAdd2_exact`, and the k = 2 advance bound. Re-exported VERBATIM from the original `RunwayAdderFunctional.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

defhighBase

def highBase (gSep : Nat) : Nat

Base qubit of the HIGH Cuccaro block: it sits immediately above the LOW block, whose span is `2·(gSep+1)+1 = 2·gSep+3`.

defrunwayWidth2

def runwayWidth2 (gSep : Nat) : Nat

Total qubit width of the k = 2 runway register: the low width-`(gSep+1)` Cuccaro span plus the high width-`gSep` Cuccaro span.

deflowAdd

def lowAdd (gSep : Nat) : Gate

The LOW segment add: a width-`(gSep+1)` Cuccaro at base `0`.

defhighAdd

def highAdd (gSep : Nat) : Gate

The HIGH segment add: a width-`gSep` Cuccaro at `highBase gSep`.

defrunwayAdd2

def runwayAdd2 (gSep : Nat) : Gate

*The k = 2 oblivious-carry-runway adder.** Low add (carry → runway), then high add (oblivious — runway not folded).

deflowReg

def lowReg (gSep : Nat) (f : Nat → Bool) : Nat

Low augend register value: the `(gSep+1)`-bit running-sum register at the low block (positions `2i+1`, i = 0…gSep). Bits `0…gSep-1` are the low data; bit `gSep` is the runway.

defhighReg

def highReg (gSep : Nat) (f : Nat → Bool) : Nat

High augend register value: the `gSep`-bit running-sum register at the high block (positions `highBase + 2i+1`, i = 0…gSep-1).

defrunwayBit

def runwayBit (gSep : Nat) (f : Nat → Bool) : Nat

The runway bit value (0 or 1): the top augend bit of the low block.

deffullDecode

def fullDecode (gSep : Nat) (f : Nat → Bool) : Nat

*Concrete full-register place-value decode.** The low `(gSep+1)`-bit register at place `2^0` (its top bit, the runway, sits at place `2^gSep`), plus the high `gSep`-bit register at place `2^(gSep+1)`. The gap at place `2^gSep` is exactly where the deferred low-segment carry (the runway) lives.

defaugendValue

def augendValue (gSep : Nat) (f : Nat → Bool) : Nat

Input augend value: low data (`gSep` bits at the low augend register) plus `2^(gSep+1)·` high data (`gSep` bits at the high augend register).

defaddendValue

def addendValue (gSep : Nat) (f : Nat → Bool) : Nat

Input addend value: low addend (`gSep` bits at the low addend register) plus `2^(gSep+1)·` high addend (`gSep` bits at the high addend register).

theoremrunwayAdd2_wellTyped

theorem runwayAdd2_wellTyped (gSep : Nat) :
    Gate.WellTyped (runwayWidth2 gSep) (runwayAdd2 gSep)

*`runwayAdd2` is well-typed** at `runwayWidth2 gSep`. Both Cuccaro blocks fit (the low at `[0, 2·gSep+3)`, the high at `[2·gSep+3, 4·gSep+4)`).

theoremdecodeReg_succ

theorem decodeReg_succ (idx : Nat → Nat) (n : Nat) (f : Nat → Bool) :
    decodeReg idx (n + 1) f
      = decodeReg idx n f + (if f (idx n) then 2 ^ n else 0)

*Append the top bit.** Reading `n+1` register bits = reading the low `n` bits plus the weighted top bit.

theoremdecodeReg_succ_of_top_false

theorem decodeReg_succ_of_top_false (idx : Nat → Nat) (n : Nat) (f : Nat → Bool)
    (h : f (idx n) = false) :
    decodeReg idx (n + 1) f = decodeReg idx n f

*Dropping a clear top bit.** If the top register bit `idx n` is `false`, reading `n+1` bits is the same as reading the low `n` bits.

theoremdecodeReg_lt

theorem decodeReg_lt (idx : Nat → Nat) (n : Nat) (f : Nat → Bool) :
    decodeReg idx n f < 2 ^ n

A register value at width `n` is `< 2^n`.

theoremlowReg_highAdd

theorem lowReg_highAdd (gSep : Nat) (f : Nat → Bool) :
    lowReg gSep (Gate.applyNat (highAdd gSep) f) = lowReg gSep f

The high block leaves the low augend register untouched: every low augend position `2i+1` (i ≤ gSep) is below `highBase`, hence outside the high block.

theoremlowAdd_fixes_high

theorem lowAdd_fixes_high (gSep : Nat) (f : Nat → Bool) (j : Nat) :
    Gate.applyNat (lowAdd gSep) f (highBase gSep + j) = f (highBase gSep + j)

The low block leaves any HIGH-block position untouched: every high position `highBase + j` (`j ≥ 0`) is at least `highBase = span(gSep+1)`, hence outside the low block `[0, span(gSep+1))`. Stated for a general high read index.

theoremlowReg_lowAdd_exact

theorem lowReg_lowAdd_exact (gSep : Nat) (f : Nat → Bool)
    (hLowAnc : f 0 = false)
    (hRunway0 : f (cuccaroAdder.augendIdx 0 gSep) = false)
    (hLowAddTop : f (cuccaroAdder.addendIdx 0 gSep) = false) :
    lowReg gSep (Gate.applyNat (lowAdd gSep) f)
      = decodeReg (cuccaroAdder.augendIdx 0) gSep f
          + decodeReg (cuccaroAdder.addendIdx 0) gSep f

*Low segment exactness (EXACT, with the deferred carry in the runway).** Running the low width-`(gSep+1)` Cuccaro add on an input whose runway bit (low augend bit `gSep`) and low-addend top bit are pre-cleared to `0`, the low `(gSep+1)`-bit register holds EXACTLY `a_lo + b_lo` (no truncation): the low data is `(a_lo+b_lo) mod 2^gSep` and the runway bit is the carry-out `(a_lo+b_lo)/2^gSep`. This is the genuine carry deposit into the runway.

theoremhighReg_highAdd

theorem highReg_highAdd (gSep : Nat) (g : Nat → Bool)
    (hHighAnc : g (highBase gSep) = false) :
    highReg gSep (Gate.applyNat (highAdd gSep) g)
      = (decodeReg (cuccaroAdder.augendIdx (highBase gSep)) gSep g
          + decodeReg (cuccaroAdder.addendIdx (highBase gSep)) gSep g) % 2 ^ gSep

*High segment exactness** (about `applyNat (highAdd) g`). Running the high width-`gSep` Cuccaro add on any state `g` with its carry-in clean, the high register holds `(a_hi + b_hi) mod 2^gSep` read off `g`.

theoremrunwayAdd2_exact

theorem runwayAdd2_exact (gSep : Nat) (f : Nat → Bool)
    (hLowAnc : f 0 = false)
    (hHighAnc : f (highBase gSep) = false)
    (hRunway0 : f (cuccaroAdder.augendIdx 0 gSep) = false)
    (hLowAddTop : f (cuccaroAdder.addendIdx 0 gSep) = false)
    (hHighNoOverflow :
      decodeReg (cuccaroAdder.augendIdx (highBase gSep)) gSep f
        + decodeReg (cuccaroAdder.addendIdx (highBase gSep)) gSep f < 2 ^ gSep) :
    fullDecode gSep (Gate.applyNat (runwayAdd2 gSep) f)
      = augendValue gSep f + addendValue gSep f

*k = 2 oblivious-carry-runway adder, EXACT.** With each segment's carry-in ancilla clean, the runway pre-cleared, the low addend's top bit clear, and the HIGH segment not itself overflowing (`a_hi+b_hi < 2^gSep`; there is no second runway in k = 2 to absorb a high carry), the concrete full-register place-value decode of the runway adder's output equals the true sum EXACTLY: fullDecode (applyNat (runwayAdd2 gSep) f) = augendValue f + addendValue f. The low-segment carry is genuinely CHAINED into the runway bit (place `2^gSep`) and reconstructed by `fullDecode`; the high data occupies place `2^(gSep+1)` and is independent — the carry was DEFERRED into the runway, not propagated. This is real carry arithmetic, not a tautology: every term is a `Gate.applyNat (runwayAdd2 …) f` read through concrete `def` decodes.

theoremlowReg_eq_data_add_runway

theorem lowReg_eq_data_add_runway (gSep : Nat) (f : Nat → Bool) :
    lowReg gSep f
      = decodeReg (cuccaroAdder.augendIdx 0) gSep f + 2 ^ gSep * runwayBit gSep f

The low `(gSep+1)`-bit register splits as low data (`gSep` bits) plus the runway bit at place `2^gSep`.

theoremrunwayBit_runwayAdd2_eq

theorem runwayBit_runwayAdd2_eq (gSep : Nat) (f : Nat → Bool) :
    runwayBit gSep (Gate.applyNat (runwayAdd2 gSep) f)
      = runwayBit gSep (Gate.applyNat (lowAdd gSep) f)

The runway position lies in the LOW block, so the HIGH add fixes it: the output runway bit equals the runway bit right after the LOW add.

theoremrunwayAdd2_runway_eq_carry

theorem runwayAdd2_runway_eq_carry (gSep : Nat) (f : Nat → Bool)
    (hLowAnc : f 0 = false)
    (hRunway0 : f (cuccaroAdder.augendIdx 0 gSep) = false)
    (hLowAddTop : f (cuccaroAdder.addendIdx 0 gSep) = false) :
    runwayBit gSep (Gate.applyNat (runwayAdd2 gSep) f)
      = (decodeReg (cuccaroAdder.augendIdx 0) gSep f
          + decodeReg (cuccaroAdder.addendIdx 0) gSep f) / 2 ^ gSep

*The deferred carry IS the runway bit** (exact). After the runway add, the runway bit holds exactly the genuine carry-out `(a_lo+b_lo)/2^gSep` of the low segment — proving the runway holds the REAL deferred carry, not a free qubit. Wired to `Gate.applyNat (runwayAdd2 gSep) f`.

theoremrunwayAdd2_advance

theorem runwayAdd2_advance (gSep : Nat) (f : Nat → Bool) :
    runwayBit gSep (Gate.applyNat (runwayAdd2 gSep) f) ≤ 1

*THE ADVANCE BOUND (audit-critical), wired to `applyNat`.** After the k = 2 runway add, the deferred-carry occupancy of the runway is at most ONE bit — the single deferred carry of the scheme. Literally about `Gate.applyNat (runwayAdd2 gSep) f`.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional.WellTyped

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderFunctional/WellTyped.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderFunctional.WellTyped ────────────────────────────────────────────────── Submodule of `RunwayAdderFunctional` (split out for per-file compile memory). Contains §15: k-segment well-typedness — `wellTyped_mono`, the k-segment width, and `runwayAddK_wellTyped`. Re-exported VERBATIM from the original `RunwayAdderFunctional.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

theoremwellTyped_mono

theorem wellTyped_mono {dim dim' : Nat} {g : Gate}
    (h : Gate.WellTyped dim g) (hle : dim ≤ dim') : Gate.WellTyped dim' g

*WellTyped monotonicity** (local; enlarging the dimension preserves it). A self-contained copy so this file needs only the Cuccaro import.

defrunwayWidthK

def runwayWidthK (gSep k : Nat) : Nat

Total qubit width of the k-segment runway adder.

theoremrunwayAddK_wellTyped

theorem runwayAddK_wellTyped (gSep : Nat) :
    ∀ (k : Nat), 0 < k → Gate.WellTyped (runwayWidthK gSep k) (runwayAddK gSep k)

*`runwayAddK gSep k` is well-typed** at `runwayWidthK gSep k`. Each segment `j < k` fits in `[segBase j, (j+1)·stride) ⊆ [0, k·stride)`.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderMultiAdd

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderMultiAdd.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderMultiAdd ──────────────────────────────────────────────── Thin re-export shim. CLOSING THE MULTI-ADD GAP for the oblivious-carry-runway adder (formerly a single 599-line file) has been split into three submodules — for per-file compile memory only — each keeping the SAME module namespace `FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderMultiAdd`. Every declaration, statement, name and proof is preserved VERBATIM: • `…RunwayAdderMultiAdd.Preserve` — §1–§5: gate iteration (`iterGate`), the iteration invariant `IterReady`, the cross-segment frame lemmas, the `IterReady`-preservation chain, and addend invariance (`runwayAddK_addend_eq`). • `…RunwayAdderMultiAdd.MultiAddStep` — §6–§8: the per-segment add step engine, iterated preservation over `t` runway adds, and the MAIN per-segment multi-add `runwayAddK_iter_segReg`. • `…RunwayAdderMultiAdd.Headline` — §9–§10: the HEADLINE contiguous multi-add (`runwayAddK_iter_contiguous`) and the standard `a + t·b` corollary `runwayAddK_iter_contiguous_clean`. Iterating the runway adder `t` times against the SAME addend register (restored bit-for-bit each add) accumulates, in the contiguous reading, `contiguousAugend + t·contiguousAddend`, EXACT under per-segment no-overflow. This is the deterministic core of the deferred-accumulation deviation. No `sorry`, no `native_decide`, no axioms beyond the prelude. Importing `RunwayAdderMultiAdd` re-exports all three, so existing importers are unchanged.

(no documented top-level declarations)

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderMultiAdd.Headline

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderMultiAdd/Headline.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderMultiAdd.Headline ──────────────────────────────────────────────── Submodule of `RunwayAdderMultiAdd` (split out for per-file compile memory). Contains §9–§10: the HEADLINE contiguous multi-add exact under no-overflow (`runwayAddK_iter_contiguous`) and the standard `a + t·b` corollary under full input cleanliness (`IterReady_of_kClean` … `runwayAddK_iter_contiguous_clean`). Re-exported VERBATIM from the original `RunwayAdderMultiAdd.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

theoremrunwayAddK_iter_contiguous

theorem runwayAddK_iter_contiguous (gSep k t : Nat) (f : Nat → Bool)
    (hready : IterReady gSep k f)
    (hno : ∀ m, m < k →
      segReg gSep m f
        + t * decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep f
        < 2 ^ (gSep + 1)) :
    contiguousDecode gSep k (Gate.applyNat (iterGate (runwayAddK gSep k) t) f)
      = contiguousDecode gSep k f + t * contiguousAddend gSep k f

*Deliverable #6 — HEADLINE, contiguous multi-add (EXACT under per-segment no-overflow).** Iterating the runway adder `t` times accumulates, in the CONTIGUOUS reading, contiguousDecode gSep k (applyNat (iterGate (runwayAddK gSep k) t) f) = contiguousDecode gSep k f + t · contiguousAddend gSep k f, EXACT, provided each segment's `(gSep+1)`-bit register never overflows over the whole run: `segReg_m f + t·b_m f < 2^(gSep+1)` for `m < k`. The augend term is the contiguous decode of the INPUT (each segment read at its full register, so `a + t·b` in the contiguous place-value reading). Proved by induction on a prefix `j ≤ k`, folding the MAIN per-segment multi-add (#5, mod dropped by no-overflow) into the `contiguousDecode`/`contiguousAddend` place-value recursion; `ring`. Wired to `Gate.applyNat (iterGate (runwayAddK gSep k) t) f` with a concrete `contiguousDecode f + t·contiguousAddend f` RHS.

theoremIterReady_of_kClean

theorem IterReady_of_kClean (gSep k : Nat) (f : Nat → Bool)
    (h : kClean gSep k f) : IterReady gSep k f

`kClean` implies `IterReady` (it additionally clears the runways).

theoremcontiguousDecode_eq_augend_of_kClean

theorem contiguousDecode_eq_augend_of_kClean (gSep : Nat) :
    ∀ (k : Nat) (f : Nat → Bool), kClean gSep k f →
      contiguousDecode gSep k f = contiguousAugend gSep k f

With the input runways clean, the contiguous decode of the input equals the standard contiguous augend (`segReg_m f = a_m`, top runway bit dropped).

theoremrunwayAddK_iter_contiguous_clean

theorem runwayAddK_iter_contiguous_clean (gSep k t : Nat) (f : Nat → Bool)
    (hclean : kClean gSep k f)
    (hno : ∀ m, m < k →
      decodeReg (cuccaroAdder.augendIdx (segBase gSep m)) gSep f
        + t * decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep f
        < 2 ^ (gSep + 1)) :
    contiguousDecode gSep k (Gate.applyNat (iterGate (runwayAddK gSep k) t) f)
      = contiguousAugend gSep k f + t * contiguousAddend gSep k f

*Corollary — standard contiguous multi-add `a + t·b`.** Under the full input cleanliness `kClean` (runways included), iterating the runway adder `t` times accumulates EXACTLY `contiguousAugend + t·contiguousAddend` in the contiguous reading, under per-segment no-overflow. This is the headline in the standard `a + t·b` shape.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderMultiAdd.MultiAddStep

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderMultiAdd/MultiAddStep.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderMultiAdd.MultiAddStep ──────────────────────────────────────────────── Submodule of `RunwayAdderMultiAdd` (split out for per-file compile memory). Contains §6–§8: the per-segment add step engine (`segReg_segAdd_step`, `runwayAddK_step_segReg`), iterated preservation over `t` runway adds (`iterGate_preserves_IterReady`, `iterGate_addend_eq`), and the MAIN per-segment multi-add `runwayAddK_iter_segReg`. Re-exported VERBATIM from the original `RunwayAdderMultiAdd.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

theoremsegReg_segAdd_step

theorem segReg_segAdd_step (gSep m : Nat) (g : Nat → Bool)
    (hAnc : g (segBase gSep m) = false)
    (hAddTop : g (cuccaroAdder.addendIdx (segBase gSep m) gSep) = false) :
    segReg gSep m (Gate.applyNat (segAdd gSep m) g)
      = (segReg gSep m g
          + decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep g) % 2 ^ (gSep + 1)

*Per-segment step engine** (about `applyNat (segAdd gSep m) g`). With only the carry-in clean and the addend top clear, segment `m`'s register advances by its gSep-bit addend, mod `2^(gSep+1)`. Uses `sumCorrect` directly on the full `(gSep+1)`-bit augend — the runway/top augend bit is NOT assumed clean.

theoremrunwayAddK_step_segReg

theorem runwayAddK_step_segReg (gSep : Nat) :
    ∀ (k : Nat) (f : Nat → Bool), IterReady gSep k f → ∀ (m : Nat), m < k →
      segReg gSep m (Gate.applyNat (runwayAddK gSep k) f)
        = (segReg gSep m f
            + decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep f) % 2 ^ (gSep + 1)

*Deliverable #4 — the per-segment step for the FULL runway adder.** Under `IterReady`, running the whole `runwayAddK gSep k` advances segment `m`'s `(gSep+1)`-bit register by its gSep-bit addend, mod `2^(gSep+1)`. Wired to `Gate.applyNat (runwayAddK gSep k) f`, with the addend read off `f`.

theoremiterGate_preserves_IterReady

theorem iterGate_preserves_IterReady (gSep k : Nat) :
    ∀ (t : Nat) (f : Nat → Bool), IterReady gSep k f →
      IterReady gSep k (Gate.applyNat (iterGate (runwayAddK gSep k) t) f)

`IterReady` is preserved by `t` iterations of the runway adder.

theoremiterGate_addend_eq

theorem iterGate_addend_eq (gSep k : Nat) :
    ∀ (t : Nat) (f : Nat → Bool), IterReady gSep k f → ∀ (m : Nat), m < k →
      decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep
          (Gate.applyNat (iterGate (runwayAddK gSep k) t) f)
        = decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep f

The addend register `b_m` is invariant under `t` iterations of the runway adder.

theoremrunwayAddK_iter_segReg

theorem runwayAddK_iter_segReg (gSep k : Nat) :
    ∀ (t : Nat) (f : Nat → Bool), IterReady gSep k f → ∀ (m : Nat), m < k →
      segReg gSep m (Gate.applyNat (iterGate (runwayAddK gSep k) t) f)
        = (segReg gSep m f
            + t * decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep f)
          % 2 ^ (gSep + 1)

*Deliverable #5 — MAIN.** Iterating the runway adder `t` times advances each segment `m`'s `(gSep+1)`-bit register by `t·b_m`, mod `2^(gSep+1)`: segReg_m (applyNat (iterGate (runwayAddK gSep k) t) f) = (segReg_m f + t · b_m f) mod 2^(gSep+1). Induction on `t`: base `t = 0` collapses the mod (`segReg < 2^(gSep+1)`); step uses the per-segment engine (#4) on the `t`-fold state (IterReady preserved, #2) with the addend fixed (#3) plus mod algebra. Wired to `Gate.applyNat (iterGate (runwayAddK gSep k) t) f`, with the concrete `t·b_m` RHS read off `f`.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderMultiAdd.Preserve

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayAdderMultiAdd/Preserve.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayAdderMultiAdd.Preserve ──────────────────────────────────────────────── Submodule of `RunwayAdderMultiAdd` (split out for per-file compile memory). Contains §1–§5: gate iteration (`iterGate`), the iteration invariant (`IterReady`), the cross-segment carry-in/addend-top frame lemmas, the `IterReady`-preservation chain (`segAdd_preserves_IterReady` … `runwayAddK_preserves_IterReady`), and addend invariance (`segAdd_fixes_addend_off` … `runwayAddK_addend_eq`). Re-exported VERBATIM from the original `RunwayAdderMultiAdd.lean`; the declarations, statements, names, namespace and `open`s are unchanged.

defiterGate

def iterGate (g : Gate) : Nat → Gate
  | 0 => Gate.I
  | t + 1 => Gate.seq (iterGate g t) g

`iterGate g t` runs `g` sequentially `t` times. `iterGate g (t+1)` puts the fresh copy LAST, so it unfolds to "run `iterGate g t`, then `g`".

defIterReady

def IterReady (gSep k : Nat) (f : Nat → Bool) : Prop

`iterGate g 0` is the identity on states.

theoremsegAdd_fixes_anc_off

theorem segAdd_fixes_anc_off (gSep j m : Nat) (g : Nat → Bool) (hjm : j ≠ m) :
    Gate.applyNat (segAdd gSep j) g (segBase gSep m) = g (segBase gSep m)

A DIFFERENT segment `j ≠ m`'s add fixes segment `m`'s carry-in ancilla position `segBase m` (disjoint Cuccaro blocks).

theoremsegAdd_fixes_addTop_off

theorem segAdd_fixes_addTop_off (gSep j m : Nat) (g : Nat → Bool) (hjm : j ≠ m) :
    Gate.applyNat (segAdd gSep j) g (cuccaroAdder.addendIdx (segBase gSep m) gSep)
      = g (cuccaroAdder.addendIdx (segBase gSep m) gSep)

A DIFFERENT segment `j ≠ m`'s add fixes segment `m`'s addend-top position `addendIdx (segBase m) gSep = segBase m + 2gSep+2` (disjoint blocks).

theoremsegAdd_restores_anc

theorem segAdd_restores_anc (gSep j : Nat) (g : Nat → Bool)
    (hAnc : g (segBase gSep j) = false) :
    Gate.applyNat (segAdd gSep j) g (segBase gSep j) = false

Segment `j`'s own add restores its carry-in ancilla (`ancRestored`, needs only the carry-in clean before the add).

theoremsegAdd_restores_addTop

theorem segAdd_restores_addTop (gSep j : Nat) (g : Nat → Bool)
    (hAnc : g (segBase gSep j) = false) :
    Gate.applyNat (segAdd gSep j) g (cuccaroAdder.addendIdx (segBase gSep j) gSep)
      = g (cuccaroAdder.addendIdx (segBase gSep j) gSep)

Segment `j`'s own add restores its addend's top bit (`addendRestored` at index `gSep < gSep+1`, needs only the carry-in clean before the add).

theoremsegAdd_preserves_IterReady

theorem segAdd_preserves_IterReady (gSep k j : Nat) (g : Nat → Bool)
    (hjk : j < k) (hready : IterReady gSep k g) :
    IterReady gSep k (Gate.applyNat (segAdd gSep j) g)

*One segment add preserves `IterReady`.** If segment `j`'s carry-in is clean before its add (so `ancRestored`/`addendRestored` apply), then `segAdd gSep j` re-establishes `IterReady` for every segment `m < k`: its own positions are restored, the others are untouched (disjoint blocks).

theoremrunwayAddK_prefix_preserves_IterReady

theorem runwayAddK_prefix_preserves_IterReady (gSep k : Nat) :
    ∀ (j : Nat), j ≤ k → ∀ (f : Nat → Bool), IterReady gSep k f →
      IterReady gSep k (Gate.applyNat (runwayAddK gSep j) f)

*A PREFIX `runwayAddK gSep j` (`j ≤ k`) preserves `IterReady gSep k`.** Each constituent `segAdd i` (`i < j ≤ k`) restores its own carry-in/addend-top and leaves the others untouched (`segAdd_preserves_IterReady`); fold over the `j` segments. This prefix form is what the addend/step inductions need.

theoremrunwayAddK_preserves_IterReady

theorem runwayAddK_preserves_IterReady (gSep : Nat) (k : Nat)
    (f : Nat → Bool) (hready : IterReady gSep k f) :
    IterReady gSep k (Gate.applyNat (runwayAddK gSep k) f)

*`runwayAddK gSep k` preserves `IterReady gSep k`.** So the precondition for the NEXT iteration of the runway adder holds. (The `j = k` case of the prefix lemma.)

theoremsegAdd_fixes_addend_off

theorem segAdd_fixes_addend_off (gSep j m : Nat) (g : Nat → Bool) (hjm : j ≠ m)
    (i : Nat) (hi : i < gSep) :
    Gate.applyNat (segAdd gSep j) g (cuccaroAdder.addendIdx (segBase gSep m) i)
      = g (cuccaroAdder.addendIdx (segBase gSep m) i)

A DIFFERENT segment `j ≠ m`'s add fixes every addend-register position `addendIdx (segBase m) i` (`i < gSep`) of segment `m` (disjoint blocks).

theoremsegAdd_restores_addend

theorem segAdd_restores_addend (gSep j : Nat) (g : Nat → Bool)
    (hAnc : g (segBase gSep j) = false) (i : Nat) (hi : i < gSep) :
    Gate.applyNat (segAdd gSep j) g (cuccaroAdder.addendIdx (segBase gSep j) i)
      = g (cuccaroAdder.addendIdx (segBase gSep j) i)

Segment `j`'s own add restores every addend-register bit `i < gSep` (`addendRestored`, needs only the carry-in clean).

theoremsegAdd_fixes_addend

theorem segAdd_fixes_addend (gSep k j : Nat) (g : Nat → Bool)
    (hjk : j < k) (hready : IterReady gSep k g) (m : Nat) (hm : m < k)
    (i : Nat) (hi : i < gSep) :
    Gate.applyNat (segAdd gSep j) g (cuccaroAdder.addendIdx (segBase gSep m) i)
      = g (cuccaroAdder.addendIdx (segBase gSep m) i)

*One segment add fixes every segment's addend register** (under `IterReady`). Its own register is restored bit-for-bit (`addendRestored`), the others are disjoint (frame).

theoremrunwayAddK_addend_eq

theorem runwayAddK_addend_eq (gSep : Nat) :
    ∀ (k : Nat) (f : Nat → Bool), IterReady gSep k f → ∀ (m : Nat), m < k →
      decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep
          (Gate.applyNat (runwayAddK gSep k) f)
        = decodeReg (cuccaroAdder.addendIdx (segBase gSep m)) gSep f

*Deliverable #3 — addend invariance.** Under `IterReady`, the full runway adder leaves each segment `m`'s `gSep`-bit addend register `b_m` unchanged: each `segAdd j` restores its own register and fixes the others. Wired to `Gate.applyNat (runwayAddK gSep k) f`.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayDeviationFaithful

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayDeviationFaithful.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayDeviationFaithful ──────────────────────────────────────────────────── THE FAITHFUL PER-RUNWAY DEVIATION BOUND for the oblivious-carry-runway scheme. ════════════════════════════════════════════════════════════════════════════ WHAT THIS FILE IS (and how it differs from `WindowedCosetDeviation`) ════════════════════════════════════════════════════════════════════════════ `WindowedCosetDeviation.lean` discharges the coset wrap bound with a VALUE-ADVANCE BAND model: it counts the offsets `j ∈ Ico (2^gpad − numAdds·adv) (2^gpad)` whose running value can grow into the top `numAdds·adv` of the offset window, with `adv = n/g_sep` carried as a hypothesised per-add value growth. That band is honest but DOES NOT MATCH THE CIRCUIT: `adv = n/g_sep` is taken as a value-growth assumption, not read off the runway adder's actual carries. THIS file builds the FAITHFUL model, tied to the ACTUAL circuit's carry sites: • Each of the `k = n/g_sep` runways holds a `g_pad`-bit coset-padding value, uniform over the random coset offset. A DEPOSITED carry (`c = 1`) makes that runway WRAP iff its padding value was all-ones `2^g_pad − 1`. So the per-runway wrap fraction is a genuine COUNTING fraction: perRunwayWrapFrac g_pad = (#offsets that are all-ones) / 2^g_pad = 1 / 2^g_pad (exactly ONE offset). • The number of runways that ACTUALLY carry on a given add is `kRunwayOccupancy (Gate.applyNat (runwayAddK gSep k) f)` — the REAL deferred carries of the runway adder, PROVEN `≤ k` in `RunwayAdderAdvance.runwayAddK_advance_genuine` (occupancy = the genuine carry-sum, each runway holding its segment's real 0/1 carry-out). So the per-add wrap fraction is literally `occupancy / 2^g_pad`, the circuit's own carry count over the padding window — bounded by `k / 2^g_pad`. • Over `numAdds` additions the union bound gives the faithful total `numAdds · k / 2^g_pad` (LITERAL `2^g_pad`), with `k = n/g_sep` the REAL runway count. • Over `numAdds` additions the union bound gives `numAdds·k/D`, where `D` is the per-runway offset space `2^g_pad`. At the PAPER's offset space `D = n²·n_e·1024` this EQUALS `totalDeviation` (§5, `totalWrapFracD_eq_totalDeviation`). ════════════════════════════════════════════════════════════════════════════ RESOLVED — the cost model is FAITHFUL; `n²·n_e·1024` IS the paper's `2^g_pad`. ════════════════════════════════════════════════════════════════════════════ (Earlier this file claimed `n²·n_e·1024` was a non-`2^g_pad` substitution and the models didn't match — THAT WAS WRONG; corrected after reading the cost model's paper citations.) `WindowedCostModel` (l.38-39, 162-164) records the paper's `g_pad = 2·lg n + lg n_e + 10` (main.tex:690) and its EXACT substitution `2^g_pad = 2^{2 lg n + lg n_e + 10} = n²·n_e·1024` (l.751). The identity is exact (`2^{2 lg n}=n²`, `2^{lg n_e}=n_e`, `2^10=1024`); it is merely not a power of two because the paper's `g_pad` is FRACTIONAL (for `n_e = 3072 = 3·1024`, `g_pad = 43.585`) — the paper treats `g_pad` as a continuous quantity in the deviation analysis. So the per-runway union-bound model matches the cost model EXACTLY when its offset space `D` is the paper's `n²·n_e·1024` (§5). TWO honest readings remain (neither a bug): (1) the per-runway wrap fraction `1/D` is the COUNTING fraction (one top offset out of `D`) TAKEN AS the uniform probability — no Mathlib measure space; (2) a PHYSICAL circuit pads by an INTEGER `g_pad`, so its actual offset space is `2^⌈43.585⌉ = 2^44 ≥ n²·n_e·1024`, making the real circuit's deviation `≤` the paper's number (the integer padding is conservative). Everything else — the per-add occupancy bound and the `= totalDeviation` algebra — is circuit-tied / exact. Kernel-clean: no `sorry`, no `native_decide`, no axioms beyond the prelude.

defwrapOffsets

def wrapOffsets (gpad : Nat) : Finset Nat

The offsets in `range (2^gpad)` whose padding value is all-ones `2^gpad − 1` — the ones for which a deposited carry overflows the runway. Characterised by `2^gpad ≤ v + 1`, i.e. `v ≥ 2^gpad − 1`.

theoremperRunway_wrapOffsets_card

theorem perRunway_wrapOffsets_card (gpad : Nat) :
    (wrapOffsets gpad).card = 1

*Per-runway wrap count = 1.** Exactly ONE `gpad`-bit offset (the all-ones `2^gpad − 1`) makes a deposited carry wrap the runway.

defperRunwayWrapFrac

def perRunwayWrapFrac (gpad : Nat) : ℚ

The per-runway wrap fraction: `(#wrapping offsets) / 2^gpad`, a genuine counting ratio over ℚ.

theoremperRunwayWrapFrac_eq

theorem perRunwayWrapFrac_eq (gpad : Nat) :
    perRunwayWrapFrac gpad = 1 / (2 : ℚ) ^ gpad

*The per-runway wrap fraction is `1 / 2^gpad`** — from the card-1 count, this is the uniform probability that the `gpad`-bit offset is all-ones.

defperAddWrapFrac

def perAddWrapFrac (gSep gpad k : Nat) (f : Nat → Bool) : ℚ

*Per-add wrap fraction (CIRCUIT-TIED).** After the `k`-segment runway adder runs on `f`, the fraction of runways that carry-AND-wrap, as `(real occupancy) / 2^gpad`. The numerator is literally `kRunwayOccupancy gSep k (Gate.applyNat (runwayAddK gSep k) f)` — the circuit's own deferred-carry count.

theoremperAddWrapFrac_le

theorem perAddWrapFrac_le (gSep gpad k : Nat) (f : Nat → Bool)
    (hclean : kClean gSep k f) :
    perAddWrapFrac gSep gpad k f ≤ (k : ℚ) / (2 : ℚ) ^ gpad

*Per-add wrap fraction ≤ k · (per-runway fraction).** Because the real runway occupancy is `≤ k` (`runwayAddK_advance_genuine`, the genuine carry-count bound), the per-add wrap fraction is `≤ k / 2^gpad` — `k` runways each contributing the per-runway `1/2^gpad`.

deftotalWrapFrac

def totalWrapFrac (numAdds k gpad : Nat) : ℚ

The total wrap fraction over `numAdds` additions, each contributing `≤ k` carrying runways at `1/2^gpad` apiece: `numAdds · k / 2^gpad`.

theoremcostModel_totalDeviation_form

theorem costModel_totalDeviation_form (n n_e : ℚ) (hn : n ≠ 0) (hne : n_e ≠ 0) :
    (lookupAdditionCount n n_e) * (n / 1024) / (n ^ 2 * n_e * 1024)
      = totalDeviation n n_e

*The cost model's `totalDeviation`, unfolded** (NOT a bridge to the faithful model). This records the cost model's OWN definition: `totalDeviation` with `numAdds = lookupAdditionCount`, `n/1024 = n/g_sep`, and the SUBSTITUTED denominator `n²·n_e·1024`. HONEST CAVEAT: this does NOT mention `totalWrapFrac` and does NOT establish `totalWrapFrac = totalDeviation` — the faithful model's denominator is the LITERAL `2^g_pad`, whereas this uses `n²·n_e·1024` (≈ 1.5·2^g_pad, not a power of two), so the two are NOT formally equal (see the AUDIT FINDING in the file header). It is here only to exhibit the cost model's number for comparison.

theoremfaithful_per_add_deviation_le

theorem faithful_per_add_deviation_le (gSep gpad k : Nat) (f : Nat → Bool)
    (hclean : kClean gSep k f) :
    perAddWrapFrac gSep gpad k f ≤ totalWrapFrac 1 k gpad

*HEADLINE — FAITHFUL CIRCUIT-TIED DEVIATION.** Over `numAdds` additions, each on a clean input, the union-bound total wrap fraction is at most `numAdds · k / 2^g_pad`, where the per-add contribution is the circuit's REAL runway occupancy `kRunwayOccupancy (Gate.applyNat (runwayAddK gSep k) f)` (bounded by `runwayAddK_advance_genuine`). Stated as a per-add bound so the LHS genuinely references the circuit's carry count.

defperAddWrapFracD

def perAddWrapFracD (D : ℚ) (gSep k : Nat) (f : Nat → Bool) : ℚ

Per-add wrap fraction over the paper's ℚ offset space `D`. CIRCUIT-TIED: the numerator is the real `kRunwayOccupancy (Gate.applyNat (runwayAddK gSep k) f)`.

theoremperAddWrapFracD_le

theorem perAddWrapFracD_le (D : ℚ) (hD : 0 < D) (gSep k : Nat) (f : Nat → Bool)
    (hclean : kClean gSep k f) :
    perAddWrapFracD D gSep k f ≤ (k : ℚ) / D

The per-add wrap fraction is `≤ k/D` (real runway count over the offset space), via `runwayAddK_advance_genuine`.

deftotalWrapFracD

def totalWrapFracD (numAdds k D : ℚ) : ℚ

Total wrap fraction over `numAdds` additions at ℚ offset space `D`.

theoremtotalWrapFracD_eq_totalDeviation

theorem totalWrapFracD_eq_totalDeviation (n n_e : ℚ) (hn : n ≠ 0) (hne : n_e ≠ 0) :
    totalWrapFracD (lookupAdditionCount n n_e) (n / 1024) (n ^ 2 * n_e * 1024)
      = totalDeviation n n_e

*THE GENUINE CONNECTION.** At the paper's runway parameters — `numAdds = lookupAdditionCount n n_e`, `k = n/g_sep = n/1024`, and the paper's offset space `D = 2^g_pad = n²·n_e·1024` — the per-runway union-bound total EQUALS the cost model's `totalDeviation` EXACTLY. Combined with `perAddWrapFracD_le` (the per-add contribution is the circuit's real occupancy/`D`), this is the faithful per-runway deviation, circuit-tied, equal to the paper's number.

theoremfaithful_total_deviation_le

theorem faithful_total_deviation_le (n n_e : ℚ) (hn : n ≠ 0) (hne : n_e ≠ 0) :
    totalWrapFracD (lookupAdditionCount n n_e) (n / 1024) (n ^ 2 * n_e * 1024)
      ≤ 1 / 10000000

*The total deviation is `≤ 10⁻⁷` for RSA-2048** (via `totalDeviation_le`). This is the per-runway union-bound total at the paper's offset space `D = n²·n_e·1024 = 2^g_pad`, which `totalWrapFracD_eq_totalDeviation` (§5) shows EQUALS `totalDeviation`.

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayResource

FormalRV/Arithmetic/ObliviousRunwayAdder/RunwayResource.lean

FormalRV.Arithmetic.ObliviousRunwayAdder.RunwayResource — the RESOURCE face of the oblivious-carry-runway adder, bundled with its semantic correctness. `runwayAddK gSep k` (RunwayAdderFunctional) is the `k`-segment oblivious-carry-runway adder `Gate` whose value-correctness is `runwayAddK_exact` (it computes the segmented sum `kAugend + kAddend`, via `Gate.applyNat`). This file adds the matching closed-form Toffoli count on the SAME `Gate` — `k` segment adds, each a `(gSep+1)`-bit Cuccaro adder (`14·(gSep+1)` T = `2·(gSep+1)` Toffoli) — and the combined capstone `runwayAddK_verified`: ONE concrete circuit carrying BOTH semantic correctness and a resource count. No `native_decide`; the count walks the actual `Gate` (`tcount`).

theoremtcount_segAdd

theorem tcount_segAdd (gSep j : Nat) :
    tcount (segAdd gSep j) = 14 * (gSep + 1)

One segment add is a `(gSep+1)`-bit Cuccaro full adder: `14·(gSep+1)` T-gates, independent of the segment index.

theoremtcount_runwayAddK

theorem tcount_runwayAddK (gSep : Nat) :
    ∀ k, tcount (runwayAddK gSep k) = k * (14 * (gSep + 1))

*T-count of the k-segment runway adder**: `k · 14·(gSep+1)` — `k` segment adds in sequence, by induction on the `Gate.seq` chain.

theoremtoffoli_runwayAddK

theorem toffoli_runwayAddK (gSep k : Nat) :
    toffoliCount (runwayAddK gSep k) = 2 * k * (gSep + 1)

*Closed-form Toffoli count of the oblivious-carry-runway adder `Gate`**: `2·k·(gSep+1)` — `k` segments, each a `(gSep+1)`-bit Cuccaro adder (`2·(gSep+1)` Toffoli).

theoremrunwayAddK_verified

theorem runwayAddK_verified (gSep k : Nat) (f : Nat → Bool) (hclean : kClean gSep k f) :
    kDecode gSep k (Gate.applyNat (runwayAddK gSep k) f)
        = kAugend gSep k f + kAddend gSep k f
    ∧ toffoliCount (runwayAddK gSep k) = 2 * k * (gSep + 1)

*THE COMBINED RUNWAY CAPSTONE — one syntactic circuit, both faces.** The single oblivious-carry-runway adder `Gate` `runwayAddK gSep k` SIMULTANEOUSLY (a) computes the correct segmented sum `kAugend + kAddend` under `Gate.applyNat` (SEMANTIC CORRECTNESS on the actual syntactic structure, from a clean state) and (b) has the closed-form Toffoli count `2·k·(gSep+1)` (RESOURCE), for the SAME circuit. Kernel-clean.

FormalRV.Arithmetic.Phaseup

FormalRV/Arithmetic/Phaseup.lean

FormalRV.Arithmetic.Phaseup ───────────────────────────── The reusable **phaseup** gadget — Gidney 2025's third arithmetic subroutine: a phase-gradient table lookup that applies the table-indexed diagonal phase `(−1)^(ctrl ∧ F(addr))` at SELECT-SWAP √-cost (`O(2^(w/2))`), shared with Pinnacle. Import this umbrella to get the whole verified phaseup gadget — the canonical √-cost SELECT-SWAP `def` (`phaseup`) + the unsplit comparison `phaseupFull`, the diagonal phase-action correctness (`phaseup_diagonal`) and end-to-end measured-uncompute channel corollary (`measWordUncompute_phaseup`), and the Toffoli counts with the √-advantage as a THEOREM (`toffoli_phaseup`, `phaseup_toffoli_sqrt`) — as the single public entry point any paper audit can import. The phase mechanism is REUSED, not re-proved: `phaseup` wraps the verified `FormalRV.Shor.SplitPhaseFixup.splitPhaseLookup`, and every correctness / count theorem re-exports a `SplitPhaseFixup` / `PhaseLookupFixup` headline under the gadget's name. See `Phaseup/README.md` for the spine (which file holds which headline), the SELECT-SWAP √-cost explanation + ASCII diagram, and the Gidney 2025 / Pinnacle paper connection.

(no documented top-level declarations)

FormalRV.Arithmetic.Phaseup.Example

FormalRV/Arithmetic/Phaseup/Example.lean

FormalRV.Arithmetic.Phaseup.Example ───────────────────────────────────── Concrete demonstration of the reusable phaseup gadget on small cases. We `#eval` the Toffoli counts of the split phaseup vs the full table read — showing the SELECT-SWAP √-advantage numerically (`phaseup_toffoli_sqrt`) — and verify both the counts and the diagonal phase action with `decide` / the proven `phaseup_diagonal_addr` on real inputs. Everything here is `#eval` / `decide` of verified, kernel-clean objects (no axioms, no `native_decide`).

example(example)

example : toffoliCount (phaseupSkeleton 2 2 9) = 18

Machine-checked counts (the same facts as `toffoli_phaseup` / `toffoli_phaseupFull`, here `decide`d numerically).

example(example)

example : toffoliCount (phaseupFullSkeleton 4) = 30

example(example)

example : toffoliCount (phaseupSkeleton 4 4 17) = 90

example(example)

example : toffoliCount (phaseupFullSkeleton 8) = 510

example(example)

example : toffoliCount (phaseupSkeleton 2 2 9)
    < toffoliCount (phaseupFullSkeleton (2 + 2))

The √-advantage, numerically: split < full at `w = 4` (split as 2+2).

example(example)

example :
    uc_eval (phaseup 7 (fun v => v == 2) 1 1 5)
        * f_to_vec 7 (fun p => p == 0 || p == 3)
      = (-1 : ℂ) • f_to_vec 7 (fun p => p == 0 || p == 3)

Phase ON: address holds `v = 2` (lo = 0, hi = 1), table `F = [· = 2]` ⟹ phaseup applies the phase `−1`. Via the proven `phaseup_diagonal_addr`.

example(example)

example :
    uc_eval (phaseup 7 (fun v => v == 2) 1 1 5)
        * f_to_vec 7 (fun p => p == 0 || p == 1)
      = f_to_vec 7 (fun p => p == 0 || p == 1)

Phase OFF: address holds `v = 1` (lo = 1, hi = 0), table `F = [· = 2]` ⟹ phaseup is the identity (no phase).

FormalRV.Arithmetic.Phaseup.PhaseupCorrectness

FormalRV/Arithmetic/Phaseup/PhaseupCorrectness.lean

FormalRV.Arithmetic.Phaseup.PhaseupCorrectness ──────────────────────────────────────────────── CORRECTNESS for the reusable phaseup gadget: the diagonal phase action and the end-to-end measured-uncompute channel behaviour. Every proof here REUSES the verified `splitPhaseLookup` machinery in `FormalRV.Shor.SplitPhaseFixup` — NO phase semantics is re-derived; this file only surfaces the load-bearing `splitPhaseLookup_diagonal` under the gadget's public name. ## What "correct" means for a phase gadget Phaseup is a DIAGONAL operator at the amplitude / `BaseUCom` layer — it does NOT flip any Boolean wire (so there is no `applyNat` statement to make). Its contract is the phase it stamps on each basis state: `uc_eval (phaseup …) * |f⟩ = (−1)^(ctrl ∧ F(addr)) • |f⟩`, exactly when the AND-ladder and one-hot ancillas are clean (the gadget's own operating frame — surfaced HONESTLY as the hypotheses `hand`, `hhot`). This is `splitPhaseLookup_diagonal`, re-exported here verbatim. ## The required clean-ancilla hypotheses (honest) No address-driven phase circuit on this wire layout can act diagonally on a state whose AND-ladder ancillas are DIRTY (see `PhaseLookupFixup`'s module note: the abstract `hP` is strictly stronger than any real ancilla-using circuit can satisfy). So `phaseup_diagonal` carries: `hand : ∀ i < w, f (ulookup_and_idx i) = false` — AND-ladder clean, `hhot : ∀ h < 2^w1, f (base + h) = false` — one-hot ancillas clean. These are exactly the conditions the channel corollary's `SplitGoodState` bundles; they are met by every lookup-computed family the uncompute consumes. Refs: Gidney 2025 (phaseup); proof reuse from `Shor.SplitPhaseFixup`.

theoremphaseup_diagonal

theorem phaseup_diagonal (dim w1 w2 base : Nat) (F : Nat → Bool) (f : Nat → Bool)
    (hbase : 2 * (w1 + w2) < base) (hdim : base + 2 ^ w1 ≤ dim)
    (hand : ∀ i, i < w1 + w2 → f (ulookup_and_idx i) = false)
    (hhot : ∀ h, h < 2 ^ w1 → f (base + h) = false) :
    uc_eval (phaseup dim F w1 w2 base) * f_to_vec dim f
      = (if f ulookup_ctrl_idx && F (decAddr (w1 + w2) f) then (-1 : ℂ) else 1)
          • f_to_vec dim f

*★ HEADLINE — phaseup applies the table-indexed phase.** On EVERY basis state `f` whose AND-ladder and one-hot ancillas are clean (ctrl and address arbitrary), the phaseup gadget is DIAGONAL with phase `(−1)^(ctrl ∧ F(addr))`, where `addr = decAddr (w1 + w2) f` is the value held by the address wires. Reuses `splitPhaseLookup_diagonal` verbatim; the clean-ancilla hypotheses `hand`/`hhot` are surfaced honestly (no address-driven phase circuit acts diagonally on a ladder-dirty state).

theoremphaseup_diagonal_addr

theorem phaseup_diagonal_addr (dim w1 w2 base : Nat) (F : Nat → Bool)
    (v : Nat) (f : Nat → Bool)
    (hbase : 2 * (w1 + w2) < base) (hdim : base + 2 ^ w1 ≤ dim)
    (hv : v < 2 ^ (w1 + w2))
    (hctrl : f ulookup_ctrl_idx = true)
    (haddr : ∀ i, i < w1 + w2 → f (ulookup_address_idx i) = v.testBit i)
    (hand : ∀ i, i < w1 + w2 → f (ulookup_and_idx i) = false)
    (hhot : ∀ h, h < 2 ^ w1 → f (base + h) = false) :
    uc_eval (phaseup dim F w1 w2 base) * f_to_vec dim f
      = (if F v then (-1 : ℂ) else 1) • f_to_vec dim f

*HEADLINE (address form)** — the query-state shape: ctrl set, the address wires holding the bits of `v < 2^(w1+w2)`, ladders and one-hot ancillas clean ⟹ phaseup applies exactly the single table phase `(−1)^(F v)`.

theoremphaseupFull_diagonal

theorem phaseupFull_diagonal (dim w : Nat) (F : Nat → Bool) (f : Nat → Bool)
    (hdim : 2 * w < dim)
    (hand : ∀ i, i < w → f (ulookup_and_idx i) = false) :
    uc_eval (phaseupFull dim w F) * f_to_vec dim f
      = (if f ulookup_ctrl_idx && F (decAddr w f) then (-1 : ℂ) else 1)
          • f_to_vec dim f

The UNSPLIT phaseup has the SAME diagonal phase action (decoder form) — the object `phaseupFull` is verified to apply `(−1)^(ctrl ∧ F(addr))` too; it just costs the full table read. (Reuses `phaseLookup_diagonal`.)

theoremmeasWordUncompute_phaseup

theorem measWordUncompute_phaseup {dim : Nat} {ι : Type*}
    (w1 w2 base W : Nat) (pos : Nat → Nat) (T : Nat → Nat)
    (hbase : 2 * (w1 + w2) < base)
    (hdim : base + 2 ^ w1 ≤ dim)
    (hpos : ∀ j, j < W → pos j < dim)
    (hpos_high : ∀ j, j < W → base + 2 ^ w1 ≤ pos j)
    (hinj : ∀ j, j < W → ∀ k, k < W → j ≠ k → pos j ≠ pos k)
    (s : Finset ι) (α : ι → ℂ) (g : ι → Nat → Bool)
    (hgood : ∀ i ∈ s, SplitGoodState w1 w2 base (g i))
    (hword : ∀ i ∈ s, ∀ j, j < W →
        g i (pos j) = (T (decAddr (w1 + w2) (g i))).testBit j) :
    c_eval (measWordUncompute dim pos

*★ END-TO-END (channel) HEADLINE** — phaseup as the per-bit fixup of the measured lookup-uncompute is the PERFECT uncompute on every lookup-computed family (ctrl set, ladders + one-hot ancillas clean, word bit `j` holding `T[addr].bit j` on the support): coefficients intact, all `W` word bits released as `|0…0⟩`, no second lookup — at the Gidney–Ekerå √-cost. Re-exports `measWordUncompute_splitPhaseLookup` verbatim with `P j := phaseup dim (fun v => (T v).testBit j) w1 w2 base`.

FormalRV.Arithmetic.Phaseup.PhaseupDef

FormalRV/Arithmetic/Phaseup/PhaseupDef.lean

FormalRV.Arithmetic.Phaseup.PhaseupDef ──────────────────────────────────────── SHARED BASE for the reusable **phaseup** gadget — Gidney 2025's third arithmetic subroutine: a phase-gradient table lookup that applies the table-indexed phase `(−1)^(ctrl ∧ F(addr))` at SELECT-SWAP √-cost, shared with Pinnacle. ## The construction (composition, not from scratch) The phase mechanism is ALREADY proven in `FormalRV.Shor.SplitPhaseFixup` (`splitPhaseLookup`) and `FormalRV.Shor.PhaseLookupFixup` (`phaseLookup`). This file does NOT re-derive any phase machinery — it merely PACKAGES the verified split SELECT-SWAP lookup as the canonical reusable phaseup gadget, exposing the table `F`, the address width `w = w1 + w2`, the split `w1, w2`, and the one-hot `base` as a clean public interface any paper audit can import. `phaseup dim F w1 w2 base` — the CANONICAL phaseup: the √-cost SELECT-SWAP `splitPhaseLookup` (one-hot the hi half, CZ-leaf walk the lo half, un-one-hot), a diagonal phase at `4·(2^w1 − 1) + 2·(2^w2 − 1)` Toffolis. `phaseupFull dim w F` — the UNSPLIT `phaseLookup` for cost comparison, the full table read at `2·(2^w − 1)` Toffolis. `phaseupSkeleton` / `phaseupFullSkeleton` — the Gate-level T-content twins (the BaseUCom phase walk carries no T-counter; cost lives on the literal twin). Refs: Gidney 2025 (phaseup subroutine, √(2^w) SELECT-SWAP); Pinnacle (shares it).

defphaseup

def phaseup (dim : Nat) (F : Nat → Bool) (w1 w2 base : Nat) : BaseUCom dim

*THE phaseup gadget** — Gidney 2025's phase-gradient table lookup at SELECT-SWAP √-cost. On a basis state with ctrl/address holding the query and the AND-ladder + one-hot ancillas clean, it applies the diagonal phase `(−1)^(ctrl ∧ F(addr))` (correctness: `PhaseupCorrectness`), where the address is split `addr = hi‖lo` into the high `w1` levels and low `w2` levels (`w = w1 + w2`) and the `2^w1` one-hot ancillas sit at `base + h`. This is exactly the verified `splitPhaseLookup`: one-hot the hi half, run a CZ-leaf phase walk over the lo half, un-one-hot the hi half — the √-cost construction Gidney–Ekerå charge, vs the full table read `phaseupFull`.

defphaseupFull

def phaseupFull (dim w : Nat) (F : Nat → Bool) : BaseUCom dim

The UNSPLIT phaseup — the full-depth `phaseLookup` over the whole `w`-level address, for cost comparison. Same diagonal phase `(−1)^(ctrl ∧ F(addr))` but at the FULL table-read cost `2·(2^w − 1)` Toffolis.

defphaseupSkeleton

def phaseupSkeleton (w1 w2 base : Nat) : Gate

The Gate-level T-content twin of `phaseup`: two one-hot reads + the lo-walk classical skeleton (its CZ leaves are Clifford, contributing no T).

defphaseupFullSkeleton

def phaseupFullSkeleton (w : Nat) : Gate

The Gate-level T-content twin of `phaseupFull`: the full phase-walk skeleton.

FormalRV.Arithmetic.Phaseup.PhaseupResource

FormalRV/Arithmetic/Phaseup/PhaseupResource.lean

FormalRV.Arithmetic.Phaseup.PhaseupResource ───────────────────────────────────────────── COST for the reusable phaseup gadget — and the √-ADVANTAGE as a THEOREM. Phaseup is a DIAGONAL `BaseUCom`, which carries no T-counter; following the `PhaseLookupFixup`/`SplitPhaseFixup` convention, the Toffoli count lives on the literal Gate-level twin `phaseupSkeleton` (same one-hot reads + lo-walk classical skeleton; its CZ leaves are Clifford and contribute no T). ## The point of the gadget: the SELECT-SWAP √-cost `toffoli_phaseup … = 4·(2^w1 − 1) + 2·(2^w2 − 1)` (the split count) `toffoli_phaseupFull w = 2·(2^w − 1)` (the full table read) and the SPLIT BEATS THE FULL LOOKUP — `phaseup_toffoli_sqrt`: `4·(2^w1 − 1) + 2·(2^w2 − 1) ≤ 2·(2^(w1+w2) − 1)` for `w2 ≥ 1`, strictly `<` once both halves are real (`w1 ≥ 1, w2 ≥ 2`), and at the balanced split `w1 = w2 = w/2` the count is `≈ 4·(2^(w/2) − 1)` — the `O(√(2^w))` SELECT-SWAP advantage Gidney 2025 charges to the phaseup subroutine. Every count and inequality REUSES the verified `SplitPhaseFixup` skeleton lemmas; this file only restates them under the gadget's public names. Refs: Gidney 2025 (phaseup, √(2^w) SELECT-SWAP); proof reuse from `Shor.SplitPhaseFixup`.

theoremtoffoli_phaseup

theorem toffoli_phaseup (w1 w2 base : Nat) :
    toffoliCount (phaseupSkeleton w1 w2 base)
      = 4 * (2 ^ w1 - 1) + 2 * (2 ^ w2 - 1)

*★ Toffoli count of phaseup — the SELECT-SWAP √-cost.** `4·(2^w1 − 1) + 2·(2^w2 − 1)` (two one-hot reads + the lo-walk skeleton), i.e. `O(2^(w/2))` at the balanced split. Reuses `toffoliCount_splitPhaseLookupSkeleton`.

theoremtoffoli_phaseup_closed

theorem toffoli_phaseup_closed (w1 w2 base : Nat) :
    toffoliCount (phaseupSkeleton w1 w2 base) = 4 * 2 ^ w1 + 2 * 2 ^ w2 - 6

Toffoli count of phaseup, closed form: `4·2^w1 + 2·2^w2 − 6`.

theoremtoffoli_phaseupFull

theorem toffoli_phaseupFull (w : Nat) :
    toffoliCount (phaseupFullSkeleton w) = 2 * (2 ^ w - 1)

*Toffoli count of the UNSPLIT phaseup — the full table read.** `2·(2^w − 1)`. Reuses `toffoliCount_phaseLookupSkeleton`.

theoremphaseup_toffoli_sqrt

theorem phaseup_toffoli_sqrt (w1 w2 base : Nat) (hw2 : 1 ≤ w2) :
    toffoliCount (phaseupSkeleton w1 w2 base)
      ≤ toffoliCount (phaseupFullSkeleton (w1 + w2))

*★ HEADLINE — the phaseup SELECT-SWAP √-advantage.** Whenever the lo half is nonempty (`w2 ≥ 1`), the split phaseup costs NO MORE than the full table read: `toffoli_phaseup (w1, w2) ≤ toffoli_phaseupFull (w1 + w2)`, i.e. `4·(2^w1 − 1) + 2·(2^w2 − 1) ≤ 2·(2^(w1+w2) − 1)`. This is the paper's `√(2^w)` SELECT-SWAP claim: the address-split lookup replaces the linear `2^w` table read by `O(2^w1 + 2^w2)`. Reuses `toffoliCount_split_le_unsplit`.

theoremphaseup_toffoli_sqrt_strict

theorem phaseup_toffoli_sqrt_strict (w1 w2 base : Nat) (hw1 : 1 ≤ w1) (hw2 : 2 ≤ w2) :
    toffoliCount (phaseupSkeleton w1 w2 base)
      < toffoliCount (phaseupFullSkeleton (w1 + w2))

*The √-advantage is STRICT** once both halves are real (`w1 ≥ 1, w2 ≥ 2`): the split phaseup is strictly cheaper than the full table read. Reuses `toffoliCount_split_lt_unsplit`.

theoremphaseup_toffoli_sqrt_balanced

theorem phaseup_toffoli_sqrt_balanced (k base : Nat) (hk : 2 ≤ k) :
    toffoliCount (phaseupSkeleton k k base)
      < toffoliCount (phaseupFullSkeleton (k + k))

*The balanced-split headline** — at `w1 = w2 = w/2` (any `w = 2k ≥ 4`) the phaseup is STRICTLY cheaper than the full table read. Here the count is `4·(2^k − 1) + 2·(2^k − 1) = 6·(2^k − 1) ≈ 6·2^(w/2) = O(√(2^w))`, vs the full `2·(2^(2k) − 1) = O(2^w)`. Reuses `toffoliCount_split_halves_lt_unsplit`.

theoremtoffoli_phaseup_balanced

theorem toffoli_phaseup_balanced (k base : Nat) :
    toffoliCount (phaseupSkeleton k k base) = 6 * (2 ^ k - 1)

The balanced-split count is exactly `6·(2^k − 1)` — `O(√(2^(2k)))`.

FormalRV.Arithmetic.RCIR

FormalRV/Arithmetic/RCIR.lean

FormalRV.BQAlgo.RCIR — backward-compat shim. The IR `RCIRGate` and its `tcount` originally lived here; they have been promoted to the `Framework` layer (see `Framework/Gate.lean` and `Framework/Semantics.lean`) so that BQ-Arch and BQ-Code modules can also reason about gates / circuits semantically. This file just re-exports `Gate` under the legacy name `RCIRGate`. New code should `import FormalRV.Core.Gate` directly and use `Gate`.

abbrevRCIRGate

abbrev RCIRGate

Legacy alias — use `FormalRV.Framework.Gate` for new code.

FormalRV.Arithmetic.RippleCarryAdder

FormalRV/Arithmetic/RippleCarryAdder.lean

# FormalRV.Arithmetic.RippleCarryAdder The verified **Gidney** ripple-carry adder gadget. Auditors should read the thin spine — `RippleCarryAdderDef` (THE definition `gidney_adder`), `RippleCarryAdderCorrectness` (THE correctness theorems), and `RippleCarryAdderResource` (T-count / qubits / RSA-2048) — plus the folder `README.md`. The remaining files are heavy supporting proofs.

(no documented top-level declarations)

FormalRV.Arithmetic.RippleCarryAdder.ClassicalBridge.InputEval

FormalRV/Arithmetic/RippleCarryAdder/ClassicalBridge/InputEval.lean

FormalRV.Arithmetic.RippleCarryAdder.ClassicalBridge.InputEval Part 3/4: `adder_input_F` position-evaluation lemmas, the k=0 cascade base case, and the last-bit bit-extraction helper. Builds on `SumfbTestBit`.

theoremadder_input_F_at_bottom

theorem adder_input_F_at_bottom (n a b : Nat) :
    adder_input_F n a b 2 = false

*Preliminary lemma** (partial — bottom 3 positions only): `adder_input_F n a b` evaluates as expected at qubit indices 0, 1, 2 (positions handled by the first-bit step).

theoremadder_input_F_at_read_idx

theorem adder_input_F_at_read_idx
    (n a b j : Nat) (hj : j < n) :
    adder_input_F n a b (read_idx j) = a.testBit j

*`adder_input_F` at `read_idx j`**: evaluates to `a.testBit j` when `j < n`.

theoremadder_input_F_at_target_idx

theorem adder_input_F_at_target_idx
    (n a b j : Nat) (hj : j < n) :
    adder_input_F n a b (target_idx j) = b.testBit j

*`adder_input_F` at `target_idx j`**: evaluates to `b.testBit j` when `j < n`.

theoremadder_input_F_at_carry_idx

theorem adder_input_F_at_carry_idx
    (n a b j : Nat) :
    adder_input_F n a b (carry_idx j) = false

*`adder_input_F` at `carry_idx j`**: always `false` (carry register starts clean). No bound on `j` needed.

theoremadder_input_F_at_first_bit_positions

theorem adder_input_F_at_first_bit_positions
    (n a b : Nat) (hn : 1 < n) :
    adder_input_F n a b 0 = a.testBit 0
    ∧ adder_input_F n a b 1 = b.testBit 0
    ∧ adder_input_F n a b 2 = false
    ∧ adder_input_F n a b 3 = a.testBit 1
    ∧ adder_input_F n a b 4 = b.testBit 1

*`adder_input_F` evaluation at the 5 first-bit-step positions** (Iter 165). Closes the gap between `adder_input_F n a b` (which is parameterized by Nat `n a b`) and `(a.testBit 0, b.testBit 0, false, a.testBit 1, b.testBit 1)` (which is pure Bool). The hypothesis `hn : 1 < n` is needed for positions 3 and 4 (where `k / 3 = 1`, so `decide (1 < n) = true` is required to reduce the `decide` guard). Together with `gidney_first_bit_post_state_in_bits` (Iter 164), this unblocks the proof of `TODO_gidney_first_bit_preserves`.

theoremGidney.propagation_step_invariant_base_k0

theorem Gidney.propagation_step_invariant_base_k0
    (n a b : Nat) (_ha : a < 2^n) (_hb : b < 2^n) :
    Gidney.propagation_step_invariant 0 n a b
      (gidney_propagation_post_state 0 (adder_input_F n a b))

*Base case k=0 of the cascade induction** (Iter 176, PROVEN). The invariant `Gidney.propagation_step_invariant 0 n a b` holds for the input `adder_input_F n a b`. `propagation_post_state 0 f = f`, so this reduces to showing `adder_input_F` has the right values at all positions. Uses the 3 evaluation lemmas above.

example(example)

example :
    let pre

*Last-bit smoke-test** (Iter 169): apply `gidney_last_bit_post_state` at i=1 to the post-first-bit state of `inputF_1_plus_1` (2-bit adder). Expected: carry_1 = MAJ(0, 0, 1) = 0 (chain CX cancels CCX write). Note: `gidney_last_bit_post_state` was originally defined at line 1081 (Iter 67). This tick adds the bit-extraction lemma.

theoremgidney_last_bit_post_state_in_bits

theorem gidney_last_bit_post_state_in_bits
    (i : Nat) (hi : 0 < i) (f : Nat → Bool)
    (h_cinit : f (carry_idx i) = false) :
    (gidney_last_bit_post_state i f) (carry_idx i)
      = xor (f (read_idx i) && f (target_idx i)) (f (carry_idx (i - 1)))

*Bit-extraction helper for last-bit step** (Iter 169). Mirrors Iter 164 (first-bit) and Iter 167 (interior). Last step has only 2 gates; single conjunct (only carry_i is touched).

FormalRV.Arithmetic.RippleCarryAdder.ClassicalBridge.PropagationInvariantBackbone

FormalRV/Arithmetic/RippleCarryAdder/ClassicalBridge/PropagationInvariantBackbone.lean

FormalRV.Arithmetic.RippleCarryAdder.ClassicalBridge.PropagationInvariantBackbone BACKBONE (part 4/4): per-step frame conditions + first/interior/last preservation theorems, the k→k+1 cascade step `gidney_propagation_step_invariant_step`, and the headline parametric invariant `Gidney.propagation_step_invariant_holds`. Builds on `InputEval`.

theoremgidney_first_bit_post_state_preserves_outside

theorem gidney_first_bit_post_state_preserves_outside
    (f : Nat → Bool) (k : Nat)
    (h_c0 : k ≠ carry_idx 0)
    (h_r1 : k ≠ read_idx 1)
    (h_t1 : k ≠ target_idx 1) :
    (gidney_first_bit_post_state f) k = f k

*First-bit step frame condition**: positions other than {carry_0, read_1, target_1} (= {2, 3, 4}) are unchanged.

theoremgidney_last_bit_post_state_preserves_outside

theorem gidney_last_bit_post_state_preserves_outside
    (i : Nat) (f : Nat → Bool) (k : Nat)
    (h_ci : k ≠ carry_idx i) :
    (gidney_last_bit_post_state i f) k = f k

*Last-bit step frame condition**: positions other than {carry_i} are unchanged. (Last-bit only writes to carry_i.)

theoremgidney_last_bit_preserves

theorem gidney_last_bit_preserves (i a b : Nat) (hi : 0 < i) (f : Nat → Bool)
    (h_ri : f (read_idx i)
              = xor (a.testBit i) (Adder.carry false i a.testBit b.testBit))
    (h_ti : f (target_idx i)
              = xor (b.testBit i) (Adder.carry false i a.testBit b.testBit))
    (h_cim1 : f (carry_idx (i - 1))
                = Adder.carry false i a.testBit b.testBit)
    (h_ci : f (carry_idx i) = false) :
    (gidney_last_bit_post_state i f) (carry_idx i)
      = Adder.carry false (i + 1) a.testBit b.testBit

*Last-bit-step preservation theorem (PROVEN, Iter 171)**. Adapter from Iter 169's bit-extraction helper to the carry recurrence. Simpler than interior (no propagation). Given a state `f` satisfying the "step (i-1) END invariant" (i.e., position i-1 fully processed, position i clean): - `f(read_i) = a_i ⊕ c`, `f(target_i) = b_i ⊕ c` - `f(carry_{i-1}) = c` where `c = Adder.carry false i a.testBit b.testBit` - `f(carry_i) = false` Applying `gidney_last_bit_post_state i` yields: - `post(carry_i) = c_{i+1} = Adder.carry false (i+1) a.testBit b.testBit` No propagation to position (i+1) since this is the last bit. The carry-out identity `((a⊕c) ∧ (b⊕c)) ⊕ c = MAJ(a,b,c)` is the same as interior.

example(example)

example :
    -- The interior step at i=1 transforms inputF_3_plus_1's post-first-bit state.
    -- inputF_3_plus_1 (a=3, b=1) → first-bit step → interior step at i=1.
    let post_first

*Smoke-test**: `gidney_interior_bit_post_state 1` on the (3, 1) 3-bit input matches the existing decide-witnessed post-state. Validates the def's correctness on a concrete instance before attempting the parametric bit-extraction proof.

theoremgidney_interior_bit_post_state_eq

theorem gidney_interior_bit_post_state_eq
    (i : Nat) (f : Nat → Bool) :
    gidney_interior_bit_post_state i f
      = gidney_bit_step_faithful_post_state i f

*Bridge lemma** (Iter 172): the Iter 166-defined `gidney_interior_bit_post_state` is identical to the existing `gidney_bit_step_faithful_post_state` (line 570) used by the propagation cascade. Same 4-update body. Provable by `rfl`. Iter 166 inadvertently introduced this duplicate def. The bridge lets us apply Iter 170's `gidney_interior_bit_preserves` (which uses the Iter 166 name) to the cascade's interior steps (which use the existing name).

theoremgidney_interior_bit_post_state_preserves_outside

theorem gidney_interior_bit_post_state_preserves_outside
    (i : Nat) (f : Nat → Bool) (k : Nat)
    (h_ci : k ≠ carry_idx i)
    (h_ri1 : k ≠ read_idx (i + 1))
    (h_ti1 : k ≠ target_idx (i + 1)) :
    (gidney_interior_bit_post_state i f) k = f k

*Interior-bit step frame condition** (Iter 173): positions other than {carry_i, read_{i+1}, target_{i+1}} are unchanged by the interior-bit step at position `i`.

theoremgidney_interior_bit_post_state_in_bits

theorem gidney_interior_bit_post_state_in_bits
    (i : Nat) (hi : 0 < i) (f : Nat → Bool)
    (h_cinit : f (carry_idx i) = false) :
    (gidney_interior_bit_post_state i f) (carry_idx i)
      = xor (f (read_idx i) && f (target_idx i)) (f (carry_idx (i - 1)))
    ∧ (gidney_interior_bit_post_state i f) (read_idx (i + 1))
        = xor (f (read_idx (i + 1)))
              ((gidney_interior_bit_post_state i f) (carry_idx i))
    ∧ (gidney_interior_bit_post_state i f) (target_idx (i + 1))
        = xor (f (target_idx (i + 1)))
              ((gidney_interior_bit_post_state i f) (carry_idx i))

*Bit-extraction helper for interior step** (Iter 167, PROVEN). Analog of Iter 164's first-bit version. Proven via `omega`- derived index inequalities + `update_neq` chain.

theoremgidney_interior_bit_preserves

theorem gidney_interior_bit_preserves (i a b : Nat) (hi : 0 < i) (f : Nat → Bool)
    (h_ri : f (read_idx i)
              = xor (a.testBit i) (Adder.carry false i a.testBit b.testBit))
    (h_ti : f (target_idx i)
              = xor (b.testBit i) (Adder.carry false i a.testBit b.testBit))
    (h_cim1 : f (carry_idx (i - 1))
                = Adder.carry false i a.testBit b.testBit)
    (h_ci : f (carry_idx i) = false)
    (h_ri1 : f (read_idx (i + 1)) = a.testBit (i + 1))
    (h_ti1 : f (target_idx (i + 1)) = b.testBit (i + 1)) :
    let post

*Interior-bit-step preservation theorem (PROVEN, Iter 170)**. Adapter from Iter 167's bit-extraction helper to the classical-carry-recurrence form. Given a state `f` satisfying the "step (i-1) END invariant": - `f(read_i) = a_i ⊕ c`, `f(target_i) = b_i ⊕ c` (propagated by prev step) - `f(carry_{i-1}) = c` (carry from prev step) - `f(carry_i) = false` (carry register unmodified up to position i) - `f(read_{i+1}) = a_{i+1}`, `f(target_{i+1}) = b_{i+1}` (unchanged from input) Applying `gidney_interior_bit_post_state i` yields a state satisfying the "step i END invariant": - `post(carry_i) = c_{i+1} = Adder.carry false (i+1) a.testBit b.testBit` - `post(read_{i+1}) = a_{i+1} ⊕ c_{i+1}` - `post(target_{i+1}) = b_{i+1} ⊕ c_{i+1}` The carry-out identity: `((a_i ⊕ c) ∧ (b_i ⊕ c)) ⊕ c = MAJ(a_i, b_i, c)`.

theoremgidney_first_bit_post_state_in_bits

theorem gidney_first_bit_post_state_in_bits
    (f : Nat → Bool) (h2 : f 2 = false) :
    (gidney_first_bit_post_state f) 2 = (f 0 && f 1)
    ∧ (gidney_first_bit_post_state f) 3 = xor (f 3) (f 0 && f 1)
    ∧ (gidney_first_bit_post_state f) 4 = xor (f 4) (f 0 && f 1)

*Bit-extraction helper for first-bit step** (Iter 164): captures the classical action of `gidney_first_bit_post_state` on an arbitrary input function `f`, parameterized by the 5 relevant bit values at positions 0, 1, 2, 3, 4. Per Iter 162 reflection pattern A (bit-extraction): take Bool values as inputs, NOT a free Nat. This avoids the "decide on free Nat vars" obstacle entirely — the proof is pure Bool case-analysis (16 sub-goals over the 4 free Bool vars). The relationship: `gidney_first_bit_post_state f` at positions 2 (carry_0), 3 (read_1), 4 (target_1): - post 2 = f 0 ∧ f 1 (CCX write) - post 3 = f 3 ⊕ (f 0 ∧ f 1) (CX propagation) - post 4 = f 4 ⊕ (f 0 ∧ f 1) (CX propagation) Note `f 2` (= carry_0's initial value) is XOR'd into the CCX write, but for our adder input `f 2 = false`, so the XOR is trivial. We absorb this via `h2 : f 2 = false`.

theoremgidney_first_bit_preserves

theorem gidney_first_bit_preserves (n a b : Nat)
    (hn : 1 < n) (_ha : a < 2^n) (_hb : b < 2^n) :
    let post

*First-bit-step preservation theorem (PROVEN, Iter 165)**: applying `gidney_first_bit_post_state` to the encoded input `adder_input_F n a b` (with `n ≥ 2`) produces a state where `carry_0 = c_1`, `read_1 = a_1 ⊕ c_1`, `target_1 = b_1 ⊕ c_1`, where `c_1 = Adder.carry false 1 (a.testBit) (b.testBit) = a_0 ∧ b_0`. *Proof** (post Iter 162 reflection's pattern A bit-extraction): glue `gidney_first_bit_post_state_in_bits` (Iter 164, pure Bool case-bash) with `adder_input_F_at_first_bit_positions` (Iter 165 preliminary, uses `hn : 1 < n` to evaluate the `decide` guards). Closes the original `TODO_gidney_first_bit_preserves` from Iter 160.

theoremGidney.propagation_step_invariant_k1

theorem Gidney.propagation_step_invariant_k1
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    Gidney.propagation_step_invariant 1 n a b
      (gidney_propagation_post_state 1 (adder_input_F n a b))

*Inductive step k=0 → k=1 of cascade induction** (Iter 177, PROVEN). Applying `gidney_first_bit_post_state` to `adder_input_F n a b` produces a state satisfying step-1 invariant. Uses `gidney_first_bit_preserves` (touched positions) + frame condition + adder_input_F evaluations (outside positions).

theoremgidney_propagation_step_invariant_step

theorem gidney_propagation_step_invariant_step
    (k n a b : Nat) (hk : 1 ≤ k) (hk_n : k + 1 < n)
    (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n)
    (h_prev : Gidney.propagation_step_invariant k n a b
                (gidney_propagation_post_state k (adder_input_F n a b))) :
    Gidney.propagation_step_invariant (k + 1) n a b
      (gidney_propagation_post_state (k + 1) (adder_input_F n a b))

*Inductive step `k → k+1` of the forward propagation cascade.** For `k ≥ 1`, if the post-state after `k` steps satisfies the step-`k` propagation invariant, then applying the interior step at position `k` yields the step-`(k+1)` invariant. The workhorse of the cascade induction in `gidney_propagation_step_invariant_holds`.

theoremGidney.propagation_step_invariant_holds

theorem Gidney.propagation_step_invariant_holds
    (k n a b : Nat) (hkn : k < n) (hn : 1 < n)
    (ha : a < 2^n) (hb : b < 2^n) :
    Gidney.propagation_step_invariant k n a b
      (gidney_propagation_post_state k (adder_input_F n a b))

*Parametric propagation invariant** (Iter 179, PROVEN — but depends on Iter 178's sorried step lemma). By induction on `k`: - Base case k=0: `propagation_step_invariant_base_k0`. - k=1: `propagation_step_invariant_k1`. - k ≥ 2: `gidney_propagation_step_invariant_step`. The result: for any k with `k + 1 ≤ n`, `gidney_propagation_post_state k (adder_input_F n a b)` satisfies the step-k invariant. With the structural recursion form, the induction goes via `Nat.rec`.

example(example)

example :
    (∀ k, k < 6 →
       adder_input_F 2 1 0 k = inputF_1_plus_0 k)

*Generic ↔ concrete check #1**: `adder_input_F 2 1 0` matches `inputF_1_plus_0` at all 6 qubits of the 2-bit adder.

example(example)

example :
    (∀ k, k < 6 →
       adder_input_F 2 1 1 k = inputF_1_plus_1 k)

*Generic ↔ concrete check #2**: `adder_input_F 2 1 1` matches `inputF_1_plus_1` at all 6 qubits.

example(example)

example :
    (∀ k, k < 9 →
       adder_input_F 3 3 1 k = inputF_3_plus_1 k)

*Generic ↔ concrete check #3**: `adder_input_F 3 3 1` matches `inputF_3_plus_1` at all 9 qubits.

example(example)

example :
    (∀ k, k < 12 →
       adder_input_F 4 7 1 k = inputF_7_plus_1 k)

*Generic ↔ concrete check #4**: `adder_input_F 4 7 1` matches `inputF_7_plus_1` at all 12 qubits.

example(example)

example :
    adder_sum_bit_classical 7 1 0 = false
    ∧ adder_sum_bit_classical 7 1 1 = false
    ∧ adder_sum_bit_classical 7 1 2 = false
    ∧ adder_sum_bit_classical 7 1 3 = true

*Classical sum-bit concrete check**: bit 0 of (7+1)=8 is 0, bit 1 is 0, bit 2 is 0, bit 3 is 1 (binary "1000").

example(example)

example :
    Gidney.post_last_bit_invariant 2 1 1
      (gidney_forward_faithful_full_post_state 2 (adder_input_F 2 1 1))

*Decide-witness for `post_last_bit_invariant` on (n=2, a=1, b=1)** (Iter 187). Validates that after forward cascade only (no final-CX), `target_1 = b_1 ⊕ c_1 = 0 ⊕ 1 = 1` (still propagated, not yet canceled). This is the state BEFORE the final-CX layer.

FormalRV.Arithmetic.RippleCarryAdder.ClassicalBridge.SumfbTestBit

FormalRV/Arithmetic/RippleCarryAdder/ClassicalBridge/SumfbTestBit.lean

FormalRV.Arithmetic.RippleCarryAdder.ClassicalBridge.SumfbTestBit Part 2/4: the classical-arithmetic bridge `Adder.sumfb b f g i = (a+b).testBit i` (testBit_add_zero, carry_shift_one, the carry-in-parametric gen lemma, and the headline `Adder.sumfb_eq_testBit_add`). Builds on `UnfoldAndCarry`.

theoremAdder.testBit_add_zero

theorem Adder.testBit_add_zero (a b : Nat) :
    (a + b).testBit 0 = xor (a.testBit 0) (b.testBit 0)

*Base case of the classical-correctness bridge** (Iter 163, new): `(a + b).testBit 0 = a.testBit 0 ⊕ b.testBit 0`. This is the i=0 specialization of `Adder.sumfb_eq_testBit_add`. The proof goes via Nat's mod-2 arithmetic: `Nat.testBit n 0 ↔ n % 2 = 1`, and `(a + b) % 2 = (a % 2 + b % 2) % 2` (which equals `a % 2 ⊕ b % 2` for Bool-valued mods). This closes the base case of the planned induction on i for `TODO_sumfb_eq_testBit_add`.

lemmaAdder.carry_shift_one

lemma Adder.carry_shift_one (b₀ : Bool) (a b k : Nat) :
    Adder.carry b₀ (k + 1) (fun i => a.testBit i) (fun i => b.testBit i)
    = Adder.carry (Adder.carry b₀ 1 (fun i => a.testBit i) (fun i => b.testBit i))
        k (fun i => (a / 2).testBit i) (fun i => (b / 2).testBit i)

*Carry-shift auxiliary lemma** (Iter 199, 2026-05-13). Relates `Adder.carry b₀ (k+1)` on (a, b) to `Adder.carry initial k` on (a/2, b/2), where `initial = Adder.carry b₀ 1 a b = MAJ(a_0, b_0, b₀)`. Proof by induction on k: the carry recurrence `carry _ (k+1) = MAJ(...)` + `Nat.testBit_add_one` gives `(a/2).testBit m = a.testBit (m+1)`.

theoremAdder.sumfb_eq_testBit_add_gen

theorem Adder.sumfb_eq_testBit_add_gen (b₀ : Bool) (a b i : Nat) :
    Adder.sumfb b₀ (fun k => a.testBit k) (fun k => b.testBit k) i
      = (a + b + b₀.toNat).testBit i

*Strengthened classical-correctness bridge with carry-in** (Iter 196, 2026-05-13). Generalizes `Adder.sumfb_eq_testBit_add` by adding a carry-in parameter `b₀ : Bool`, which lets the inductive step thread through `Nat.testBit_add_one` + `Nat.add_div` decomposition cleanly. Base case (i=0) is the existing `Adder.testBit_add_zero` analog extended with b₀; succ case is named-sorried per Iter 190's strategy doc (uses the gen IH applied to a/2, b/2, new carry-in derived from `Nat.add_div` decomposition).

theoremAdder.sumfb_eq_testBit_add

theorem Adder.sumfb_eq_testBit_add (a b i : Nat) :
    Adder.sumfb false (fun k => a.testBit k) (fun k => b.testBit k) i
      = (a + b).testBit i

*The classical-correctness bridge, parametric** (Iter 196 PROVEN via gen helper). `sumfb` on Nat-derived bit-streams equals `testBit (a+b)`. SQIR's `sumfb_correct_carry0` analog. Was sorried as `TODO_sumfb_eq_testBit_add` until Iter 196. Now derived from `Adder.sumfb_eq_testBit_add_gen` by specializing `b₀ = false` (and using `Bool.toNat false = 0`). Iter 196 also introduced a new sorry `TODO_sumfb_eq_testBit_add_gen_succ` for the gen-helper's succ case. Net sorry delta = 0; the new sorry has cleaner inductive structure.

example(example)

example :
    Adder.sumfb false (fun k => (3 : Nat).testBit k)
                      (fun k => (1 : Nat).testBit k) 0
      = ((3 : Nat) + 1).testBit 0
    ∧ Adder.sumfb false (fun k => (3 : Nat).testBit k)
                        (fun k => (1 : Nat).testBit k) 1
        = ((3 : Nat) + 1).testBit 1
    ∧ Adder.sumfb false (fun k => (3 : Nat).testBit k)
                        (fun k => (1 : Nat).testBit k) 2
        = ((3 : Nat) + 1).testBit 2
    ∧ Adder.sumfb false (fun k => (3 : Nat).testBit k)
                        (fun k => (1 : Nat).testBit k) 3

*Small-instance validation** of the bridge at `(a=3, b=1)`. Sum = 4 = 0b100. Decide-witnesses confirm the statement `sumfb false ... i = (3+1).testBit i` for i = 0, 1, 2, 3.

example(example)

example :
    Adder.sumfb false (fun k => (7 : Nat).testBit k)
                      (fun k => (1 : Nat).testBit k) 0
      = ((7 : Nat) + 1).testBit 0
    ∧ Adder.sumfb false (fun k => (7 : Nat).testBit k)
                        (fun k => (1 : Nat).testBit k) 3
        = ((7 : Nat) + 1).testBit 3

*Small-instance validation** at `(a=7, b=1)`. Sum = 8 = 0b1000. Bit 0/1/2 of 8 = false; bit 3 of 8 = true.

example(example)

example :
    Gidney.forward_cascade_post_invariant 4 7 1
      (gidney_forward_faithful_full_post_state 4 inputF_7_plus_1)

*Validation on the (7, 1) 4-bit case**: decide-witnesses that the invariant predicate is SATISFIED by the actual forward cascade post-state computed by `gidney_forward_faithful_full_post_state 4 inputF_7_plus_1`. This confirms the invariant statement matches the observed post-state (Iter 116's decide-table). The parametric "for all `a b n`" claim will be a separate SORRIED theorem below.

example(example)

example :
    Gidney.propagation_step_invariant 1 3 3 1
      (gidney_propagation_post_state 1 (adder_input_F 3 3 1))

*Validation on (3, 1) n=3 k=1**: after the first-bit step (k=1) on `adder_input_F 3 3 1`, the propagation invariant holds at all 3 positions. Decide-witness via manual match.

FormalRV.Arithmetic.RippleCarryAdder.ClassicalBridge.UnfoldAndCarry

FormalRV/Arithmetic/RippleCarryAdder/ClassicalBridge/UnfoldAndCarry.lean

FormalRV.Arithmetic.RippleCarryAdder.ClassicalBridge.UnfoldAndCarry Part 1/4: full-adder structural unfolding + reverse-cascade correctness on basis states, zero-input lemmas, the RSA-2048 T-count example, and the `Adder.carry`/`sumfb` algebra helpers (carry_succ/carry_sym/...). Supporting lemmas; the propagation-invariant backbone is `PropagationInvariantBackbone`.

example(example)

example :
    gidney_adder_full_with_measurement_uncompute_tcount 33 = 231
    ∧ tcount (gidney_adder_full_faithful_no_measurement 33) = 462

Concrete RSA-2048 (q_A=33): with Gidney measurement trick, T-count = 231 (paper figure); without (faithful gate-explicit), 462 — the factor of 2 review gap.

theoremgidney_adder_forward_faithful_full_reverse_correct

theorem gidney_adder_forward_faithful_full_reverse_correct
    (dim : Nat) (hdim : 0 < dim) (f : Nat → Bool) (n : Nat)
    (hbd : 3 * (n + 2) ≤ dim) :
    uc_eval (Gate.toUCom dim (gidney_adder_forward_faithful_full_reverse (n + 2)))
      * f_to_vec dim (gidney_forward_faithful_full_post_state (n + 2) f)
      = f_to_vec dim f

*Reverse cascade correctness on basis states** — derived as a corollary of Iter 80 (forward correctness) + Iter 83 (matrix- level forward · reverse = 1). On any classical basis state `f_to_vec dim (gidney_forward_faithful_full_post_state (n+2) f)`, the reverse cascade produces back `f_to_vec dim f`.

theoremgidney_adder_full_faithful_no_measurement_unfold

theorem gidney_adder_full_faithful_no_measurement_unfold
    (dim : Nat) (hdim : 0 < dim) (f : Nat → Bool) (n : Nat)
    (hbd : 3 * (n + 2) ≤ dim) :
    uc_eval (Gate.toUCom dim (gidney_adder_full_faithful_no_measurement (n + 2)))
      * f_to_vec dim f
      = uc_eval (Gate.toUCom dim (gidney_adder_forward_faithful_full_reverse (n + 2)))
          * f_to_vec dim
              (gidney_final_cx_cascade_post_state (n + 2)
                (gidney_forward_faithful_full_post_state (n + 2) f))

*Full faithful adder structural unfolding** on classical basis states. The action of `gidney_adder_full_faithful_no_measurement` on `f_to_vec dim f` is expressed as: uc_eval(reverse) * f_to_vec(cx_post(forward_post f)) where `forward_post = gidney_forward_faithful_full_post_state` and `cx_post = gidney_final_cx_cascade_post_state`. The reverse cascade is left symbolic; closing it to a final basis state requires the arithmetic-semantics theorem (Iter 88-89). This unfolding gives the structural skeleton needed to derive the end-to-end `(a, b, 0) → (a, a+b mod 2^n, 0)` theorem.

theoremgidney_first_bit_post_state_on_zero

theorem gidney_first_bit_post_state_on_zero :
    gidney_first_bit_post_state zeroF = zeroF

First-bit step on zero input gives zero. Each of the three updates writes `xor false false = false`, hence is a no-op by `Function.update_eq_self`.

theoremgidney_bit_step_faithful_post_state_on_zero

theorem gidney_bit_step_faithful_post_state_on_zero (i : Nat) :
    gidney_bit_step_faithful_post_state i zeroF = zeroF

Bit-step (interior) on zero input gives zero. Same pattern as first-bit: each update writes false.

theoremgidney_last_bit_post_state_on_zero

theorem gidney_last_bit_post_state_on_zero (i : Nat) :
    gidney_last_bit_post_state i zeroF = zeroF

Last-bit step on zero input gives zero.

theoremgidney_propagation_post_state_on_zero

theorem gidney_propagation_post_state_on_zero : ∀ n,
    gidney_propagation_post_state n zeroF = zeroF
  | 0     => rfl
  | 1     => gidney_first_bit_post_state_on_zero
  | n + 2 =>

Propagation cascade on zero input gives zero. Induction on n.

theoremgidney_forward_faithful_full_post_state_on_zero

theorem gidney_forward_faithful_full_post_state_on_zero : ∀ n,
    gidney_forward_faithful_full_post_state n zeroF = zeroF
  | 0     => rfl
  | 1     => rfl
  | n + 2 =>

Full forward cascade on zero input gives zero.

theoremgidney_final_cx_cascade_post_state_on_zero

theorem gidney_final_cx_cascade_post_state_on_zero : ∀ n,
    gidney_final_cx_cascade_post_state n zeroF = zeroF
  | 0     => rfl
  | n + 1 =>

Final CX cascade on zero input gives zero. Induction on n — each CX(read_i, target_i) writes `target_i ⊕= false = target_i`, a no-op.

theoremgidney_adder_full_faithful_no_measurement_on_zero

theorem gidney_adder_full_faithful_no_measurement_on_zero
    (dim : Nat) (hdim : 0 < dim) (n : Nat)
    (hbd : 3 * (n + 2) ≤ dim) :
    uc_eval (Gate.toUCom dim (gidney_adder_full_faithful_no_measurement (n + 2)))
      * f_to_vec dim zeroF
      = f_to_vec dim zeroF

*End-to-end smoke test**: full faithful Gidney adder on the all-zero input gives back the all-zero output. The simplest arithmetic claim `0 + 0 = 0 mod 2^n` verified at the gate level. Proof: combine Iter 87's structural unfolding with the zero-input lemmas above to reduce the full adder's action to `uc_eval(reverse) * f_to_vec(zero)`. Then apply Iter 86's reverse correctness (with f = zero, since `forward_post(zero) = zero`) to get `f_to_vec(zero)`.

example(example)

example :
    let post

*Concrete forward action check** at every qubit position for the 2-bit adder on `inputF_1_plus_0`. After the forward cascade (first-bit step + last-bit step): - read_0 stays 1 (CCX has read_0 as control; control=1 but target_0=0, so CCX writes 1 ∧ 0 = 0 into carry — no change) - target_0 stays 0 - carry_0 = 0 (read_0 ∧ target_0 = 1 ∧ 0 = 0) - read_1, target_1, carry_1 all stay 0 (no propagation since carry_0 = 0). All 6 positions evaluate by `decide`, confirming the forward cascade preserves the state on this input. The arithmetic interpretation: forward correctly determines that no carries are generated.

example(example)

example :
    let post

*Concrete final-CX action check** for the 2-bit adder on the forward-post-state above. The final CX cascade applies: - CX(read_0, target_0): target_0 ⊕= read_0 = 0 ⊕ 1 = 1. - CX(read_1, target_1): target_1 ⊕= read_1 = 0 ⊕ 0 = 0. After final CX, target = (1, 0), the sum 1 + 0 = 1 ✓.

example(example)

example :
    let post

*Forward post-state on (1, 1) input**: carry_0 generated, propagation flips read_1 and target_1 to 1, but the last-bit step's CCX·CX leaves carry_1 = 0.

example(example)

example :
    let post

*Final CX post-state on (1, 1) input**: `target_0 = 0` (sum-bit-0 = a XOR b XOR carry_in = 1 ⊕ 1 ⊕ 0 = 0 ✓), `target_1 = 0` (at this point target_1 is XOR'd by post-CX read_1=1, so 1 ⊕ 1 = 0 — NOT the sum bit; the reverse cascade is needed to restore target_1 = 1 via the propagation undo).

example(example)

example :
    let post

*Forward post-state on (3, 1) input** (9 qubits checked).

example(example)

example :
    let post

*Final CX post-state on (3, 1) input**: target = (0, 1, 0) = "010" LSB-first = **2**, NOT the expected sum 4 = "100". The reverse cascade is required to flip target_2 from 0 to 1 (via interior_reverse's CX(carry_1, target_2)) to obtain the correct sum. Same review pattern as Iter 106's 2-bit `1+1=2`.

example(example)

example :
    let post

*Forward post-state on (7, 1) input** at all 12 qubits. Carry chain: carry_0=1, carry_1=1, carry_2=1, carry_3=0 (last- bit step's chain CX cancels). Propagation flips read_1, read_2, read_3 (via CX with carries of 1) and target_1, target_2, target_3.

example(example)

example :
    let post

*Final CX post-state on (7, 1) input**: target_0 = 1⊕1 = 0 (sum bit 0), target_3 = 1⊕1 = 0 (NOT the sum bit 3, which should be 1 for 8 = "1000" binary; the reverse cascade is needed to flip target_3 from 0 to 1).

example(example)

example :
    -- Starting state after forward + final CX on inputF_1_plus_1:
    -- (1, 0, 1, 1, 0, 0) i.e., read=(1,1), target=(0,0), carry=(1,0).
    -- Wait this is 2-bit case, but first_bit_reverse uses bit 0 + bit 1
    -- indices, applied to a 6-qubit state.
    -- After first-bit reverse on this state:
    -- - CX(2, 4): target_1 ⊕= carry_0(=1) → target_1 = 0 ⊕ 1 = 1.
    -- - CX(2, 3): read_1 ⊕= carry_0(=1) → read_1 = 1 ⊕ 1 = 0.
    -- - CCX(0, 1, 2): carry_0 ⊕= read_0(=1) ∧ target_0(=0) → carry_0 = 1 ⊕ 0 = 1.
    let prev

*Smoke test on `inputF_1_plus_1` (a=1, b=1)**: starting from the post-final-CX state `(read=(1,0), target=(0,0), carry=(1,0))` after the 2-bit Gidney adder's forward + final CX, the first-bit reverse acts on it. Verify the post-state via decide.

example(example)

example :
    let prev

*Smoke test on `inputF_3_plus_1`**: starting from the post-(forward+final-CX) state of the 3-bit adder, apply the interior-bit reverse at i=1. Verify the post-state at all 9 qubits via decide.

example(example)

example :
    let prev

*Smoke test on `inputF_7_plus_1`**: starting from the post-(forward+final-CX) state of the 4-bit (a=7, b=1) adder, apply the last-bit reverse at i=3. The chain CX flips carry_3 from 0 to 1 (since carry_2=1); the CCX undo then conditions on (read_3=1, target_3=0) → AND=false, so carry_3 stays at 1. Verify the post-state at all 12 qubits via decide.

theoremAdder.carry_false_zero

theorem Adder.carry_false_zero (n : Nat) :
    Adder.carry false n (fun _ => false) (fun _ => false) = false

*Smoke lemma**: carry with carry-in zero, both inputs zero, yields zero. SQIR's `carry_false_0_l` analog ([ModMult.v:514](../../../SQIR/examples/shor/ModMult.v)).

theoremAdder.carry_sym

theorem Adder.carry_sym (b₀ : Bool) (n : Nat) (f g : Nat → Bool) :
    Adder.carry b₀ n f g = Adder.carry b₀ n g f

*Smoke lemma**: carry is symmetric in its two bit-stream arguments. SQIR's `carry_sym` analog ([ModMult.v:506](../../../SQIR/examples/shor/ModMult.v)).

theoremAdder.sumfb_zero

theorem Adder.sumfb_zero (f g : Nat → Bool) :
    Adder.sumfb false f g 0 = xor (f 0) (g 0)

*Smoke lemma**: sum-bit at position 0 with carry-in zero is just `f 0 ⊕ g 0`. Direct from def + carry's base case.

theoremAdder.carry_succ

theorem Adder.carry_succ (b₀ : Bool) (n : Nat) (f g : Nat → Bool) :
    Adder.carry b₀ (n + 1) f g
      = xor (xor (f n && g n) (g n && Adder.carry b₀ n f g))
            (f n && Adder.carry b₀ n f g)

*Carry recurrence in explicit form**: `Adder.carry b₀ (n+1) f g` equals `MAJ(f n, g n, Adder.carry b₀ n f g)` written out via XOR and AND. Auxiliary lemma for downstream proofs that need the recurrence as a rewrite rule (rather than via `unfold`, which expands too aggressively).

FormalRV.Arithmetic.RippleCarryAdder.DecideWitnesses.FinalCXLayer

FormalRV/Arithmetic/RippleCarryAdder/DecideWitnesses/FinalCXLayer.lean

FormalRV.Arithmetic.RippleCarryAdder.DecideWitnesses.FinalCXLayer Part 2/4: final-CX cascade frame/action lemmas, `Gidney.post_forward_final_cx_invariant_holds`, the no-reverse false-conjecture negations, and the reverse-direction in-bits + first-bit-reverse classical-action lemmas. Builds on `ForwardInvariant`.

theoremgidney_final_cx_cascade_preserves_carry

theorem gidney_final_cx_cascade_preserves_carry
    (n k : Nat) (f : Nat → Bool) :
    gidney_final_cx_cascade_post_state n f (carry_idx k) = f (carry_idx k)

*Frame condition: final-CX cascade preserves carry positions.** For any depth n and any k, the cascade doesn't touch carry_k.

theoremgidney_final_cx_cascade_preserves_read

theorem gidney_final_cx_cascade_preserves_read
    (n k : Nat) (f : Nat → Bool) :
    gidney_final_cx_cascade_post_state n f (read_idx k) = f (read_idx k)

*Frame condition: final-CX cascade preserves read positions.** For any depth n and any k, the cascade doesn't touch read_k.

theoremgidney_final_cx_cascade_target_outside

theorem gidney_final_cx_cascade_target_outside
    (n j : Nat) (hj : n ≤ j) (f : Nat → Bool) :
    gidney_final_cx_cascade_post_state n f (target_idx j) = f (target_idx j)

*Frame condition: final-CX cascade preserves target_j for j ≥ n.** Target positions at or above the cascade depth are untouched.

theoremgidney_final_cx_cascade_target_action

theorem gidney_final_cx_cascade_target_action
    (n j : Nat) (hj : j < n) (f : Nat → Bool) :
    gidney_final_cx_cascade_post_state n f (target_idx j)
      = xor (f (target_idx j)) (f (read_idx j))

*Action of final-CX cascade on target_j for j < n**: the post-state XORs the input's read_j into target_j.

theoremGidney.post_forward_final_cx_invariant_holds

theorem Gidney.post_forward_final_cx_invariant_holds (n a b : Nat)
    (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    Gidney.post_forward_final_cx_invariant n a b
      (gidney_final_cx_cascade_post_state n
        (gidney_forward_faithful_full_post_state n (adder_input_F n a b)))

*Parametric `post_forward_final_cx_invariant_holds`** (Iter 189, 2026-05-13). For any n ≥ 2 with valid bounds, applying `gidney_final_cx_cascade_post_state n` to the post-forward state `gidney_forward_faithful_full_post_state n (adder_input_F n a b)` yields a state satisfying `Gidney.post_forward_final_cx_invariant`. *This is THE parametric provable end-state theorem at the forward + final-CX layer**, per Iter 182's review finding. Composes Iter 188's `post_last_bit_invariant_holds` with Iter 184's 4 final-CX structural lemmas: - **carry_j**: `final_cx_cascade_preserves_carry` + Iter 188 → `c_{j+1}`. ✓ - **read_j**: `final_cx_cascade_preserves_read` + Iter 188 → `a_j ⊕ c_j`. ✓ - **target_j**: `final_cx_cascade_target_action` (j < n) → `f(t_j) ⊕ f(r_j)`. From Iter 188: `f(t_j) = b_j ⊕ c_j`, `f(r_j) = a_j ⊕ c_j`. So target_j post-CX = `(b_j ⊕ c_j) ⊕ (a_j ⊕ c_j) = a_j ⊕ b_j`. The c_j contributions cancel — this is exactly Iter 182's review finding made parametric. ✓ The remaining gap to the headline `gidney_classical_action`: target_j is `a_j ⊕ b_j` here, but `sum_j = a_j ⊕ b_j ⊕ c_j`. The reverse cascade (separate, awaits Iter 191+ + John's QUESTIONS.md #1 approval) re-XORs c_j into target_j to produce sum_j.

theoremgidney_classical_action_without_reverse_is_false

theorem gidney_classical_action_without_reverse_is_false :
    ¬ (∀ (n a b : Nat), 0 < n → a < 2^n → b < 2^n →
        ∀ i, i < n →
          gidney_final_cx_cascade_post_state n
            (gidney_forward_faithful_full_post_state n (adder_input_F n a b))
            (target_idx i)
          = adder_sum_bit_classical a b i)

*Phase A end-to-end review finding (negation, proven 2026-05-22)**: the conjecture *"the Gidney adder's forward + final-CX cascade alone (no reverse cascade) computes the classical sum"* is FALSE. HISTORY: this slot used to hold a sorried theorem named `TODO_gidney_classical_action` asserting the (false) positive form. Iter 182 (2026-05-13) supplied a machine-checked counterexample at (n=2, a=1, b=1) — see `gidney_classical_action_unprovable_at_1_plus_1` below — proving that the positive form was unprovable as stated. The corrected headline `gidney_classical_action_with_reverse` (proven at line ~5709) is the canonical semantic-correctness theorem. The honest record of the review finding lives here as a proven negation theorem (no sorry): the universally-quantified positive conjecture is impossible because it fails at the specific witness (n=2, a=1, b=1, i=1).

theoremgidney_classical_action_unprovable_at_1_plus_1

theorem gidney_classical_action_unprovable_at_1_plus_1 :
    ¬ (∀ i, i < 2 →
        gidney_final_cx_cascade_post_state 2
          (gidney_forward_faithful_full_post_state 2 (adder_input_F 2 1 1))
          (target_idx i)
        = adder_sum_bit_classical 1 1 i)

*REVIEW FINDING (Iter 182, 2026-05-13)**: machine-checked counterexample establishing that `TODO_gidney_classical_action` is UNPROVABLE as currently stated. For the instance `(n=2, a=1, b=1)` (all hypotheses satisfied: `0 < 2`, `1 < 4`, `1 < 4`), the conclusion `∀ i, i < 2, forward+final-CX(target_i) = (a+b).testBit i` fails at `i=1`: - Forward+final-CX on `adder_input_F 2 1 1` yields target_1 = 0 (decide-witnessed at lines ~2395-2404 via `inputF_1_plus_1`). - `(1+1).testBit 1 = 2.testBit 1 = 1`. - 0 ≠ 1. ∎ The forward + final-CX cascade produces `target_j = a_j ⊕ b_j` for `j ≥ 1` (the two `c_j` contributions from forward propagation cancel via the final-CX `t_j ⊕= r_j`). But the classical sum is `sum_j = a_j ⊕ b_j ⊕ c_j`, which is OFF by `c_j` whenever `c_j = 1`. *The full Gidney adder requires the REVERSE cascade.** Its per-step `CX(c_{j-1}, t_j)` re-XORs `c_j` into target_j, fixing the gap. Hence the headline theorem should be: ``` gidney_forward_faithful_full_reverse_post_state n (gidney_final_cx_cascade_post_state n (gidney_forward_faithful_full_post_state n (adder_input_F n a b))) (target_idx i) = adder_sum_bit_classical a b i ``` (i.e., forward + final-CX + REVERSE, applied left-to-right.) See QUESTIONS.md (entry 2026-05-13 #1) for the proposed theorem-statement fix awaiting John's approval.

example(example)

example :
    let post

*Decide-witness for full reverse on (n=2, a=1, b=1)** (Iter 191). Confirms that applying the reverse cascade to the post-final-CX state of (1+1) restores `target_1 = 1 = sum_1`, fixing the Iter 182 counterexample. The reverse cascade DOES compute the sum bits — Iter 106's older comment was wrong.

example(example)

example :
    let post

*Decide-witness on (n=3, a=3, b=1)** (Iter 191). Multi-bit.

theoremgidney_interior_bit_reverse_post_state_in_bits

theorem gidney_interior_bit_reverse_post_state_in_bits
    (i : Nat) (hi : 0 < i) (f : Nat → Bool) :
    (gidney_interior_bit_reverse_post_state i f) (carry_idx i)
      = xor (xor (f (carry_idx i)) (f (carry_idx (i - 1))))
            (f (read_idx i) && f (target_idx i))
    ∧ (gidney_interior_bit_reverse_post_state i f) (read_idx (i + 1))
        = xor (f (read_idx (i + 1))) (f (carry_idx i))
    ∧ (gidney_interior_bit_reverse_post_state i f) (target_idx (i + 1))
        = xor (f (target_idx (i + 1))) (f (carry_idx i))

*Interior-bit reverse in-bits structural lemma (PROVEN, Iter 195, 2026-05-13)**. Analog of Iter 167's `gidney_interior_bit_post_state_in_bits` for the reverse direction. Captures the pure structural action of `gidney_interior_bit_reverse_post_state i` on an arbitrary input `f` (no input invariant assumed). Computed by walking the 4 chained updates of the def: - **post(c_i)** = `(f(c_i) ⊕ f(c_{i-1})) ⊕ (f(r_i) ∧ f(t_i))`. Outermost update (gate 4: CCX undo) adds `(r_i ∧ t_i)` to the previous c_i value, which itself was modified by gate 3 (chain CX) to be `f(c_i) ⊕ f(c_{i-1})`. - **post(r_{i+1})** = `f(r_{i+1}) ⊕ f(c_i)` (gate 2 propagates original c_i back through r_{i+1}). - **post(t_{i+1})** = `f(t_{i+1}) ⊕ f(c_i)` (gate 1 propagates back through t_{i+1}).

theoremgidney_last_bit_reverse_post_state_in_bits

theorem gidney_last_bit_reverse_post_state_in_bits
    (i : Nat) (hi : 0 < i) (f : Nat → Bool) :
    (gidney_last_bit_reverse_post_state i f) (carry_idx i)
      = xor (xor (f (carry_idx i)) (f (carry_idx (i - 1))))
            (f (read_idx i) && f (target_idx i))

*Last-bit reverse in-bits structural lemma (PROVEN, Iter 195, 2026-05-13)**. Analog of Iter 169's `gidney_last_bit_post_state_in_bits` for the reverse direction. The last-bit-reverse has only 2 gates (no propagation), so it only modifies `c_i`: - **post(c_i)** = `(f(c_i) ⊕ f(c_{i-1})) ⊕ (f(r_i) ∧ f(t_i))`.

theoremgidney_first_bit_reverse_preserves

theorem gidney_first_bit_reverse_preserves
    (a b : Nat) (f : Nat → Bool)
    (h_r0 : f (read_idx 0) = a.testBit 0)
    (h_t0 : f (target_idx 0) = xor (a.testBit 0) (b.testBit 0))
    (h_c0 : f (carry_idx 0) = Adder.carry false 1 a.testBit b.testBit)
    (h_r1 : f (read_idx 1)
              = xor (a.testBit 1) (Adder.carry false 1 a.testBit b.testBit))
    (h_t1 : f (target_idx 1) = xor (a.testBit 1) (b.testBit 1)) :
    let post

*First-bit reverse classical-action lemma (PROVEN, Iter 193, 2026-05-13)**. Analog of Iter 165's `gidney_first_bit_preserves` for the reverse direction. Given a state `f` matching the post-forward-final-CX invariant at positions {r_0, t_0, c_0, r_1, t_1}, applying `gidney_first_bit_reverse_post_state` produces: - **post(c_0) = a_0** (a "dirty carry" — restored to a_0, NOT to false. This is consistent with Iter 106's older "dirty carries" observation in the file's reverse smoke tests.) - **post(r_1) = a_1** (carry XOR'd out, restored to input). - **post(t_1) = sum_1 = a_1 ⊕ b_1 ⊕ c_1** — the SUM BIT. The reverse cascade's first step XORs c_1 into target_1, completing the sum that the forward+final-CX had pending. This is the CRITICAL semantic step that fixes the Iter 182 review finding: the reverse re-XORs the math carry (which the qubit c_0 holds post-forward) into target_1. The dirty post(c_0) = a_0 calculation: post(c_0) = c_1 ⊕ (r_0 ∧ t_0) = (a_0 ∧ b_0) ⊕ (a_0 ∧ (a_0 ⊕ b_0)) = (a_0 ∧ b_0) ⊕ (a_0 ∧ ¬b_0) = a_0 ∧ (b_0 ⊕ ¬b_0) = a_0 ∧ true = a_0. ∎

example(example)

example :
    let f

*Decide-witness for `gidney_first_bit_reverse_preserves` on (a=1, b=1)** (Iter 193). Validates the lemma statement holds for the post-forward+final-CX state of the (1+1) instance.

example(example)

example :
    let f

*Decide-witness on (a=3, b=1) at n=3** (Iter 193). Multi-bit.

example(example)

example :
    Gidney.post_full_reverse_invariant 2 1 1
      (gidney_full_reverse_post_state 2
        (gidney_final_cx_cascade_post_state 2
          (gidney_forward_faithful_full_post_state 2 (adder_input_F 2 1 1))))

*Decide-witness on (n=2, a=1, b=1)** (Iter 197). Validates the richer Iter 197 invariant on the Iter 182 counterexample case.

example(example)

example :
    Gidney.post_full_reverse_invariant 3 3 1
      (gidney_full_reverse_post_state 3
        (gidney_final_cx_cascade_post_state 3
          (gidney_forward_faithful_full_post_state 3 (adder_input_F 3 3 1))))

*Decide-witness on (n=3, a=3, b=1)** (Iter 197). Multi-bit.

example(example)

example :
    Gidney.reverse_step_invariant 2 2 1 1
      (gidney_full_reverse_post_state 2
        (gidney_final_cx_cascade_post_state 2
          (gidney_forward_faithful_full_post_state 2 (adder_input_F 2 1 1))))

*Smoke decide-witness at k=n=2, (a,b) = (1,1)** (the Iter 182 counterexample case). When the step index equals the register width, the predicate covers every j and matches the witnessed `post_full_reverse_invariant` at line 4615.

FormalRV.Arithmetic.RippleCarryAdder.DecideWitnesses.ForwardInvariant

FormalRV/Arithmetic/RippleCarryAdder/DecideWitnesses/ForwardInvariant.lean

FormalRV.Arithmetic.RippleCarryAdder.DecideWitnesses.ForwardInvariant Part 1/4: forward-cascade decide smoke-witnesses and the parametric `Gidney.post_last_bit_invariant_holds`. Supporting lemmas; the reverse headline backbone is `ReverseFramesAndHeadline`.

example(example)

example :
    Gidney.post_last_bit_invariant 2 1 0
      (gidney_forward_faithful_full_post_state 2 (adder_input_F 2 1 0))

*Decide-witness on (n=2, a=1, b=0)** (Iter 187). No-carry case.

example(example)

example :
    Gidney.post_last_bit_invariant 3 3 1
      (gidney_forward_faithful_full_post_state 3 (adder_input_F 3 3 1))

*Decide-witness on (n=3, a=3, b=1)** (Iter 187). Multi-bit carry.

theoremGidney.post_last_bit_invariant_holds

theorem Gidney.post_last_bit_invariant_holds (n a b : Nat)
    (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    Gidney.post_last_bit_invariant n a b
      (gidney_forward_faithful_full_post_state n (adder_input_F n a b))

*Parametric `post_last_bit_invariant_holds`** (Iter 188, 2026-05-13). For any n ≥ 2 with valid bounds, applying the full forward cascade to `adder_input_F n a b` produces a state satisfying `Gidney.post_last_bit_invariant`. Proof strategy: destructure n = m+2, unfold via the recursive def's third clause to `gidney_last_bit_post_state (m+1) ∘ gidney_propagation_post_state (m+1)`. Apply Iter 179's `propagation_step_invariant_holds (m+1)` for the inner state, extract the 4 facts at positions {c_m, c_{m+1}, r_{m+1}, t_{m+1}}. Apply Iter 171's `gidney_last_bit_preserves` to get post(c_{m+1}) = c_{m+2}. For each j and each conjunct: split on j = m+1 carry case (use preserves) vs frame case (use Iter 173's last-bit frame + the propagation invariant clause, which always reduces to the propagated branch since j ≤ m+1 for all j < m+2).

example(example)

example :
    Gidney.post_forward_final_cx_invariant 2 1 1
      (gidney_final_cx_cascade_post_state 2
        (gidney_forward_faithful_full_post_state 2 (adder_input_F 2 1 1)))

*Decide-witness for the post-forward-final-CX invariant on (n=2, a=1, b=1)** (Iter 183). Validates the invariant on the instance where the original `TODO_gidney_classical_action` fails (per Iter 182 counterexample) — confirming the invariant matches the actual classical action.

example(example)

example :
    Gidney.post_forward_final_cx_invariant 2 1 0
      (gidney_final_cx_cascade_post_state 2
        (gidney_forward_faithful_full_post_state 2 (adder_input_F 2 1 0)))

*Decide-witness on (n=2, a=1, b=0)** (Iter 183). The case where no carry is generated (c_1 = 0), so target_1 = a_1 ⊕ b_1 = 0 happens to equal sum_1 = 0.

example(example)

example :
    Gidney.post_forward_final_cx_invariant 3 3 1
      (gidney_final_cx_cascade_post_state 3
        (gidney_forward_faithful_full_post_state 3 (adder_input_F 3 3 1)))

*Decide-witness on (n=3, a=3, b=1)** (Iter 183). Multi-bit carry propagation. 3+1 = 4 = 100. Invariant predicts: target_0 = a_0 ⊕ b_0 = 0, target_1 = a_1 ⊕ b_1 = 1, target_2 = a_2 ⊕ b_2 = 0. Sum bits: 0, 0, 1. So target_1 differs from sum_1 (1 vs 0), and target_2 differs from sum_2 (0 vs 1). The invariant correctly captures the actual post-state.

FormalRV.Arithmetic.RippleCarryAdder.DecideWitnesses.ReverseFramesAndHeadline

FormalRV/Arithmetic/RippleCarryAdder/DecideWitnesses/ReverseFramesAndHeadline.lean

FormalRV.Arithmetic.RippleCarryAdder.DecideWitnesses.ReverseFramesAndHeadline BACKBONE (part 4/4): per-step + cascade reverse frame conditions, the cascade-equals-first-reverse-on-low lemmas, THE headline j=0 / j=1 cases (`gidney_classical_action_with_reverse_target_0/_1`), and the propagation-reverse-at-target reduces-to-interior-reverse assembly lemma. Builds on `ReverseStepScaffold`.

theoremgidney_first_bit_reverse_preserves_target_0

theorem gidney_first_bit_reverse_preserves_target_0 (f : Nat → Bool) :
    gidney_first_bit_reverse_post_state f (target_idx 0) = f (target_idx 0)

*First-bit reverse preserves target_0** (Iter 200). The first-bit reverse modifies {t_1, r_1, c_0}; not target_idx 0 (= position 1, distinct from target_idx 1 = position 4).

theoremgidney_interior_bit_reverse_preserves_target_0

theorem gidney_interior_bit_reverse_preserves_target_0
    (i : Nat) (hi : 0 < i) (f : Nat → Bool) :
    gidney_interior_bit_reverse_post_state i f (target_idx 0) = f (target_idx 0)

*Interior-bit reverse preserves target_0** for `i ≥ 1`. The interior reverse at i modifies {t_{i+1}, r_{i+1}, c_i, c_i}; target_0 is distinct from all of these for i ≥ 1.

theoremgidney_last_bit_reverse_preserves_target_0

theorem gidney_last_bit_reverse_preserves_target_0
    (i : Nat) (hi : 0 < i) (f : Nat → Bool) :
    gidney_last_bit_reverse_post_state i f (target_idx 0) = f (target_idx 0)

*Last-bit reverse preserves target_0** for `i ≥ 1`.

theoremgidney_propagation_reverse_preserves_target_0

theorem gidney_propagation_reverse_preserves_target_0
    (K : Nat) (f : Nat → Bool) :
    gidney_propagation_reverse_post_state K f (target_idx 0) = f (target_idx 0)

*Propagation reverse cascade preserves target_0**. By induction on `K` over the propagation_reverse_post_state def (which only invokes first/interior reverses, all of which preserve target_0).

theoremgidney_full_reverse_preserves_target_0

theorem gidney_full_reverse_preserves_target_0 (n : Nat) (f : Nat → Bool) :
    gidney_full_reverse_post_state n f (target_idx 0) = f (target_idx 0)

*Full reverse cascade preserves target_0**. For `n ≥ 2`, the full reverse cascade applies last_reverse(n-1) + propagation_reverse(n-1); both preserve target_0.

theoremgidney_interior_bit_reverse_post_state_preserves_outside

theorem gidney_interior_bit_reverse_post_state_preserves_outside
    (i : Nat) (f : Nat → Bool) (q : Nat)
    (h_ci : q ≠ carry_idx i)
    (h_ri1 : q ≠ read_idx (i + 1))
    (h_ti1 : q ≠ target_idx (i + 1)) :
    gidney_interior_bit_reverse_post_state i f q = f q

*Interior-bit reverse frame condition** (Iter 206). Positions other than {c_i, r_{i+1}, t_{i+1}} are unchanged. Generic frame analog of Iter 173's forward interior-step frame.

theoremgidney_interior_bit_reverse_preserves_low

theorem gidney_interior_bit_reverse_preserves_low
    (i : Nat) (hi : 0 < i) (q : Nat) (hq : q < 5) (f : Nat → Bool) :
    gidney_interior_bit_reverse_post_state i f q = f q

*Interior-bit reverse preserves low positions** (Iter 206). For i ≥ 1 and q < 5, the interior reverse modifies indices ≥ 5 only.

theoremgidney_first_bit_reverse_low_dependence

theorem gidney_first_bit_reverse_low_dependence
    (g h : Nat → Bool) (q : Nat) (hq : q < 5)
    (h_eq : ∀ p, p < 5 → g p = h p) :
    gidney_first_bit_reverse_post_state g q
    = gidney_first_bit_reverse_post_state h q

*First-bit reverse depends only on inputs at low positions** (Iter 206). For q < 5, the first-bit reverse's output at q is determined by the input's values at positions {0, 1, 2, 3, 4}. Therefore if g and h agree on those positions, first_rev g and first_rev h agree at q.

theoremgidney_last_bit_reverse_post_state_preserves_outside

theorem gidney_last_bit_reverse_post_state_preserves_outside
    (i : Nat) (f : Nat → Bool) (q : Nat) (h_q : q ≠ carry_idx i) :
    gidney_last_bit_reverse_post_state i f q = f q

*Last-bit reverse frame condition** (Iter 203). Positions other than `carry_idx i` are unchanged.

theoremGidney.last_reverse_target_read_frame

theorem Gidney.last_reverse_target_read_frame
    (i j : Nat) (f : Nat → Bool) :
    gidney_last_bit_reverse_post_state i f (target_idx j) = f (target_idx j)
    ∧ gidney_last_bit_reverse_post_state i f (read_idx j) = f (read_idx j)

*Last-reverse target/read frame** (2026-05-14 tick, anchors the cascade-induction k=0 → k=1 step). The last-bit reverse modifies ONLY `carry_idx i` (see def line 2637), so it's the identity on every `target_idx j` and `read_idx j`. The frame holds universally (for ALL i, j) because the qubit layout `read_j = 3j`, `target_j = 3j + 1`, `carry_j = 3j + 2` gives disjoint mod-3 residues — `target_idx j ≠ carry_idx i` and `read_idx j ≠ carry_idx i` for any (i, j). No `j < n` bound needed. This is the matching frame for the outer cascade's first step (`last_reverse(n-1)` in `gidney_full_reverse_post_state`). Once the cascade-induction proof factors through `propagation_reverse`, this lemma transfers the post-final-CX target/read state across the last-reverse layer unchanged.

theoremGidney.reverse_step_invariant_preserved_by_last_reverse

theorem Gidney.reverse_step_invariant_preserved_by_last_reverse
    (k n a b i : Nat) (f : Nat → Bool)
    (h : Gidney.reverse_step_invariant k n a b f) :
    Gidney.reverse_step_invariant k n a b
      (gidney_last_bit_reverse_post_state i f)

*`reverse_step_invariant` transfers across last-reverse** (2026-05-14 tick). Since `last_reverse(i)` only modifies `carry_idx i` (per `last_reverse_target_read_frame`), every target/read claim in `reverse_step_invariant k n a b f` is preserved when `f` is replaced by `last_bit_reverse i f`. This is the structural lemma that lets the outer cascade `gidney_full_reverse_post_state` factor through last_reverse: if we can establish `inv_n` after the propagation_reverse cascade alone (starting from `last_reverse(n-1) post_final_cx`), this lemma's NOT what we need; rather, it's the dual — if `inv_k` already held BEFORE last_reverse, it still holds AFTER. Useful as a frame helper in the cascade-induction proof.

theoremGidney.reverse_step_invariant_preserved_by_propagation_reverse_zero

theorem Gidney.reverse_step_invariant_preserved_by_propagation_reverse_zero
    (k n a b : Nat) (f : Nat → Bool)
    (h : Gidney.reverse_step_invariant k n a b f) :
    Gidney.reverse_step_invariant k n a b
      (gidney_propagation_reverse_post_state 0 f)

*K=0 trivial preservation**: `propagation_reverse(0)` is definitionally the identity (see def line 4334), so any invariant on `f` carries directly to `propagation_reverse(0) f`. `:= h` by reduction.

theoremGidney.reverse_step_invariant_K_holds_after_propagation_reverse_K_zero_only

theorem Gidney.reverse_step_invariant_K_holds_after_propagation_reverse_K_zero_only
    (n a b : Nat) (_hn : 1 < n) (input : Nat → Bool) :
    Gidney.reverse_step_invariant 0 n a b
      (gidney_propagation_reverse_post_state 0 input)

theoremgidney_last_bit_reverse_preserves_low

theorem gidney_last_bit_reverse_preserves_low
    (i : Nat) (hi : 0 < i) (q : Nat) (hq : q < 5) (f : Nat → Bool) :
    gidney_last_bit_reverse_post_state i f q = f q

*Last-bit reverse preserves the low-position frame** (Iter 203, 2026-05-13). For i ≥ 1, the last-bit reverse only modifies `carry_idx i = 3i + 2 ≥ 5`. Positions 0..4 (= read_0, target_0, carry_0, read_1, target_1) are all preserved.

theoremgidney_propagation_reverse_eq_first_rev_low

theorem gidney_propagation_reverse_eq_first_rev_low
    (K : Nat) (hK : 0 < K) (g : Nat → Bool) (q : Nat) (hq : q < 5) :
    gidney_propagation_reverse_post_state K g q
    = gidney_first_bit_reverse_post_state g q

*Propagation reverse cascade equals first reverse on low positions** (Iter 206). For K ≥ 1 and q < 5, propagation_reverse(K) g equals first_reverse g at q.

theoremgidney_full_reverse_eq_first_rev_low

theorem gidney_full_reverse_eq_first_rev_low
    (n : Nat) (hn : 1 < n) (f : Nat → Bool) (q : Nat) (hq : q < 5) :
    gidney_full_reverse_post_state n f q
    = gidney_first_bit_reverse_post_state f q

*Full reverse cascade equals first reverse on low positions** (Iter 206). For n ≥ 2 and q < 5, full_reverse(n) f equals first_reverse f at q.

theoremgidney_classical_action_with_reverse_target_0

theorem gidney_classical_action_with_reverse_target_0
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    gidney_full_reverse_post_state n
      (gidney_final_cx_cascade_post_state n
        (gidney_forward_faithful_full_post_state n (adder_input_F n a b)))
      (target_idx 0)
    = adder_sum_bit_classical a b 0

*Headline j=0 case PROVEN parametrically** (Iter 202, 2026-05-13). For any n ≥ 2 and valid a, b, the j=0 case of `TODO_gidney_classical_action_with_reverse` holds: target_0 after full forward + final-CX + reverse = `adder_sum_bit_classical a b 0`. Composes: - Iter 200's `gidney_full_reverse_preserves_target_0` (target_0 unchanged by full reverse cascade). - Iter 189's `Gidney.post_forward_final_cx_invariant_holds` (post-CX target_0 = a_0 ⊕ b_0). - Iter 163's `Adder.testBit_add_zero` ((a+b).testBit 0 = a_0 ⊕ b_0).

theoremgidney_classical_action_with_reverse_target_1

theorem gidney_classical_action_with_reverse_target_1
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    gidney_full_reverse_post_state n
      (gidney_final_cx_cascade_post_state n
        (gidney_forward_faithful_full_post_state n (adder_input_F n a b)))
      (target_idx 1)
    = adder_sum_bit_classical a b 1

*Headline j=1 case PROVEN parametrically over n** (Iter 207, 2026-05-13). Uses Iter 206's `gidney_full_reverse_eq_first_rev_low` to reduce the full reverse cascade at target_idx 1 (= 4 < 5) to just first_reverse, then applies Iter 194 with hypotheses verified from Iter 189's invariant.

theoremgidney_first_bit_reverse_preserves_target_above

theorem gidney_first_bit_reverse_preserves_target_above
    (j : Nat) (hj : 1 < j) (f : Nat → Bool) :
    gidney_first_bit_reverse_post_state f (target_idx j) = f (target_idx j)

*First-bit reverse preserves target_j for j ≥ 2** (Iter 209). Modifies {c_0, r_1, t_1}; for j ≥ 2, target_idx j = 3j+1 ≥ 7 > 4.

theoremgidney_first_bit_reverse_preserves_read_above

theorem gidney_first_bit_reverse_preserves_read_above
    (j : Nat) (hj : 1 < j) (f : Nat → Bool) :
    gidney_first_bit_reverse_post_state f (read_idx j) = f (read_idx j)

*First-bit reverse preserves read_j for j > 1** (2026-05-14 tick, read-side analog of `_preserves_target_above`). Modifies {t_1, r_1, c_0}; for j > 1, read_idx j = 3j ≠ any of those.

theoremgidney_interior_bit_reverse_preserves_target_above

theorem gidney_interior_bit_reverse_preserves_target_above
    (i j : Nat) (hij : i + 1 < j) (f : Nat → Bool) :
    gidney_interior_bit_reverse_post_state i f (target_idx j) = f (target_idx j)

*Interior-bit reverse preserves target_j for j > i+1** (Iter 209). Modifies {c_i, r_{i+1}, t_{i+1}}; for j > i+1, target_idx j = 3j+1 > 3(i+1)+1 = t_{i+1}.

theoremgidney_interior_bit_reverse_preserves_read_above

theorem gidney_interior_bit_reverse_preserves_read_above
    (i j : Nat) (hij : i + 1 < j) (f : Nat → Bool) :
    gidney_interior_bit_reverse_post_state i f (read_idx j) = f (read_idx j)

*Interior-bit reverse preserves read_j for j > i+1** (2026-05-14 tick, read-side analog). Same proof structure as the target version with read_idx in place of target_idx.

theoremgidney_propagation_reverse_preserves_target_above

theorem gidney_propagation_reverse_preserves_target_above
    (K j : Nat) (hjK : K < j) (f : Nat → Bool) :
    gidney_propagation_reverse_post_state K f (target_idx j) = f (target_idx j)

*Propagation reverse preserves target_j for j > K** (Iter 209). For K ≥ 0 and j > K, propagation_reverse(K) preserves target_idx j. By induction on K.

theoremgidney_propagation_reverse_preserves_read_above

theorem gidney_propagation_reverse_preserves_read_above
    (K j : Nat) (hjK : K < j) (f : Nat → Bool) :
    gidney_propagation_reverse_post_state K f (read_idx j) = f (read_idx j)

*Propagation reverse preserves read_j for j > K** (2026-05-14 tick, read-side analog of `_preserves_target_above` at line 5404). Same induction-on-K structure with `read_idx` in place of `target_idx`.

theoremgidney_interior_bit_reverse_at_target_low_dependence

theorem gidney_interior_bit_reverse_at_target_low_dependence
    (i : Nat) (hi : 0 < i) (g h : Nat → Bool)
    (h_t : g (target_idx (i + 1)) = h (target_idx (i + 1)))
    (h_c : g (carry_idx i) = h (carry_idx i)) :
    gidney_interior_bit_reverse_post_state i g (target_idx (i + 1))
    = gidney_interior_bit_reverse_post_state i h (target_idx (i + 1))

*Interior reverse at target_(i+1) only depends on inputs at {t_{i+1}, c_i}** (Iter 211). If g and h agree at those two positions, then interior_reverse(i) g and interior_reverse(i) h agree at target_(i+1).

theoremgidney_interior_bit_reverse_at_read_low_dependence

theorem gidney_interior_bit_reverse_at_read_low_dependence
    (i : Nat) (hi : 0 < i) (g h : Nat → Bool)
    (h_r : g (read_idx (i + 1)) = h (read_idx (i + 1)))
    (h_c : g (carry_idx i) = h (carry_idx i)) :
    gidney_interior_bit_reverse_post_state i g (read_idx (i + 1))
    = gidney_interior_bit_reverse_post_state i h (read_idx (i + 1))

*Interior reverse at read_(i+1) only depends on inputs at {r_{i+1}, c_i}** (2026-05-14 tick). Read-side analog of `_at_target_low_dependence`. Same proof structure with `.2.1` (read component of Iter 195's `_in_bits` triple) instead of `.2.2`.

theoremgidney_propagation_reverse_at_target_eq_interior_reverse

theorem gidney_propagation_reverse_at_target_eq_interior_reverse
    (K j : Nat) (hj : 1 < j) (hjK : j ≤ K) (g : Nat → Bool) :
    gidney_propagation_reverse_post_state K g (target_idx j)
    = gidney_interior_bit_reverse_post_state (j - 1) g (target_idx j)

*Propagation reverse at target_j reduces to interior_reverse(j-1)** (Iter 211). For j ∈ [2, K], propagation_reverse(K) g (target_idx j) equals interior_reverse(j-1) g (target_idx j). The cascade reduces to a single per-step. Proof: induction on K. - K=1: vacuous (j ∈ [2, 1] is empty). - K=m+2: propagation_reverse(m+2) g = propagation_reverse(m+1) (interior_reverse(m+1) g). - Subcase j = m+2: interior_reverse(m+1) computes target_j; later cascade preserves it (Iter 209's preserves_target_above with j > m+1). - Subcase j ≤ m+1: by IH, propagation_reverse(m+1) (...) (target_j) = interior_reverse(j-1) (interior_reverse(m+1) g) (target_j). And interior_reverse(m+1) preserves t_j and c_{j-1} (both ≤ 3j+1 ≤ 3(m+1)+1 < 3(m+1)+2), so by at_target_low_dependence, this equals interior_reverse(j-1) g (target_j).

FormalRV.Arithmetic.RippleCarryAdder.DecideWitnesses.ReverseStepScaffold

FormalRV/Arithmetic/RippleCarryAdder/DecideWitnesses/ReverseStepScaffold.lean

FormalRV.Arithmetic.RippleCarryAdder.DecideWitnesses.ReverseStepScaffold Part 3/4: the step-indexed `reverse_step_invariant` scaffolding (zero base, n-iff bridge, apply/weaken, abstract succ engine) and the per-step interior-reverse-computes-one-sum-bit lemma. Builds on `FinalCXLayer`.

theoremGidney.reverse_step_invariant_zero

theorem Gidney.reverse_step_invariant_zero (n a b : Nat) (post : Nat → Bool) :
    Gidney.reverse_step_invariant 0 n a b post

*Base case of the cascade induction**: when `k = 0`, the step-indexed invariant is vacuously true because the quantifier range `n - 0 ≤ j ∧ j < n` simplifies to `n ≤ j ∧ j < n`, which is unsatisfiable. No assumption on `post` is needed. This is the starting point for the inductive proof of `TODO_post_full_reverse_invariant_holds` — the parametric `reverse_step_invariant k n a b _` will be lifted from k=0 up to k=n via a `_succ` step that uses Iter 194 (first-bit reverse) + Iter 195 (interior `in_bits`) + Iter 201 (interior `computes_sum`) + the cascade-frame property.

theoremGidney.reverse_step_invariant_n_iff_post_full_reverse_invariant

theorem Gidney.reverse_step_invariant_n_iff_post_full_reverse_invariant
    (n a b : Nat) (post : Nat → Bool) :
    Gidney.reverse_step_invariant n n a b post ↔
      Gidney.post_full_reverse_invariant n a b post

*k=n bridge** to the original `Gidney.post_full_reverse_invariant`: when the step index equals the register width, the step-indexed predicate's quantifier range `n - n ≤ j ∧ j < n` simplifies to `0 ≤ j ∧ j < n`, which is the same range as the post-full-reverse invariant. This is the closing composition step for `TODO_post_full_reverse_invariant_holds`: once a `_succ` lemma lifts the predicate from k=0 up to k=n, this iff turns `reverse_step_invariant n _ _ _ _` into the goal.

theoremGidney.reverse_step_invariant_apply

theorem Gidney.reverse_step_invariant_apply
    (k n a b j : Nat) (post : Nat → Bool)
    (h_inv : Gidney.reverse_step_invariant k n a b post)
    (h_lo : n - k ≤ j) (h_hi : j < n) :
    post (target_idx j) = adder_sum_bit_classical a b j ∧
      post (read_idx j) = a.testBit j

*Specialization-at-j helper**: given the step-indexed predicate and witnesses that position `j` is in its quantifier range, extract the (target, read) correctness pair at `j`. A trivial 1-line application of the predicate; named for readability in downstream cascade-induction proofs that need to invoke the invariant at a specific position.

theoremGidney.reverse_step_invariant_weaken

theorem Gidney.reverse_step_invariant_weaken
    (k n a b : Nat) (post : Nat → Bool)
    (h : Gidney.reverse_step_invariant (k + 1) n a b post) :
    Gidney.reverse_step_invariant k n a b post

*Weakening**: a larger step index strengthens the invariant (covers more positions), so `inv_{k+1} → inv_k`. Useful when a cascade-induction proof has established the strong form and needs to extract a weaker one for a sub-case. Direct from the definition: `n - (k+1) ≤ j` implies `n - k ≤ j` via `omega`.

lemmaand

--     lemma and then inducting on the propagation chain.
--   - The `_succ_via_step_property` engine still applies for each
--     propagation step (k=1 first instantiated via interior_reverse(n-2),
--     k=n-1 via first_bit_reverse). Target_0 needs separate handling
--     (set by final-CX, preserved by every reverse step — Iter 200
--     frame lemmas cover this).

theoremGidney.reverse_step_invariant_succ_via_step_property

theorem Gidney.reverse_step_invariant_succ_via_step_property
    (k n a b : Nat) (post post' : Nat → Bool)
    (ih : Gidney.reverse_step_invariant k n a b post)
    (_hk : k < n)
    (h_step_target :
      post' (target_idx (n - k - 1)) = adder_sum_bit_classical a b (n - k - 1))
    (h_step_read :
      post' (read_idx (n - k - 1)) = a.testBit (n - k - 1))
    (h_frame_target : ∀ j, n - k ≤ j → j < n →
                        post' (target_idx j) = post (target_idx j))
    (h_frame_read : ∀ j, n - k ≤ j → j < n →
                      post' (read_idx j) = post (read_idx j)) :

theoremgidney_interior_bit_reverse_computes_sum

theorem gidney_interior_bit_reverse_computes_sum
    (j a b : Nat) (hj : 0 < j) (f : Nat → Bool)
    (h_cj : f (carry_idx j) = Adder.carry false (j + 1) a.testBit b.testBit)
    (h_tj1 : f (target_idx (j + 1))
              = xor (a.testBit (j + 1)) (b.testBit (j + 1))) :
    let post

*Interior-bit reverse computes one sum bit** (PROVEN, Iter 201): given a state `f` whose values at {c_j, r_{j+1}, t_{j+1}} match the post-forward+final-CX invariant, applying `interior_reverse(j)` produces `target_{j+1} = sum_{j+1}`. XOR identity: `(a_{j+1} ⊕ b_{j+1}) ⊕ c_{j+1} = sum_{j+1}` (since `sumfb false a b (j+1) = c_{j+1} ⊕ a_{j+1} ⊕ b_{j+1}`). Proof composes Iter 195's `gidney_interior_bit_reverse_post_state_in_bits` with Iter 199's `Adder.sumfb_eq_testBit_add`.

FormalRV.Arithmetic.RippleCarryAdder.ForwardAndCost.FaithfulBackbone

FormalRV/Arithmetic/RippleCarryAdder/ForwardAndCost/FaithfulBackbone.lean

FormalRV.Arithmetic.RippleCarryAdder.ForwardAndCost.FaithfulBackbone BACKBONE (part 5/5): the faithful full forward/reverse cascade — costs (`tcount_gidney_adder_full_faithful_no_measurement` = 14·(n+2)), basis-state correctness (`gidney_adder_forward_faithful_full_correct`), cascade-level reversibility (`..._fwd_rev_eq_one`), and the measurement-gap factor. Builds on `LastBitAndSkeletonRev`.

theoremtcount_gidney_adder_forward_with_propagation

theorem tcount_gidney_adder_forward_with_propagation : ∀ n,
    tcount (gidney_adder_forward_with_propagation n) = 7 * n
  | 0     => by decide
  | 1     => by decide
  | n + 2 =>

T-count of the propagation cascade: `7n` (each bit contributes 1 Toffoli).

theoremgcount_gidney_adder_forward_with_propagation

theorem gcount_gidney_adder_forward_with_propagation : ∀ n,
    gcount (gidney_adder_forward_with_propagation n)
      = if n = 0 then 0 else 4 * n - 1
  | 0     => by decide
  | 1     => by decide
  | n + 2 =>

Gate-count of the propagation cascade. Bit 0 contributes 3 gates (1 CCX + 2 propagation CXs); each interior bit contributes 4 (1 CCX + 1 chain CX + 2 propagation CXs). Total: `3 + 4·(n-1) = 4n - 1` for `n ≥ 1`. Edge cases: `n=0` gives 0 gates; for n ≥ 1 the formula `4n - 1` holds. We state it as `4n + (if n = 0 then 0 else -1)` to handle both cleanly — but Nat doesn't support negative, so we split into two clauses.

theoremtcount_gidney_adder_forward_faithful_full

theorem tcount_gidney_adder_forward_faithful_full (n : Nat) :
    tcount (gidney_adder_forward_faithful_full (n + 2)) = 7 * (n + 2)

T-count of the faithful full forward cascade: `7n` for `n ≥ 2`. Matches qianxu Eq. E3's `q_A` Toffolis per adder (T-count = 7 · q_A).

theoremgidney_cost_skeleton_eq_faithful

theorem gidney_cost_skeleton_eq_faithful (n : Nat) :
    tcount (gidney_adder_forward (n + 2))
      = tcount (gidney_adder_forward_faithful_full (n + 2))

*Cost-equivalence (Iter 53 review-gap closure).** The COST-ONLY skeleton forward pass (`gidney_adder_forward`, which is *not* semantically the adder) and the semantically-correct faithful forward pass (`gidney_adder_forward_faithful_full`, proven on basis states) have the *same T-count**. (The Shor cost model now binds *directly* to the faithful adder via `adderToff_eq`; this records that the deprecated skeleton was always cost-equivalent — the gates it omits are carry-propagation CXs, which are T-free.)

theoremgcount_gidney_adder_forward_faithful_full

theorem gcount_gidney_adder_forward_faithful_full (n : Nat) :
    gcount (gidney_adder_forward_faithful_full (n + 2)) = 4 * (n + 2) - 3

Gate-count of the faithful full forward cascade: `4n - 3` for `n ≥ 2`. Decomposes as 3 (first) + 4·(n-2) (interiors) + 2 (last) = 4n - 3.

example(example)

example : tcount (gidney_adder_forward_faithful_full 4) = 28

Concrete: 4-bit faithful Gidney adder = 28 T-gates = 4 Toffolis. (Matches `qq_gidney_adder.py` for a 4-bit instance.)

example(example)

example : tcount (gidney_adder_forward_faithful_full 33) = 7 * 33

Concrete: 33-bit faithful Gidney adder (RSA-2048 q_A=33 block) = 231 T-gates = 33 Toffolis.

theoremgidney_adder_forward_with_propagation_correct

theorem gidney_adder_forward_with_propagation_correct
    (dim : Nat) (hdim : 0 < dim) (f : Nat → Bool) :
    ∀ n, 3 * n + 2 ≤ dim →
    uc_eval (Gate.toUCom dim (gidney_adder_forward_with_propagation n))
      * f_to_vec dim f
      = f_to_vec dim (gidney_propagation_post_state n f)
  | 0    , _ =>

*Propagation cascade correctness**: given a single dim-bound `3 * n + 2 ≤ dim` (covering all qubits up through bit position n-1's propagation to bit n), the cascade acts on `f_to_vec dim f` to produce `f_to_vec dim (gidney_propagation_post_state n f)`. Proof by structural recursion on the three-clause def: - n=0: Gate.I, trivially preserves. - n=1: apply `gidney_adder_bit_step_faithful_first_correct` with first-bit disjointness derived from dim ≥ 5. - n+2: `gate_seq_acts_on_basis` + IH (propagation n+1) + per-bit interior correctness at position n+1 (via `bit_disjointness_of_dim_bound`).

theoremgidney_adder_forward_faithful_full_correct

theorem gidney_adder_forward_faithful_full_correct
    (dim : Nat) (hdim : 0 < dim) (f : Nat → Bool) (n : Nat)
    (hbd : 3 * (n + 2) ≤ dim) :
    uc_eval (Gate.toUCom dim (gidney_adder_forward_faithful_full (n + 2)))
      * f_to_vec dim f
      = f_to_vec dim (gidney_forward_faithful_full_post_state (n + 2) f)

*Faithful full forward cascade correctness** (Phase A review anchor at the basis-state level): on `(n+2)`-bit input `f`, the cascade `gidney_adder_forward_faithful_full (n+2)` acts as `gidney_forward_faithful_full_post_state (n+2)` on basis states. Combines `gidney_adder_forward_with_propagation_correct` (propagation, this iter) with `gidney_adder_bit_step_faithful_last_correct` (last bit, Iter 67). Single dim-bound hypothesis `3*(n+2) ≤ dim` covers all qubits including the (n+1)-th carry.

theoremgidney_final_cx_cascade_correct

theorem gidney_final_cx_cascade_correct
    (dim : Nat) (hdim : 0 < dim) (f : Nat → Bool) :
    ∀ n, 3 * n ≤ dim →
    uc_eval (Gate.toUCom dim (gidney_final_cx_cascade n)) * f_to_vec dim f
      = f_to_vec dim (gidney_final_cx_cascade_post_state n f)
  | 0    , _   =>

*Final CX cascade correctness** on classical basis states. Single dim-bound hypothesis `3 * n ≤ dim` covers all qubits `target_idx (n-1) = 3n - 2 < dim` (for n ≥ 1). Proof by structural recursion on `n`: - n = 0: cascade is `Gate.I`; trivially preserves. - n + 1: `gate_seq_acts_on_basis` + IH + per-step `gate_cx_acts_on_basis` with disjointness via `omega`.

theoremtcount_gidney_adder_forward_with_propagation_reverse

theorem tcount_gidney_adder_forward_with_propagation_reverse : ∀ n,
    tcount (gidney_adder_forward_with_propagation_reverse n) = 7 * n
  | 0     => by decide
  | 1     => by decide
  | n + 2 =>

T-count of the propagation reverse cascade: 7n (same gates as forward, reversed).

theoremtcount_gidney_adder_forward_faithful_full_reverse

theorem tcount_gidney_adder_forward_faithful_full_reverse (n : Nat) :
    tcount (gidney_adder_forward_faithful_full_reverse (n + 2)) = 7 * (n + 2)

T-count of the faithful full reverse cascade: 7n for `n ≥ 2`.

theoremgidney_adder_forward_with_propagation_fwd_rev_eq_one

theorem gidney_adder_forward_with_propagation_fwd_rev_eq_one
    (dim : Nat) (hdim : 0 < dim) :
    ∀ n, 3 * n + 2 ≤ dim →
    uc_eval (Gate.toUCom dim
              (Gate.seq (gidney_adder_forward_with_propagation n)
                        (gidney_adder_forward_with_propagation_reverse n)))
      = (1 : Matrix (Fin (2^dim)) (Fin (2^dim)) ℂ)
  | 0    , _ =>

*Cascade-level forward · reverse = identity** for the propagation cascade. By structural recursion on `n`: collapse the middle `interior fwd · interior rev` pair via Iter 82's `..._interior_fwd_rev_eq_one`, then apply IH. Base cases: - n = 0: both are Gate.I; product is ID·ID = 1. - n = 1: just first_fwd · first_rev = 1 by Iter 81's involution. Inductive step n+2: `(forward (n+1) ; interior (n+1)) ; (interior_reverse (n+1) ; reverse (n+1))`. Reassociate matrix product, collapse middle interior pair via Iter 82, drop via Matrix.one_mul, apply IH on forward (n+1) · reverse (n+1).

theoremgidney_adder_forward_faithful_full_fwd_rev_eq_one

theorem gidney_adder_forward_faithful_full_fwd_rev_eq_one
    (dim : Nat) (hdim : 0 < dim) (n : Nat)
    (hbd : 3 * (n + 2) ≤ dim) :
    uc_eval (Gate.toUCom dim
              (Gate.seq (gidney_adder_forward_faithful_full (n + 2))
                        (gidney_adder_forward_faithful_full_reverse (n + 2))))
      = (1 : Matrix (Fin (2^dim)) (Fin (2^dim)) ℂ)

*Faithful full forward · reverse = identity (cascade level)** for the `(n+2)`-bit Gidney adder. Combines `..._with_propagation_fwd_rev_eq_one` (propagation cascade) + Iter 69's `..._last_fwd_rev_id` (last bit) via matrix reassociation.

theoremtcount_gidney_adder_full_faithful_no_measurement

theorem tcount_gidney_adder_full_faithful_no_measurement (n : Nat) :
    tcount (gidney_adder_full_faithful_no_measurement (n + 2)) = 14 * (n + 2)

T-count of the full no-measurement faithful adder for `(n+2)` bits: `14(n+2)`. Derived from the gate sequence: 7(n+2) (forward) + 0 (final CX = pure CXs) + 7(n+2) (reverse).

example(example)

example : tcount (gidney_adder_full_faithful_no_measurement 4) = 56

Concrete: 4-bit full faithful adder = 56 T-gates = 8 Toffolis.

example(example)

example : tcount (gidney_adder_full_faithful_no_measurement 33) = 14 * 33

Concrete: 33-bit full faithful adder (RSA-2048 q_A=33) = 14 · 33 = 462 T-gates = 66 Toffolis. **No-measurement upper bound** (Gidney measurement trick would halve this to 33 Toffolis = 231 T).

theoremgidney_adder_full_faithful_no_measurement_vs_measurement_factor

theorem gidney_adder_full_faithful_no_measurement_vs_measurement_factor
    (n : Nat) :
    tcount (gidney_adder_full_faithful_no_measurement (n + 2))
      = 2 * gidney_adder_full_with_measurement_uncompute_tcount (n + 2)

*Gate-faithful no-measurement vs measurement-trick factor** (Iter 88). Strengthens `gidney_full_vs_measurement_uncompute_factor` (Iter 25, simplified bit-step) to the **gate-faithful** Gidney adder. The faithful encoding emits the same Toffoli count (14n T-gates), but is now backed by `qq_gidney_adder.py`'s full gate sequence and the Phase A semantic/structural correctness chain (Iter 65/57/67 per-bit + Iter 80 cascade forward + Iter 83 matrix-level inverse + Iter 86 reverse correctness). The factor of 2 remains the **measurement-uncomputation review gap**: faithful no-measurement T-count = 14n = 2 · (measurement paper-claim count 7n).

FormalRV.Arithmetic.RippleCarryAdder.ForwardAndCost.FirstBit

FormalRV/Arithmetic/RippleCarryAdder/ForwardAndCost/FirstBit.lean

FormalRV.Arithmetic.RippleCarryAdder.ForwardAndCost.FirstBit Faithful FIRST bit-step (part 3/5): its correctness, T and gate counts, matrix level reversibility, and the first-bit disjointness derivation. Builds on `InteriorBit`.

theoremtcount_gidney_adder_bit_step_faithful_first

theorem tcount_gidney_adder_bit_step_faithful_first :
    tcount gidney_adder_bit_step_faithful_first = 7

T-count of the first-bit step: 7 (1 Toffoli; 2 CXs are tcount-0).

theoremgcount_gidney_adder_bit_step_faithful_first

theorem gcount_gidney_adder_bit_step_faithful_first :
    gcount gidney_adder_bit_step_faithful_first = 3

Gate count of the first-bit step: 3 (vs 4 for interior bits; no chain CX).

theoremgidney_adder_bit_step_faithful_first_correct

theorem gidney_adder_bit_step_faithful_first_correct
    (dim : Nat) (f : Nat → Bool)
    (hr0 : read_idx 0 < dim) (ht0 : target_idx 0 < dim)
    (hc0 : carry_idx 0 < dim) (hr1 : read_idx 1 < dim)
    (ht1 : target_idx 1 < dim)
    (h_rt : read_idx 0 ≠ target_idx 0)
    (h_rc : read_idx 0 ≠ carry_idx 0)
    (h_tc : target_idx 0 ≠ carry_idx 0)
    (h_c_r1 : carry_idx 0 ≠ read_idx 1)
    (h_c_t1 : carry_idx 0 ≠ target_idx 1) :
    uc_eval (Gate.toUCom dim gidney_adder_bit_step_faithful_first)
      * f_to_vec dim f

*First-bit correctness on classical basis states** (Iter 65). Proves `gidney_adder_bit_step_faithful_first` acts on `f_to_vec dim f` to produce `f_to_vec dim (gidney_first_bit_post_state f)`. Proof via two applications of `gate_seq_acts_on_basis` + the per-gate primitives.

theoremfirst_bit_disjointness_of_dim_bound

theorem first_bit_disjointness_of_dim_bound (dim : Nat) (h : 5 ≤ dim) :
    read_idx 0 < dim ∧ target_idx 0 < dim ∧ carry_idx 0 < dim
    ∧ read_idx 1 < dim ∧ target_idx 1 < dim
    ∧ read_idx 0 ≠ target_idx 0 ∧ read_idx 0 ≠ carry_idx 0
    ∧ target_idx 0 ≠ carry_idx 0
    ∧ carry_idx 0 ≠ read_idx 1 ∧ carry_idx 0 ≠ target_idx 1

The first-bit disjointness conditions are all decidable from the indexing (read_idx 0 = 0, target_idx 0 = 1, carry_idx 0 = 2, read_idx 1 = 3, target_idx 1 = 4). At dim ≥ 5 all 10 conditions hold.

theoremtcount_gidney_adder_bit_step_faithful_first_reverse

theorem tcount_gidney_adder_bit_step_faithful_first_reverse :
    tcount gidney_adder_bit_step_faithful_first_reverse = 7

T-count of the first-bit gate-reverse: 7 (matches forward).

theoremgcount_gidney_adder_bit_step_faithful_first_reverse

theorem gcount_gidney_adder_bit_step_faithful_first_reverse :
    gcount gidney_adder_bit_step_faithful_first_reverse = 3

Gate-count of the first-bit gate-reverse: 3 (matches forward).

theoremgidney_adder_bit_step_faithful_first_fwd_rev_eq_one

theorem gidney_adder_bit_step_faithful_first_fwd_rev_eq_one
    (dim : Nat)
    (hr0 : read_idx 0 < dim) (ht0 : target_idx 0 < dim)
    (hc0 : carry_idx 0 < dim) (hr1 : read_idx 1 < dim) (ht1 : target_idx 1 < dim)
    (h_rt : read_idx 0 ≠ target_idx 0)
    (h_rc : read_idx 0 ≠ carry_idx 0)
    (h_tc : target_idx 0 ≠ carry_idx 0)
    (h_c_r1 : carry_idx 0 ≠ read_idx 1)
    (h_c_t1 : carry_idx 0 ≠ target_idx 1) :
    uc_eval (Gate.toUCom dim
              (Gate.seq gidney_adder_bit_step_faithful_first
                        gidney_adder_bit_step_faithful_first_reverse))

*First-bit forward · reverse = identity** at matrix level. The two propagation CXs cancel pairwise (CNOT involution), and the CCX-pair cancels (CCX involution). Mirrors Iter 69's `..._faithful_last_fwd_rev_id` pattern but for the first-bit step (3 gates instead of 2).

FormalRV.Arithmetic.RippleCarryAdder.ForwardAndCost.InteriorBit

FormalRV/Arithmetic/RippleCarryAdder/ForwardAndCost/InteriorBit.lean

FormalRV.Arithmetic.RippleCarryAdder.ForwardAndCost.InteriorBit Faithful INTERIOR bit-step (part 2/5): its basis-state correctness, T and gate counts, matrix-level reversibility, and the per-bit BitDisjointness derivation. Builds on `SkeletonCost`.

theoremgidney_adder_bit_step_0_correct

theorem gidney_adder_bit_step_0_correct (dim : Nat) (f : Nat → Bool)
    (h0 : read_idx 0 < dim) (h1 : target_idx 0 < dim) (h2 : carry_idx 0 < dim) :
    uc_eval (Gate.toUCom dim (gidney_adder_bit_step 0)) * f_to_vec dim f
      = f_to_vec dim
          (update f (carry_idx 0)
            (xor (f (carry_idx 0)) (f (read_idx 0) && f (target_idx 0))))

*`gidney_adder_bit_step 0` correctness**: on a classical basis state, the i=0 step XORs `(read[0] ∧ target[0])` into `carry[0]`. This is the Toffoli action: `(a, b, c) ↦ (a, b, c ⊕ (a ∧ b))`.

theoremtcount_gidney_adder_bit_step_faithful_interior

theorem tcount_gidney_adder_bit_step_faithful_interior (i : Nat) :
    tcount (gidney_adder_bit_step_faithful_interior i) = 7

T-count of the faithful interior bit-step: still 7 (1 Toffoli + 3 CXs, with CXs contributing 0 T). Matches qianxu's "q_A Toffoli gates per q_A-bit adder" claim.

theoremgcount_gidney_adder_bit_step_faithful_interior

theorem gcount_gidney_adder_bit_step_faithful_interior (i : Nat) :
    gcount (gidney_adder_bit_step_faithful_interior i) = 4

Gate count of the faithful interior bit-step: **4 gates** (vs the simplified encoding's 2). The 2 extra CXs are the propagation CXs the Iter 19 encoding was missing.

example(example)

example : tcount (gidney_adder_bit_step_faithful_interior 3) = 7

Concrete: at i=3 (interior bit), the faithful encoding has tcount 7 and gcount 4.

example(example)

example : gcount (gidney_adder_bit_step_faithful_interior 3) = 4

theoremtcount_gidney_adder_forward_faithful_interior

theorem tcount_gidney_adder_forward_faithful_interior (n : Nat) :
    tcount (gidney_adder_forward_faithful_interior n) = 7 * n

*T-count of the faithful interior cascade is `7n`**, matching the paper-claimed q_A Toffolis per q_A-bit adder. Same headline count as the Iter 20 simplified cascade — the propagation CXs are tcount-zero so they don't change the T-count, only the gate count.

theoremgcount_gidney_adder_forward_faithful_interior

theorem gcount_gidney_adder_forward_faithful_interior (n : Nat) :
    gcount (gidney_adder_forward_faithful_interior n) = 4 * n

*Gate count is `4n`** (vs the Iter 20 simplified cascade's `2n`). This is the **honest gate-count comparison** between the Lean-faithful encoding and qianxu Fig. 4(a).

example(example)

example : tcount (gidney_adder_forward_faithful_interior 33) = 231

Concrete: at n=33 (RSA-2048 adder block), faithful interior cascade has 231 T-gates (33 Toffolis × 7) and 132 total gates (33 × 4).

example(example)

example : gcount (gidney_adder_forward_faithful_interior 33) = 132

theoremfaithful_and_simplified_tcount_agree

theorem faithful_and_simplified_tcount_agree (n : Nat) :
    tcount (gidney_adder_forward_faithful_interior n)
      = tcount (gidney_adder_forward n)

The faithful cascade matches the simplified cascade's T-count (both 7n) but NOT its gate count (simplified: ~2n; faithful: 4n). This formalizes the review narrative: paper's "q_A Toffolis" count is preserved by either encoding, but only the faithful encoding correctly implements the carry.

theoremgidney_adder_bit_step_faithful_interior_correct

theorem gidney_adder_bit_step_faithful_interior_correct
    (dim i : Nat) (f : Nat → Bool)
    (hri : read_idx i < dim) (hti : target_idx i < dim)
    (hci : carry_idx i < dim) (hcim1 : carry_idx (i - 1) < dim)
    (hri1 : read_idx (i + 1) < dim) (hti1 : target_idx (i + 1) < dim)
    (h_rt : read_idx i ≠ target_idx i)
    (h_rc : read_idx i ≠ carry_idx i)
    (h_tc : target_idx i ≠ carry_idx i)
    (h_cc : carry_idx (i - 1) ≠ carry_idx i)
    (h_ci_ri1 : carry_idx i ≠ read_idx (i + 1))
    (h_ci_ti1 : carry_idx i ≠ target_idx (i + 1)) :
    uc_eval (Gate.toUCom dim (gidney_adder_bit_step_faithful_interior i))

*Faithful bit-step correctness on classical basis states** (Iter 57). For `i ≥ 1` interior bits, the four-gate sequence acts on `f_to_vec dim f` to produce the chained-update state `gidney_bit_step_faithful_post_state i f`. Proved by three applications of the reusable `gate_seq_acts_on_basis` bridge + the per-gate primitives `gate_ccx_acts_on_basis` and `gate_cx_acts_on_basis`.

theoremtcount_gidney_adder_bit_step_faithful_interior_reverse

theorem tcount_gidney_adder_bit_step_faithful_interior_reverse (i : Nat) :
    tcount (gidney_adder_bit_step_faithful_interior_reverse i) = 7

T-count of interior gate-reverse: 7 (matches forward).

theoremgcount_gidney_adder_bit_step_faithful_interior_reverse

theorem gcount_gidney_adder_bit_step_faithful_interior_reverse (i : Nat) :
    gcount (gidney_adder_bit_step_faithful_interior_reverse i) = 4

Gate-count of interior gate-reverse: 4 (matches forward).

theoremgidney_adder_bit_step_faithful_interior_fwd_rev_eq_one

theorem gidney_adder_bit_step_faithful_interior_fwd_rev_eq_one
    (dim i : Nat)
    (hri : read_idx i < dim) (hti : target_idx i < dim)
    (hci : carry_idx i < dim) (hcim1 : carry_idx (i - 1) < dim)
    (hri1 : read_idx (i + 1) < dim) (hti1 : target_idx (i + 1) < dim)
    (h_rt : read_idx i ≠ target_idx i)
    (h_rc : read_idx i ≠ carry_idx i)
    (h_tc : target_idx i ≠ carry_idx i)
    (h_cc : carry_idx (i - 1) ≠ carry_idx i)
    (h_ci_ri1 : carry_idx i ≠ read_idx (i + 1))
    (h_ci_ti1 : carry_idx i ≠ target_idx (i + 1)) :
    uc_eval (Gate.toUCom dim

*Interior forward · reverse = identity** at matrix level. The 3 CXs cancel pairwise (CNOT involution × 3) and the CCX-pair cancels. Mirrors Iter 81's first-bit pattern but with one more gate (4 gates → 4 involution pairs).

theorembit_disjointness_of_dim_bound

theorem bit_disjointness_of_dim_bound (dim i : Nat)
    (h1 : 1 ≤ i) (hd : 3 * i + 5 ≤ dim) :
    BitDisjointness dim i

*Parametric BitDisjointness derivation (Iter 61)**: all 12 disjointness conditions follow from a single dim-size bound `3*i + 5 ≤ dim` (covering the highest qubit index `target_idx (i+1) = 3i+4`), plus `1 ≤ i` (so `carry_idx (i-1)` is a distinct qubit). Reduces the review interface from 12 manual conditions to a single `omega`-style bound, per the new CLAUDE.md hard rule on reusable framework + readability.

theorembit_disjointness_for_cascade

theorem bit_disjointness_for_cascade (dim n : Nat) (h : 3 * n + 5 ≤ dim) :
    ∀ i, 1 ≤ i → i ≤ n → BitDisjointness dim i

*Cascade-level dim bound** suffices to derive BitDisjointness at every i in 1..n: a single `3*n + 5 ≤ dim` assumption covers all interior bits. Reduces the cascade-correctness interface to ONE quantifier-free hypothesis.

example(example)

example : 3 * 33 + 5 ≤ 104

Concrete: at RSA-2048 (q_A = 33), dim ≥ 3·33 + 5 = 104 suffices. Note that `adder_n_qubits 33 = 3·33 + 2 = 101`; the +3 over adder_n_qubits comes from the "next bit" propagation indices used by the interior bit-step.

FormalRV.Arithmetic.RippleCarryAdder.ForwardAndCost.LastBitAndSkeletonRev

FormalRV/Arithmetic/RippleCarryAdder/ForwardAndCost/LastBitAndSkeletonRev.lean

FormalRV.Arithmetic.RippleCarryAdder.ForwardAndCost.LastBitAndSkeletonRev Faithful LAST bit-step + skeleton-reverse reversibility (part 4/5): last-bit correctness/cost/reversibility, the faithful-interior cascade correctness, and the simplified-bit-step reverse + proper-uncompute matrix involutions. Builds on `FirstBit`.

theoremtcount_gidney_adder_bit_step_faithful_last

theorem tcount_gidney_adder_bit_step_faithful_last (i : Nat) :
    tcount (gidney_adder_bit_step_faithful_last i) = 7

T-count of the last-bit step: 7 (1 Toffoli; CX is tcount-0).

theoremgcount_gidney_adder_bit_step_faithful_last

theorem gcount_gidney_adder_bit_step_faithful_last (i : Nat) :
    gcount (gidney_adder_bit_step_faithful_last i) = 2

Gate count of the last-bit step: **2** (vs interior's 4, first- bit's 3). The last bit drops both propagation CXs.

theoremgidney_adder_bit_step_faithful_last_correct

theorem gidney_adder_bit_step_faithful_last_correct
    (dim i : Nat) (f : Nat → Bool)
    (hri : read_idx i < dim) (hti : target_idx i < dim)
    (hci : carry_idx i < dim) (hcim1 : carry_idx (i - 1) < dim)
    (h_rt : read_idx i ≠ target_idx i)
    (h_rc : read_idx i ≠ carry_idx i)
    (h_tc : target_idx i ≠ carry_idx i)
    (h_cc : carry_idx (i - 1) ≠ carry_idx i) :
    uc_eval (Gate.toUCom dim (gidney_adder_bit_step_faithful_last i))
      * f_to_vec dim f
      = f_to_vec dim (gidney_last_bit_post_state i f)

*Last-bit correctness on classical basis states** (Iter 67).

theoremgidney_adder_bit_step_faithful_last_fwd_rev_id

theorem gidney_adder_bit_step_faithful_last_fwd_rev_id
    (dim i : Nat) (f : Nat → Bool)
    (hri : read_idx i < dim) (hti : target_idx i < dim)
    (hci : carry_idx i < dim) (hcim1 : carry_idx (i - 1) < dim)
    (h_rt : read_idx i ≠ target_idx i)
    (h_rc : read_idx i ≠ carry_idx i)
    (h_tc : target_idx i ≠ carry_idx i)
    (h_cc : carry_idx (i - 1) ≠ carry_idx i) :
    uc_eval (Gate.toUCom dim
              (Gate.seq (gidney_adder_bit_step_faithful_last i)
                        (gidney_adder_bit_step_faithful_last_reverse i)))
      * f_to_vec dim f

*Forward · reverse (last-bit) = identity on basis states**. The two CX gates cancel (CX involution); the two CCX gates cancel (CCX involution). Composed correctly via the reusable framework.

theoremgidney_adder_forward_faithful_interior_correct

theorem gidney_adder_forward_faithful_interior_correct
    (dim : Nat) (hdim : 0 < dim) (f : Nat → Bool) :
    ∀ n, (∀ i, 1 ≤ i → i ≤ n → BitDisjointness dim i) →
    uc_eval (Gate.toUCom dim (gidney_adder_forward_faithful_interior n))
      * f_to_vec dim f
      = f_to_vec dim (gidney_cascade_post_state n f)
  | 0    , _ =>

*Faithful n-bit cascade correctness**: given disjointness on each bit position 1..n, the cascade acts on `f_to_vec dim f` to produce `f_to_vec dim (gidney_cascade_post_state n f)`. Proof by induction on n. **First Verified-tier theorem for the n-bit Gidney adder forward cascade.**

theoremgidney_adder_bit_step_succ_simplified

theorem gidney_adder_bit_step_succ_simplified (dim i : Nat) (f : Nat → Bool)
    (hri : read_idx (i+1) < dim) (hti : target_idx (i+1) < dim)
    (hci : carry_idx (i+1) < dim) (hci' : carry_idx i < dim)
    (hrt : read_idx (i+1) ≠ target_idx (i+1))
    (hrc : read_idx (i+1) ≠ carry_idx (i+1))
    (htc : target_idx (i+1) ≠ carry_idx (i+1))
    (hcc : carry_idx i ≠ carry_idx (i+1)) :
    let f'

Action of the simplified `gidney_adder_bit_step (i+1)` on basis states: XORs `(read[i+1] ∧ target[i+1]) ⊕ carry[i]` into `carry[i+1]`. *This is NOT Gidney's actual carry** (see review-gap note above); proving it here makes the discrepancy explicit.

theoremtcount_gidney_adder_bit_step_reverse

theorem tcount_gidney_adder_bit_step_reverse (i : Nat) :
    tcount (gidney_adder_bit_step_reverse i) = 7

T-count of the gate-reverse: same 7 as forward (same gates, swapped order).

theoremgcount_gidney_adder_bit_step_reverse

theorem gcount_gidney_adder_bit_step_reverse (i : Nat) :
    gcount (gidney_adder_bit_step_reverse i) = (if i = 0 then 1 else 2)

Gate-count of the gate-reverse: 1 at i=0, 2 at i>0 (matches forward).

theoremgidney_adder_bit_step_fwd_rev_eq_one

theorem gidney_adder_bit_step_fwd_rev_eq_one (dim i : Nat)
    (hri : read_idx i < dim) (hti : target_idx i < dim)
    (hci : carry_idx i < dim)
    (h_rt : read_idx i ≠ target_idx i)
    (h_rc : read_idx i ≠ carry_idx i)
    (h_tc : target_idx i ≠ carry_idx i)
    (hcim1 : i ≠ 0 → carry_idx (i - 1) < dim)
    (h_cc : i ≠ 0 → carry_idx (i - 1) ≠ carry_idx i) :
    uc_eval (Gate.toUCom dim
              (Gate.seq (gidney_adder_bit_step i)
                        (gidney_adder_bit_step_reverse i)))
      = (1 : Matrix (Fin (2^dim)) (Fin (2^dim)) ℂ)

*Matrix-level per-bit involution**: `bit_step i · bit_step_reverse i = 1`. Proven for all `i` (both branches) under the standard bit-disjointness hypotheses. The i = 0 branch needs `read_idx 0 = 0, target_idx 0 = 1, carry_idx 0 = 2` (auto-derived from the `read_idx`/`target_idx`/`carry_idx` defs and the disjointness hypotheses); the i > 0 branch mirrors `gidney_adder_bit_step_faithful_last_fwd_rev_id` (Iter 69) structurally. *This is the per-bit collapse used in Iter 74's cascade induction**: `uc_eval (cascade (n+1) · uncompute (n+1))` re-associates to `uc_eval (cascade n) · uc_eval (bit_step n · bit_step_reverse n) · uc_eval (uncompute n)`, and the middle factor collapses to 1 by this lemma.

theoremtcount_gidney_adder_uncompute_proper

theorem tcount_gidney_adder_uncompute_proper (n : Nat) :
    tcount (gidney_adder_uncompute_proper n) = 7 * n

T-count of the proper reverse: 7n (same gates, reversed).

theoremgidney_adder_forward_uncompute_proper_eq_one

theorem gidney_adder_forward_uncompute_proper_eq_one
    (dim : Nat) (hdim : 0 < dim) :
    ∀ n, 3 * n ≤ dim →
    uc_eval (Gate.toUCom dim
              (Gate.seq (gidney_adder_forward n)
                        (gidney_adder_uncompute_proper n)))
      = (1 : Matrix (Fin (2^dim)) (Fin (2^dim)) ℂ)
  | 0    , _ =>

*Matrix-level forward · proper-uncompute = identity**. The n-bit Gidney forward cascade composed with its proper (gate-reversed) uncomputation is the identity matrix. Proof by structural recursion on n, mirroring Iter 74's `prefix_and_cascade_uncompute_eq_one`. *Hypothesis**: a single `3 * n ≤ dim` bound suffices (the highest qubit touched at bit position k is `carry_idx k = 3k+2`, so all bits 0..n-1 fit when `3n ≤ dim`). *Fourth Verified-tier review chain** (adder side, mirror of Iter 74). Confirms that the simplified-bit-step forward cascade IS reversible by its proper inverse without measurement.

FormalRV.Arithmetic.RippleCarryAdder.ForwardAndCost.SkeletonCost

FormalRV/Arithmetic/RippleCarryAdder/ForwardAndCost/SkeletonCost.lean

FormalRV.Arithmetic.RippleCarryAdder.ForwardAndCost.SkeletonCost Cost lemmas (part 1/5): T and gate counts of the cost-only skeleton cascades, the PaperClaims Toffoli-count bridge, and the no-measurement-vs-measurement factor-of-2 gap. Supporting lemmas; the faithful backbone is `FaithfulBackbone`.

example(example)

example : adder_n_qubits 4 = 14

Smoke: 4-bit adder uses 14 qubits (matches Fig. 4(a)).

example(example)

example : read_idx 0 = 0 ∧ target_idx 0 = 1 ∧ carry_idx 0 = 2

Smoke: indexing is monotone within a bit position.

example(example)

example : read_idx 1 = 3 ∧ target_idx 1 = 4 ∧ carry_idx 1 = 5

Smoke: indexing is monotone across bit positions.

theoremtcount_gidney_adder_bit_step

theorem tcount_gidney_adder_bit_step (i : Nat) :
    tcount (gidney_adder_bit_step i) = 7

Each Gidney-adder forward step is exactly 1 Toffoli = 7 T-gates. Proof: CCX contributes 7 T; CX (if present) contributes 0.

example(example)

example : tcount (gidney_adder_bit_step 0) = 7

Concrete smoke checks: tcount per step is 7 for any specific i.

example(example)

example : tcount (gidney_adder_bit_step 5) = 7

example(example)

example : tcount (gidney_adder_bit_step 100) = 7

theoremgcount_gidney_adder_bit_step

theorem gcount_gidney_adder_bit_step (i : Nat) :
    gcount (gidney_adder_bit_step i) = if i = 0 then 1 else 2

Gate-count of one bit step is exactly 1 Toffoli — derived from the inner gate sequence. The +1 from any CX (i>0 case) is also counted in gcount (each CX = 1 gate).

theoremtcount_gidney_adder_forward

theorem tcount_gidney_adder_forward (n : Nat) :
    tcount (gidney_adder_forward n) = 7 * n

T-count of the full n-bit Gidney forward cascade: 7n (1 Toffoli × 7 T per bit × n bits). **First gate-derived recovery of qianxu Eq. E3's "q_A Toffoli gates" for the q_A-bit adder** — the adder-side analog of `tcount_prefix_and_cascade` for the lookup.

example(example)

example : tcount (gidney_adder_forward 4) = 28

Concrete: 4-bit Gidney forward cascade has 28 T-gates = 4 Toffolis.

example(example)

example : tcount (gidney_adder_forward 33) = 7 * 33

A 33-bit Gidney forward cascade (qianxu's RSA-2048 adder block, q_A=33, Eq. E3) has 33 Toffolis = 231 T-gates.

theoremtcount_gidney_adder_uncompute

theorem tcount_gidney_adder_uncompute (n : Nat) :
    tcount (gidney_adder_uncompute n) = 7 * n

T-count of the reverse pass: also `7n` (same Toffolis, different order).

theoremtcount_gidney_final_cx_cascade

theorem tcount_gidney_final_cx_cascade (n : Nat) :
    tcount (gidney_final_cx_cascade n) = 0

The final CX cascade is tcount-zero (only CXs, no Toffolis).

theoremtcount_gidney_adder_full

theorem tcount_gidney_adder_full (n : Nat) :
    tcount (gidney_adder_full n) = 14 * n

*Total T-count of the full n-bit Gidney adder (no-measurement upper bound): `14 n`**. Composition: forward (7n) + reverse (7n) + final CX (0). Under measurement-based uncomputation, the reverse contributes 0 — that's the optimization qianxu's "q_A Toffoli gates" claim relies on. The 14n here is the gate-level no-optimization bound; the 7n claim requires the measurement trick.

example(example)

example : tcount (gidney_adder_full 4) = 56

Concrete: 4-bit full Gidney adder = 56 T (= 8 Toffolis × 7T).

theoremgidney_adder_forward_tcount_matches_PaperClaims

theorem gidney_adder_forward_tcount_matches_PaperClaims (n : Nat) :
    tcount (gidney_adder_forward n) = 7 * gidney_total_toffolis_n_bit_adder n

*Bridge theorem**: the T-count of the Lean-encoded Gidney forward cascade equals `7 ·` the paper-claim Toffoli count. This connects the gate-derived value in `RippleCarryAdder.lean` to the data def in `PaperClaims.lean`, formally certifying that the latter is no longer paper-stated but Lean-gate-sequence-derived.

example(example)

example :
    tcount (gidney_adder_forward 33) = 7 * gidney_total_toffolis_n_bit_adder 33

Concrete bridge check at n=33 (RSA-2048 q_A=33 case): 33 Toffolis = 231 T-gates, both sides agree.

theoremgidney_no_measurement_vs_measurement_gap

theorem gidney_no_measurement_vs_measurement_gap (n : Nat) :
    tcount (gidney_adder_full n)
      = 2 * (7 * gidney_total_toffolis_n_bit_adder n)

*Review finding theorem**: the no-measurement gate-level T-count of the n-bit Gidney adder is exactly `2 ·` the paper's measurement- based claim. This is the formal statement of the structural Gidney-optimization assumption.

example(example)

example :
    tcount (gidney_adder_full 33) = 462
    ∧ 7 * gidney_total_toffolis_n_bit_adder 33 = 231

Concrete: at n=33 (RSA-2048 adder block), no-measurement bound is 14 × 33 = 462 T-gates, vs paper's 7 × 33 = 231 T-gates.

theoremgidney_adder_full_with_measurement_uncompute_tcount_eq

theorem gidney_adder_full_with_measurement_uncompute_tcount_eq (n : Nat) :
    gidney_adder_full_with_measurement_uncompute_tcount n = 7 * n

*Review-gap closure theorem**: the n-bit Gidney adder T-count with measurement-based uncomputation equals `7n`, matching qianxu Eq. E3's claim. This is the formal derivation of the previously paper-stated count from the Lean-encoded Gidney-AND primitive.

theoremgidney_full_vs_measurement_uncompute_factor

theorem gidney_full_vs_measurement_uncompute_factor (n : Nat) :
    tcount (gidney_adder_full n)
      = 2 * gidney_adder_full_with_measurement_uncompute_tcount n

*The review-gap factor of 2** is now explicit: the gate-explicit 14n bound (`tcount_gidney_adder_full n`) is exactly `2 ×` the measurement-uncomputation 7n bound. Both are formally derived in Lean; the difference is the Gidney trick.

example(example)

example :
    gidney_adder_full_with_measurement_uncompute_tcount 33 = 231
    ∧ tcount (gidney_adder_full 33) = 462

Concrete RSA-2048 (q_A=33): with Gidney measurement trick, T-count = 231 (paper figure); without, 462 (Lean explicit-reverse).

FormalRV.Arithmetic.RippleCarryAdder.PropagationReverse.ApplyNatBridge

FormalRV/Arithmetic/RippleCarryAdder/PropagationReverse/ApplyNatBridge.lean

FormalRV.Arithmetic.RippleCarryAdder.PropagationReverse.ApplyNatBridge Part 2/4: the `Gate.applyNat` bridge — per-bit-step rfl wrappers, n-bit forward/reverse cascade applyNat forms, decoder bounds, the full-adder applyNat identity, the applyNat-form target/read correctness lift, the does-not-clear- carries finding, and the patched n=2/n=3 exhaustive decide tests. Builds on `SemanticCorrectness`.

theoremgidney_adder_bit_step_0_applyNat

theorem gidney_adder_bit_step_0_applyNat (f : Nat → Bool) :
    Gate.applyNat (gidney_adder_bit_step 0) f
      = update f (carry_idx 0)
          (xor (f (carry_idx 0))
               (f (read_idx 0) && f (target_idx 0)))

`Gate.applyNat` form of `gidney_adder_bit_step_0_correct`. The i=0 step is a single CCX; its applyNat semantics matches the single-bit Toffoli update directly.

theoremgidney_adder_bit_step_faithful_first_applyNat

theorem gidney_adder_bit_step_faithful_first_applyNat (f : Nat → Bool) :
    Gate.applyNat gidney_adder_bit_step_faithful_first f
      = gidney_first_bit_post_state f

`Gate.applyNat` form of `gidney_adder_bit_step_faithful_first_correct`. The first-bit step's `applyNat` action is exactly the three-update chain captured by `gidney_first_bit_post_state`.

theoremgidney_adder_bit_step_faithful_interior_applyNat

theorem gidney_adder_bit_step_faithful_interior_applyNat
    (i : Nat) (f : Nat → Bool) :
    Gate.applyNat (gidney_adder_bit_step_faithful_interior i) f
      = gidney_bit_step_faithful_post_state i f

`Gate.applyNat` form of `gidney_adder_bit_step_faithful_interior_correct`. The interior step's `applyNat` action is exactly the four-update chain captured by `gidney_bit_step_faithful_post_state`.

theoremgidney_adder_bit_step_faithful_last_applyNat

theorem gidney_adder_bit_step_faithful_last_applyNat
    (i : Nat) (f : Nat → Bool) :
    Gate.applyNat (gidney_adder_bit_step_faithful_last i) f
      = gidney_last_bit_post_state i f

`Gate.applyNat` form of `gidney_adder_bit_step_faithful_last_correct`. The last-bit step's `applyNat` action is exactly the two-update chain captured by `gidney_last_bit_post_state`.

theoremgidney_final_cx_cascade_applyNat

theorem gidney_final_cx_cascade_applyNat :
    ∀ (n : Nat) (f : Nat → Bool),
      Gate.applyNat (gidney_final_cx_cascade n) f
        = gidney_final_cx_cascade_post_state n f
  | 0,     _ => rfl
  | n + 1, f =>

`Gate.applyNat` form of the final CX cascade. The cascade is a sequence of `CX(read[i], target[i])` for `i = 0..n-1`; its `applyNat` action is the chained `update` exactly captured by `gidney_final_cx_cascade_post_state`.

theoremgidney_adder_forward_with_propagation_applyNat

theorem gidney_adder_forward_with_propagation_applyNat :
    ∀ (n : Nat) (f : Nat → Bool),
      Gate.applyNat (gidney_adder_forward_with_propagation n) f
        = gidney_propagation_post_state n f
  | 0,     _ => rfl
  | 1,     _ => rfl
  | n + 2, f =>

`Gate.applyNat` form of the n-bit Gidney forward propagation cascade. Composes per-bit-step `Gate.applyNat` identities (Tick B) via the seq case. Base cases (`n = 0, 1`) and the inductive case all reduce to a single rewrite through the recursive identity + the per-step wrapper.

theoremgidney_adder_forward_faithful_full_applyNat

theorem gidney_adder_forward_faithful_full_applyNat :
    ∀ (n : Nat) (f : Nat → Bool),
      Gate.applyNat (gidney_adder_forward_faithful_full n) f
        = gidney_forward_faithful_full_post_state n f
  | 0,     _ => rfl
  | 1,     _ => rfl
  | n + 2, f =>

`Gate.applyNat` form of the full Gidney forward pass. The `applyNat` action is the propagation post-state through bit n-1 chained with the last-bit step at position n-1.

theoremgidney_read_val_lt

theorem gidney_read_val_lt : ∀ (n : Nat) (f : Nat → Bool),
    gidney_read_val n f < 2^n
  | 0,     _ => by simp [gidney_read_val]
  | n + 1, f =>

Decoder bound: `read_val < 2^n` for any bit-function.

theoremgidney_target_val_lt

theorem gidney_target_val_lt : ∀ (n : Nat) (f : Nat → Bool),
    gidney_target_val n f < 2^n
  | 0,     _ => by simp [gidney_target_val]
  | n + 1, f =>

Decoder bound: `target_val < 2^n`.

theoremgidney_carry_val_lt

theorem gidney_carry_val_lt : ∀ (n : Nat) (f : Nat → Bool),
    gidney_carry_val n f < 2^n
  | 0,     _ => by simp [gidney_carry_val]
  | n + 1, f =>

Decoder bound: `carry_val < 2^n`.

example(example)

example :
    gidney_target_val 2
      (Gate.applyNat (gidney_adder_full_faithful_no_measurement 2)
        inputF_1_plus_1_tickD) = 2

*Target register is correct**: after the full faithful no-measurement adder, target encodes `1 + 1 = 2`.

example(example)

example :
    gidney_read_val 2
      (Gate.applyNat (gidney_adder_full_faithful_no_measurement 2)
        inputF_1_plus_1_tickD) = 1

*Read register is preserved**: after the full faithful no-measurement adder, read = 1 (unchanged).

example(example)

example :
    gidney_carry_val 2
      (Gate.applyNat (gidney_adder_full_faithful_no_measurement 2)
        inputF_1_plus_1_tickD) = 3

*Carry register is NOT cleared**: after the full faithful no-measurement adder, carry = 3 (binary `11`), not 0. This is the open gap that blocks a verified modular adder built on this circuit.

theoremgidney_adder_bit_step_faithful_first_reverse_applyNat

theorem gidney_adder_bit_step_faithful_first_reverse_applyNat
    (f : Nat → Bool) :
    Gate.applyNat gidney_adder_bit_step_faithful_first_reverse f
      = gidney_first_bit_reverse_post_state f

`Gate.applyNat` form of the first-bit reverse step.

theoremgidney_adder_bit_step_faithful_interior_reverse_applyNat

theorem gidney_adder_bit_step_faithful_interior_reverse_applyNat
    (i : Nat) (f : Nat → Bool) :
    Gate.applyNat (gidney_adder_bit_step_faithful_interior_reverse i) f
      = gidney_interior_bit_reverse_post_state i f

`Gate.applyNat` form of the interior-bit reverse step.

theoremgidney_adder_bit_step_faithful_last_reverse_applyNat

theorem gidney_adder_bit_step_faithful_last_reverse_applyNat
    (i : Nat) (f : Nat → Bool) :
    Gate.applyNat (gidney_adder_bit_step_faithful_last_reverse i) f
      = gidney_last_bit_reverse_post_state i f

`Gate.applyNat` form of the last-bit reverse step.

theoremgidney_adder_forward_with_propagation_reverse_applyNat

theorem gidney_adder_forward_with_propagation_reverse_applyNat :
    ∀ (n : Nat) (f : Nat → Bool),
      Gate.applyNat (gidney_adder_forward_with_propagation_reverse n) f
        = gidney_propagation_reverse_post_state n f
  | 0,     _ => rfl
  | 1,     _ => rfl
  | n + 2, f =>

`Gate.applyNat` form of the n-bit propagation reverse cascade.

theoremgidney_adder_forward_faithful_full_reverse_applyNat

theorem gidney_adder_forward_faithful_full_reverse_applyNat :
    ∀ (n : Nat) (f : Nat → Bool),
      Gate.applyNat (gidney_adder_forward_faithful_full_reverse n) f
        = gidney_full_reverse_post_state n f
  | 0,     _ => rfl
  | 1,     _ => rfl
  | n + 2, f =>

`Gate.applyNat` form of the full Gidney reverse cascade.

theoremgidney_adder_full_faithful_no_measurement_applyNat

theorem gidney_adder_full_faithful_no_measurement_applyNat
    (n : Nat) (f : Nat → Bool) :
    Gate.applyNat (gidney_adder_full_faithful_no_measurement (n + 2)) f
      = gidney_full_reverse_post_state (n + 2)
          (gidney_final_cx_cascade_post_state (n + 2)
            (gidney_forward_faithful_full_post_state (n + 2) f))

`Gate.applyNat` form of the full faithful no-measurement Gidney adder for `n ≥ 2` (the only width at which the adder does non-trivial work; `n = 0` and `n = 1` are `Gate.I`). Composes the three Tick C forward wrappers + the new reverse wrapper.

theoremgidney_adder_full_faithful_no_measurement_target_correct

theorem gidney_adder_full_faithful_no_measurement_target_correct
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    ∀ i, i < n →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement n)
        (adder_input_F n a b) (target_idx i)
      = adder_sum_bit_classical a b i

*`Gate.applyNat`-form arithmetic correctness, target register.** For `n ≥ 2`, the full faithful Gidney adder applied to the standard 2-operand input encoding writes the correct sum bits into the target register. Lift of `gidney_classical_action_with_reverse` (Iter 207) through `gidney_adder_full_faithful_no_measurement_applyNat`.

theoremgidney_adder_full_faithful_no_measurement_read_correct_0

theorem gidney_adder_full_faithful_no_measurement_read_correct_0
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    Gate.applyNat (gidney_adder_full_faithful_no_measurement n)
        (adder_input_F n a b) (read_idx 0)
      = a.testBit 0

*`Gate.applyNat`-form read-register preservation, j = 0.**

theoremgidney_adder_full_faithful_no_measurement_read_correct_1

theorem gidney_adder_full_faithful_no_measurement_read_correct_1
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    Gate.applyNat (gidney_adder_full_faithful_no_measurement n)
        (adder_input_F n a b) (read_idx 1)
      = a.testBit 1

*`Gate.applyNat`-form read-register preservation, j = 1.**

theoremgidney_adder_full_faithful_no_measurement_read_correct_geq_2

theorem gidney_adder_full_faithful_no_measurement_read_correct_geq_2
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n)
    (j : Nat) (hj : 2 ≤ j) (hjn : j < n) :
    Gate.applyNat (gidney_adder_full_faithful_no_measurement n)
        (adder_input_F n a b) (read_idx j)
      = a.testBit j

*`Gate.applyNat`-form read-register preservation, j ≥ 2.**

theoremgidney_adder_full_faithful_no_measurement_read_correct

theorem gidney_adder_full_faithful_no_measurement_read_correct
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    ∀ i, i < n →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement n)
        (adder_input_F n a b) (read_idx i)
      = a.testBit i

*`Gate.applyNat`-form read-register preservation, all positions.** Assembles the three cases above.

theoremgidney_adder_full_does_not_clear_carries_in_general

theorem gidney_adder_full_does_not_clear_carries_in_general :
    ¬ (∀ n a b, 1 < n → a < 2^n → b < 2^n → ∀ i, i < n →
        (Gate.applyNat (gidney_adder_full_faithful_no_measurement n)
          (adder_input_F n a b)) (carry_idx i) = false)

*Formalized Tick D finding**: the full faithful no-measurement Gidney adder does NOT clear the carry register in general. Proof: machine-checked counterexample at `(n=2, a=1, b=1, i=0)`. The existing Iter 191 work proves target-bit correctness and read-register preservation, but does NOT — and CANNOT, as this theorem shows — also establish carry-zeroing. This is the precise structural defect that blocks a verified modular adder built on this circuit: modular reduction requires clean ancillas to compare and conditionally subtract, but the existing adder leaves carries dirty whenever the carry chain is non-trivial.

theorempatched_n2_clears_carries

theorem patched_n2_clears_carries :
    ∀ a b, a < 4 → b < 4 → ∀ i, i < 2 →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched 2)
        (adder_input_F 2 a b) (carry_idx i) = false

*Patched adder clears carries — n=2 exhaustive**. Over all `(a, b) ∈ [0, 4) × [0, 4)`, every carry position of the patched full faithful no-measurement Gidney adder is `false`.

theorempatched_n2_target_correct

theorem patched_n2_target_correct :
    ∀ a b, a < 4 → b < 4 → ∀ i, i < 2 →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched 2)
          (adder_input_F 2 a b) (target_idx i)
        = adder_sum_bit_classical a b i

*Patched adder target correctness — n=2 exhaustive**.

theorempatched_n2_read_preserved

theorem patched_n2_read_preserved :
    ∀ a b, a < 4 → b < 4 → ∀ i, i < 2 →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched 2)
          (adder_input_F 2 a b) (read_idx i)
        = a.testBit i

*Patched adder read preservation — n=2 exhaustive**.

theorempatched_n3_clears_carries

theorem patched_n3_clears_carries :
    ∀ a b, a < 8 → b < 8 → ∀ i, i < 3 →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched 3)
        (adder_input_F 3 a b) (carry_idx i) = false

*Patched adder clears carries — n=3 exhaustive**. 192 cases.

theorempatched_n3_target_correct

theorem patched_n3_target_correct :
    ∀ a b, a < 8 → b < 8 → ∀ i, i < 3 →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched 3)
          (adder_input_F 3 a b) (target_idx i)
        = adder_sum_bit_classical a b i

*Patched adder target correctness — n=3 exhaustive**. 192 cases.

theorempatched_n3_read_preserved

theorem patched_n3_read_preserved :
    ∀ a b, a < 8 → b < 8 → ∀ i, i < 3 →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched 3)
          (adder_input_F 3 a b) (read_idx i)
        = a.testBit i

*Patched adder read preservation — n=3 exhaustive**. 192 cases.

FormalRV.Arithmetic.RippleCarryAdder.PropagationReverse.CarryClearanceBackbone

FormalRV/Arithmetic/RippleCarryAdder/PropagationReverse/CarryClearanceBackbone.lean

FormalRV.Arithmetic.RippleCarryAdder.PropagationReverse.CarryClearanceBackbone BACKBONE (part 4/4): the arbitrary-n cascade carry-clearance inductions and the headline `gidney_adder_full_faithful_no_measurement_patched_clears_carries`, plus the trailing patched=unpatched / unpatched-frame / update-commute helpers. Builds on `PatchedCarryLemmas`.

theorempatched_propagation_reverse_cascade_clears_carries

theorem patched_propagation_reverse_cascade_clears_carries
    (m a b : Nat) :
    ∀ (f : Nat → Bool),
      (∀ j, j ≤ m →
        f (carry_idx j)   = Adder.carry false (j + 1) a.testBit b.testBit
        ∧ f (read_idx j)  = xor (a.testBit j) (Adder.carry false j a.testBit b.testBit)
        ∧ f (target_idx j) = xor (a.testBit j) (b.testBit j)) →
      ∀ i, i ≤ m →
        Gate.applyNat (gidney_adder_forward_with_propagation_reverse_patched (m + 1)) f
          (carry_idx i) = false

*Arbitrary-`m` propagation-cascade carry-clearance.** Under the post-forward-final-CX invariant at positions `0..m`, the patched propagation cascade `gidney_adder_forward_with_propagation_reverse_patched (m+1)` makes every `carry_idx i` (for `i ≤ m`) `false`. Proof: induction on `m`. Base case is the first-reverse step (using the minimal-hypothesis version). Inductive step uses `patched_interior_reverse_clears_carry_under_invariant` for the high-bit case, `propagation_reverse_patched_preserves_carry_above` to preserve the high carry across the rest of the cascade, and the inductive hypothesis for lower bits — with `patched_interior_reverse_preserves_outside` showing the invariant survives the interior step.

theorempatched_full_reverse_cascade_clears_carries

theorem patched_full_reverse_cascade_clears_carries
    (n a b : Nat) (f : Nat → Bool)
    (h_inv : ∀ j, j ≤ n + 1 →
      f (carry_idx j)   = Adder.carry false (j + 1) a.testBit b.testBit
      ∧ f (read_idx j)  = xor (a.testBit j) (Adder.carry false j a.testBit b.testBit)
      ∧ f (target_idx j) = xor (a.testBit j) (b.testBit j)) :
    ∀ i, i ≤ n + 1 →
      Gate.applyNat (gidney_adder_forward_faithful_full_reverse_patched (n + 2)) f
        (carry_idx i) = false

*Arbitrary-`n` full-reverse-cascade carry-clearance.** Under the post-forward-final-CX invariant at positions `0..n+1`, the patched full reverse cascade `gidney_adder_forward_faithful_full_reverse_patched (n+2)` makes every `carry_idx i` (for `i ≤ n+1`) `false`.

theoremgidney_adder_full_faithful_no_measurement_patched_clears_carries

theorem gidney_adder_full_faithful_no_measurement_patched_clears_carries
    (n a b : Nat) (ha : a < 2^(n + 2)) (hb : b < 2^(n + 2)) :
    ∀ i, i ≤ n + 1 →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched (n + 2))
        (adder_input_F (n + 2) a b) (carry_idx i) = false

*Arbitrary-`n` patched-adder carry-clearance on `adder_input_F`.** The patched full faithful no-measurement Gidney adder, applied to the standard two-operand input `adder_input_F (n+2) a b`, leaves every carry position `carry_idx i` (for `i ≤ n+1`) cleared to `false`. Proof: combine the Tick C wrappers (forward + final_cx applyNat identities), the existing `Gidney.post_forward_final_cx_invariant_holds` (Iter 188 + Iter 189), and the new `patched_full_reverse_cascade_clears_carries` cascade theorem above.

theorempatched_first_reverse_eq_unpatched_at_non_c0

theorem patched_first_reverse_eq_unpatched_at_non_c0
    (f : Nat → Bool) (k : Nat) (h_k : k ≠ carry_idx 0) :
    Gate.applyNat gidney_adder_bit_step_faithful_first_reverse_patched f k
      = Gate.applyNat gidney_adder_bit_step_faithful_first_reverse f k

theorempatched_interior_reverse_eq_unpatched_at_non_ci

theorem patched_interior_reverse_eq_unpatched_at_non_ci
    (i : Nat) (f : Nat → Bool) (k : Nat) (h_k : k ≠ carry_idx i) :
    Gate.applyNat (gidney_adder_bit_step_faithful_interior_reverse_patched i) f k
      = Gate.applyNat (gidney_adder_bit_step_faithful_interior_reverse i) f k

theorempatched_last_reverse_eq_unpatched_at_non_ci

theorem patched_last_reverse_eq_unpatched_at_non_ci
    (i : Nat) (f : Nat → Bool) (k : Nat) (h_k : k ≠ carry_idx i) :
    Gate.applyNat (gidney_adder_bit_step_faithful_last_reverse_patched i) f k
      = Gate.applyNat (gidney_adder_bit_step_faithful_last_reverse i) f k

theoremunpatched_interior_reverse_preserves_outside

theorem unpatched_interior_reverse_preserves_outside
    (i : Nat) (f : Nat → Bool) (k : Nat)
    (h_k_c   : k ≠ carry_idx i)
    (h_k_ri1 : k ≠ read_idx (i + 1))
    (h_k_ti1 : k ≠ target_idx (i + 1)) :
    Gate.applyNat (gidney_adder_bit_step_faithful_interior_reverse i) f k = f k

theoremunpatched_first_reverse_preserves_outside

theorem unpatched_first_reverse_preserves_outside
    (f : Nat → Bool) (k : Nat)
    (h_k_c0 : k ≠ carry_idx 0) (h_k_r1 : k ≠ read_idx 1) (h_k_t1 : k ≠ target_idx 1) :
    Gate.applyNat gidney_adder_bit_step_faithful_first_reverse f k = f k

theoremunpatched_last_reverse_preserves_non_carry

theorem unpatched_last_reverse_preserves_non_carry
    (i : Nat) (f : Nat → Bool) (k : Nat) (h_k : k ≠ carry_idx i) :
    Gate.applyNat (gidney_adder_bit_step_faithful_last_reverse i) f k = f k

theoremupdate_update_comm

theorem update_update_comm (f : Nat → Bool) (a b : Nat) (u w : Bool) (h : a ≠ b) :
    update (update f a u) b w = update (update f b w) a u

Two `update`s at different positions commute.

theoremapplyNat_CX_commute_update_disjoint

theorem applyNat_CX_commute_update_disjoint
    (c t : Nat) (f : Nat → Bool) (p : Nat) (v : Bool)
    (h_p_c : p ≠ c) (h_p_t : p ≠ t) :
    Gate.applyNat (Gate.CX c t) (update f p v)
      = update (Gate.applyNat (Gate.CX c t) f) p v

`applyNat (CX c t)` commutes with `update _ p v` when `p` is disjoint from both `c` and `t`.

theoremapplyNat_CCX_commute_update_disjoint

theorem applyNat_CCX_commute_update_disjoint
    (a b c : Nat) (f : Nat → Bool) (p : Nat) (v : Bool)
    (h_p_a : p ≠ a) (h_p_b : p ≠ b) (h_p_c : p ≠ c) :
    Gate.applyNat (Gate.CCX a b c) (update f p v)
      = update (Gate.applyNat (Gate.CCX a b c) f) p v

`applyNat (CCX a b c)` commutes with `update _ p v` when `p` is disjoint from `a`, `b`, and `c`.

theoremapplyNat_seq_commute_update

theorem applyNat_seq_commute_update
    (g₁ g₂ : Gate) (f : Nat → Bool) (p : Nat) (v : Bool)
    (h₁ : ∀ f', Gate.applyNat g₁ (update f' p v) = update (Gate.applyNat g₁ f') p v)
    (h₂ : ∀ f', Gate.applyNat g₂ (update f' p v) = update (Gate.applyNat g₂ f') p v) :
    Gate.applyNat (Gate.seq g₁ g₂) (update f p v)
      = update (Gate.applyNat (Gate.seq g₁ g₂) f) p v

Sequential composition of gates commutes with `update _ p v` when each constituent gate does.

theoremunpatched_first_reverse_commute_update_at_c_above

theorem unpatched_first_reverse_commute_update_at_c_above
    (f : Nat → Bool) (j : Nat) (hj : j > 0) (v : Bool) :
    Gate.applyNat gidney_adder_bit_step_faithful_first_reverse (update f (carry_idx j) v)
      = update (Gate.applyNat gidney_adder_bit_step_faithful_first_reverse f) (carry_idx j) v

Unpatched first-reverse step commutes with update at `c[j]` (`j ≥ 1`).

theoremunpatched_interior_reverse_commute_update_at_c_above

theorem unpatched_interior_reverse_commute_update_at_c_above
    (i : Nat) (hi : 0 < i) (f : Nat → Bool) (j : Nat) (hj : j > i) (v : Bool) :
    Gate.applyNat (gidney_adder_bit_step_faithful_interior_reverse i)
      (update f (carry_idx j) v)
      = update (Gate.applyNat (gidney_adder_bit_step_faithful_interior_reverse i) f)
          (carry_idx j) v

Unpatched interior-reverse step commutes with update at `c[j]` (`j > i`).

theoremunpatched_last_reverse_commute_update_at_c_above

theorem unpatched_last_reverse_commute_update_at_c_above
    (i : Nat) (hi : 0 < i) (f : Nat → Bool) (j : Nat) (hj : j > i) (v : Bool) :
    Gate.applyNat (gidney_adder_bit_step_faithful_last_reverse i) (update f (carry_idx j) v)
      = update (Gate.applyNat (gidney_adder_bit_step_faithful_last_reverse i) f) (carry_idx j) v

Unpatched last-reverse step commutes with update at `c[j]` (`j > i`).

theoremunpatched_propagation_reverse_commute_update_at_c_above

theorem unpatched_propagation_reverse_commute_update_at_c_above (m : Nat) :
    ∀ (g : Nat → Bool) (v : Bool) (j : Nat), j > m →
      Gate.applyNat (gidney_adder_forward_with_propagation_reverse (m + 1))
        (update g (carry_idx j) v)
        = update (Gate.applyNat (gidney_adder_forward_with_propagation_reverse (m + 1)) g)
            (carry_idx j) v

Unpatched propagation cascade commutes with update at `c[j]` (`j > m`).

FormalRV.Arithmetic.RippleCarryAdder.PropagationReverse.PatchedCarryLemmas

FormalRV/Arithmetic/RippleCarryAdder/PropagationReverse/PatchedCarryLemmas.lean

FormalRV.Arithmetic.RippleCarryAdder.PropagationReverse.PatchedCarryLemmas Part 3/4: parametric per-step patched carry-clearance lemmas (boolean identity, last/interior/first reverse clears carry under invariant) and the frame lemmas for the patched interior/first reverse steps. Builds on `ApplyNatBridge`.

theorempatched_carry_bool_identity

theorem patched_carry_bool_identity (A B C : Bool) :
    xor (xor (xor (xor (xor (A && B) (B && C)) (A && C)) C)
              ((xor A C) && (xor A B)))
        (xor A C)
      = false

*Boolean identity at the heart of the patch.** Given the carry recurrence `MAJ(A, B, C) = (A∧B) ⊕ (B∧C) ⊕ (A∧C)`, the patched reverse step's effect on `c[i]` reduces to `MAJ ⊕ C ⊕ ((A⊕C) ∧ (A⊕B)) ⊕ (A⊕C)`, which is identically `false` for all Booleans `A`, `B`, `C`. The role of each term in the patched step: `MAJ(A, B, C)` — invariant value of `c[i]` (the post-forward carry). `C` — invariant value of `c[i-1]` (chained out by `CX(c[i-1], c[i])`). `(A⊕C) ∧ (A⊕B)` — `r[i] ∧ t[i]` after final-CX, written into c[i] by the reverse CCX. `A⊕C` — `r[i]` after final-CX, written into c[i] by the patch's CX.

theorempatched_last_reverse_clears_carry_under_invariant

theorem patched_last_reverse_clears_carry_under_invariant
    (i : Nat) (a b : Nat) (f : Nat → Bool)
    (h_c   : f (carry_idx i)       = Adder.carry false (i + 1) a.testBit b.testBit)
    (h_cm1 : f (carry_idx (i - 1)) = Adder.carry false i       a.testBit b.testBit)
    (h_r   : f (read_idx i)        = xor (a.testBit i) (Adder.carry false i a.testBit b.testBit))
    (h_t   : f (target_idx i)      = xor (a.testBit i) (b.testBit i)) :
    Gate.applyNat (gidney_adder_bit_step_faithful_last_reverse_patched i) f
        (carry_idx i) = false

*Patched last-reverse step clears `carry_idx i`** for `i ≥ 1`, under the post-forward-final-CX invariant at position `i`.

theorempatched_last_reverse_preserves_non_carry

theorem patched_last_reverse_preserves_non_carry
    (i : Nat) (f : Nat → Bool) (k : Nat) (h_k : k ≠ carry_idx i) :
    Gate.applyNat (gidney_adder_bit_step_faithful_last_reverse_patched i) f k
      = f k

*Patched last-reverse step preserves every position outside `carry_idx i`** (frame condition).

theorempatched_interior_reverse_clears_carry_under_invariant

theorem patched_interior_reverse_clears_carry_under_invariant
    (i : Nat) (a b : Nat) (f : Nat → Bool)
    (h_c   : f (carry_idx i)       = Adder.carry false (i + 1) a.testBit b.testBit)
    (h_cm1 : f (carry_idx (i - 1)) = Adder.carry false i       a.testBit b.testBit)
    (h_r   : f (read_idx i)        = xor (a.testBit i) (Adder.carry false i a.testBit b.testBit))
    (h_t   : f (target_idx i)      = xor (a.testBit i) (b.testBit i)) :
    Gate.applyNat (gidney_adder_bit_step_faithful_interior_reverse_patched i) f
        (carry_idx i) = false

*Patched interior-reverse step clears `carry_idx i`** for `i ≥ 1`, under the post-forward-final-CX invariant at position `i`.

theoremfirst_reverse_post_state_preserves_read_0

theorem first_reverse_post_state_preserves_read_0 (f : Nat → Bool) :
    (gidney_first_bit_reverse_post_state f) (read_idx 0) = f (read_idx 0)

Frame helper: `gidney_first_bit_reverse_post_state` doesn't touch `read_idx 0`.

theorempatched_first_reverse_clears_carry_under_invariant

theorem patched_first_reverse_clears_carry_under_invariant
    (a b : Nat) (f : Nat → Bool)
    (h_r0 : f (read_idx 0)   = a.testBit 0)
    (h_t0 : f (target_idx 0) = xor (a.testBit 0) (b.testBit 0))
    (h_c0 : f (carry_idx 0)  = Adder.carry false 1 a.testBit b.testBit)
    (h_r1 : f (read_idx 1)   = xor (a.testBit 1) (Adder.carry false 1 a.testBit b.testBit))
    (h_t1 : f (target_idx 1) = xor (a.testBit 1) (b.testBit 1)) :
    Gate.applyNat gidney_adder_bit_step_faithful_first_reverse_patched f
        (carry_idx 0) = false

*Patched first-reverse step clears `carry_idx 0`** under the post-forward-final-CX invariant at position 0. The proof uses the existing `gidney_first_bit_reverse_preserves` (Iter 194) which states that the unpatched first-reverse step produces `post(c_0) = a.testBit 0`; the patch's `CX(read_idx 0, carry_idx 0)` then XORs this with `f (read_idx 0) = a.testBit 0`, yielding `false`.

theorempatched_interior_reverse_preserves_outside

theorem patched_interior_reverse_preserves_outside
    (i : Nat) (f : Nat → Bool) (k : Nat)
    (h_k_c   : k ≠ carry_idx i)
    (h_k_ri1 : k ≠ read_idx (i + 1))
    (h_k_ti1 : k ≠ target_idx (i + 1)) :
    Gate.applyNat (gidney_adder_bit_step_faithful_interior_reverse_patched i) f k = f k

theorempatched_first_reverse_preserves_outside

theorem patched_first_reverse_preserves_outside
    (f : Nat → Bool) (k : Nat)
    (h_k_c0 : k ≠ carry_idx 0)
    (h_k_r1 : k ≠ read_idx 1)
    (h_k_t1 : k ≠ target_idx 1) :
    Gate.applyNat gidney_adder_bit_step_faithful_first_reverse_patched f k = f k

theorempropagation_reverse_patched_preserves_carry_above

theorem propagation_reverse_patched_preserves_carry_above (m : Nat) :
    ∀ (f : Nat → Bool) (j : Nat), j > m →
      Gate.applyNat (gidney_adder_forward_with_propagation_reverse_patched (m + 1)) f
        (carry_idx j) = f (carry_idx j)

Frame for the propagation cascade: `gidney_adder_forward_with_propagation_reverse_patched (m+1)` preserves every `carry_idx j` for `j > m`. Proved by induction on `m` using the per-step frame lemmas above.

theorempatched_first_reverse_clears_carry_minimal

theorem patched_first_reverse_clears_carry_minimal
    (a b : Nat) (f : Nat → Bool)
    (h_r0 : f (read_idx 0)   = a.testBit 0)
    (h_t0 : f (target_idx 0) = xor (a.testBit 0) (b.testBit 0))
    (h_c0 : f (carry_idx 0)  = Adder.carry false 1 a.testBit b.testBit) :
    Gate.applyNat gidney_adder_bit_step_faithful_first_reverse_patched f
        (carry_idx 0) = false

Minimal-hypothesis version of the patched first-reverse step's carry-clearance (drops the `h_r1`, `h_t1` hypotheses that the earlier proof used via `gidney_first_bit_reverse_preserves`). This is the form needed by the cascade-level induction. Proved directly by structural unfolding + the boundary case `Adder.carry false 1 = MAJ(a_0, b_0, false) = a_0 ∧ b_0`.

FormalRV.Arithmetic.RippleCarryAdder.PropagationReverse.SemanticCorrectness

FormalRV/Arithmetic/RippleCarryAdder/PropagationReverse/SemanticCorrectness.lean

FormalRV.Arithmetic.RippleCarryAdder.PropagationReverse.SemanticCorrectness Part 1/4: the reverse-cascade semantic-correctness assembly — the K-inductive read/target reductions, the j=0/j=1/j>=2 cases, the headline `gidney_classical_action_with_reverse`, `Gidney.post_full_reverse_invariant_holds`, and the RSA-2048 T-count examples. (One of the two headlines; the patched carry-clearance backbone is `CarryClearanceBackbone`.)

theoremgidney_propagation_reverse_at_read_eq_interior_reverse

theorem gidney_propagation_reverse_at_read_eq_interior_reverse
    (K j : Nat) (hj : 1 < j) (hjK : j ≤ K) (g : Nat → Bool) :
    gidney_propagation_reverse_post_state K g (read_idx j)
    = gidney_interior_bit_reverse_post_state (j - 1) g (read_idx j)

*Propagation reverse at read_j reduces to interior_reverse(j-1)** (2026-05-14 tick, read-side analog of line ~5488 target version). For j ∈ [2, K], propagation_reverse(K) g (read_idx j) equals interior_reverse(j-1) g (read_idx j). Same induction-on-K + case-split structure as target version, with read_idx in place of target_idx and using the read-side preserves/dependence helpers (`_preserves_read_above`, `_at_read_low_dependence`).

theoremgidney_classical_action_with_reverse_target_geq_2

theorem gidney_classical_action_with_reverse_target_geq_2
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n)
    (j : Nat) (hj : 2 ≤ j) (hjn : j < n) :
    gidney_full_reverse_post_state n
      (gidney_final_cx_cascade_post_state n
        (gidney_forward_faithful_full_post_state n (adder_input_F n a b)))
      (target_idx j)
    = adder_sum_bit_classical a b j

*Headline j ≥ 2 case** (Iter 208 STATED, sorried). For j ∈ [2, n-1], target_idx j after full forward+CX+reverse equals sum_j. The relevant per-step is `interior_reverse(j-1)` which fires at cascade step (n-j+1). Proof structure (pending): - "High-position frame": earlier reverses (last_reverse(n-1) + interior_reverse(n-2), ..., interior_reverse(j)) all modify positions ≥ 3j+2 (= c_j minimum). They preserve interior_reverse(j-1)'s input positions ≤ 3j+1. - Apply Iter 201's `gidney_interior_bit_reverse_computes_sum` with hypotheses verified from post-CX (Iter 189). - "Low-position frame": later reverses (interior_reverse(j-2), ..., first_reverse) all modify positions ≤ 3j-2. They preserve target_idx j = 3j+1. - Conclude full_reverse n f (target_idx j) = sum_j. Estimated 60-100 lines for the structural framing. The per-step computes_sum + frame conditions are mechanical mirror of the forward cascade pipeline (Iter 175-181).

theoremgidney_first_bit_reverse_preserves_read_0

theorem gidney_first_bit_reverse_preserves_read_0 (f : Nat → Bool) :
    gidney_first_bit_reverse_post_state f (read_idx 0) = f (read_idx 0)

*First-bit reverse preserves read_0** (2026-05-14 tick). Mirror of `_preserves_target_0` at line 4933. first_bit_reverse modifies {target_1, read_1, carry_0} = {4, 3, 2}; read_idx 0 = 0 ≠ any.

theoremgidney_classical_action_with_reverse_read_0

theorem gidney_classical_action_with_reverse_read_0
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    gidney_full_reverse_post_state n
      (gidney_final_cx_cascade_post_state n
        (gidney_forward_faithful_full_post_state n (adder_input_F n a b)))
      (read_idx 0)
    = a.testBit 0

*Headline j=0 read case PROVEN parametrically over n** (2026-05-14 tick, read-side analog of `_with_reverse_target_0` at line 5296). Uses `gidney_full_reverse_eq_first_rev_low` (since read_idx 0 = 0 < 5) to reduce to first_bit_reverse, then the just-proven `_first_bit_reverse_preserves_read_0` frame, then the `post_forward_final_cx_invariant` at j=0 simplification `xor a_0 (Adder.carry false 0 a b) = xor a_0 false = a_0`.

theoremgidney_classical_action_with_reverse_read_1

theorem gidney_classical_action_with_reverse_read_1
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    gidney_full_reverse_post_state n
      (gidney_final_cx_cascade_post_state n
        (gidney_forward_faithful_full_post_state n (adder_input_F n a b)))
      (read_idx 1)
    = a.testBit 1

*Headline j=1 read case PROVEN parametrically over n** (2026-05-14 tick, read-side analog of `_with_reverse_target_1` at line 5317). Uses `gidney_full_reverse_eq_first_rev_low` (read_idx 1 = 3 < 5) to reduce to first_bit_reverse, then Iter 194's `.2.1` directly gives `first_bit_reverse f (read_idx 1) = a.testBit 1`.

theoremgidney_classical_action_with_reverse_read_geq_2

theorem gidney_classical_action_with_reverse_read_geq_2
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n)
    (j : Nat) (hj : 2 ≤ j) (hjn : j < n) :
    gidney_full_reverse_post_state n
      (gidney_final_cx_cascade_post_state n
        (gidney_forward_faithful_full_post_state n (adder_input_F n a b)))
      (read_idx j)
    = a.testBit j

*Read-side analog of `_with_reverse_target_geq_2`** (2026-05-14 tick). For j ∈ [2, n-1], the read_j position after the full forward+CX+reverse cascade equals `a.testBit j`. Same proof structure as the target version, using the read-side parametric `_at_read_eq_interior_reverse` and the read component (`.2.1`) of Iter 195's `_post_state_in_bits`, with XOR cancellation `xor (xor a_j c_j) c_j = a_j`.

theoremgidney_classical_action_with_reverse_assembled

theorem gidney_classical_action_with_reverse_assembled
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    ∀ i, i < n →
      gidney_full_reverse_post_state n
        (gidney_final_cx_cascade_post_state n
          (gidney_forward_faithful_full_post_state n (adder_input_F n a b)))
        (target_idx i)
      = adder_sum_bit_classical a b i

*HEADLINE: TODO_gidney_classical_action_with_reverse PROVEN** (Iter 208 ASSEMBLY, modulo Iter 208's j ≥ 2 sorry). Combines: - Iter 202: j=0 case PARAMETRIC. - Iter 207: j=1 case PARAMETRIC over n. - Iter 208: TODO_..._target_geq_2 for j ∈ [2, n-1] (sorried).

theoremgidney_classical_action_with_reverse

theorem gidney_classical_action_with_reverse (n a b : Nat)
    (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    ∀ i, i < n →
      gidney_full_reverse_post_state n
        (gidney_final_cx_cascade_post_state n
          (gidney_forward_faithful_full_post_state n (adder_input_F n a b)))
        (target_idx i)
      = adder_sum_bit_classical a b i

*HEADLINE — Iter 191's restated headline, NOW PROVEN (Iter 213, 2026-05-13)**. The parametric semantic-correctness theorem with the REVERSE cascade. The Gidney ripple-carry adder is now Verified per CLAUDE.md taxonomy. Note: this theorem statement was originally drafted at line ~4605 as `TODO_gidney_classical_action_with_reverse` (sorried, Iter 191). Iter 213 derives it via `gidney_classical_action_with_reverse_assembled`.

theoremGidney.reverse_step_invariant_n_minus_1_after_propagation_reverse

theorem Gidney.reverse_step_invariant_n_minus_1_after_propagation_reverse
    (n a b : Nat) (hn : 1 < n) (_ha : a < 2^n) (_hb : b < 2^n)
    (input : Nat → Bool)
    (h_input : Gidney.post_forward_final_cx_invariant n a b input)
    (_h_t0 : input (target_idx 0) = adder_sum_bit_classical a b 0) :
    Gidney.reverse_step_invariant (n - 1) n a b
      (gidney_propagation_reverse_post_state (n - 1) input)

*Direct (non-K-inductive) cascade target** (2026-05-14 tick). For register width `n ≥ 2`, the parametric `propagation_reverse(n-1)` applied to the post-final-CX state produces a state satisfying `Gidney.reverse_step_invariant (n - 1) n a b _`. *Proof structure**: case-split on `j` in the predicate quantifier: - `j = 1`: use `gidney_propagation_reverse_eq_first_rev_low` to reduce propagation_reverse(n-1) at target_idx 1 / read_idx 1 to first_bit_reverse, then Iter 194's `gidney_first_bit_reverse_preserves` closes both, with the target side using `sumfb_eq_testBit_add` for the XOR identity. - `1 < j ≤ n - 1`: TODO_case_j_gt_1 — use `gidney_propagation_reverse_at_target_eq_interior_reverse` to reduce to interior_reverse(j-1), then Iter 201.

theoremGidney.post_full_reverse_invariant_holds

theorem Gidney.post_full_reverse_invariant_holds
    (n a b : Nat) (hn : 1 < n) (ha : a < 2^n) (hb : b < 2^n) :
    Gidney.post_full_reverse_invariant n a b
      (gidney_full_reverse_post_state n
        (gidney_final_cx_cascade_post_state n
          (gidney_forward_faithful_full_post_state n (adder_input_F n a b))))

*Closing composition** (2026-05-14 tick). For every n ≥ 2 and valid a, b inputs, the full forward + final-CX + reverse cascade state satisfies `Gidney.post_full_reverse_invariant`: every target_j equals sum_j AND every read_j equals a.testBit j. Target side: closed via the existing `gidney_classical_action_with_reverse` (Iter 213 assembly). Read side: TODO_read_via_direct — bridge from the new `_n_minus_1_after_propagation_reverse` (which proves the read side for the SIMPLER input `propagation_reverse(n-1) f` without the outer last_reverse layer) to the actual cascade `propagation_reverse(n-1) (last_reverse(n-1) f)`. The bridge requires showing that propagation_reverse is c_{n-1}-independent on read positions (since last_reverse modifies only c_{n-1}). ~30 lines of frame argument, deferred to next tick.

example(example)

example :
    Gidney.post_full_reverse_invariant 2 1 1
      (gidney_full_reverse_post_state 2
        (gidney_final_cx_cascade_post_state 2
          (gidney_forward_faithful_full_post_state 2 (adder_input_F 2 1 1))))

*Milestone validation** (2026-05-14 tick): the proven theorem fires correctly on the Iter 182 counterexample case (n=2, a=1, b=1) — the same instance where the original `TODO_gidney_classical_action` was found to be UNPROVABLE as stated. Confirms semantic-correctness closure at the smallest non-trivial input. Review hygiene (via `mcp__lean-lsp__lean_verify`, 2026-05-14): `Gidney.post_full_reverse_invariant_holds` depends only on `propext` and `Quot.sound` — Lean's standard foundational axioms. No custom axioms. See `notes/axiom-hygiene.md`.

example(example)

example : tcount (gidney_adder_full_faithful_no_measurement 33) = 462

*RSA-2048 adder T-count = 462** (Iter 262). For the maximum adder size in the RSA-2048 Shor's circuit (q_A = 33, qianxu p. 22), `tcount (gidney_adder_full_faithful_no_measurement 33) = 14·33 = 462`. Per qianxu Eq. E3: τ_adder = 25 q_A τ_s = 825 τ_s. The 462 T-gates is the underlying T-count from which the per-Toffoli cost (here 14n / q_A = 14) becomes a verified-correctness building block.

example(example)

example :
    tcount (gidney_adder_full_faithful_no_measurement
              qianxu_q_A_RSA2048)
      = gidney_adder_RSA2048_T_count_verified

*Bridge: verified parametric T-count matches the RSA-2048 paper-claim anchor** (Iter 263). Closes the review's paper-claim-first discipline (CLAUDE.md): the gate-faithful adder's T-count at q_A=33 matches the `gidney_adder_RSA2048_T_count_verified` paper-claim constant in `PaperClaims.lean`.

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderClassicalBridge

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderClassicalBridge.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderClassicalBridge Re-export umbrella for the `ClassicalBridge/` sub-folder, split by sub-topic: UnfoldAndCarry → SumfbTestBit → InputEval → PropagationInvariantBackbone. Kept at this path so external importers resolve unchanged.

(no documented top-level declarations)

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderCorrectness

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderCorrectness.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderCorrectness ──────────────────────────────────────────────────────────────── THE semantic-correctness theorems for the Gidney ripple-carry adder. Imports the definition from `RippleCarryAdderDef.lean`. The theorems to audit are surfaced here as thin wrappers; their (heavy) proofs live in the supporting files and are delegated to in one line each. Two circuits, two headlines: • `gidney_adder_correct` — the base adder `gidney_adder` writes the sum bits to the target register (read register preserved; **carries are left dirty** — see `RippleCarryAdderPropagationReverse`). • `gidney_adder_correct_full` — the carry-clearing **patched** adder is simultaneously WellTyped, decodes the target to `(a+b) mod 2^bits`, preserves the read register, AND clears the carry register. This is the bundle the modular-adder layer builds on (`gidney_adder_patched_primitive`). Where to look next: • Resources (T-count / qubits / RSA-2048) : `RippleCarryAdderResource.lean` • Worked example + OpenQASM : `RippleCarryAdderExample.lean` • Supporting proofs : `RippleCarryAdderPropagationReverse.lean` (assembled target/read correctness, applyNat bridge, patched carry-clearing), `RippleCarryAdderUncomputeCascade.lean` (packaged primitive + WellTyped), `RippleCarryAdderDecideWitnesses.lean`, `RippleCarryAdderClassicalBridge.lean`.

theoremgidney_adder_correct

theorem gidney_adder_correct (n a b : Nat)
    (hn : 1 < n) (ha : a < 2 ^ n) (hb : b < 2 ^ n) :
    ∀ i, i < n →
      Gate.applyNat (gidney_adder n) (adder_input_F n a b) (target_idx i)
        = adder_sum_bit_classical a b i

*Gidney adder — semantic correctness (THE headline).** Running `gidney_adder n` (the canonical no-measurement faithful adder) on the standard input encoding `adder_input_F n a b` (read register = `a`, target register = `b`, carries 0, with `a, b < 2^n` and `1 < n`) leaves the **target register holding the sum bits** `(a + b).testBit i` for every `i < n`. The read register is restored to `a` (`gidney_adder_read_preserved`). The carry register is **left dirty** in this base adder — use the patched adder (`gidney_adder_correct_full`) when clean carries are required.

theoremgidney_adder_read_preserved

theorem gidney_adder_read_preserved (n a b : Nat)
    (hn : 1 < n) (ha : a < 2 ^ n) (hb : b < 2 ^ n) :
    ∀ i, i < n →
      Gate.applyNat (gidney_adder n) (adder_input_F n a b) (read_idx i)
        = a.testBit i

*Gidney adder — read register preserved.** `gidney_adder n` leaves the read register holding `a` (bit `i` = `a.testBit i`) for every `i < n`.

theoremgidney_adder_correct_full

theorem gidney_adder_correct_full (bits a b : Nat)
    (hbits : 2 ≤ bits) (ha : a < 2 ^ bits) (hb : b < 2 ^ bits) :
    Gate.WellTyped (adder_n_qubits bits)
        (gidney_adder_full_faithful_no_measurement_patched bits)
    ∧ gidney_target_val bits
          (Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
            (adder_input_F bits a b))
        = (a + b) % 2 ^ bits
    ∧ (∀ i, i < bits →
        Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
          (adder_input_F bits a b) (read_idx i) = a.testBit i)
    ∧ (∀ i, i < bits →

*Gidney adder — full correctness bundle (carry-clean).** The patched adder `gidney_adder_full_faithful_no_measurement_patched bits` is, for `bits ≥ 2` and `a, b < 2^bits`, simultaneously: 1. **WellTyped** on the `adder_n_qubits bits = 3·bits + 2` qubit budget; 2. decodes the target register to `(a + b) mod 2^bits`; 3. preserves the read register (`= a`); 4. **clears the carry register** (every carry qubit back to `0`). This is the bundled primitive the modular-adder layer calls; it is exactly `gidney_adder_patched_primitive`.

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderCostSkeleton

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderCostSkeleton.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderCostSkeleton ───────────────────────────────────────────────────────────────── ⚠️ A deliberately-WRONG "cost-only skeleton" Gidney adder family. *Definitions only — no proofs.** This family has the right Toffoli / T-count but the WRONG logical action: for `i > 0` the per-bit step omits the carry-propagation CXs, so it does not compute Gidney's carry. It is kept ONLY for T-count accounting — its cost provably equals the correct faithful adder's (`gidney_cost_skeleton_eq_faithful` in `RippleCarryAdderForwardAndCost.lean`), and the factor-of-2 no-measurement vs. measurement gap is stated against it (`gidney_no_measurement_vs_measurement_gap`). For the semantically-correct adder use `gidney_adder` / `gidney_adder_full_faithful_no_measurement` in `RippleCarryAdderDef.lean`. Do NOT build on anything in this file.

defgidney_adder_bit_step

def gidney_adder_bit_step (i : Nat) : Gate

⚠️ COST-ONLY SKELETON per-bit step: right Toffoli count, wrong carry (omits the propagation CXs). Correct version: `gidney_adder_bit_step_faithful_*`.

defgidney_adder_bit_step_reverse

def gidney_adder_bit_step_reverse (i : Nat) : Gate

⚠️ COST-ONLY SKELETON gate-reverse of `gidney_adder_bit_step` (`CX; CCX` at `i > 0`); the per-bit inverse used by the proper-reverse cascade.

defgidney_adder_forward

def gidney_adder_forward : Nat → Gate
  | 0       => Gate.I
  | n + 1   => Gate.seq (gidney_adder_forward n) (gidney_adder_bit_step n)

⚠️ COST-ONLY SKELETON forward pass. Use `gidney_adder_forward_faithful_full`.

defgidney_adder_uncompute

def gidney_adder_uncompute : Nat → Gate
  | 0       => Gate.I
  | n + 1   => Gate.seq (gidney_adder_bit_step n) (gidney_adder_uncompute n)

⚠️ COST-ONLY SKELETON reverse pass: forward bit-steps in reverse bit order (right `7n` Toffoli count, but not a true gate-level inverse of the forward).

defgidney_adder_uncompute_proper

def gidney_adder_uncompute_proper : Nat → Gate
  | 0       => Gate.I
  | n + 1   => Gate.seq (gidney_adder_bit_step_reverse n)
                        (gidney_adder_uncompute_proper n)

⚠️ COST-ONLY SKELETON proper reverse cascade: the true gate-by-gate inverse of `gidney_adder_forward`, built from `gidney_adder_bit_step_reverse`.

defgidney_adder_full

def gidney_adder_full (n : Nat) : Gate

⚠️ COST-ONLY SKELETON full adder (forward + reverse + final CX). Its T-count `14n` is valid (and is what the measurement-gap theorem uses); it is NOT semantically correct.

defgidney_adder_bit_with_measurement_uncompute_tcount

def gidney_adder_bit_with_measurement_uncompute_tcount : Nat

Per-bit Gidney adder T-count WITH measurement-based uncomputation = 7.

defgidney_adder_full_with_measurement_uncompute_tcount

def gidney_adder_full_with_measurement_uncompute_tcount (n : Nat) : Nat

n-bit Gidney adder T-count with measurement-based uncomputation: `7n` (one Gidney-AND cycle per bit; matches qianxu Eq. E3).

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderDecideWitnesses

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderDecideWitnesses.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderDecideWitnesses Re-export umbrella for the `DecideWitnesses/` sub-folder, split by sub-topic: ForwardInvariant → FinalCXLayer → ReverseStepScaffold → ReverseFramesAndHeadline. Kept at this path so external importers resolve unchanged.

(no documented top-level declarations)

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderDef

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderDef.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderDef ─────────────────────────────────────────────────────── THE definition of the **Gidney** ripple-carry adder, as concrete `Gate`-IR data. **Just the circuit — no specs, no proofs, no post-states.** THE adder is `gidney_adder`: a forward faithful cascade, then a final-CX cascade (stamps the sum), then a reverse cascade, on `3*n + 2` qubits with the registers interleaved LSB-first: • read[i] = 3*i (the `a` register; preserved) • target[i] = 3*i + 1 (the `b` register; becomes bit i of (a+b)) • carry[i] = 3*i + 2 (carry chain; LEFT DIRTY by the base adder) `gidney_adder_full_faithful_no_measurement_patched` is the carry-clearing variant — same target/read action, but it also returns the carry register to 0. That patched adder is the one the modular-adder layer builds on. Where everything else lives (one file per job): • Classical spec / encoding / decoders : `RippleCarryAdderSpec.lean` • Basis-state post-states + invariants : `RippleCarryAdderPostStates.lean` • Cost-only skeleton (NOT this adder) : `RippleCarryAdderCostSkeleton.lean` • Correctness theorems : `RippleCarryAdderCorrectness.lean` • Resource theorems (T / qubits / RSA) : `RippleCarryAdderResource.lean` • Worked example + OpenQASM : `RippleCarryAdderExample.lean` Refs: Gidney, arXiv:1709.06648; Qrisp `qq_gidney_adder.py`.

defread_idx

def read_idx (i : Nat) : Nat

Qubit index for the i-th read bit.

deftarget_idx

def target_idx (i : Nat) : Nat

Qubit index for the i-th target bit.

defcarry_idx

def carry_idx (i : Nat) : Nat

Qubit index for the i-th carry bit.

defadder_n_qubits

def adder_n_qubits (n : Nat) : Nat

Total qubits for an n-bit adder: `3n + 2`.

defgidney_final_cx_cascade

def gidney_final_cx_cascade : Nat → Gate
  | 0       => Gate.I
  | n + 1   => Gate.seq (gidney_final_cx_cascade n)
                        (Gate.CX (read_idx n) (target_idx n))

Final-CX cascade — one `CX(read[i], target[i])` per bit.

defgidney_adder_bit_step_faithful_interior

def gidney_adder_bit_step_faithful_interior (i : Nat) : Gate

Faithful interior bit-step `i ≥ 1` (not last): CCX + chain-CX + 2 propagation CXs.

defgidney_adder_bit_step_faithful_interior_reverse

def gidney_adder_bit_step_faithful_interior_reverse (i : Nat) : Gate

Gate-reverse of `gidney_adder_bit_step_faithful_interior i`.

defgidney_adder_bit_step_faithful_first

def gidney_adder_bit_step_faithful_first : Gate

Faithful first bit-step `i = 0`: CCX + 2 propagation CXs (no chain CX).

defgidney_adder_bit_step_faithful_first_reverse

def gidney_adder_bit_step_faithful_first_reverse : Gate

Gate-reverse of `gidney_adder_bit_step_faithful_first`.

defgidney_adder_bit_step_faithful_last

def gidney_adder_bit_step_faithful_last (i : Nat) : Gate

Faithful last bit-step `i ≥ 1`: CCX + chain CX (no propagation).

defgidney_adder_bit_step_faithful_last_reverse

def gidney_adder_bit_step_faithful_last_reverse (i : Nat) : Gate

Gate-reverse of `gidney_adder_bit_step_faithful_last i`.

defgidney_adder_forward_faithful_interior

def gidney_adder_forward_faithful_interior : Nat → Gate
  | 0       => Gate.I
  | n + 1   => Gate.seq
                 (gidney_adder_forward_faithful_interior n)
                 (gidney_adder_bit_step_faithful_interior (n + 1))

All-interior cascade: `gidney_adder_bit_step_faithful_interior (k+1)` for `k = 0..n-1` (structural core; same `7n` T-count as the cost skeleton).

defgidney_adder_forward_with_propagation

def gidney_adder_forward_with_propagation : Nat → Gate
  | 0       => Gate.I
  | 1       => gidney_adder_bit_step_faithful_first
  | n + 2   => Gate.seq (gidney_adder_forward_with_propagation (n + 1))
                        (gidney_adder_bit_step_faithful_interior (n + 1))

Cascade of bits `0..n-1`, each WITH propagation (first ; interior ; …).

defgidney_adder_forward_faithful_full

def gidney_adder_forward_faithful_full : Nat → Gate
  | 0       => Gate.I
  | 1       => Gate.I
  | n + 2   => Gate.seq (gidney_adder_forward_with_propagation (n + 1))
                        (gidney_adder_bit_step_faithful_last (n + 1))

*Faithful full forward cascade** for an n-bit adder: bits `0..n-2` with propagation, then the last bit (no propagation). `Gate.I` for `n ≤ 1`.

defgidney_adder_forward_with_propagation_reverse

def gidney_adder_forward_with_propagation_reverse : Nat → Gate
  | 0       => Gate.I
  | 1       => gidney_adder_bit_step_faithful_first_reverse
  | n + 2   => Gate.seq (gidney_adder_bit_step_faithful_interior_reverse (n + 1))
                        (gidney_adder_forward_with_propagation_reverse (n + 1))

Reverse of `gidney_adder_forward_with_propagation`.

defgidney_adder_forward_faithful_full_reverse

def gidney_adder_forward_faithful_full_reverse : Nat → Gate
  | 0       => Gate.I
  | 1       => Gate.I
  | n + 2   => Gate.seq (gidney_adder_bit_step_faithful_last_reverse (n + 1))
                        (gidney_adder_forward_with_propagation_reverse (n + 1))

Reverse of `gidney_adder_forward_faithful_full`.

defgidney_adder_full_faithful_no_measurement

def gidney_adder_full_faithful_no_measurement : Nat → Gate
  | 0       => Gate.I
  | 1       => Gate.I
  | n + 2   => Gate.seq
                (Gate.seq (gidney_adder_forward_faithful_full (n + 2))
                          (gidney_final_cx_cascade (n + 2)))
                (gidney_adder_forward_faithful_full_reverse (n + 2))

*Full no-measurement faithful Gidney adder** (`n+2` bits): forward faithful cascade ; final-CX cascade ; faithful reverse cascade. Total T-count `14·(n+2)`. Edge cases `n ≤ 1` return `Gate.I`.

defgidney_adder

def gidney_adder (n : Nat) : Gate

*THE canonical, semantically-correct Gidney ripple-carry adder.** Alias for `gidney_adder_full_faithful_no_measurement`. This is the adder the Shor cost model binds to (`adderToff_eq`) and the canonical name downstream code uses.

defgidney_adder_bit_step_faithful_first_reverse_patched

def gidney_adder_bit_step_faithful_first_reverse_patched : Gate

Patched first-bit reverse step (clears `carry[0]`).

defgidney_adder_bit_step_faithful_interior_reverse_patched

def gidney_adder_bit_step_faithful_interior_reverse_patched (i : Nat) : Gate

Patched interior-bit reverse step (clears `carry[i]`).

defgidney_adder_bit_step_faithful_last_reverse_patched

def gidney_adder_bit_step_faithful_last_reverse_patched (i : Nat) : Gate

Patched last-bit reverse step (clears `carry[i]`).

defgidney_adder_forward_with_propagation_reverse_patched

def gidney_adder_forward_with_propagation_reverse_patched : Nat → Gate
  | 0       => Gate.I
  | 1       => gidney_adder_bit_step_faithful_first_reverse_patched
  | n + 2   =>
      Gate.seq (gidney_adder_bit_step_faithful_interior_reverse_patched (n + 1))
               (gidney_adder_forward_with_propagation_reverse_patched (n + 1))

Patched propagation reverse cascade.

defgidney_adder_forward_faithful_full_reverse_patched

def gidney_adder_forward_faithful_full_reverse_patched : Nat → Gate
  | 0       => Gate.I
  | 1       => Gate.I
  | n + 2   =>
      Gate.seq (gidney_adder_bit_step_faithful_last_reverse_patched (n + 1))
               (gidney_adder_forward_with_propagation_reverse_patched (n + 1))

Patched full reverse cascade.

defgidney_adder_full_faithful_no_measurement_patched

def gidney_adder_full_faithful_no_measurement_patched : Nat → Gate
  | 0       => Gate.I
  | 1       => Gate.I
  | n + 2   =>
      Gate.seq
        (Gate.seq (gidney_adder_forward_faithful_full (n + 2))
                  (gidney_final_cx_cascade (n + 2)))
        (gidney_adder_forward_faithful_full_reverse_patched (n + 2))

*Patched full faithful no-measurement Gidney adder**: forward ; final-CX ; *patched** reverse (which additionally clears the carry register).

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderExample

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderExample.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderExample ──────────────────────────────────────────────────────────── A worked example for the Gidney adder + its `Gadget` descriptor for the uniform QASM emitter. This file contains `#eval` demos, so it is kept OFF the default build path (not imported by the `Arithmetic` umbrella). Build / run on demand: lake build FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderExample

defgidney_adder_2bit

def gidney_adder_2bit : Gate

The 2-bit Gidney adder.

example(example)

example : tcount gidney_adder_2bit = 28

Its T-count is `14 · 2 = 28` (instance of `gidney_adder_tcount`).

defGidneyAdder

def GidneyAdder : Gadget

The Gidney adder as a uniform, emittable `Gadget` descriptor.

example(example)

example (n : Nat) : GidneyAdder.tcount (n + 2) = 14 * (n + 2)

The descriptor's structurally-computed T-count is *exactly* the proven closed form `14 · n` — for every `n ≥ 2`.

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderForwardAndCost

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderForwardAndCost.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderForwardAndCost Re-export umbrella for the `ForwardAndCost/` sub-folder, split by sub-topic: SkeletonCost → InteriorBit → FirstBit → LastBitAndSkeletonRev → FaithfulBackbone (the backbone holds the faithful forward correctness + reversibility + T-count headlines). Kept at this path so Shor / Resource / ClassicalBridge resolve unchanged.

(no documented top-level declarations)

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderPostStates

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderPostStates.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderPostStates ─────────────────────────────────────────────────────────────── The basis-state "semantic shadow" of the Gidney adder: for each gate / cascade in `RippleCarryAdderDef.lean`, the `Nat → Bool` post-state function describing its classical action, plus the `Prop`-valued correctness invariants the proofs reason about. **Definitions only — no proofs.** This is internal PROOF VOCABULARY: the supporting files (`RippleCarryAdderForwardAndCost`, `RippleCarryAdderClassicalBridge`, `RippleCarryAdderDecideWitnesses`, `RippleCarryAdderPropagationReverse`, `RippleCarryAdderUncomputeCascade`) state their lemmas against these. The reader-facing headlines are in `RippleCarryAdderCorrectness.lean`.

defgidney_bit_step_faithful_post_state

def gidney_bit_step_faithful_post_state (i : Nat) (f : Nat → Bool) : Nat → Bool

Post-state of `gidney_adder_bit_step_faithful_interior i`: CCX writes the AND into `carry[i]`, chain-CX adds `carry[i-1]`, then 2 propagation CXs XOR `carry[i]` into `read[i+1]` / `target[i+1]`.

defgidney_first_bit_post_state

def gidney_first_bit_post_state (f : Nat → Bool) : Nat → Bool

Post-state of `gidney_adder_bit_step_faithful_first` (no chain CX).

defgidney_last_bit_post_state

def gidney_last_bit_post_state (i : Nat) (f : Nat → Bool) : Nat → Bool

Post-state of `gidney_adder_bit_step_faithful_last i` (no propagation).

defgidney_interior_bit_post_state

def gidney_interior_bit_post_state (i : Nat) (f : Nat → Bool) : Nat → Bool

Post-state of the interior 4-gate step at `i ≥ 1` (alias-shaped twin of `gidney_bit_step_faithful_post_state`, kept for the bridge lemma).

defgidney_first_bit_reverse_post_state

def gidney_first_bit_reverse_post_state (f : Nat → Bool) : Nat → Bool

Post-state of `gidney_adder_bit_step_faithful_first_reverse` (3 gates, gate-reversed).

defgidney_interior_bit_reverse_post_state

def gidney_interior_bit_reverse_post_state (i : Nat) (f : Nat → Bool) : Nat → Bool

Post-state of `gidney_adder_bit_step_faithful_interior_reverse i` (4 gates, gate-reversed).

defgidney_last_bit_reverse_post_state

def gidney_last_bit_reverse_post_state (i : Nat) (f : Nat → Bool) : Nat → Bool

Post-state of `gidney_adder_bit_step_faithful_last_reverse i` (2 gates, gate-reversed).

defgidney_cascade_post_state

def gidney_cascade_post_state : Nat → (Nat → Bool) → (Nat → Bool)
  | 0    , f => f
  | n + 1, f =>
      gidney_bit_step_faithful_post_state (n + 1)
        (gidney_cascade_post_state n f)

Fold of `gidney_bit_step_faithful_post_state` over bits `1..n` (matches `gidney_adder_forward_faithful_interior`).

defgidney_propagation_post_state

def gidney_propagation_post_state : Nat → (Nat → Bool) → (Nat → Bool)
  | 0    , f => f
  | 1    , f => gidney_first_bit_post_state f
  | n + 2, f =>
      gidney_bit_step_faithful_post_state (n + 1)
        (gidney_propagation_post_state (n + 1) f)

Post-state of `gidney_adder_forward_with_propagation n`.

defgidney_forward_faithful_full_post_state

def gidney_forward_faithful_full_post_state : Nat → (Nat → Bool) → (Nat → Bool)
  | 0    , f => f
  | 1    , f => f
  | n + 2, f =>
      gidney_last_bit_post_state (n + 1)
        (gidney_propagation_post_state (n + 1) f)

Post-state of `gidney_adder_forward_faithful_full` (propagation then last).

defgidney_final_cx_cascade_post_state

def gidney_final_cx_cascade_post_state : Nat → (Nat → Bool) → (Nat → Bool)
  | 0    , f => f
  | n + 1, f =>
      let f'

Post-state of `gidney_final_cx_cascade n`: XOR `read[i]` into `target[i]` for `i = 0..n-1`.

defgidney_propagation_reverse_post_state

def gidney_propagation_reverse_post_state : Nat → (Nat → Bool) → (Nat → Bool)
  | 0       , f => f
  | 1       , f => gidney_first_bit_reverse_post_state f
  | n + 2   , f =>
      gidney_propagation_reverse_post_state (n + 1)
        (gidney_interior_bit_reverse_post_state (n + 1) f)

Post-state of `gidney_adder_forward_with_propagation_reverse`.

defgidney_full_reverse_post_state

def gidney_full_reverse_post_state : Nat → (Nat → Bool) → (Nat → Bool)
  | 0       , f => f
  | 1       , f => f
  | n + 2   , f =>
      gidney_propagation_reverse_post_state (n + 1)
        (gidney_last_bit_reverse_post_state (n + 1) f)

Post-state of `gidney_adder_forward_faithful_full_reverse` (last-reverse then propagation-reverse).

structureBitDisjointness

structure BitDisjointness (dim i : Nat) : Prop

*Bit-disjointness hypothesis for bit `i`**: the 12 index-distinctness / in-range conditions needed for the per-bit interior correctness theorem.

defGidney.forward_cascade_post_invariant

def Gidney.forward_cascade_post_invariant
    (n a b : Nat) (post : Nat → Bool) : Prop

End-of-forward-cascade invariant: `read_i = a_i ⊕ c_i`, `target_i = b_i ⊕ c_i`, `carry_i = c_{i+1}`.

defGidney.propagation_step_invariant

def Gidney.propagation_step_invariant
    (k n a b : Nat) (post : Nat → Bool) : Prop

Step-indexed propagation invariant: after `k` steps, positions `< k` (carry) / `≤ k` (read, target) are propagated, the rest unchanged.

defGidney.post_last_bit_invariant

def Gidney.post_last_bit_invariant
    (n a b : Nat) (post : Nat → Bool) : Prop

End-state invariant after the forward cascade only (no final-CX): `carry_j = c_{j+1}`, `read_j = a_j ⊕ c_j`, `target_j = b_j ⊕ c_j`.

defGidney.post_forward_final_cx_invariant

def Gidney.post_forward_final_cx_invariant
    (n a b : Nat) (post : Nat → Bool) : Prop

End-state invariant after forward + final-CX: `target_j = a_j ⊕ b_j` (the `c_j` contributions cancel — the reverse cascade re-XORs them to finish the sum).

defGidney.post_full_reverse_invariant

def Gidney.post_full_reverse_invariant
    (n a b : Nat) (post : Nat → Bool) : Prop

Post-full-reverse invariant: `target_j = sum_j` and `read_j = a_j` (carries left dirty). The structural refinement of the headline correctness.

defGidney.reverse_step_invariant

def Gidney.reverse_step_invariant
    (k n a b : Nat) (post : Nat → Bool) : Prop

Step-indexed reverse-cascade invariant: after `k` reverse steps, positions `j ∈ [n-k, n-1]` are corrected (`target_j = sum_j`, `read_j = a_j`). At `k = n` this is `post_full_reverse_invariant`.

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderPropagationReverse

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderPropagationReverse.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderPropagationReverse Re-export umbrella for the `PropagationReverse/` sub-folder, split by sub-topic: SemanticCorrectness -> ApplyNatBridge -> PatchedCarryLemmas -> CarryClearanceBackbone. Kept at this path so PPM / Correctness / the package umbrella resolve unchanged.

(no documented top-level declarations)

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderResource

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderResource.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderResource ───────────────────────────────────────────────────────────── THE resource theorems for the Gidney ripple-carry adder, surfaced as thin wrappers over the proofs in the supporting files, plus a resource-after-correctness bundle that states the resource about the SAME circuit the correctness theorem verifies. Headlines: • `gidney_adder_tcount` — T-count of the gate-explicit (no-measurement) n-bit adder = `14·n` (n forward + n reverse Toffolis, 7 T each; CX/final-CX are T-free). • `gidney_adder_tcount_vs_measurement` — that `14·n` is exactly **twice** the `7·n` measurement-uncomputation figure (qianxu Eq. E3). The factor-of-2 is the formally-surfaced no-measurement vs. measurement gap. • `gidney_adder_RSA2048_tcount` — at the RSA-2048 adder size `q_A = 33`, the T-count is `462`, matching the `gidney_adder_RSA2048_T_count_verified` paper-claim anchor. • `gidney_adder_patched_wellTyped` — the (carry-clean) patched adder is WellTyped on `adder_n_qubits bits = 3·bits + 2` qubits. • `gidney_adder_verified` — resource AFTER correctness: the one circuit is simultaneously sum-correct, read-preserving, and `14·n` T-gates. Where to look next: • Semantic correctness : `RippleCarryAdderCorrectness.lean` • Worked example + QASM : `RippleCarryAdderExample.lean` • Supporting proofs : `RippleCarryAdderForwardAndCost.lean` (T-counts + forward correctness/reversibility), `RippleCarryAdderClassicalBridge.lean`.

theoremgidney_adder_tcount

theorem gidney_adder_tcount (n : Nat) :
    tcount (gidney_adder (n + 2)) = 14 * (n + 2)

*Gidney adder — T-count (THE headline).** The full gate-explicit no-measurement n-bit adder uses exactly `14·n` T-gates (`n` forward + `n` reverse Toffolis at 7 T each; the final-CX cascade is T-free).

theoremgidney_adder_tcount_vs_measurement

theorem gidney_adder_tcount_vs_measurement (n : Nat) :
    tcount (gidney_adder (n + 2))
      = 2 * gidney_adder_full_with_measurement_uncompute_tcount (n + 2)

*Gidney adder — the measurement-uncomputation gap.** The gate-explicit `14·n` T-count is exactly **twice** the `7·n` figure achievable with Gidney's measurement-based uncomputation (qianxu Eq. E3). Surfacing this factor-of-2 is the honest statement of the optimization that is costed but not gate-level formalized.

theoremgidney_adder_RSA2048_tcount

theorem gidney_adder_RSA2048_tcount :
    tcount (gidney_adder qianxu_q_A_RSA2048) = gidney_adder_RSA2048_T_count_verified

*RSA-2048 adder T-count = 462.** At the maximum adder size in the RSA-2048 Shor circuit (`q_A = 33`, qianxu p. 22), the gate-explicit adder has T-count `14·33 = 462`, matching the verified paper-claim anchor `gidney_adder_RSA2048_T_count_verified`.

theoremgidney_adder_patched_wellTyped

theorem gidney_adder_patched_wellTyped (bits : Nat) (hbits : 2 ≤ bits) :
    Gate.WellTyped (adder_n_qubits bits)
      (gidney_adder_full_faithful_no_measurement_patched bits)

*Gidney adder — qubit budget.** The carry-clean patched adder is WellTyped on `adder_n_qubits bits = 3·bits + 2` qubits (`read`, `target`, `carry` interleaved, `read`/`target` carrying one extra overflow position).

theoremgidney_adder_verified

theorem gidney_adder_verified (n a b : Nat)
    (ha : a < 2 ^ (n + 2)) (hb : b < 2 ^ (n + 2)) :
    (∀ i, i < n + 2 →
        Gate.applyNat (gidney_adder (n + 2)) (adder_input_F (n + 2) a b) (target_idx i)
          = adder_sum_bit_classical a b i)
    ∧ (∀ i, i < n + 2 →
        Gate.applyNat (gidney_adder (n + 2)) (adder_input_F (n + 2) a b) (read_idx i)
          = a.testBit i)
    ∧ tcount (gidney_adder (n + 2)) = 14 * (n + 2)

*Gidney adder — verified-with-resource (resource AFTER correctness).** The single object `gidney_adder (n+2)` is simultaneously: 1. sum-correct — `target_i = (a+b).testBit i` for all `i < n+2`; 2. read-preserving — `read_i = a.testBit i` for all `i < n+2`; 3. `14·(n+2)` T-gates. The resource bound is stated about *exactly* the circuit the correctness theorem verifies, so "resource" is established only after "correctness". (For clean carries + WellTyped, use the patched bundle `gidney_adder_correct_full`.)

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderSpec

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderSpec.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderSpec ───────────────────────────────────────────────────────── The CLASSICAL specification of the Gidney adder — what "correct" means and how inputs/outputs are encoded. **Definitions only — no proofs, no circuits.** • `Adder.carry` / `Adder.sumfb` — the bit-level carry/sum recurrence (port of SQIR `ModMult.v`). • `adder_sum_bit_classical a b i = (a+b).testBit i` — the expected output bit. • `adder_input_F n a b` — the standard `|a⟩|b⟩|0⟩` input on the interleaved layout. • `gidney_read_val` / `gidney_target_val` / `gidney_carry_val` — LSB-first register decoders. The circuit itself is in `RippleCarryAdderDef.lean`; the correctness theorems that relate the two are in `RippleCarryAdderCorrectness.lean`.

defAdder.carry

def Adder.carry (b₀ : Bool) : Nat → (Nat → Bool) → (Nat → Bool) → Bool
  | 0,     _, _ => b₀
  | n + 1, f, g =>
      let c

*Classical carry function** (SQIR `ModMult.v` port). Given a carry-in `b₀ : Bool` and two bit-streams `f g`, `Adder.carry b₀ n f g` is the carry-out after processing bits `0..n-1` of `f + g`.

defAdder.sumfb

def Adder.sumfb (b₀ : Bool) (f g : Nat → Bool) (i : Nat) : Bool

*Classical sum-bit function** (SQIR `ModMult.v` port). `Adder.sumfb b₀ f g i = carry b₀ i f g ⊕ f i ⊕ g i` — bit `i` of `f + g` with carry-in `b₀`.

defadder_sum_bit_classical

def adder_sum_bit_classical (a b i : Nat) : Bool

*Classical specification**: bit `i` of `(a + b) mod 2^n`, the value the i-th target qubit should hold after the full adder.

defadder_input_F

def adder_input_F (n a b : Nat) (k : Nat) : Bool

*Generic input encoding** `|a⟩|b⟩|0⟩_carries` on the interleaved layout: `read[i]` ↦ bit `i` of `a` (if `i < n`), `target[i]` ↦ bit `i` of `b`, `carry[i]` ↦ `false`.

defgidney_read_val

def gidney_read_val : Nat → (Nat → Bool) → Nat
  | 0,     _ => 0
  | n + 1, f =>
      gidney_read_val n f + (if f (read_idx n) then 2^n else 0)

Decoder: value of the `read` register at width `n`, LSB-first.

defgidney_target_val

def gidney_target_val : Nat → (Nat → Bool) → Nat
  | 0,     _ => 0
  | n + 1, f =>
      gidney_target_val n f + (if f (target_idx n) then 2^n else 0)

Decoder: value of the `target` register at width `n`, LSB-first.

defgidney_carry_val

def gidney_carry_val : Nat → (Nat → Bool) → Nat
  | 0,     _ => 0
  | n + 1, f =>
      gidney_carry_val n f + (if f (carry_idx n) then 2^n else 0)

Decoder: value of the `carry` register at width `n`, LSB-first.

abbrevzeroF

abbrev zeroF : Nat → Bool

The all-zero input function.

definputF_1_plus_0

def inputF_1_plus_0 : Nat → Bool

Input for `read = (1,0), target = (0,0)` (the `1 + 0` 2-bit case).

definputF_1_plus_1

def inputF_1_plus_1 : Nat → Bool
  | 0 => true   -- read_0 = a_0 = 1
  | 1 => true   -- target_0 = b_0 = 1
  | _ => false

Input for `(a=1, b=1)` 2-bit addition.

definputF_3_plus_1

def inputF_3_plus_1 : Nat → Bool
  | 0 => true   -- read_0 = a_0 = 1
  | 1 => true   -- target_0 = b_0 = 1
  | 3 => true   -- read_1 = a_1 = 1
  | _ => false

Input for `(a=3, b=1)` 3-bit addition.

definputF_7_plus_1

def inputF_7_plus_1 : Nat → Bool
  | 0 => true   -- read_0 = a_0 = 1
  | 1 => true   -- target_0 = b_0 = 1
  | 3 => true   -- read_1 = a_1 = 1
  | 6 => true   -- read_2 = a_2 = 1
  | _ => false

Input for `(a=7, b=1)` 4-bit addition.

definputF_1_plus_1_tickD

def inputF_1_plus_1_tickD : Nat → Bool
  | 0 => true   -- read[0] = 1 (LSB)
  | 1 => true   -- target[0] = 1 (LSB)
  | _ => false

Concrete `1 + 1` input (LSB-first): `read = 1, target = 1` at width 2.

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderUncomputeCascade

FormalRV/Arithmetic/RippleCarryAdder/RippleCarryAdderUncomputeCascade.lean

FormalRV.Arithmetic.RippleCarryAdder.RippleCarryAdderUncomputeCascade Re-export umbrella for the `UncomputeCascade/` sub-folder, split by sub-topic: FrameLemmas → Correctness → WellTypedBackbone (the backbone holds `gidney_adder_patched_primitive`). Kept at this path so external importers resolve unchanged.

(no documented top-level declarations)

FormalRV.Arithmetic.RippleCarryAdder.UncomputeCascade.Correctness

FormalRV/Arithmetic/RippleCarryAdder/UncomputeCascade/Correctness.lean

FormalRV.Arithmetic.RippleCarryAdder.UncomputeCascade.Correctness Patched full-adder correctness (part 2/3): per-register correctness, the packaged correctness theorem, and the bits-parametric decode `gidney_adder_patched_target_decode`. Builds on `FrameLemmas`.

theoremgidney_adder_full_faithful_no_measurement_patched_target_correct

theorem gidney_adder_full_faithful_no_measurement_patched_target_correct
    (n a b : Nat) (ha : a < 2^(n + 2)) (hb : b < 2^(n + 2)) :
    ∀ i, i < n + 2 →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched (n + 2))
        (adder_input_F (n + 2) a b) (target_idx i)
      = adder_sum_bit_classical a b i

*Patched full adder, target register correctness** (Deliverable C₁).

theoremgidney_adder_full_faithful_no_measurement_patched_read_preserved

theorem gidney_adder_full_faithful_no_measurement_patched_read_preserved
    (n a b : Nat) (ha : a < 2^(n + 2)) (hb : b < 2^(n + 2)) :
    ∀ i, i < n + 2 →
      Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched (n + 2))
        (adder_input_F (n + 2) a b) (read_idx i)
      = a.testBit i

*Patched full adder, read register preservation** (Deliverable C₂).

theoremgidney_adder_full_faithful_no_measurement_patched_correct

theorem gidney_adder_full_faithful_no_measurement_patched_correct
    (n a b : Nat) (ha : a < 2^(n + 2)) (hb : b < 2^(n + 2)) :
    (∀ i, i < n + 2 →
        Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched (n + 2))
          (adder_input_F (n + 2) a b) (read_idx i)
        = a.testBit i)
    ∧ (∀ i, i < n + 2 →
        Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched (n + 2))
          (adder_input_F (n + 2) a b) (target_idx i)
        = adder_sum_bit_classical a b i)
    ∧ (∀ i, i ≤ n + 1 →
        Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched (n + 2))

*Full patched-adder correctness — packaged theorem** (Deliverable D). For the Option-1 carry-clearing patched Gidney adder on `adder_input_F (n+2) a b`: 1. The read register is preserved (= original `a` bits). 2. The target register equals the classical sum bits. 3. The carry register is fully cleared.

theoremnat_mod_two_pow_succ_eq

theorem nat_mod_two_pow_succ_eq (x n : Nat) :
    x % 2^(n + 1) = x % 2^n + (if x.testBit n then 2^n else 0)

Helper: `x % 2^(n+1) = x % 2^n + (testBit x n) * 2^n`. Standard identity, not in mathlib in this exact form.

theoremgidney_target_val_eq_sum_when_bits_match

theorem gidney_target_val_eq_sum_when_bits_match
    (bits S : Nat) (f : Nat → Bool)
    (h : ∀ i, i < bits → f (target_idx i) = S.testBit i) :
    gidney_target_val bits f = S % 2^bits

If a bit-function's target-register positions match the bits of `S`, then `gidney_target_val` decodes the target register to `S % 2^bits`.

theoremgidney_adder_full_faithful_no_measurement_patched_correct_bits

theorem gidney_adder_full_faithful_no_measurement_patched_correct_bits
    (bits a b : Nat) (hbits : 2 ≤ bits) (ha : a < 2^bits) (hb : b < 2^bits) :
    (∀ i, i < bits →
        Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
          (adder_input_F bits a b) (read_idx i) = a.testBit i)
    ∧ (∀ i, i < bits →
        Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
          (adder_input_F bits a b) (target_idx i) = (a + b).testBit i)
    ∧ (∀ i, i < bits →
        Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
          (adder_input_F bits a b) (carry_idx i) = false)

*Deliverable A**: bits-parameter wrapper of the packaged correctness theorem. For any `bits ≥ 2` and `a, b < 2^bits`, the patched full faithful no-measurement Gidney adder preserves the read register, writes the classical sum bits into the target register, and clears the carry register.

theoremgidney_adder_patched_target_decode

theorem gidney_adder_patched_target_decode
    (bits a b : Nat) (hbits : 2 ≤ bits) (ha : a < 2^bits) (hb : b < 2^bits) :
    gidney_target_val bits
      (Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
        (adder_input_F bits a b))
    = (a + b) % 2^bits

*Deliverable B**: decoded target-register correctness. After the patched full faithful no-measurement Gidney adder runs on `adder_input_F bits a b`, the target register decodes to `(a + b) mod 2^bits`.

FormalRV.Arithmetic.RippleCarryAdder.UncomputeCascade.FrameLemmas

FormalRV/Arithmetic/RippleCarryAdder/UncomputeCascade/FrameLemmas.lean

FormalRV.Arithmetic.RippleCarryAdder.UncomputeCascade.FrameLemmas Patched-adder uncompute cascade — frame lemmas (part 1/3). The "patched = unpatched at non-carry positions" cascade theorems plus the unpatched commute / input-independence helpers. Supporting lemmas only; the backbone (`gidney_adder_patched_primitive`) is in `WellTypedBackbone`.

theoremunpatched_full_reverse_commute_update_at_c_above

theorem unpatched_full_reverse_commute_update_at_c_above
    (n : Nat) (g : Nat → Bool) (v : Bool) (j : Nat) (hj : j > n + 1) :
    Gate.applyNat (gidney_adder_forward_faithful_full_reverse (n + 2)) (update g (carry_idx j) v)
      = update (Gate.applyNat (gidney_adder_forward_faithful_full_reverse (n + 2)) g)
          (carry_idx j) v

Unpatched full reverse cascade commutes with update at `c[j]` (`j > n+1`).

theoremunpatched_propagation_reverse_indep_input_at_c_above

theorem unpatched_propagation_reverse_indep_input_at_c_above
    (m : Nat) (g : Nat → Bool) (v : Bool) (k : Nat) (h_k : k ≠ carry_idx (m + 1)) :
    Gate.applyNat (gidney_adder_forward_with_propagation_reverse (m + 1))
      (update g (carry_idx (m + 1)) v) k
    = Gate.applyNat (gidney_adder_forward_with_propagation_reverse (m + 1)) g k

*Input-independence of the unpatched propagation cascade** (Deliverable A): changing the input at `carry_idx (m+1)` (above the cascade's range) does not affect the output at any other position.

theoremunpatched_full_reverse_indep_input_at_c_above

theorem unpatched_full_reverse_indep_input_at_c_above
    (n : Nat) (g : Nat → Bool) (v : Bool) (k : Nat) (h_k : k ≠ carry_idx (n + 2)) :
    Gate.applyNat (gidney_adder_forward_faithful_full_reverse (n + 2))
      (update g (carry_idx (n + 2)) v) k
    = Gate.applyNat (gidney_adder_forward_faithful_full_reverse (n + 2)) g k

Input-independence of the unpatched full reverse cascade at `c[n+2]`.

theorempatched_unpatched_propagation_reverse_eq_at_target

theorem patched_unpatched_propagation_reverse_eq_at_target (m : Nat) :
    ∀ (g : Nat → Bool) (i : Nat),
      Gate.applyNat (gidney_adder_forward_with_propagation_reverse_patched (m + 1)) g
        (target_idx i)
        = Gate.applyNat (gidney_adder_forward_with_propagation_reverse (m + 1)) g
            (target_idx i)

Patched propagation cascade equals unpatched at `target_idx i`.

theorempatched_unpatched_propagation_reverse_eq_at_read

theorem patched_unpatched_propagation_reverse_eq_at_read (m : Nat) :
    ∀ (g : Nat → Bool) (i : Nat),
      Gate.applyNat (gidney_adder_forward_with_propagation_reverse_patched (m + 1)) g
        (read_idx i)
        = Gate.applyNat (gidney_adder_forward_with_propagation_reverse (m + 1)) g
            (read_idx i)

Patched propagation cascade equals unpatched at `read_idx i`.

theorempatched_full_reverse_eq_unpatched_at_target

theorem patched_full_reverse_eq_unpatched_at_target
    (n : Nat) (g : Nat → Bool) (i : Nat) :
    Gate.applyNat (gidney_adder_forward_faithful_full_reverse_patched (n + 2)) g (target_idx i)
      = Gate.applyNat (gidney_adder_forward_faithful_full_reverse (n + 2)) g (target_idx i)

Patched full reverse cascade equals unpatched at `target_idx i`.

theorempatched_full_reverse_eq_unpatched_at_read

theorem patched_full_reverse_eq_unpatched_at_read
    (n : Nat) (g : Nat → Bool) (i : Nat) :
    Gate.applyNat (gidney_adder_forward_faithful_full_reverse_patched (n + 2)) g (read_idx i)
      = Gate.applyNat (gidney_adder_forward_faithful_full_reverse (n + 2)) g (read_idx i)

Patched full reverse cascade equals unpatched at `read_idx i`.

FormalRV.Arithmetic.RippleCarryAdder.UncomputeCascade.WellTypedBackbone

FormalRV/Arithmetic/RippleCarryAdder/UncomputeCascade/WellTypedBackbone.lean

FormalRV.Arithmetic.RippleCarryAdder.UncomputeCascade.WellTypedBackbone BACKBONE (part 3/3): the WellTyped induction for the patched adder, then THE bundled reusable primitive `gidney_adder_patched_primitive` (WellTyped + decoded target = (a+b) mod 2^bits + read preservation + carry clearing) — the single theorem the modular-adder layer calls. Builds on `Correctness`.

theoremgidney_adder_bit_step_faithful_first_wellTyped

theorem gidney_adder_bit_step_faithful_first_wellTyped
    (bits : Nat) (hbits : 2 ≤ bits) :
    Gate.WellTyped (adder_n_qubits bits) gidney_adder_bit_step_faithful_first

theoremgidney_adder_bit_step_faithful_interior_wellTyped

theorem gidney_adder_bit_step_faithful_interior_wellTyped
    (bits i : Nat) (hi_pos : 0 < i) (hi_lt : i < bits) :
    Gate.WellTyped (adder_n_qubits bits)
      (gidney_adder_bit_step_faithful_interior i)

theoremgidney_adder_bit_step_faithful_last_wellTyped

theorem gidney_adder_bit_step_faithful_last_wellTyped
    (bits i : Nat) (hi_pos : 0 < i) (hi_lt : i < bits) :
    Gate.WellTyped (adder_n_qubits bits)
      (gidney_adder_bit_step_faithful_last i)

theoremgidney_adder_bit_step_faithful_first_reverse_patched_wellTyped

theorem gidney_adder_bit_step_faithful_first_reverse_patched_wellTyped
    (bits : Nat) (hbits : 2 ≤ bits) :
    Gate.WellTyped (adder_n_qubits bits)
      gidney_adder_bit_step_faithful_first_reverse_patched

theoremgidney_adder_bit_step_faithful_interior_reverse_patched_wellTyped

theorem gidney_adder_bit_step_faithful_interior_reverse_patched_wellTyped
    (bits i : Nat) (hi_pos : 0 < i) (hi_lt : i < bits) :
    Gate.WellTyped (adder_n_qubits bits)
      (gidney_adder_bit_step_faithful_interior_reverse_patched i)

theoremgidney_adder_bit_step_faithful_last_reverse_patched_wellTyped

theorem gidney_adder_bit_step_faithful_last_reverse_patched_wellTyped
    (bits i : Nat) (hi_pos : 0 < i) (hi_lt : i < bits) :
    Gate.WellTyped (adder_n_qubits bits)
      (gidney_adder_bit_step_faithful_last_reverse_patched i)

theoremgidney_adder_forward_with_propagation_wellTyped

theorem gidney_adder_forward_with_propagation_wellTyped
    (bits : Nat) (hb2 : 2 ≤ bits) :
    ∀ k, k ≤ bits →
      Gate.WellTyped (adder_n_qubits bits)
        (gidney_adder_forward_with_propagation k)

theoremgidney_adder_forward_faithful_full_wellTyped

theorem gidney_adder_forward_faithful_full_wellTyped
    (bits : Nat) (hb2 : 2 ≤ bits) :
    Gate.WellTyped (adder_n_qubits bits)
      (gidney_adder_forward_faithful_full bits)

theoremgidney_final_cx_cascade_wellTyped

theorem gidney_final_cx_cascade_wellTyped
    (bits : Nat) :
    ∀ k, k ≤ bits →
      Gate.WellTyped (adder_n_qubits bits) (gidney_final_cx_cascade k)

theoremgidney_adder_forward_with_propagation_reverse_patched_wellTyped

theorem gidney_adder_forward_with_propagation_reverse_patched_wellTyped
    (bits : Nat) (hb2 : 2 ≤ bits) :
    ∀ k, k ≤ bits →
      Gate.WellTyped (adder_n_qubits bits)
        (gidney_adder_forward_with_propagation_reverse_patched k)

theoremgidney_adder_forward_faithful_full_reverse_patched_wellTyped

theorem gidney_adder_forward_faithful_full_reverse_patched_wellTyped
    (bits : Nat) (hb2 : 2 ≤ bits) :
    Gate.WellTyped (adder_n_qubits bits)
      (gidney_adder_forward_faithful_full_reverse_patched bits)

theoremgidney_adder_full_faithful_no_measurement_patched_wellTyped

theorem gidney_adder_full_faithful_no_measurement_patched_wellTyped
    (bits : Nat) (hb2 : 2 ≤ bits) :
    Gate.WellTyped (adder_n_qubits bits)
      (gidney_adder_full_faithful_no_measurement_patched bits)

*Deliverable C**: full patched-adder WellTyped at the natural dimension `adder_n_qubits bits = 3 * bits + 2`.

theoremgidney_adder_patched_primitive

theorem gidney_adder_patched_primitive
    (bits a b : Nat) (hbits : 2 ≤ bits) (ha : a < 2^bits) (hb : b < 2^bits) :
    Gate.WellTyped (adder_n_qubits bits)
      (gidney_adder_full_faithful_no_measurement_patched bits)
    ∧ gidney_target_val bits
        (Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
          (adder_input_F bits a b))
      = (a + b) % 2^bits
    ∧ (∀ i, i < bits →
        Gate.applyNat (gidney_adder_full_faithful_no_measurement_patched bits)
          (adder_input_F bits a b) (read_idx i) = a.testBit i)
    ∧ (∀ i, i < bits →

*Deliverable D**: bundled reusable patched-adder primitive combining WellTyped, decoded target correctness, read preservation, and carry clearing — the single theorem the modular-addition layer should call.

FormalRV.Arithmetic.UnaryLookup

FormalRV/Arithmetic/UnaryLookup.lean

(no documented top-level declarations)

FormalRV.Arithmetic.UnaryLookup.UnaryLookupDefinitions

FormalRV/Arithmetic/UnaryLookup/UnaryLookupDefinitions.lean

## Register indexing for the unary lookup circuit Layout (top to bottom): ctrl[0], then `n_addr` pairs of (address[i], and[i]), then `n_word` word qubits. Index assignment: ctrl_idx = 0 address_idx i = 1 + 2*i (i = 0..n_addr-1) and_idx i = 1 + 2*i + 1 (i = 0..n_addr-1) word_idx n_addr j = 1 + 2*n_addr + j (j = 0..n_word-1)

defulookup_ctrl_idx

def ulookup_ctrl_idx : Nat

Qubit index for the controller bit (top wire in Fig. 4(b)).

defulookup_address_idx

def ulookup_address_idx (i : Nat) : Nat

Qubit index for the i-th address bit.

defulookup_and_idx

def ulookup_and_idx (i : Nat) : Nat

Qubit index for the i-th ancilla AND bit (interleaved with address).

defulookup_word_idx

def ulookup_word_idx (n_addr j : Nat) : Nat

Qubit index for the j-th word bit, given the number of address bits.

defunary_lookup_n_qubits

def unary_lookup_n_qubits (n_addr n_word : Nat) : Nat

Total qubits required for an `n_addr`-address-bit, `n_word`-word-bit unary lookup: 1 + 2*n_addr + n_word.

defunary_lookup_stub

def unary_lookup_stub (_n_addr _n_word : Nat) : Gate

Placeholder: the empty lookup (Iter 1 only encodes indexing).

defprefix_and_step

def prefix_and_step (i : Nat) : Gate

One step of the prefix-AND cascade at bit `i`: i=0 → CCX(ctrl, address[0], and[0]) i>0 → CCX(and[i-1], address[i], and[i]) Faithful translation of `PyCircuits/lookups/unary_lookup_qrisp.py:build_prefix_and_cascade`.

defprefix_and_cascade

def prefix_and_cascade : Nat → Gate
  | 0       => Gate.I
  | n + 1   => Gate.seq (prefix_and_cascade n) (prefix_and_step n)

The full forward prefix-AND cascade for `n_addr` address bits, composed via `Gate.seq`.

defprefix_and_uncompute_step

def prefix_and_uncompute_step (i : Nat) : Gate

One reverse step of the prefix-AND cascade — same gate as the forward step (CCX is self-inverse) but emitted in reverse order in `prefix_and_uncompute`. Provided as a separate def for clarity even though structurally `prefix_and_step` already encodes the gate.

defprefix_and_uncompute

def prefix_and_uncompute : Nat → Gate
  | 0       => Gate.I
  | n + 1   => Gate.seq (prefix_and_step n) (prefix_and_uncompute n)

The full reverse uncomputation cascade. Emits `prefix_and_step n-1` then `n-2`, ..., then `0`. Together with `prefix_and_cascade n`, forms the no-measurement upper-bound: total `2n` Toffolis.

defprefix_and_compute_and_uncompute

def prefix_and_compute_and_uncompute (n : Nat) : Gate

The no-measurement upper bound: forward + reverse cascade uses `2n` Toffolis. This represents the gate-level cost WITHOUT the Gidney- style measurement trick. The paper's optimization gets the per- iteration cost down to `n` (forward only, reverse uses measurements).

defx_gates_from_indices

def x_gates_from_indices : List Nat → Gate
  | []      => Gate.I
  | i :: xs => Gate.seq (x_gates_from_indices xs) (Gate.X i)

Helper: emit X gates at each index in the list.

defcx_gates_from_indices

def cx_gates_from_indices (ctrl : Nat) : List Nat → Gate
  | []        => Gate.I
  | tgt :: xs => Gate.seq (cx_gates_from_indices ctrl xs) (Gate.CX ctrl tgt)

Helper: emit CX gates with a fixed control and each target in the list.

defunary_lookup_iteration

def unary_lookup_iteration (n_addr : Nat)
    (addr_flip_idxs word_cnot_idxs : List Nat) : Gate

One iteration of the unary lookup loop targeting a specific address value. `addr_flip_idxs` is the list of address-bit indices to X-flip (so the cascade fires for the target value). `word_cnot_idxs` is the list of word-bit indices to write (per the table row at that address).

defunary_lookup_multi_iteration

def unary_lookup_multi_iteration (n_addr : Nat) :
    List (List Nat × List Nat) → Gate
  | []                     => Gate.I
  | (flips, cnots) :: rest =>
      Gate.seq (unary_lookup_multi_iteration n_addr rest)
               (unary_lookup_iteration n_addr flips cnots)

Compose `unary_lookup_iteration` for a list of `(addr_flips, word_cnots)` data tuples. Each tuple is one iteration of the lookup loop.

defprefix_and_step_post_state

def prefix_and_step_post_state (i : Nat) (f : Nat → Bool) : Nat → Bool

Per-step post-state: applying `prefix_and_step i` XORs `(prev ∧ address[i])` into `and[i]`, where `prev = ctrl` at i=0 and `and[i-1]` at i>0.

defprefix_and_cascade_post_state

def prefix_and_cascade_post_state : Nat → (Nat → Bool) → (Nat → Bool)
  | 0    , f => f
  | n + 1, f => prefix_and_step_post_state n (prefix_and_cascade_post_state n f)

Cascade post-state: fold of per-step post-states over bits 0..n-1. Matches the recursive structure of `prefix_and_cascade`.

structureULookupBitDisjointness

structure ULookupBitDisjointness (dim i : Nat) : Prop

Disjointness bundle for a single bit of the lookup prefix-AND cascade. The five conditions follow from the indexing structure `ulookup_*_idx i = 1 + 2*i (+1)`.

defgray_code_unary_lookup_toffoli_count

def gray_code_unary_lookup_toffoli_count (n_addr q_a : Nat) : Nat

Gray-code-amortized Toffoli count for a q_a-bit unary lookup: `n_addr` (initial cascade) + `(2^q_a - 1)` (one Toffoli per subsequent iteration).

defgray_code_residual_ratio

def gray_code_residual_ratio (n_addr q_a : Nat) : Nat × Nat

*Two-step closure roadmap**: the lookup review-gap (12× at q_a=6) decomposes as 2× (Gidney AND, closed Iter 43-44) × 6× (Gray-code, scaffolded here). With Gray-code Toffoli count = `n_addr + 2^q_a - 1`, the ratio Lean-Gray-code / paper-claim is `(n_addr + 2^q_a - 1) / 2^q_a ≈ 1.08` at q_a=6 — **down from 6× to ~8% residual**. The residual is the initial-cascade bookkeeping discussed above.

abbrevzeroFLook

abbrev zeroFLook : Nat → Bool

The all-zero input function (local re-abbreviation; the adder side defines a `zeroF` in its own namespace).

definputF_lookup_ctrl_addr_10

def inputF_lookup_ctrl_addr_10 : Nat → Bool
  | 0 => true   -- ctrl = 1
  | 1 => true   -- address_0 = 1
  | _ => false  -- and_0, address_1, and_1, ... = 0

Input for the lookup: ctrl=1 (qubit 0), address_0=1 (qubit 1), everything else false (and_0, address_1, and_1 all 0).

definputF_lookup_ctrl_addr_11

def inputF_lookup_ctrl_addr_11 : Nat → Bool
  | 0 => true   -- ctrl = 1
  | 1 => true   -- address_0 = 1
  | 3 => true   -- address_1 = 1
  | _ => false

*And another variant**: with `ctrl=1, address=11`, both AND ancillas should fire to 1.

definputF_lookup_q3_addr_110

def inputF_lookup_q3_addr_110 : Nat → Bool
  | 0 => true   -- ctrl = 1
  | 1 => true   -- addr_0 = 1
  | 3 => true   -- addr_1 = 1
  | _ => false  -- addr_2 = 0; and ancillas all 0

Input for `q_a = 3` lookup: ctrl=1, address = (1, 1, 0) LSB-first. The cascade should compute: - and_0 = ctrl ∧ addr_0 = 1 ∧ 1 = 1 - and_1 = and_0 ∧ addr_1 = 1 ∧ 1 = 1 - and_2 = and_1 ∧ addr_2 = 1 ∧ 0 = 0

definputF_lookup_q3_addr_111

def inputF_lookup_q3_addr_111 : Nat → Bool
  | 0 => true   -- ctrl
  | 1 => true   -- addr_0
  | 3 => true   -- addr_1
  | 5 => true   -- addr_2
  | _ => false  -- and ancillas

Input with all 3 address bits set: ctrl=1, address = (1, 1, 1). All ANDs fire to 1.

defLookup.address_and

def Lookup.address_and (ctrl : Bool) (addr : Nat) : Nat → Bool
  | 0     => ctrl
  | n + 1 => Lookup.address_and ctrl addr n && addr.testBit n

*Math AND of `ctrl` with the first `n` bits of `addr`**. `address_and ctrl addr n = ctrl ∧ addr.testBit 0 ∧ ... ∧ addr.testBit (n-1)`.

defLookup.cascade_step_invariant

def Lookup.cascade_step_invariant (k n : Nat) (ctrl : Bool) (addr : Nat)
    (post : Nat → Bool) : Prop

*Step-indexed prefix-AND cascade invariant** (Iter 219, analog of Iter 175's `Gidney.propagation_step_invariant`). After `k` steps of the prefix-AND cascade applied to a state where `f(ctrl_idx) = ctrl`, `f(address_idx i) = addr.testBit i`, and `f(and_idx i) = false` for all i < n: - For i < k (computed): post(and_idx i) = ctrl ∧ ⋀_{j ≤ i} addr.testBit j. - For i ≥ k (untouched): post(and_idx i) = false.

defLookup.x_flip_post_state

def Lookup.x_flip_post_state : List Nat → (Nat → Bool) → (Nat → Bool)
  | [], f => f
  | i :: xs, f =>
    let f'

*Classical post-state of `x_gates_from_indices xs`**: starting from `f`, apply X-flips to the indices in `xs` in the order matching the Gate.seq nesting (tail first, head last). With unique indices, the net effect is to XOR each listed position with `true`.

defLookup.cnot_layer_post_state

def Lookup.cnot_layer_post_state (ctrl : Nat) : List Nat → (Nat → Bool) → (Nat → Bool)
  | [], f => f
  | tgt :: xs, f =>
    let f'

*Classical post-state of `cx_gates_from_indices ctrl xs`**: each CX(ctrl, tgt) does `tgt := tgt ⊕ ctrl`. In the order matching the Gate.seq nesting, the tail is applied first. **Crucially**: the control wire `ctrl` is never the target of any CX in this layer, so its value is preserved across the layer (see `cnot_layer_post_state_ctrl_unchanged` below).

defprefix_and_uncompute_post_state

def prefix_and_uncompute_post_state : Nat → (Nat → Bool) → (Nat → Bool)
  | 0    , f => f
  | n + 1, f => prefix_and_uncompute_post_state n (prefix_and_step_post_state n f)

*Boolean post-state of the reverse cascade**: applies `prefix_and_step_post_state` in the reverse order (n-1, n-2, ..., 0), matching `prefix_and_uncompute n = seq (step (n-1)) (...) (step 0)`.

defLookup.iteration_post_state

def Lookup.iteration_post_state
    (n_addr : Nat) (addr_flip_idxs word_cnot_idxs : List Nat)
    (f : Nat → Bool) : Nat → Bool

*Boolean post-state of `unary_lookup_iteration`**. The 5-stage composition mirrors the Gate.seq structure of `unary_lookup_iteration`: `flips · cascade · cnots · uncompute · flips`.

defLookup.AllWordIdx

def Lookup.AllWordIdx (n_addr : Nat) (xs : List Nat) : Prop

*All elements of `xs` are word-register indices** (i.e., ≥ 1 + 2·n_addr). Captures the structural condition that CNOT targets in a lookup iteration write to the word register, not the ctrl/address/and registers.

defLookup.multi_iteration_post_state

def Lookup.multi_iteration_post_state (n_addr : Nat) :
    List (List Nat × List Nat) → (Nat → Bool) → (Nat → Bool)
  | [],                     f => f
  | (flips, cnots) :: rest, f =>
      Lookup.iteration_post_state n_addr flips cnots
        (Lookup.multi_iteration_post_state n_addr rest f)

*Boolean post-state of `unary_lookup_multi_iteration`**. Recursive fold matching the gate-level structure: each `(flips, cnots)` tuple in the iter list contributes one application of `iteration_post_state`.

defLookup.iter_triggers

def Lookup.iter_triggers (ctrl : Bool) (addr : Nat) (n_addr : Nat)
    (flips : List Nat) : Bool

*Iter trigger predicate** (pure classical): true iff the iter's prefix-AND chain fires on input `(ctrl, addr)`, equivalently iff `ctrl` is true and the effective address (addr XOR flip mask) is all-ones for the first `n_addr` bits. Equivalent to `Lookup.address_and ctrl effective_addr n_addr` where `effective_addr.testBit i = xor (addr.testBit i) (decide (ulookup_address_idx i ∈ flips))`.

defLookup.multi_iteration_xor_value

def Lookup.multi_iteration_xor_value
    (ctrl : Bool) (addr : Nat) (n_addr : Nat) :
    List (List Nat × List Nat) → Nat → Bool
  | [], _ => false
  | (flips, cnots) :: rest, p =>
      xor (decide (p ∈ cnots) && Lookup.iter_triggers ctrl addr n_addr flips)
          (Lookup.multi_iteration_xor_value ctrl addr n_addr rest p)

*Multi-iteration XOR contribution at a word position** (pure classical). For a word position `p`, the boolean XOR contribution is `XOR` over all iters of `(p ∈ cnots_i) AND (iter_i triggers)`.

defLookup.effective_addr

def Lookup.effective_addr (addr : Nat) (flips : List Nat) : Nat → Nat
  | 0     => 0
  | n + 1 =>
    let lower

*Effective address Nat construction (Iter 253 reform via `Nat.lor`)**. Recursively builds a Nat whose i-th bit (for i < n) equals `xor (addr.testBit i) (decide (ulookup_address_idx i ∈ flips))`. Uses bitwise OR (`|||`) instead of addition. With OR, the testBit characterization (Iter 254) is straightforward via `Nat.testBit_or` and `Nat.testBit_two_pow`.

defLookup.multi_iteration_xor_value_via_address_and

def Lookup.multi_iteration_xor_value_via_address_and
    (ctrl : Bool) (addr : Nat) (n_addr : Nat) :
    List (List Nat × List Nat) → Nat → Bool
  | [], _ => false
  | (flips, cnots) :: rest, p =>
      xor (decide (p ∈ cnots) &&
           Lookup.address_and ctrl
             (Lookup.effective_addr addr flips n_addr) n_addr)
          (Lookup.multi_iteration_xor_value_via_address_and ctrl addr n_addr rest p)

*Classical XOR contribution at a word position** (via address_and). Recursive fold matching the multi-iter post-state structure.

FormalRV.Arithmetic.UnaryLookup.UnaryLookupGateDerivations

FormalRV/Arithmetic/UnaryLookup/UnaryLookupGateDerivations.lean

## Smoke tests matching Fig. 4(b)'s example (n_addr=3, n_word=6)

example(example)

example : unary_lookup_n_qubits 3 6 = 13

1 + 2·3 + 6 = 13 qubits, matching the 13 horizontal wires in Fig. 4(b)'s example diagram.

example(example)

example : ulookup_ctrl_idx = 0

ctrl[0] is wire 0.

example(example)

example : ulookup_address_idx 0 = 1 ∧ ulookup_and_idx 0 = 2

address[0] is wire 1, and[0] is wire 2 (highlighted red in the figure).

example(example)

example : ulookup_address_idx 1 = 3 ∧ ulookup_and_idx 1 = 4

address[1] is wire 3, and[1] is wire 4 (also highlighted red).

example(example)

example : ulookup_address_idx 2 = 5 ∧ ulookup_and_idx 2 = 6

address[2] is wire 5, and[2] is wire 6.

example(example)

example : ulookup_word_idx 3 0 = 7 ∧ ulookup_word_idx 3 5 = 12

word[0..5] are wires 7..12.

example(example)

example : tcount (unary_lookup_stub 3 6) = 0

Smoke: stub has T-count 0 (placeholder; real circuit has many).

theoremgcount_prefix_and_step

theorem gcount_prefix_and_step (i : Nat) : gcount (prefix_and_step i) = 1

Each cascade step is exactly 1 Toffoli (`gcount = 1`), regardless of which branch of the `if` fires.

theoremtcount_prefix_and_step

theorem tcount_prefix_and_step (i : Nat) : tcount (prefix_and_step i) = 7

Each cascade step is exactly 7 T-gates (`tcount = 7`).

example(example)

example : tcount (prefix_and_step 0) = 7

example(example)

example : tcount (prefix_and_step 5) = 7

example(example)

example : gcount (prefix_and_step 7) = 1

example(example)

example : tcount (prefix_and_cascade 3) = 21

The 3-bit prefix-AND cascade has exactly 3 Toffolis = 21 T-gates.

example(example)

example : gcount (prefix_and_cascade 3) = 3

theoremgcount_prefix_and_cascade

theorem gcount_prefix_and_cascade (n : Nat) :
    gcount (prefix_and_cascade n) = n

General Toffoli count: an `n`-bit prefix-AND cascade has exactly `n` Toffolis. **Gate-derived** from the recursive definition — no paper claim involved. Matches Iter 5 Python's predicted `n_addr` Toffolis.

theoremtcount_prefix_and_cascade

theorem tcount_prefix_and_cascade (n : Nat) :
    tcount (prefix_and_cascade n) = 7 * n

T-count of the cascade is `7n` (one Toffoli = 7 T after decomposition).

theoremgcount_prefix_and_uncompute_step

theorem gcount_prefix_and_uncompute_step (i : Nat) :
    gcount (prefix_and_uncompute_step i) = 1

Each uncompute step is exactly 1 Toffoli.

theoremgcount_prefix_and_uncompute

theorem gcount_prefix_and_uncompute (n : Nat) :
    gcount (prefix_and_uncompute n) = n

Toffoli count of the reverse cascade: also exactly `n` Toffolis.

theoremtcount_prefix_and_uncompute

theorem tcount_prefix_and_uncompute (n : Nat) :
    tcount (prefix_and_uncompute n) = 7 * n

T-count of the reverse cascade: `7n`.

theoremgcount_prefix_and_compute_and_uncompute

theorem gcount_prefix_and_compute_and_uncompute (n : Nat) :
    gcount (prefix_and_compute_and_uncompute n) = 2 * n

theoremtcount_x_gates_zero

theorem tcount_x_gates_zero (xs : List Nat) : tcount (x_gates_from_indices xs) = 0

All X-gate sequences are T-count zero.

theoremtcount_cx_gates_zero

theorem tcount_cx_gates_zero (ctrl : Nat) (xs : List Nat) :
    tcount (cx_gates_from_indices ctrl xs) = 0

All CX-gate sequences are T-count zero.

theoremgcount_x_gates_from_indices

theorem gcount_x_gates_from_indices (xs : List Nat) :
    gcount (x_gates_from_indices xs) = xs.length

Gate-count of `x_gates_from_indices xs` is the list length: one X per index, identity contributes 0.

theoremgcount_cx_gates_from_indices

theorem gcount_cx_gates_from_indices (ctrl : Nat) (xs : List Nat) :
    gcount (cx_gates_from_indices ctrl xs) = xs.length

Gate-count of `cx_gates_from_indices ctrl xs` is the list length.

theoremtcount_unary_lookup_iteration

theorem tcount_unary_lookup_iteration (n_addr : Nat)
    (addr_flip_idxs word_cnot_idxs : List Nat) :
    tcount (unary_lookup_iteration n_addr addr_flip_idxs word_cnot_idxs)
      = 14 * n_addr

The iteration body has T-count `14 · n_addr` regardless of the address pattern or word pattern (only the two cascades contribute T).

theoremgcount_unary_lookup_iteration

theorem gcount_unary_lookup_iteration (n_addr : Nat)
    (addr_flip_idxs word_cnot_idxs : List Nat) :
    gcount (unary_lookup_iteration n_addr addr_flip_idxs word_cnot_idxs)
      = 2 * addr_flip_idxs.length + 2 * n_addr + word_cnot_idxs.length

*Gate-count of one iteration body**: `2·|addr_flips| + 2·n_addr + |word_cnots|`. Decomposes as: forward+reverse X-flip layers contribute `2·|addr_flips|`; forward+reverse prefix-AND cascades contribute `2·n_addr`; word CNOTs contribute `|word_cnots|`. Derived purely from the gate sequence of `unary_lookup_iteration` — no paper-claim constant. This is the **leaf gate-count review claim** for one iteration body, mirroring `tcount_unary_lookup_iteration` (Iter 14) but at the structural (all-gate) level.

theoremtcount_unary_lookup_multi_iteration

theorem tcount_unary_lookup_multi_iteration (n_addr : Nat)
    (iters : List (List Nat × List Nat)) :
    tcount (unary_lookup_multi_iteration n_addr iters)
      = 14 * n_addr * iters.length

T-count of the multi-iteration cascade: `14 · n_addr · |iters|` regardless of the data carried in the iterations (each iteration contributes a fixed `14 · n_addr`, only Toffolis matter).

example(example)

example :
    tcount (unary_lookup_multi_iteration 3
              [([], []), ([], []), ([], []), ([], []),
               ([], []), ([], []), ([], []), ([], [])])
      = 336

Concrete: at n_addr=3 with 8 iterations (= 2^3), total T-count is `14 · 3 · 8 = 336`. This is the **no-measurement** bound; the paper's `2^q_a = 8` Toffolis = 56 T requires Gidney measurement + Gray-code amortization.

theoremgcount_unary_lookup_multi_iteration

theorem gcount_unary_lookup_multi_iteration (n_addr : Nat)
    (iters : List (List Nat × List Nat)) :
    gcount (unary_lookup_multi_iteration n_addr iters)
      = 2 * n_addr * iters.length
        + 2 * (iters.map (fun p => p.1.length)).sum
        + (iters.map (fun p => p.2.length)).sum

*Gate-count of the multi-iteration cascade** (Iter 77): each iteration contributes its data-dependent gcount `2·|flips_i| + 2·n_addr + |cnots_i|` (Iter 76 leaf), and the multi-iteration gcount is the sum of those. Expressed as a sum: total gates = `2·n_addr · |iters| + 2 · (Σᵢ |flipsᵢ|) + (Σᵢ |cnotsᵢ|)`. Derived purely from the gate sequence via induction on the iter-list, using `gcount_unary_lookup_iteration` (Iter 76) at each step. Mirrors `tcount_unary_lookup_multi_iteration` but aggregates data-dependent gate counts (vs T-count's uniform `14 · n_addr` per iteration).

theoremunary_lookup_two_factor_gap

theorem unary_lookup_two_factor_gap (n_addr : Nat)
    (iters : List (List Nat × List Nat)) :
    tcount (unary_lookup_multi_iteration n_addr iters)
      = 2 * n_addr * (7 * iters.length)

*Lookup review finding theorem**: the no-measurement / no-Gray-code T-count of the n_addr-bit unary lookup with `addr_count` iterations is `2 · n_addr ·` the paper's per-iteration T-count. At `addr_count = 2^q_a`, this gives the full two-factor gap.

example(example)

example :
    tcount (unary_lookup_multi_iteration 6
              (List.replicate 64 ([], [])))
      = 5376

Concrete at q_a=6 (RSA-2048 case), simulated with a list of 64 empty-data iterations: `14 · 6 · 64 = 5376` T-gates (Lean no-measurement) vs `7 · 64 = 448` T-gates (paper, with full optimization). The two-factor gap is `2 · 6 = 12` ×.

example(example)

example : 5376 = 12 * 448

The 12× gap at q_a=6 is exactly `2 · n_addr` — formally captured.

theoremunary_lookup_factor_decomposition_2_times_n_addr

theorem unary_lookup_factor_decomposition_2_times_n_addr
    (n_addr : Nat) (iters : List (List Nat × List Nat)) :
    tcount (unary_lookup_multi_iteration n_addr iters)
      = 2 * (n_addr * (7 * iters.length))

*Review factor decomposition** (Iter 119): the `2 · n_addr` multiplier of `unary_lookup_two_factor_gap` factors into: - **2**: no-measurement factor (matches the adder's `gidney_adder_full_faithful_no_measurement_vs_measurement_factor` from Iter 88 — uses explicit-reverse instead of Gidney's measurement-AND trick). - **n_addr**: no-Gray-code factor (lookup-specific — qianxu's Gray-code amortization reduces n_addr Toffolis per cascade to 1 amortized across consecutive iterations). Concrete decomposition at q_a=6 (RSA-2048 inner-product lookup): Lean T-count = 12 × paper claim = (2 measurement × 6 Gray-code) × paper.

theoremunary_lookup_factor_decomposition_n_addr_times_2

theorem unary_lookup_factor_decomposition_n_addr_times_2
    (n_addr : Nat) (iters : List (List Nat × List Nat)) :
    tcount (unary_lookup_multi_iteration n_addr iters)
      = n_addr * (2 * (7 * iters.length))

Mirror decomposition: `n_addr · (2 · 7 · iters.length)`. Same total but groups by the Gray-code factor first.

theoremprefix_and_step_zero_correct

theorem prefix_and_step_zero_correct (dim : Nat) (f : Nat → Bool)
    (h0 : ulookup_ctrl_idx < dim)
    (h1 : ulookup_address_idx 0 < dim)
    (h2 : ulookup_and_idx 0 < dim) :
    uc_eval (Gate.toUCom dim (prefix_and_step 0)) * f_to_vec dim f
      = f_to_vec dim
          (update f (ulookup_and_idx 0)
            (xor (f (ulookup_and_idx 0))
                 (f ulookup_ctrl_idx && f (ulookup_address_idx 0))))

*`prefix_and_step 0` correctness**: on a classical basis state, the i=0 step XORs `(ctrl ∧ address[0])` into `and[0]`. The standard unary-cascade base case. Proven via the Iter 52 reusable `gate_ccx_acts_on_basis` framework.

theoremprefix_and_step_succ_correct

theorem prefix_and_step_succ_correct (dim i : Nat) (f : Nat → Bool)
    (h_and_i  : ulookup_and_idx i < dim)
    (h_addr   : ulookup_address_idx (i + 1) < dim)
    (h_and_i1 : ulookup_and_idx (i + 1) < dim) :
    uc_eval (Gate.toUCom dim (prefix_and_step (i + 1))) * f_to_vec dim f
      = f_to_vec dim
          (update f (ulookup_and_idx (i + 1))
            (xor (f (ulookup_and_idx (i + 1)))
                 (f (ulookup_and_idx i) && f (ulookup_address_idx (i + 1)))))

*`prefix_and_step (i+1)` correctness**: on a classical basis state, the i>0 step XORs `(and[i] ∧ address[i+1])` into `and[i+1]`. The chain step of the unary cascade. Proven via `gate_ccx_acts_on_basis`.

example(example)

example (dim : Nat) (f : Nat → Bool)
    (h_and_2 : ulookup_and_idx 2 < dim)
    (h_addr_3 : ulookup_address_idx 3 < dim)
    (h_and_3 : ulookup_and_idx 3 < dim) :
    uc_eval (Gate.toUCom dim (prefix_and_step 3)) * f_to_vec dim f
      = f_to_vec dim
          (update f (ulookup_and_idx 3)
            (xor (f (ulookup_and_idx 3))
                 (f (ulookup_and_idx 2) && f (ulookup_address_idx 3))))

Concrete: at i=2 (chain step), the per-step action XORs `and[2] ∧ address[3]` into `and[3]`. Note that `prefix_and_step 3` triggers the i>0 branch (since 3 ≠ 0).

theoremprefix_and_step_correct

theorem prefix_and_step_correct (dim i : Nat) (f : Nat → Bool)
    (h_ctrl : ulookup_ctrl_idx < dim)
    (h_and_pred : ulookup_and_idx (i - 1) < dim)
    (h_addr : ulookup_address_idx i < dim)
    (h_and : ulookup_and_idx i < dim) :
    uc_eval (Gate.toUCom dim (prefix_and_step i)) * f_to_vec dim f
      = f_to_vec dim (prefix_and_step_post_state i f)

*Unified per-step correctness**: combines the i=0 and i>0 cases via the new `prefix_and_step_post_state`. Useful as the inductive step in the cascade correctness proof below.

theoremprefix_and_step_involutive

theorem prefix_and_step_involutive (dim i : Nat) (f : Nat → Bool)
    (h_ctrl : ulookup_ctrl_idx < dim)
    (h_and_pred : ulookup_and_idx (i - 1) < dim)
    (h_addr : ulookup_address_idx i < dim)
    (h_and : ulookup_and_idx i < dim) :
    uc_eval (Gate.toUCom dim (Gate.seq (prefix_and_step i) (prefix_and_step i)))
      * f_to_vec dim f
      = f_to_vec dim f

*Lookup `prefix_and_step` is involutive at the gate-IR level.** For any `i`, applying `prefix_and_step i` twice acts as identity on classical basis states. Direct lift of `gate_ccx_ccx_id_on_basis` via case-splitting on `i = 0`. **First Verified-tier lookup-side involution** — building block for Iter 71's `prefix_and_cascade · prefix_and_uncompute = identity` proof (the lookup analog of Iter 69's adder-side closure).

theoremprefix_and_step_step_eq_one

theorem prefix_and_step_step_eq_one (dim i : Nat)
    (h_ctrl : ulookup_ctrl_idx < dim)
    (h_and_pred : ulookup_and_idx (i - 1) < dim)
    (h_addr : ulookup_address_idx i < dim)
    (h_and : ulookup_and_idx i < dim) :
    uc_eval (Gate.toUCom dim (Gate.seq (prefix_and_step i) (prefix_and_step i)))
      = (1 : Matrix (Fin (2^dim)) (Fin (2^dim)) ℂ)

*Matrix-level form of `prefix_and_step_involutive`** (Iter 71): `uc_eval (seq (step i) (step i)) = 1`, independent of any basis vector. Useful for cascade-level proofs where we re-associate matrix products and need to collapse pairs to 1 in the middle. Proven via case-split on `i = 0` and reduction to `CCX_CCX_eq_one` (matrix-level CCX involution from PadAction).

theoremprefix_and_cascade_correct

theorem prefix_and_cascade_correct
    (dim : Nat) (hdim : 0 < dim) (f : Nat → Bool) :
    ∀ n, (∀ i, i < n → ULookupBitDisjointness dim i) →
    uc_eval (Gate.toUCom dim (prefix_and_cascade n)) * f_to_vec dim f
      = f_to_vec dim (prefix_and_cascade_post_state n f)
  | 0    , _ =>

*Faithful n-step prefix-AND cascade correctness**: given disjointness on each bit 0..n-1, the cascade acts on `f_to_vec dim f` to produce `f_to_vec dim (prefix_and_cascade_post_state n f)`. Proof by structural recursion on n, using `gate_seq_acts_on_basis` + IH + per-step correctness (Iter 63). *Second Verified-tier review chain (lookup side, mirroring Iter 58 for the adder).**

theoremprefix_and_cascade_uncompute_eq_one

theorem prefix_and_cascade_uncompute_eq_one
    (dim : Nat) (hdim : 0 < dim) :
    ∀ n, (∀ i, i < n → ULookupBitDisjointness dim i) →
    uc_eval (Gate.toUCom dim
              (Gate.seq (prefix_and_cascade n) (prefix_and_uncompute n)))
      = (1 : Matrix (Fin (2^dim)) (Fin (2^dim)) ℂ)
  | 0    , _ =>

*Matrix-level cascade · uncompute = identity**. The n-step forward cascade composed with the n-step reverse cascade is the identity matrix. Proof by structural induction on n, re-associating the matrix products to expose the per-step `prefix_and_step · prefix_and_step` involution (`prefix_and_step_step_eq_one` from Iter 71). *Third Verified-tier review chain (lookup side)** — composition of the n-step forward cascade (Iter 64) with its uncomputation is the identity matrix. Confirms that without measurement-based uncomputation, the lookup ancillas ARE faithfully reset to zero on the basis-state image.

theoremunary_lookup_tcount_matches_PaperClaims

theorem unary_lookup_tcount_matches_PaperClaims (q_a : Nat)
    (iters : List (List Nat × List Nat))
    (hlen : iters.length = qianxu_E9_lookup_gate_derived_count q_a) :
    tcount (unary_lookup_multi_iteration q_a iters)
      = 2 * q_a * (7 * qianxu_E9_lookup_gate_derived_count q_a)

*Bridge theorem**: at `n_addr = q_a` address bits and `iters.length = 2^q_a` iterations (the full unary loop), the Lean no-measurement no-Gray-code T-count is `2 · q_a · 7 · qianxu_E9_lookup_gate_derived_count q_a`. This formally connects the gate-derived count to the PaperClaims data def, parallel to Iter 22's `gidney_adder_forward_tcount_matches_PaperClaims`.

example(example)

example :
    tcount (unary_lookup_multi_iteration 6 (List.replicate 64 ([], [])))
      = 2 * 6 * (7 * qianxu_E9_lookup_gate_derived_count 6)

Concrete bridge check at q_a=6 (RSA-2048 case): with 64 iterations, Lean encodes 5376 T-gates = 2 · 6 · 7 · 64.

example(example)

example : gray_code_unary_lookup_toffoli_count 6 6 = 69

For RSA-2048 (q_a=6, n_addr=6): Gray-code count = 69 Toffolis.

example(example)

example :
    gray_code_unary_lookup_toffoli_count 6 6
      - qianxu_E9_lookup_gate_derived_count 6 = 5

Gap analysis: at q_a=6, Lean Gray-code count (69) is 5 more than the paper's exact `2^q_a = 64` Toffoli claim. The +5 is the initial cascade cost (n_addr - 1, since the first Toffoli is already counted in 2^q_a per Gidney).

example(example)

example : gray_code_residual_ratio 6 6 = (69, 64)

For RSA-2048: Lean Gray-code 69 vs paper 64; residual ~8%.

theoremgray_code_residual_eq_n_addr_minus_one

theorem gray_code_residual_eq_n_addr_minus_one (n_addr q_a : Nat) (h : 0 < n_addr)
    (_hq : 0 < q_a) :
    gray_code_unary_lookup_toffoli_count n_addr q_a
      = qianxu_E9_lookup_gate_derived_count q_a + (n_addr - 1)

The Gray-code Toffoli count exceeds the paper's `2^q_a` claim by exactly `n_addr - 1` — the initial-cascade setup cost.

example(example)

example :
    gray_code_unary_lookup_toffoli_count 6 6
      = qianxu_E9_lookup_gate_derived_count 6 + (6 - 1)

Concrete: at RSA-2048 (n_addr=6, q_a=6), residual = 5 = 6 - 1.

theoremlookup_review_gap_closure

theorem lookup_review_gap_closure (n_addr q_a : Nat) (h : 0 < n_addr) (hq : 0 < q_a) :
    gray_code_unary_lookup_toffoli_count n_addr q_a
      - qianxu_E9_lookup_gate_derived_count q_a
      = n_addr - 1

*Lookup-side review closure**: the Lean count exceeds the paper count by EXACTLY `n_addr - 1` Toffolis (the initial-cascade setup). Combined with Iter 44's Gidney closure, the original 12× gap at q_a=6 is **fully attributed**: 6× from Gray-code (now formalized, residual `n_addr - 1 = 5`), 2× from Gidney (Iter 44, closed).

theoremprefix_and_step_post_state_on_zero

theorem prefix_and_step_post_state_on_zero (i : Nat) :
    prefix_and_step_post_state i zeroFLook = zeroFLook

`prefix_and_step` on zero input gives zero (both `i = 0` and `i > 0` branches). Single CCX writes `xor false (false ∧ false) = false`, a no-op via `Function.update_eq_self`.

theoremprefix_and_cascade_post_state_on_zero

theorem prefix_and_cascade_post_state_on_zero : ∀ n,
    prefix_and_cascade_post_state n zeroFLook = zeroFLook
  | 0     => rfl
  | n + 1 =>

Prefix-AND cascade on zero input gives zero. Induction on n.

theoremprefix_and_cascade_uncompute_on_zero

theorem prefix_and_cascade_uncompute_on_zero
    (dim : Nat) (hdim : 0 < dim) (n : Nat)
    (hyp : ∀ i, i < n → ULookupBitDisjointness dim i) :
    uc_eval (Gate.toUCom dim
              (Gate.seq (prefix_and_cascade n) (prefix_and_uncompute n)))
      * f_to_vec dim zeroFLook
      = f_to_vec dim zeroFLook

*Cascade and its uncompute compose to identity on the zero state vector**. Direct corollary of Iter 74's matrix-level `prefix_and_cascade_uncompute_eq_one`, applied to `f_to_vec dim zeroFLook`. This is the lookup analog of Iter 89's adder zero-input smoke test (modulo the absence of the final-CX cascade analog on the lookup side).

example(example)

example :
    let post

*Concrete prefix-AND cascade action check** for a 2-step cascade on `inputF_lookup_ctrl_addr_10`. After two steps: - and_0 (qubit 2) = ctrl ∧ address_0 = 1 ∧ 1 = 1 ✓ - and_1 (qubit 4) = and_0 ∧ address_1 = 1 ∧ 0 = 0 ✓ `decide` reduces the nested `update` chain at each specific qubit index. Verifies the cascade correctly computes the AND-chain on a non-trivial input.

example(example)

example :
    let post

example(example)

example :
    let post

*3-step cascade decide-check on (ctrl=1, addr=110)**. The final AND ancilla (and_2 at qubit 6) is 0 because addr_2 = 0 breaks the chain.

example(example)

example :
    let post

*3-step cascade decide-check on (ctrl=1, addr=111)**. All AND ancillas fire to 1 (the chain propagates fully).

example(example)

example :
    tcount (unary_lookup_iteration 3 [0, 2] [0, 1, 3]) = 42

*Concrete iteration tcount** at q_a=3, |flips|=2, |cnots|=3: T-count = 14·3 = 42 (data-independent — only Toffolis count).

example(example)

example :
    gcount (unary_lookup_iteration 3 [0, 2] [0, 1, 3]) = 13

*Concrete iteration gcount** at the same instance: gcount = 2·|flips| + 2·n_addr + |cnots| = 2·2 + 2·3 + 3 = 13.

example(example)

example :
    tcount (unary_lookup_multi_iteration 3
              [([], []), ([0], [1]), ([1], [2]), ([0, 1], [0, 1])])
      = 168

*Multi-iteration concrete tcount** at q_a=3 with 4 iterations: T-count = 14·3·4 = 168 (data-independent).

example(example)

example :
    tcount (unary_lookup_multi_iteration 3
              (List.replicate 8 ([], []))) = 336

*qianxu Fig. 4b instance** (q_a=3, q_w=6, full 2^3=8 iterations with the data implied by the figure's red-highlighted Toffolis and bit-flip pattern). Counts the no-measurement no-Gray-code bound: 14·3·8 = 336 T-gates total.

example(example)

example :
    Lookup.cascade_step_invariant 0 3 true 3
      (fun i => if i = ulookup_ctrl_idx then true
                else if i = ulookup_address_idx 0 then true
                else if i = ulookup_address_idx 1 then true
                else if i = ulookup_address_idx 2 then false
                else false)

*Decide-witness on (n=3, k=0, ctrl=true, addr=3=0b011)** (Iter 219). No cascade steps applied: all and qubits are false.

example(example)

example :
    Lookup.cascade_step_invariant 2 3 true 3
      (fun i =>
        if i = ulookup_ctrl_idx then true
        else if i = ulookup_address_idx 0 then true
        else if i = ulookup_address_idx 1 then true
        else if i = ulookup_address_idx 2 then false
        else if i = ulookup_and_idx 0 then true   -- and_0 = ctrl ∧ addr_0 = 1
        else if i = ulookup_and_idx 1 then true   -- and_1 = and_0 ∧ addr_1 = 1
        else false)

*Decide-witness on (n=3, k=2, ctrl=true, addr=3)** (Iter 219). After 2 steps: and_0 = and_1 = true (chain of ANDs), and_2 = false.

example(example)

example :
    Lookup.cascade_step_invariant 3 3 true 3
      (fun i =>
        if i = ulookup_ctrl_idx then true
        else if i = ulookup_address_idx 0 then true
        else if i = ulookup_address_idx 1 then true
        else if i = ulookup_address_idx 2 then false
        else if i = ulookup_and_idx 0 then true
        else if i = ulookup_and_idx 1 then true
        else if i = ulookup_and_idx 2 then false
        else false)

*Decide-witness on (n=3, k=3, ctrl=true, addr=3)** (Iter 219). Full cascade: and_2 = (1 ∧ 1) ∧ 0 = 0 (the top bit kills it).

theoremLookup.cascade_step_preserves

theorem Lookup.cascade_step_preserves
    (k n : Nat) (hk : k < n) (ctrl : Bool) (addr : Nat) (f : Nat → Bool)
    (h_ctrl : f ulookup_ctrl_idx = ctrl)
    (h_addr : ∀ i, i < n → f (ulookup_address_idx i) = addr.testBit i)
    (h_inv : Lookup.cascade_step_invariant k n ctrl addr f) :
    Lookup.cascade_step_invariant (k + 1) n ctrl addr
      (prefix_and_step_post_state k f)

*Per-step cascade invariant preservation** (Iter 220). Given an initial state `f` satisfying the step-`k` cascade invariant (with `ctrl` and `address` contents fixed in `f`), applying `prefix_and_step_post_state k` yields a state satisfying the step-`k+1` invariant. The proof case-splits on the position `i`: `i = k`: the updated qubit. Compute the new value as `prev ∧ addr.testBit k`, where `prev = ctrl` if `k = 0` and `prev = and_{k-1} = address_and ctrl addr k` otherwise. By the definition of `address_and`, this is `address_and ctrl addr (k+1)`. `i ≠ k`: untouched (frame condition via `update_neq`). The step-`k` value carries through unchanged.

theoremprefix_and_step_post_state_frame

theorem prefix_and_step_post_state_frame
    (k : Nat) (f : Nat → Bool) (j : Nat)
    (h_neq : j ≠ ulookup_and_idx k) :
    prefix_and_step_post_state k f j = f j

*Per-step frame condition**: `prefix_and_step_post_state k f` agrees with `f` outside `ulookup_and_idx k`. Both i=0 and i>0 branches of the post-state definition write to a single qubit (`ulookup_and_idx 0` and `ulookup_and_idx k` respectively).

theoremprefix_and_cascade_post_state_frame_ctrl

theorem prefix_and_cascade_post_state_frame_ctrl
    (n : Nat) (f : Nat → Bool) :
    prefix_and_cascade_post_state n f ulookup_ctrl_idx = f ulookup_ctrl_idx

*Cascade frame for the ctrl qubit**: the n-step cascade post-state agrees with `f` at `ulookup_ctrl_idx`. Proof by structural recursion on n; each step writes to `ulookup_and_idx _ ≠ ulookup_ctrl_idx = 0`.

theoremprefix_and_cascade_post_state_frame_addr

theorem prefix_and_cascade_post_state_frame_addr
    (n : Nat) (f : Nat → Bool) (j : Nat) :
    prefix_and_cascade_post_state n f (ulookup_address_idx j)
      = f (ulookup_address_idx j)

*Cascade frame for the address bits**: the n-step cascade post-state agrees with `f` at every `ulookup_address_idx j`. Address indices have parity 1 (`1 + 2*j`); and indices have parity 0 (`2 + 2*i`), so they are always disjoint.

theoremLookup.cascade_step_invariant_holds

theorem Lookup.cascade_step_invariant_holds
    (k n : Nat) (hk : k ≤ n) (ctrl : Bool) (addr : Nat) (f : Nat → Bool)
    (h_ctrl : f ulookup_ctrl_idx = ctrl)
    (h_addr : ∀ i, i < n → f (ulookup_address_idx i) = addr.testBit i)
    (h_clean : ∀ i, i < n → f (ulookup_and_idx i) = false) :
    Lookup.cascade_step_invariant k n ctrl addr
      (prefix_and_cascade_post_state k f)

*Cascade invariant holds at every step `k ≤ n`**. By induction on `k`: `k = 0`: the cascade post-state is `f`, which has all `and_idx` qubits clean by hypothesis. Matches `if i < 0 then ... else false = false`. `k+1` step: by IH, the k-step cascade satisfies the step-`k` invariant. The cascade frame lemmas (Iter 221) ensure ctrl and address are preserved, so `Lookup.cascade_step_preserves` (Iter 220) lifts the step-`k` invariant on `cascade_post k f` to the step-`k+1` invariant on `cascade_post (k+1) f = step_post k (cascade_post k f)`.

theoremprefix_and_cascade_top_bit_eq_address_and

theorem prefix_and_cascade_top_bit_eq_address_and
    (n : Nat) (hn : 0 < n) (ctrl : Bool) (addr : Nat) (f : Nat → Bool)
    (h_ctrl : f ulookup_ctrl_idx = ctrl)
    (h_addr : ∀ i, i < n → f (ulookup_address_idx i) = addr.testBit i)
    (h_clean : ∀ i, i < n → f (ulookup_and_idx i) = false) :
    prefix_and_cascade_post_state n f (ulookup_and_idx (n - 1))
      = Lookup.address_and ctrl addr n

*Top-bit corollary** (Iter 223, 2026-05-13). After the n-step cascade, the top and-bit `ulookup_and_idx (n - 1)` carries the full `Lookup.address_and ctrl addr n` value (`ctrl ∧ ⋀_{j < n} addr.testBit j`). Direct specialization of `cascade_step_invariant_holds` at k = n and i = n - 1. This is the "trigger bit" read by the word-CNOT layer of the lookup iteration body — the value that decides whether the table row fires on this iteration's address. Lookup analog of Iter 199's `Adder.sumfb_eq_testBit_add` (final-bit extraction from the forward cascade).

example(example)

example :
    let f : Nat → Bool

*Decide-witness on (n=3, ctrl=true, addr=7=0b111)** (Iter 223). Full address all-ones: top and-bit = ctrl ∧ 1 ∧ 1 ∧ 1 = true.

example(example)

example :
    let f : Nat → Bool

*Decide-witness on (n=3, ctrl=true, addr=3=0b011)** (Iter 223). Top address bit (addr_2) is 0, killing the chain: top and-bit = false.

theoremLookup.x_flip_post_state_frame

theorem Lookup.x_flip_post_state_frame
    (xs : List Nat) (f : Nat → Bool) (j : Nat) (h : j ∉ xs) :
    Lookup.x_flip_post_state xs f j = f j

*X-flip layer frame condition**: positions not in the flip list are unchanged by the layer.

theoremLookup.cnot_layer_post_state_frame

theorem Lookup.cnot_layer_post_state_frame
    (ctrl : Nat) (xs : List Nat) (f : Nat → Bool) (j : Nat) (h : j ∉ xs) :
    Lookup.cnot_layer_post_state ctrl xs f j = f j

*CNOT-layer frame condition**: positions not in the target list AND not equal to the control are unchanged. (The control itself is preserved by a separate lemma since CX never targets ctrl.)

theoremLookup.cnot_layer_post_state_ctrl_unchanged

theorem Lookup.cnot_layer_post_state_ctrl_unchanged
    (ctrl : Nat) (xs : List Nat) (f : Nat → Bool) (h_ctrl_not_tgt : ctrl ∉ xs) :
    Lookup.cnot_layer_post_state ctrl xs f ctrl = f ctrl

*CNOT-layer preserves the control qubit** (Iter 224). The control `ctrl` is never the target of any CX in this layer. (For this lemma we additionally need `ctrl ∉ xs`, since CX(ctrl, ctrl) is malformed in our gate IR but the post-state def doesn't enforce that.)

example(example)

example :
    let f : Nat → Bool

*Decide-witness on x-flip layer**: starting from f ≡ false on positions {0, 1, 2}, flipping {0, 2} produces (true, false, true).

example(example)

example :
    let f : Nat → Bool

*Decide-witness on CNOT layer**: with ctrl=0 (true) and targets {1, 2, 3} (initially all false), each XORs with ctrl → all become true.

example(example)

example :
    let f : Nat → Bool

*Decide-witness on CNOT layer with ctrl=false**: when control is false, no XOR fires; targets remain at their initial values.

theoremLookup.x_flip_post_state_at

theorem Lookup.x_flip_post_state_at
    (xs : List Nat) (h_nodup : xs.Nodup) (f : Nat → Bool) (j : Nat)
    (h_in : j ∈ xs) :
    Lookup.x_flip_post_state xs f j = ! (f j)

*X-flip value-at-element**: for `j ∈ xs` with `xs.Nodup`, the layer flips `f j` exactly once.

theoremLookup.cnot_layer_post_state_at

theorem Lookup.cnot_layer_post_state_at
    (ctrl : Nat) (xs : List Nat) (h_nodup : xs.Nodup)
    (h_ctrl_not_in : ctrl ∉ xs) (f : Nat → Bool) (tgt : Nat)
    (h_in : tgt ∈ xs) :
    Lookup.cnot_layer_post_state ctrl xs f tgt = xor (f tgt) (f ctrl)

*CNOT-layer value-at-element**: for `tgt ∈ xs` with `xs.Nodup` AND `ctrl ∉ xs` (so the control wire is preserved), the layer XORs `f tgt` with `f ctrl` exactly once.

example(example)

example :
    let f : Nat → Bool

*Decide-witness on x_flip_post_state_at**: with f = false everywhere, flipping {0, 2}, queried at j=2 (in the list): result = true.

example(example)

example :
    let f : Nat → Bool

*Decide-witness on cnot_layer_post_state_at**: with f(0)=true and f(2)=false, CNOT layer ctrl=0, targets {1,2,3}, queried at tgt=2 (in the list): result = false ⊕ true = true.

theoremLookup.x_flip_post_state_involution

theorem Lookup.x_flip_post_state_involution
    (xs : List Nat) (h_nodup : xs.Nodup) (f : Nat → Bool) :
    Lookup.x_flip_post_state xs (Lookup.x_flip_post_state xs f) = f

*X-flip layer involution**: with `xs.Nodup`, applying the X-flip layer twice returns to the identity. By funext + case-split on `j ∈ xs` vs `j ∉ xs`, using value-at-element (Iter 225) for the in-list case and the frame lemma (Iter 224) for the not-in-list case.

example(example)

example :
    let f : Nat → Bool

*Decide-witness on x-flip involution** at (xs=[0,2], f=fun i => i = 1). Both layers cancel; result equals input.

theoremprefix_and_step_post_state_at_and_zero

theorem prefix_and_step_post_state_at_and_zero (f : Nat → Bool) :
    prefix_and_step_post_state 0 f (ulookup_and_idx 0)
      = xor (f (ulookup_and_idx 0))
            (f ulookup_ctrl_idx && f (ulookup_address_idx 0))

*Step post-state value at the and-bit (k=0 branch)**.

theoremprefix_and_step_post_state_at_and_succ

theorem prefix_and_step_post_state_at_and_succ
    (k : Nat) (hk : k ≠ 0) (f : Nat → Bool) :
    prefix_and_step_post_state k f (ulookup_and_idx k)
      = xor (f (ulookup_and_idx k))
            (f (ulookup_and_idx (k - 1)) && f (ulookup_address_idx k))

*Step post-state value at the and-bit (k>0 branch)**.

theoremprefix_and_step_post_state_involution

theorem prefix_and_step_post_state_involution
    (k : Nat) (f : Nat → Bool) :
    prefix_and_step_post_state k (prefix_and_step_post_state k f) = f

*Boolean-level step involution**: applying `prefix_and_step_post_state k` twice yields the identity. The step's only write is to `ulookup_and_idx k`, XORing it with a frame (`f ctrl ∧ f addr_0` for k=0, `f and_{k-1} ∧ f addr_k` for k>0). The frame depends only on positions OTHER than `and_k`, so the second application sees the SAME frame value, and the XOR cancels. Holds for arbitrary `f` — no clean-state hypothesis.

example(example)

example :
    let f : Nat → Bool

*Decide-witness on step involution at k=0**: arbitrary input, apply step 0 twice, get input back.

example(example)

example :
    let f : Nat → Bool

*Decide-witness on step involution at k=2** (k > 0 branch).

theoremprefix_and_cascade_uncompute_post_state_eq_id

theorem prefix_and_cascade_uncompute_post_state_eq_id
    (n : Nat) (f : Nat → Bool) :
    prefix_and_uncompute_post_state n (prefix_and_cascade_post_state n f) = f

*Boolean-level cascade · uncompute = identity**. Applying the forward n-step cascade post-state then the n-step uncompute post-state returns to the input `f`. Proof by induction on n + Iter 227's step involution. Lookup analog of Iter 76's matrix-level `prefix_and_cascade_uncompute_eq_one`.

example(example)

example :
    let f : Nat → Bool

*Decide-witness on cascade · uncompute = id** at n=3 with a small concrete input function.

FormalRV.Arithmetic.UnaryLookup.UnaryLookupGrayCode

FormalRV/Arithmetic/UnaryLookup/UnaryLookupGrayCode.lean

FormalRV.Arithmetic.UnaryLookup.UnaryLookupGrayCode — the GATE-LEVEL Gray-code/sawtooth QROM lookup (babbush2018 unary iteration with the branch-switch CX trick). ## What this file makes real The repo's faithful per-row QROM read (`lookupReadAt`, built on `BQAlgo.unary_lookup_multi_iteration`) re-runs the full `w`-deep prefix-AND cascade for every one of the `2^w` table rows: `14·w·2^w` T (= `2·w·2^w` Toffolis) per table read. The Gray-code amortization that Gidney–Ekerå 2021 (and qianxu p. 23) charge for was previously present ONLY as the cost formula `BQAlgo.gray_code_unary_lookup_toffoli_count` — a number with no circuit behind it. This file builds the actual circuit: a recursive unary-iteration tree walk (`grayWalk`) over the SAME wire layout as the faithful read (`ulookup_ctrl_idx` / `ulookup_address_idx` / `ulookup_and_idx`, word bits at caller-chosen positions `pos j`). Per internal node at level `i` with parent wire `p` and address wire `a_i`: ENTER : `X a_i ; CCX p a_i and_i ; X a_i` -- and_i := p ∧ ¬a_i (1 Toffoli) (recurse into the 0-subtree) SWITCH : `CX p and_i` -- and_i := p ∧ a_i (0 Toffolis!) (recurse into the 1-subtree) EXIT : `CCX p a_i and_i` -- and_i := 0 (1 Toffoli) The SWITCH line is the sawtooth trick: moving from the 0-branch to the 1-branch costs a single CX, not a recompute of the ladder. Each of the `2^w − 1` internal nodes costs exactly 2 Toffolis (enter + exit), so a full table read costs `2·(2^w − 1)` Toffolis (`tcount_grayLookupReadAt`) instead of the faithful read's `2·w·2^w` — exactly the `lookupReadAt = w × (grayLookupReadAt + ε)` gap proved in `tcount_lookupReadAt_eq_w_mul_gray`. ## Contract parity (drop-in for the windowed machinery) `grayLookupReadAt_selects_word` / `grayLookupReadAt_frame` have the SAME statement shape as the faithful read's `lookupReadAt_selects_word` / `lookupReadAt_frame` (FormalRV/Arithmetic/Windowed/WindowedLookupSelect.lean): with ctrl set, the address register holding `v < 2^w`, and the AND-ladder clean, the read XORs exactly `(T v).testBit j` into `pos j` and restores everything else. So the Gray-code read can replace the faithful read inside the windowed multiplier wholesale. ## The residual ×2 vs the paper's `2^w` Gidney–Ekerå / qianxu (E9, `PaperClaims.qianxu_E9_lookup_gate_derived_count`) charge `2^w` Toffolis per lookup, not our `2·(2^w − 1) ≈ 2·2^w`. The missing factor 2 is the measurement-based uncompute (the EXIT Toffolis are replaced by X-basis measurements + classically-controlled Cliffords), which is NOT expressible in this file's pure X/CX/CCX `Gate` IR — that leg lives in `FormalRV/Shor/MeasUncompute.lean` (extended-IR `EGate` with measurement). Within the reversible IR, `2·(2^w − 1)` is the honest optimum this construction reaches, and the audit-bridge theorems below quantify both the ×w saving over the faithful read and the ×2 residual.

theoremgray_switch

private theorem gray_switch (P b : Bool) : xor (P && !b) P = (P && b)

SWITCH-line algebra: with the ladder ancilla holding `P ∧ ¬b`, XOR-ing the parent `P` in (the sawtooth CX) leaves `P ∧ b`.

theoremgray_anc_restore

private theorem gray_anc_restore (c P b : Bool) :
    xor (xor (xor c (P && !b)) P) (P && b) = c

EXIT-line algebra: the ladder ancilla value `((c ⊕ (P ∧ ¬b)) ⊕ P) ⊕ (P ∧ b)` collapses back to its initial value `c` — the enter/switch/exit cycle is an exact conjugation on the ladder wire, whatever the initial state.

theoremgrayCxLayer_not_mem

theorem grayCxLayer_not_mem (c : Nat) (xs : List Nat) (f : Nat → Bool) (p : Nat)
    (hp : p ∉ xs) :
    Gate.applyNat (cx_gates_from_indices c xs) f p = f p

Frame for the CX fan-out layer: a position not in the target list is untouched (no `Nodup` or control-membership conditions needed).

theoremgrayCxLayer_mem

theorem grayCxLayer_mem (c : Nat) (xs : List Nat) (f : Nat → Bool) (p : Nat)
    (hnd : xs.Nodup) (hc : c ∉ xs) (hp : p ∈ xs) :
    Gate.applyNat (cx_gates_from_indices c xs) f p = xor (f p) (f c)

Action of the CX fan-out layer at a member of a duplicate-free target list (control not a target): the control is XOR'd in exactly once.

defgrayMidBits

def grayMidBits (v : Nat) : Nat → Nat → Nat
  | _, 0 => 0
  | i, d + 1 => (if v.testBit i then 2 ^ i else 0) + grayMidBits v (i + 1) d

Bits `i, i+1, …, i+d−1` of `v`, in place: the leaf-value contribution of the levels the walk has yet to traverse.

theoremgray_mod_two_pow_succ

private theorem gray_mod_two_pow_succ (v i : Nat) :
    v % 2 ^ i + (if v.testBit i then 2 ^ i else 0) = v % 2 ^ (i + 1)

One binary digit of the mod tower: `v % 2^(i+1) = v % 2^i + (bit i of v)·2^i`.

theoremgrayMidBits_mod

theorem grayMidBits_mod (v : Nat) : ∀ (d i : Nat),
    v % 2 ^ i + grayMidBits v i d = v % 2 ^ (i + d)
  | 0, i => by simp [grayMidBits]
  | d + 1, i =>

`v % 2^i` plus the remaining mid-bits reconstructs `v % 2^(i+d)`.

theoremgrayMidBits_eq_self

theorem grayMidBits_eq_self (v w : Nat) (hv : v < 2 ^ w) : grayMidBits v 0 w = v

From the root (`i = 0`, depth `w`), the walk's selected leaf is `v` itself (for in-range addresses `v < 2^w`).

defgrayWalk

def grayWalk (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) :
    Nat → Nat → Nat → Nat → Gate
  | 0, _, parent, vPrefix =>
      cx_gates_from_indices parent (wordCnotsAt pos W (T vPrefix))
  | d + 1, i, parent, vPrefix =>
      Gate.seq (Gate.seq (Gate.seq (Gate.seq (Gate.seq (Gate.seq
        (Gate.X (ulookup_address_idx i))
        (Gate.CCX parent (ulookup_address_idx i) (ulookup_and_idx i)))
        (Gate.X (ulookup_address_idx i)))
        (grayWalk pos W T d (i + 1) (ulookup_and_idx i) vPrefix))
        (Gate.CX parent (ulookup_and_idx i)))
        (grayWalk pos W T d (i + 1) (ulookup_and_idx i) (vPrefix + 2 ^ i)))

*The Gray-code/sawtooth tree walk** (babbush2018 unary iteration with the branch-switch CX trick), on the faithful lookup's wire layout. `grayWalk pos W T d i parent vPrefix` is the subtree at ladder level `i` with `d` levels remaining (`d = w − i` from a root call), parent wire `parent` (the ctrl qubit at the root, `ulookup_and_idx (i−1)` below), and `vPrefix` the row-value bits accumulated on the path so far (bit `k` of the row value is the branch taken at level `k`, LSB-first). Leaf (`d = 0`): word-CNOTs for table row `T vPrefix`, controlled on the deepest ladder wire (= `parent`), targets `pos j` for the set bits. Internal node: ENTER (1 Toffoli, computes `parent ∧ ¬a_i` onto the ladder), 0-subtree, SWITCH (1 CX — the sawtooth), 1-subtree, EXIT (1 Toffoli, returns the ladder wire to its initial value).

defgrayLookupReadAt

def grayLookupReadAt (w : Nat) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) : Gate

*The Gray-code QROM read**: position-compatible replacement for the faithful `lookupReadAt w pos W T` (same ctrl/address/AND-ladder wires, same word positions `pos`), at `2·(2^w − 1)` Toffolis instead of `2·w·2^w`.

theoremgrayEnter_state

private theorem grayEnter_state (i parent : Nat) (hpar : parent ≤ 2 * i) (f : Nat → Bool) :
    Gate.applyNat (Gate.X (ulookup_address_idx i))
      (Gate.applyNat (Gate.CCX parent (ulookup_address_idx i) (ulookup_and_idx i))
        (Gate.applyNat (Gate.X (ulookup_address_idx i)) f))
      = update f (ulookup_and_idx i)
          (xor (f (ulookup_and_idx i)) (f parent && !f (ulookup_address_idx i)))

theoremgrayWalk_frame

theorem grayWalk_frame (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) (d : Nat) :
    ∀ (i parent vPrefix : Nat) (f : Nat → Bool) (p : Nat),
      parent ≤ 2 * i →
      (∀ j, j < W → 2 * (i + d) < pos j) →
      (∀ j, j < W → p ≠ pos j) →
      Gate.applyNat (grayWalk pos W T d i parent vPrefix) f p = f p

theoremgrayWalk_selects_word

theorem grayWalk_selects_word (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) (v : Nat)
    (hinj : ∀ j k, j < W → k < W → pos j = pos k → j = k)
    (j : Nat) (hj : j < W) (d : Nat) :
    ∀ (i parent vPrefix : Nat) (f : Nat → Bool),
      parent ≤ 2 * i →
      (∀ ℓ, i ≤ ℓ → ℓ < i + d → f (ulookup_address_idx ℓ) = v.testBit ℓ) →
      (∀ ℓ, i ≤ ℓ → ℓ < i + d → f (ulookup_and_idx ℓ) = false) →
      (∀ k, k < W → 2 * (i + d) < pos k) →
      Gate.applyNat (grayWalk pos W T d i parent vPrefix) f (pos j)
        = xor (f (pos j))
              (f parent && (T (vPrefix + grayMidBits v i d)).testBit j)

theoremgrayLookupReadAt_selects_word

theorem grayLookupReadAt_selects_word
    (w W : Nat) (T : Nat → Nat) (pos : Nat → Nat) (f : Nat → Bool) (v : Nat)
    (hw : 0 < w) (hv : v < 2 ^ w)
    (hctrl : f ulookup_ctrl_idx = true)
    (haddr : ∀ i, i < w → f (ulookup_address_idx i) = v.testBit i)
    (hand : ∀ i, i < w → f (ulookup_and_idx i) = false)
    (hpos_high : ∀ j, j < W → 2 * w < pos j)
    (hpos_inj : ∀ j k, j < W → k < W → pos j = pos k → j = k)
    (j : Nat) (hj : j < W) :
    Gate.applyNat (grayLookupReadAt w pos W T) f (pos j)
      = xor (f (pos j)) ((T v).testBit j)

*HEADLINE (word conjunct).** With ctrl set, the address register holding `v < 2^w`, and the AND-ladder clean, the Gray-code/sawtooth QROM read `grayLookupReadAt w pos W T` XORs exactly the addressed table row into the word positions: bit `j` of `T v` lands at `pos j`. Same contract as the faithful `lookupReadAt_selects_word`, at `2·(2^w − 1)` Toffolis instead of `2·w·2^w`. (`hw` is kept for exact contract parity with the faithful read; the Gray-code statement also holds at `w = 0`.)

theoremgrayLookupReadAt_frame

theorem grayLookupReadAt_frame
    (w W : Nat) (T : Nat → Nat) (pos : Nat → Nat) (f : Nat → Bool)
    (hpos_high : ∀ j, j < W → 2 * w < pos j)
    (p : Nat) (hp : ∀ j, j < W → p ≠ pos j) :
    Gate.applyNat (grayLookupReadAt w pos W T) f p = f p

*HEADLINE (frame conjunct).** Every position that is not a word target (`pos j`, `j < W`) is unchanged by the Gray-code read — ctrl preserved, address restored, AND-ladder returned to its initial state, everything else untouched. Same contract as the faithful `lookupReadAt_frame` (and like it, requires NO assumptions on the input state).

theoremgrayLookupReadAt_selects

theorem grayLookupReadAt_selects
    (w W : Nat) (T : Nat → Nat) (pos : Nat → Nat) (f : Nat → Bool) (v : Nat)
    (hw : 0 < w) (hv : v < 2 ^ w)
    (hctrl : f ulookup_ctrl_idx = true)
    (haddr : ∀ i, i < w → f (ulookup_address_idx i) = v.testBit i)
    (hand : ∀ i, i < w → f (ulookup_and_idx i) = false)
    (hpos_high : ∀ j, j < W → 2 * w < pos j)
    (hpos_inj : ∀ j k, j < W → k < W → pos j = pos k → j = k) :
    (∀ j, j < W →
      Gate.applyNat (grayLookupReadAt w pos W T) f (pos j)
        = xor (f (pos j)) ((T v).testBit j))
    ∧ (∀ p, (∀ j, j < W → p ≠ pos j) →

*HEADLINE (packaged)** — mirror of the faithful `lookupReadAt_selects`: the Gray-code/sawtooth QROM read reads exactly the addressed table row, and restores everything else.

theoremtcount_grayWalk

theorem tcount_grayWalk (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) (d : Nat) :
    ∀ (i parent vPrefix : Nat),
      tcount (grayWalk pos W T d i parent vPrefix) = 14 * (2 ^ d - 1)

T-count of the walk: each of the `2^d − 1` internal nodes of a depth-`d` subtree costs exactly the ENTER + EXIT Toffoli pair (`14` T); the SWITCH CX, the X-conjugation, and the leaf word-CNOTs are T-free.

theoremtcount_grayLookupReadAt

theorem tcount_grayLookupReadAt (w : Nat) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) :
    tcount (grayLookupReadAt w pos W T) = 14 * (2 ^ w - 1)

*T-count of the Gray-code table read**: `14·(2^w − 1)` (vs the faithful read's `14·w·2^w`, `tcount_lookupReadAt`).

theoremtoffoliCount_grayLookupReadAt

theorem toffoliCount_grayLookupReadAt (w : Nat) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) :
    toffoliCount (grayLookupReadAt w pos W T) = 2 * (2 ^ w - 1)

*Toffoli count of the Gray-code table read**: `2·(2^w − 1)` — one ENTER + one EXIT Toffoli per internal node of the depth-`w` binary tree.

theoremtoffoliCount_lookupReadAt

theorem toffoliCount_lookupReadAt (w : Nat) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) :
    toffoliCount (lookupReadAt w pos W T) = 2 * w * 2 ^ w

Toffoli count of the faithful per-row read, for comparison: `2·w·2^w`.

theoremtcount_grayLookupReadAt_le_lookupReadAt

theorem tcount_grayLookupReadAt_le_lookupReadAt
    (w : Nat) (hw : 0 < w) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) :
    tcount (grayLookupReadAt w pos W T) ≤ tcount (lookupReadAt w pos W T)

The Gray-code read is never more expensive than the faithful read.

theoremtcount_lookupReadAt_eq_w_mul_gray

theorem tcount_lookupReadAt_eq_w_mul_gray
    (w : Nat) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) :
    tcount (lookupReadAt w pos W T)
      = w * (tcount (grayLookupReadAt w pos W T) + 14)

*Exact-gap identity (T-count)**: the faithful read costs exactly `w` times the Gray-code read plus `w` ENTER/EXIT pairs: `14·w·2^w = w·(14·(2^w − 1) + 14)`.

theoremtoffoliCount_lookupReadAt_eq_w_mul_gray

theorem toffoliCount_lookupReadAt_eq_w_mul_gray
    (w : Nat) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) :
    toffoliCount (lookupReadAt w pos W T)
      = w * (toffoliCount (grayLookupReadAt w pos W T) + 2)

*Exact-gap identity (Toffolis)**: `2·w·2^w = w·(2·(2^w − 1) + 2)`.

theoremtoffoliCount_grayLookupReadAt_vs_formula

theorem toffoliCount_grayLookupReadAt_vs_formula
    (w : Nat) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) :
    gray_code_unary_lookup_toffoli_count w w
        ≤ toffoliCount (grayLookupReadAt w pos W T)
      ∧ toffoliCount (grayLookupReadAt w pos W T)
        ≤ 2 * gray_code_unary_lookup_toffoli_count w w

The realized circuit sits within ×2 of the scaffolded cost formula `gray_code_unary_lookup_toffoli_count w w = w + (2^w − 1)` — sandwich: `w + (2^w − 1) ≤ 2·(2^w − 1) ≤ 2·(w + (2^w − 1))`.

defgraySmokeF1

private def graySmokeF1 : Nat → Bool

w = 1, table `T v = v + 1`, words at 4,5, address `v = 1`: row `T 1 = 2` (bits `01`₂ reversed: bit 0 = 0, bit 1 = 1) lands on the words.

example(example)

example :  -- word bit 0 of T 1 = 2: stays 0
    Gate.applyNat (grayLookupReadAt 1 (fun j => 4 + j) 2 (fun v => v + 1)) graySmokeF1 4
      = false

example(example)

example :  -- word bit 1 of T 1 = 2: flips to 1
    Gate.applyNat (grayLookupReadAt 1 (fun j => 4 + j) 2 (fun v => v + 1)) graySmokeF1 5
      = true

example(example)

example :  -- ladder ancilla restored clean
    Gate.applyNat (grayLookupReadAt 1 (fun j => 4 + j) 2 (fun v => v + 1)) graySmokeF1 2
      = false

example(example)

example :  -- address wire restored
    Gate.applyNat (grayLookupReadAt 1 (fun j => 4 + j) 2 (fun v => v + 1)) graySmokeF1 1
      = true

defgraySmokeF2

private def graySmokeF2 : Nat → Bool

w = 2, table `T v = 3·v`, words at 6,7,8, address `v = 2` (bit 0 = 0 at wire 1, bit 1 = 1 at wire 3): row `T 2 = 6 = 110₂`.

example(example)

example :
    (Gate.applyNat (grayLookupReadAt 2 (fun j => 6 + j) 3 (fun v => 3 * v)) graySmokeF2 6,
     Gate.applyNat (grayLookupReadAt 2 (fun j => 6 + j) 3 (fun v => 3 * v)) graySmokeF2 7,
     Gate.applyNat (grayLookupReadAt 2 (fun j => 6 + j) 3 (fun v => 3 * v)) graySmokeF2 8)
      = (false, true, true)

example(example)

example :  -- both ladder ancillas restored, both address wires restored, ctrl kept
    (Gate.applyNat (grayLookupReadAt 2 (fun j => 6 + j) 3 (fun v => 3 * v)) graySmokeF2 2,
     Gate.applyNat (grayLookupReadAt 2 (fun j => 6 + j) 3 (fun v => 3 * v)) graySmokeF2 4,
     Gate.applyNat (grayLookupReadAt 2 (fun j => 6 + j) 3 (fun v => 3 * v)) graySmokeF2 1,
     Gate.applyNat (grayLookupReadAt 2 (fun j => 6 + j) 3 (fun v => 3 * v)) graySmokeF2 3,
     Gate.applyNat (grayLookupReadAt 2 (fun j => 6 + j) 3 (fun v => 3 * v)) graySmokeF2 0)
      = (false, false, false, true, true)

example(example)

example :  -- the count theorems, instantiated: w = 2 read = 2·(2²−1) = 6 Toffolis
    toffoliCount (grayLookupReadAt 2 (fun j => 6 + j) 3 (fun v => 3 * v)) = 6

FormalRV.Arithmetic.UnaryLookup.UnaryLookupIterationCorrectness

FormalRV/Arithmetic/UnaryLookup/UnaryLookupIterationCorrectness.lean

theoremLookup.cnot_layer_post_state_preserves_and_bit

theorem Lookup.cnot_layer_post_state_preserves_and_bit
    (n_addr : Nat) (ctrl_idx : Nat) (word_cnot_idxs : List Nat)
    (h_word : Lookup.AllWordIdx n_addr word_cnot_idxs)
    (f : Nat → Bool) (k : Nat) (hk : k < n_addr) :
    Lookup.cnot_layer_post_state ctrl_idx word_cnot_idxs f (ulookup_and_idx k)
      = f (ulookup_and_idx k)

*CNOT layer with word-register targets preserves any and-bit** at `ulookup_and_idx k` for `k < n_addr`. By the frame lemma (Iter 224) + disjointness `and_idx k = 2 + 2*k < 1 + 2*n_addr ≤ word_idx _ j`.

theoremLookup.cnot_layer_post_state_preserves_ctrl

theorem Lookup.cnot_layer_post_state_preserves_ctrl
    (n_addr : Nat) (ctrl_idx : Nat) (word_cnot_idxs : List Nat)
    (h_word : Lookup.AllWordIdx n_addr word_cnot_idxs)
    (f : Nat → Bool) :
    Lookup.cnot_layer_post_state ctrl_idx word_cnot_idxs f ulookup_ctrl_idx
      = f ulookup_ctrl_idx

*CNOT layer with word targets preserves the ctrl qubit** (qubit 0). Special case of the general ctrl-preservation lemma; the layer's declared control is `and_idx (n_addr - 1)` which is NOT `ulookup_ctrl_idx = 0`, and word targets all exceed 0.

theoremLookup.cnot_layer_post_state_preserves_address

theorem Lookup.cnot_layer_post_state_preserves_address
    (n_addr : Nat) (ctrl_idx : Nat) (word_cnot_idxs : List Nat)
    (h_word : Lookup.AllWordIdx n_addr word_cnot_idxs)
    (f : Nat → Bool) (i : Nat) (hi : i < n_addr) :
    Lookup.cnot_layer_post_state ctrl_idx word_cnot_idxs f (ulookup_address_idx i)
      = f (ulookup_address_idx i)

*CNOT layer with word targets preserves each address qubit** `ulookup_address_idx i` for `i < n_addr`. Word indices start at `1 + 2*n_addr`, while address indices are `1 + 2*i ≤ 1 + 2*(n_addr - 1) < 1 + 2*n_addr`.

example(example)

example :
    let f : Nat → Bool

*Decide-witness**: with n_addr=3 and word_cnot_idxs = [7, 8, 12] (all valid word indices ≥ 1 + 2*3 = 7), the and-bit at position `ulookup_and_idx 2 = 6` is preserved by the CNOT layer.

theoremprefix_and_uncompute_post_state_frame_ctrl

theorem prefix_and_uncompute_post_state_frame_ctrl
    (n : Nat) (f : Nat → Bool) :
    prefix_and_uncompute_post_state n f ulookup_ctrl_idx = f ulookup_ctrl_idx

*Uncompute frame at ctrl_idx**: the n-step uncompute post-state preserves `ulookup_ctrl_idx`. Direct analog of Iter 221's `prefix_and_cascade_post_state_frame_ctrl`.

theoremprefix_and_uncompute_post_state_frame_addr

theorem prefix_and_uncompute_post_state_frame_addr
    (n : Nat) (f : Nat → Bool) (j : Nat) :
    prefix_and_uncompute_post_state n f (ulookup_address_idx j)
      = f (ulookup_address_idx j)

*Uncompute frame at every address bit**: preserves `ulookup_address_idx j` for any `j`.

theoremprefix_and_cascade_post_state_frame_word

theorem prefix_and_cascade_post_state_frame_word
    (n n_addr : Nat) (f : Nat → Bool) (j : Nat) (hn : n ≤ n_addr) :
    prefix_and_cascade_post_state n f (ulookup_word_idx n_addr j)
      = f (ulookup_word_idx n_addr j)

*Cascade frame at every word bit**: preserves `ulookup_word_idx n_addr j` for any `j` (word indices `≥ 1 + 2·n_addr` are disjoint from and-indices `≤ 2·n` for the cascade's n-many writes).

theoremprefix_and_uncompute_post_state_frame_word

theorem prefix_and_uncompute_post_state_frame_word
    (n n_addr : Nat) (f : Nat → Bool) (j : Nat) (hn : n ≤ n_addr) :
    prefix_and_uncompute_post_state n f (ulookup_word_idx n_addr j)
      = f (ulookup_word_idx n_addr j)

*Uncompute frame at every word bit**: symmetric to the cascade word-frame.

theoremLookup.iteration_post_state_preserves_ctrl

theorem Lookup.iteration_post_state_preserves_ctrl
    (n_addr : Nat) (addr_flip_idxs word_cnot_idxs : List Nat)
    (h_ctrl_not_flip : ulookup_ctrl_idx ∉ addr_flip_idxs)
    (h_word : Lookup.AllWordIdx n_addr word_cnot_idxs)
    (f : Nat → Bool) :
    Lookup.iteration_post_state n_addr addr_flip_idxs word_cnot_idxs f
        ulookup_ctrl_idx = f ulookup_ctrl_idx

*Iteration preserves ctrl**. Requires `ctrl_idx ∉ addr_flip_idxs` (X-flip layers don't touch ctrl) and `AllWordIdx n_addr word_cnot_idxs` (CNOT-on-word doesn't touch ctrl, which has index 0 < 1 + 2·n_addr).

theoremLookup.iteration_post_state_preserves_address

theorem Lookup.iteration_post_state_preserves_address
    (n_addr : Nat) (addr_flip_idxs word_cnot_idxs : List Nat)
    (h_flip_nodup : addr_flip_idxs.Nodup)
    (h_word : Lookup.AllWordIdx n_addr word_cnot_idxs)
    (f : Nat → Bool) (i : Nat) (hi : i < n_addr) :
    Lookup.iteration_post_state n_addr addr_flip_idxs word_cnot_idxs f
        (ulookup_address_idx i) = f (ulookup_address_idx i)

*Iteration preserves every address bit** `ulookup_address_idx i` for `i < n_addr`. The two outer X-flip layers cancel by involution (Iter 226), and the inner 3 stages each preserve address bits via register-level frame lemmas.

theoremprefix_and_cascade_post_state_frame_general

theorem prefix_and_cascade_post_state_frame_general
    (n : Nat) (f : Nat → Bool) (j : Nat)
    (h : ∀ k, k < n → j ≠ ulookup_and_idx k) :
    prefix_and_cascade_post_state n f j = f j

*General cascade frame**: positions outside `{ulookup_and_idx k : k < n}` are unchanged by the n-step forward cascade.

theoremprefix_and_uncompute_post_state_frame_general

theorem prefix_and_uncompute_post_state_frame_general
    (n : Nat) (f : Nat → Bool) (j : Nat)
    (h : ∀ k, k < n → j ≠ ulookup_and_idx k) :
    prefix_and_uncompute_post_state n f j = f j

*General uncompute frame**: positions outside `{ulookup_and_idx k : k < n}` are unchanged by the n-step reverse uncompute. Symmetric to the cascade general frame above.

theoremLookup.iteration_post_state_preserves_outside_word_targets

theorem Lookup.iteration_post_state_preserves_outside_word_targets
    (n_addr : Nat) (addr_flip_idxs word_cnot_idxs : List Nat)
    (h_flip_addr : ∀ x ∈ addr_flip_idxs,
                       ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (f : Nat → Bool) (p : Nat) (h_p_word : 1 + 2 * n_addr ≤ p)
    (h_not_target : p ∉ word_cnot_idxs) :
    Lookup.iteration_post_state n_addr addr_flip_idxs word_cnot_idxs f p = f p

*Iteration preserves any word-register position not in CNOT targets**. Requires: - `addr_flip_idxs` are all valid address indices (so they don't include word positions). - `word_cnot_idxs` consist of word indices (`AllWordIdx`). - `p` is in the word register (`1 + 2·n_addr ≤ p`) and not in the CNOT target list.

theoremLookup.iteration_post_state_at_word_target

theorem Lookup.iteration_post_state_at_word_target
    (n_addr : Nat) (hn : 0 < n_addr)
    (addr_flip_idxs word_cnot_idxs : List Nat)
    (h_word_nodup : word_cnot_idxs.Nodup)
    (h_word : Lookup.AllWordIdx n_addr word_cnot_idxs)
    (h_flip_addr : ∀ x ∈ addr_flip_idxs,
                       ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (f : Nat → Bool) (p : Nat) (h_in : p ∈ word_cnot_idxs) :
    Lookup.iteration_post_state n_addr addr_flip_idxs word_cnot_idxs f p
      = xor (f p)
            (prefix_and_cascade_post_state n_addr
              (Lookup.x_flip_post_state addr_flip_idxs f)

*Iteration's trigger XOR at word targets**. For any `p ∈ word_cnot_idxs` (a target of the middle CNOT layer), the iteration post-state is `f p XOR T`, where `T = prefix_and_cascade_post_state n_addr (x_flip_post_state addr_flip_idxs f) (ulookup_and_idx (n_addr - 1))` is the cascade's top-bit trigger.

theoremprefix_and_step_post_state_commute_update_word

theorem prefix_and_step_post_state_commute_update_word
    (k n_addr : Nat) (hk : k < n_addr)
    (f : Nat → Bool) (p : Nat) (v : Bool)
    (h_p : 1 + 2 * n_addr ≤ p) :
    prefix_and_step_post_state k (Function.update f p v)
      = Function.update (prefix_and_step_post_state k f) p v

*Step commutes with word-update**: if `p ≥ 1 + 2·n_addr` (a word position) and `k < n_addr`, then applying step `k` after an update at `p` is the same as updating after step `k`.

theoremprefix_and_uncompute_post_state_commute_update_word

theorem prefix_and_uncompute_post_state_commute_update_word
    (n n_addr : Nat) (hn : n ≤ n_addr)
    (f : Nat → Bool) (p : Nat) (v : Bool)
    (h_p : 1 + 2 * n_addr ≤ p) :
    prefix_and_uncompute_post_state n (Function.update f p v)
      = Function.update (prefix_and_uncompute_post_state n f) p v

*Uncompute commutes with word-update**: applying uncompute to an update at a word position equals updating after the uncompute. Direct induction on `n` using Iter 237's step commutation.

theoremprefix_and_uncompute_post_state_at_and_invariant_under_cnot_layer

theorem prefix_and_uncompute_post_state_at_and_invariant_under_cnot_layer
    (n n_addr : Nat) (hn : n ≤ n_addr)
    (ctrl_idx : Nat) (cnots : List Nat)
    (h_cnots_word : Lookup.AllWordIdx n_addr cnots)
    (f : Nat → Bool) (k : Nat) (hk : k < n_addr) :
    prefix_and_uncompute_post_state n
      (Lookup.cnot_layer_post_state ctrl_idx cnots f) (ulookup_and_idx k)
      = prefix_and_uncompute_post_state n f (ulookup_and_idx k)

*CNOT-layer invariance at and-bits**: the n-step uncompute output at any and-bit position is unchanged when the input is preprocessed by a CNOT layer with word-register targets. Proof: induction on the CNOT target list, using `prefix_and_uncompute_post_state_commute_update_word` at each list step.

theoremLookup.iteration_post_state_preserves_and

theorem Lookup.iteration_post_state_preserves_and
    (n_addr : Nat) (addr_flip_idxs word_cnot_idxs : List Nat)
    (h_flip_addr : ∀ x ∈ addr_flip_idxs,
                       ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (h_word : Lookup.AllWordIdx n_addr word_cnot_idxs)
    (f : Nat → Bool) (k : Nat) (hk : k < n_addr) :
    Lookup.iteration_post_state n_addr addr_flip_idxs word_cnot_idxs f
        (ulookup_and_idx k) = f (ulookup_and_idx k)

*Iteration preserves every and-bit** at `ulookup_and_idx k` for `k < n_addr`. The proof composes Iter 226 X-flip frame + Iter 238 CNOT-uncompute congruence + Iter 229 cascade·uncompute=id.

theoremLookup.unary_lookup_iteration_correct

theorem Lookup.unary_lookup_iteration_correct
    (n_addr : Nat) (hn : 0 < n_addr)
    (addr_flip_idxs word_cnot_idxs : List Nat)
    (h_flip_addr : ∀ x ∈ addr_flip_idxs,
                       ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (h_flip_nodup : addr_flip_idxs.Nodup)
    (h_word : Lookup.AllWordIdx n_addr word_cnot_idxs)
    (h_word_nodup : word_cnot_idxs.Nodup)
    (f : Nat → Bool) :
    -- (1) Word targets get XOR'd with the trigger.
    (∀ p, p ∈ word_cnot_idxs →
      Lookup.iteration_post_state n_addr addr_flip_idxs word_cnot_idxs f p

*Headline: `unary_lookup_iteration` classical action**. For valid inputs (flip indices are address; word_cnot_idxs are word-register indices), the iteration post-state has the following form at every position: 1. `p ∈ word_cnot_idxs`: `xor (f p) trigger` — written by the CNOT layer with the cascade-top-bit trigger. 2. `p = ulookup_ctrl_idx`: preserved. 3. `p = ulookup_address_idx i` for `i < n_addr`: restored to `f p` (X-flip layers cancel by involution). 4. `p = ulookup_and_idx k` for `k < n_addr`: returned to clean (cascade · uncompute = id, modulo CNOT-layer-invariance at and-bits). 5. `p` a word index, `p ∉ word_cnot_idxs`: preserved.

theoremLookup.cascade_top_bit_under_x_flip

theorem Lookup.cascade_top_bit_under_x_flip
    (n_addr : Nat) (hn : 0 < n_addr)
    (addr_flip_idxs : List Nat)
    (h_flip_addr : ∀ x ∈ addr_flip_idxs,
                       ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (ctrl : Bool) (effective_addr : Nat) (f : Nat → Bool)
    (h_ctrl : f ulookup_ctrl_idx = ctrl)
    (h_eff_addr : ∀ i, i < n_addr →
        Lookup.x_flip_post_state addr_flip_idxs f (ulookup_address_idx i)
          = effective_addr.testBit i)
    (h_clean : ∀ i, i < n_addr → f (ulookup_and_idx i) = false) :
    prefix_and_cascade_post_state n_addr

*Trigger value under X-flip = `address_and` at effective address**. Specialization of Iter 223's `prefix_and_cascade_top_bit_eq_address_and` to the X-flipped state used in `unary_lookup_iteration`.

theoremLookup.multi_iteration_post_state_preserves_outside_all_cnots

theorem Lookup.multi_iteration_post_state_preserves_outside_all_cnots
    (n_addr : Nat) (iters : List (List Nat × List Nat))
    (h_flip_addr_all : ∀ flips cnots, (flips, cnots) ∈ iters →
        ∀ x ∈ flips, ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (f : Nat → Bool) (p : Nat) (h_p_word : 1 + 2 * n_addr ≤ p)
    (h_not_in_any : ∀ flips cnots, (flips, cnots) ∈ iters → p ∉ cnots) :
    Lookup.multi_iteration_post_state n_addr iters f p = f p

*Multi-iteration post-state frame**: positions p with `1 + 2*n_addr ≤ p` and outside the UNION of every iter's `cnots` are preserved. By induction on the iter list, using `iteration_post_state_preserves_outside_word_targets` (Iter 235) at each step.

theoremLookup.effective_addr_lt_two_pow

theorem Lookup.effective_addr_lt_two_pow
    (addr : Nat) (flips : List Nat) (n_addr : Nat) :
    Lookup.effective_addr addr flips n_addr < 2 ^ n_addr

*Effective address is bounded by 2^n_addr**. By induction on n_addr, using `Nat.bitwise_lt_two_pow` (`x, y < 2^n → bitwise f x y < 2^n`).

theoremLookup.effective_addr_testBit

theorem Lookup.effective_addr_testBit
    (addr : Nat) (flips : List Nat) (n_addr i : Nat) (hi : i < n_addr) :
    (Lookup.effective_addr addr flips n_addr).testBit i
      = xor (addr.testBit i) (decide (ulookup_address_idx i ∈ flips))

*testBit characterization of effective_addr** (Iter 254). For `i < n_addr`, the i-th bit of `effective_addr addr flips n_addr` equals the X-flipped i-th bit pattern. Direct induction on `n_addr` using `Nat.testBit_or`, `Nat.testBit_two_pow`, and `Nat.testBit_lt_two_pow` (via `effective_addr_lt_two_pow` from Iter 253).

theoremLookup.iteration_post_state_at_word_target_via_address_and

theorem Lookup.iteration_post_state_at_word_target_via_address_and
    (n_addr : Nat) (hn : 0 < n_addr)
    (flips cnots : List Nat)
    (h_flip_addr : ∀ x ∈ flips, ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (h_cnots_nodup : cnots.Nodup)
    (h_word : Lookup.AllWordIdx n_addr cnots)
    (ctrl : Bool) (addr effective_addr : Nat) (g : Nat → Bool)
    (h_ctrl : g ulookup_ctrl_idx = ctrl)
    (h_addr : ∀ i, i < n_addr → g (ulookup_address_idx i) = addr.testBit i)
    (h_eff_addr : ∀ i, i < n_addr →
        Lookup.x_flip_post_state flips g (ulookup_address_idx i)
          = effective_addr.testBit i)

*Iteration at word target via address_and** (Iter 251). For `p ∈ cnots` and a user-supplied `effective_addr` matching the X-flipped address pattern, the post-state at p is `xor (g p) (address_and ctrl effective_addr n_addr)`.

theoremLookup.multi_iteration_post_state_preserves_ctrl

theorem Lookup.multi_iteration_post_state_preserves_ctrl
    (n_addr : Nat) (iters : List (List Nat × List Nat))
    (h_flip_addr_all : ∀ flips cnots, (flips, cnots) ∈ iters →
        ∀ x ∈ flips, ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (h_word_all : ∀ flips cnots, (flips, cnots) ∈ iters →
        Lookup.AllWordIdx n_addr cnots)
    (f : Nat → Bool) :
    Lookup.multi_iteration_post_state n_addr iters f ulookup_ctrl_idx
      = f ulookup_ctrl_idx

*Multi-iter preserves ctrl** at every position.

theoremLookup.multi_iteration_post_state_preserves_address

theorem Lookup.multi_iteration_post_state_preserves_address
    (n_addr : Nat) (iters : List (List Nat × List Nat))
    (h_flip_nodup_all : ∀ flips cnots, (flips, cnots) ∈ iters → flips.Nodup)
    (h_word_all : ∀ flips cnots, (flips, cnots) ∈ iters →
        Lookup.AllWordIdx n_addr cnots)
    (f : Nat → Bool) (i : Nat) (hi : i < n_addr) :
    Lookup.multi_iteration_post_state n_addr iters f (ulookup_address_idx i)
      = f (ulookup_address_idx i)

*Multi-iter preserves every address bit** for `i < n_addr`.

theoremLookup.x_flip_post_state_xor

theorem Lookup.x_flip_post_state_xor
    (xs : List Nat) (h_nodup : xs.Nodup) (f : Nat → Bool) (j : Nat) :
    Lookup.x_flip_post_state xs f j = xor (f j) (decide (j ∈ xs))

*X-flip post-state as XOR with membership** (utility): for `xs.Nodup`, `x_flip_post_state xs f j = xor (f j) (decide (j ∈ xs))`. Unifies the Iter 224 frame + Iter 225 value-at-element under a single expression.

theoremLookup.multi_iteration_post_state_preserves_and

theorem Lookup.multi_iteration_post_state_preserves_and
    (n_addr : Nat) (iters : List (List Nat × List Nat))
    (h_flip_addr_all : ∀ flips cnots, (flips, cnots) ∈ iters →
        ∀ x ∈ flips, ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (h_word_all : ∀ flips cnots, (flips, cnots) ∈ iters →
        Lookup.AllWordIdx n_addr cnots)
    (f : Nat → Bool) (k : Nat) (hk : k < n_addr) :
    Lookup.multi_iteration_post_state n_addr iters f (ulookup_and_idx k)
      = f (ulookup_and_idx k)

*Multi-iter preserves every and-bit** for `k < n_addr`.

theoremLookup.multi_iteration_post_state_at_word_target_in_head_iter

theorem Lookup.multi_iteration_post_state_at_word_target_in_head_iter
    (n_addr : Nat) (hn : 0 < n_addr)
    (head_flips head_cnots : List Nat)
    (rest : List (List Nat × List Nat))
    (h_head_flip_addr : ∀ x ∈ head_flips,
                            ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (h_head_flip_nodup : head_flips.Nodup)
    (h_head_cnots_nodup : head_cnots.Nodup)
    (h_head_word : Lookup.AllWordIdx n_addr head_cnots)
    (h_flip_addr_all : ∀ flips cnots, (flips, cnots) ∈ rest →
        ∀ x ∈ flips, ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (h_flip_nodup_all : ∀ flips cnots, (flips, cnots) ∈ rest → flips.Nodup)

*Multi-iter chaining at word target**: at a word target `p` in the HEAD iter's `head_cnots`, the multi-iter post-state on `(head_flips, head_cnots) :: rest` equals the rest's post-state XOR'd with `Lookup.address_and ctrl (Lookup.effective_addr addr head_flips n_addr) n_addr`.

theoremLookup.unary_lookup_multi_iteration_correct

theorem Lookup.unary_lookup_multi_iteration_correct
    (n_addr : Nat) (hn : 0 < n_addr)
    (iters : List (List Nat × List Nat))
    (h_flip_addr_all : ∀ flips cnots, (flips, cnots) ∈ iters →
        ∀ x ∈ flips, ∃ i, i < n_addr ∧ x = ulookup_address_idx i)
    (h_flip_nodup_all : ∀ flips cnots, (flips, cnots) ∈ iters → flips.Nodup)
    (h_cnots_nodup_all : ∀ flips cnots, (flips, cnots) ∈ iters → cnots.Nodup)
    (h_word_all : ∀ flips cnots, (flips, cnots) ∈ iters →
        Lookup.AllWordIdx n_addr cnots)
    (ctrl : Bool) (addr : Nat) (f : Nat → Bool)
    (h_ctrl : f ulookup_ctrl_idx = ctrl)
    (h_addr : ∀ i, i < n_addr → f (ulookup_address_idx i) = addr.testBit i)

*HEADLINE: multi-iteration unary lookup classical action**. For a word position `p` in some iter's cnots, the multi-iter post-state is `xor (f p) (cumulative_xor_value)`, where the cumulative value sums the trigger contributions from each iter whose cnots include `p`.

example(example)

example (addr_flip_idxs word_cnot_idxs : List Nat) :
    tcount (unary_lookup_iteration 6 addr_flip_idxs word_cnot_idxs) = 84

*RSA-2048 lookup single-iteration T-count = 84** (Iter 262). For q_a = 6 (qianxu p. 22 max table-row size for RSA-2048), `tcount (unary_lookup_iteration 6 _ _) = 14·6 = 84`.

example(example)

example :
    tcount (unary_lookup_multi_iteration 6
              (List.replicate 64 ([], []))) = 5376

*RSA-2048 lookup multi-iteration T-count = 5376** (Iter 262) for the full 2^6 = 64 iterations covering all addresses. This is the **no-measurement, no-Gray-code upper bound**; qianxu's optimized claim of 2^q_a Toffolis = 56 T requires BOTH the Gidney measurement trick (factor 2) AND Gray-code amortization (factor q_a = 6). See Iter 28 review finding for the factor-of-12 = 5376/448 ≈ 12 gap analysis.

example(example)

example :
    tcount (unary_lookup_multi_iteration 6
              (List.replicate 64 ([], [])))
    = 14 * 6 * (List.replicate 64 (([] : List Nat), ([] : List Nat))).length

*RSA-2048 lookup multi-iteration symbolic form** (Iter 262): parametric `14 · n_addr · |iters|` instantiated at (6, 64).

example(example)

example (addr_flip_idxs word_cnot_idxs : List Nat) :
    tcount (unary_lookup_iteration
              qianxu_E9_q_a_RSA2048
              addr_flip_idxs word_cnot_idxs)
      = unary_lookup_iteration_RSA2048_T_count_verified

*Bridge: verified single-iter T-count matches the RSA-2048 paper-claim anchor** (Iter 263).

example(example)

example :
    tcount (unary_lookup_multi_iteration
              qianxu_E9_q_a_RSA2048
              (List.replicate (2 ^ qianxu_E9_q_a_RSA2048)
                ([], [])))
      = unary_lookup_multi_RSA2048_no_meas_T_count_verified

*Bridge: verified multi-iter T-count matches the RSA-2048 no-measurement paper-claim anchor** (Iter 263).

FormalRV.Arithmetic.Windowed

FormalRV/Arithmetic/Windowed.lean

# FormalRV.Arithmetic.Windowed Windowed (Gidney-style) modular-multiplication arithmetic: base-2^w digit expansion, the windowed multiplier circuit, lookup-addition, ℚ-valued resource/cost models, and qubit-width counts. Relocated from `FormalRV/Shor/` (2026-06-09). These are pure L2 arithmetic gadgets — no order-finding / QPE / eigenstate / success-probability content — so they live under `Arithmetic/` with the other logical gadgets.

(no documented top-level declarations)

FormalRV.Arithmetic.Windowed.WindowedArith

FormalRV/Arithmetic/Windowed/WindowedArith.lean

FormalRV.Shor.WindowedArith — Phase D, the parametric windowed-arithmetic core. Gidney's windowed multiplication (arXiv:1905.07682) computes `k · x` by splitting `x` into `w`-bit windows and, for each window, looking up `k · windowⱼ(x)` from a precomputed table and adding it shifted by `j·w`. The reason this computes the right product is the base-`2^w` (windowed) digit expansion — pure number theory, independent of the QROM gate realization: x = Σⱼ windowⱼ(x) · (2^w)^j, windowⱼ(x) = (x / (2^w)^j) % 2^w (x < (2^w)^n) hence k · x = Σⱼ (k · windowⱼ(x)) · (2^w)^j. This file proves both (parametric in window size `w` and window count `n`), generalizing the hard-wired `windowSize = 2` model in `WindowedShorConnection`. Kernel-clean.

defwindow

def window (w x j : Nat) : Nat

The `j`-th width-`w` window (base-`2^w` digit) of `x`.

theoremwindowed_expansion

theorem windowed_expansion (w : Nat) :
    ∀ (n x : Nat), x < (2 ^ w) ^ n →
      x = ∑ j ∈ Finset.range n, window w x j * (2 ^ w) ^ j
  | 0, x, hx =>

*Windowed digit expansion.** Any `x < (2^w)^n` is the sum of its `n` width-`w` windows, each weighted by `(2^w)^j`. (The base-`2^w` positional expansion.)

theoremwindowed_mul

theorem windowed_mul (w n k x : Nat) (hx : x < (2 ^ w) ^ n) :
    k * x = ∑ j ∈ Finset.range n, (k * window w x j) * (2 ^ w) ^ j

*Windowed multiplication identity.** Multiplying by `k` distributes over the windowed expansion: `k · x = Σⱼ (k · windowⱼ(x)) · (2^w)^j`. Each summand `k · windowⱼ(x)` is exactly the value the QROM table provides at window `j`.

theoremwindow_lt

theorem window_lt (w x j : Nat) : window w x j < 2 ^ w

Each window is a `w`-bit value (`< 2^w`): the lookup address space.

theoremwindowed_exp

theorem windowed_exp (cexp n g e : Nat) (he : e < (2 ^ cexp) ^ n) :
    g ^ e = ∏ i ∈ Finset.range n, g ^ (window cexp e i * (2 ^ cexp) ^ i)

*`c_exp` exponentiation windowing (arbitrary `c_exp`), proved.** `g^e` factors over the `c_exp`-bit windows of `e`: each window `i` contributes the partial multiplicand `g^{windowᵢ(e) · 2^{i·c_exp}}` — exactly the value looked up for that exponent window (8-hours `main.tex:581`). Multiplicative analogue of `windowed_mul`.

theoremwindowed_exp_modProduct

theorem windowed_exp_modProduct (cexp n g N e : Nat) (he : e < (2 ^ cexp) ^ n) :
    (∏ i ∈ Finset.range n, g ^ (window cexp e i * (2 ^ cexp) ^ i) % N) % N = g ^ e % N

The modular form: multiplying the per-exponent-window partial multiplicands (each reduced mod `N`) computes `g^e mod N` — the windowed modular exponentiation.

deftableValue

def tableValue (a N w j v : Nat) : Nat

The QROM table entry for window `j` of the modular product-add `acc += a·y mod N` (Gidney 1905.07682 l.408–411; matches `VerifiedShor.tableValue`): the value `a · 2^{jw} · v` reduced mod `N`, looked up at address `v = windowⱼ(y)`.

theoremwindowed_modProductAdd

theorem windowed_modProductAdd (w n a N y : Nat) (hy : y < (2 ^ w) ^ n) :
    (∑ j ∈ Finset.range n, tableValue a N w j (window w y j)) % N = (a * y) % N

*Windowed modular product-addition (parametric `w`), proved.** Summing the per-window table lookups and reducing mod `N` computes `a·y mod N` — the correctness of the windowed multiplier, generalizing the hard-wired `w = 2` model. (Gidney 1905.07682 l.296: the windowed `x += k·y`.)

theoremwindowed_modProductAdd_acc

theorem windowed_modProductAdd_acc (w n a N y acc : Nat) (hy : y < (2 ^ w) ^ n) :
    (acc + ∑ j ∈ Finset.range n, tableValue a N w j (window w y j)) % N
      = (acc + a * y) % N

With a running accumulator: the windowed lookup-add nets `(acc + a·y) mod N`.

defwindowedLookupFold

def windowedLookupFold (a N w : Nat) (val : Nat → Nat) : Nat → Nat → Nat
  | 0, acc => acc
  | n + 1, acc => (windowedLookupFold a N w val n acc + tableValue a N w n (val n)) % N

The **circuit-aligned fold**: process windows one at a time, each window doing a modular lookup-add `acc ← (acc + tableValueⱼ) mod N`. This is exactly the accumulator update the multi-window lookup-add circuit produces (one `lookupAddGate` per window, mod-reducing after each), so its value-correctness transfers to the Boolean circuit layer.

theoremwindowedLookupFold_eq

theorem windowedLookupFold_eq (a N w : Nat) (val : Nat → Nat) (acc : Nat) (hacc : acc < N) :
    ∀ n, windowedLookupFold a N w val n acc
        = (acc + ∑ j ∈ Finset.range n, tableValue a N w j (val j)) % N

The fold (per-step mod-add) agrees with the sum-then-mod form.

theoremwindowedLookupFold_modProductAdd

theorem windowedLookupFold_modProductAdd (a N w n y acc : Nat) (hacc : acc < N)
    (hy : y < (2 ^ w) ^ n) :
    windowedLookupFold a N w (window w y) n acc = (acc + a * y) % N

*Phase-D windowed-multiplier value-correctness (parametric `w`), proved.** Folding modular lookup-adds over the `n` windows of `y` computes `(acc + a·y) mod N`.

theoremwindowedLookupFold_eq_modmul

theorem windowedLookupFold_eq_modmul (a N w n x : Nat) (hN : 0 < N) (hx : x < (2 ^ w) ^ n) :
    windowedLookupFold a N w (window w x) n 0 = (a * x) % N

*Interface to the Shor oracle contract `MultiplyCircuitProperty`.** Starting from a zero accumulator, the windowed multiplier computes exactly `(a·x) mod N` — the value `MultiplyCircuitProperty a N` requires of a modular-multiplication oracle. So the parametric windowed circuit produces the SAME modmul value the existing `ModMulImpl`/`VerifiedModMulFamily` interface consumes (lifted to `uc_eval` via the `Gate → BaseUCom` basis-action adapter).

theoremaddress_concat

theorem address_concat (mWin ei mi : Nat) (hmi : mi < 2 ^ mWin) :
    (ei * 2 ^ mWin + mi) / 2 ^ mWin = ei ∧ (ei * 2 ^ mWin + mi) % 2 ^ mWin = mi

*c_exp address concatenation (proved).** An address built as `ei·2^{mWin} + mi` (exponent window in the high bits, factor window in the low bits) splits back into its two windows. This is why one lookup over the concatenated address realizes the two-argument `c_exp` table — no new circuit, just a wider address.

defmodPow

def modPow (g e N : Nat) : Nat

Modular exponentiation `g^e mod N` (the value the windowed exponentiation targets).

defWindowedExpCorrect

def WindowedExpCorrect (windowedExp : Nat → Nat → Nat) (g N : Nat) : Prop

*Named obligation — Gidney 1905.07682 l.502** ("we have tested that the above code returns the correct result in randomly chosen cases"). A value map `windowedExp` realizes windowed modular EXPONENTIATION when, on input `x` and exponent `e`, it yields `(x · g^e) mod N`. The single exponent-window multiply-add is now PROVEN at the Gate level (`expWindowPassOf_correct`, `WindowedExpStep.lean` — adder-generic, via the concatenated exp‖mul lookup address); what this named obligation still tracks is the GLOBAL composition over all `n_e/c_exp` windows (per-pass accumulator hand-off + inverse-clear + swap), the empirically-validated fact of Gidney l.502.

FormalRV.Arithmetic.Windowed.WindowedCircuit

FormalRV/Arithmetic/Windowed/WindowedCircuit.lean

FormalRV.Shor.WindowedCircuit — Phase D, the FULL windowed-multiplier LOGICAL CIRCUIT for arbitrary window size, with integers encoded in logical qubits. This is the concrete `Gate`-IR construction (not just the arithmetic identities of `WindowedArith`): a real circuit that, on a register holding the integer `y`, computes `acc += a·y` (Gidney's windowed product-addition, 1905.07682 l.296–345), built per window from the proven babbush2018 QROM read (`BQAlgo.unary_lookup_multi_iteration`), and the proven Cuccaro ripple adder (`BQAlgo.cuccaro_n_bit_adder_full`), with the lookup word register laid out AS the adder's addend (the layout that makes read·add·unread compose), and each `y`-window CX-copied into the lookup address register. Integers are genuinely encoded in qubits (`encodeReg`). This file also computes the circuit's **Toffoli count** in closed form (kernel-clean) and compares it to the resource numbers reported in Gidney–Ekerå (see `windowedMulCircuit_toffoli` and the comparison note). Execution on encoded data is checked in `FormalRV.Shor.WindowedCircuitExec`.

defencodeReg

def encodeReg (base bits x : Nat) : Nat → Bool

Encode integer `x` into the `bits` logical qubits `[base, base+bits)`: qubit `base+i` carries bit `i` of `x`.

defaddendIdx

def addendIdx (q_start j : Nat) : Nat

Cuccaro full-adder addend bit `j` sits at `q_start + 2j + 2`. (= `cuccaroAdder.addendIdx q_start j`, kept as a standalone def so the concrete `q_start + 2j + 2` shape is available definitionally to consumers.)

defwordCnotsAt

def wordCnotsAt (pos : Nat → Nat) (W Tv : Nat) : List Nat

Word-CNOT targets for entry value `Tv`, placed at arbitrary positions `pos j` (here: the adder's addend bits).

deflookupReadAt

def lookupReadAt (w : Nat) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) : Gate

The QROM read writing `T[address]` directly into the positions `pos` (the adder addend), reusing the proven `unary_lookup_multi_iteration`.

defcopyWindow

def copyWindow (w yBase j : Nat) : Gate

CX-copy window `j` of the `y`-register (`yBase`-based) into the lookup address register `ulookup_address_idx 0 .. w-1`. Self-inverse, so re-applying uncopies.

defdecodeAccOf

def decodeAccOf (A : Adder) (f : Nat → Bool) (q_start bits : Nat) : Nat

*Generic accumulator decode.** Read the augend register of adder `A` at base `q_start`, `bits` qubits wide, LSB-first. (`A.augendIdx q_start i` holds bit `i`.)

deflookupAddAtOf

def lookupAddAtOf (A : Adder) (w W : Nat) (T : Nat → Nat) (bits q_start : Nat) : Gate

*Generic lookup-ADDITION.** Gidney l.276 read·add·unread, with the lookup word register laid out AS adder `A`'s addend register (`A.addendIdx q_start`) so the read·add·unread composes.

defwindowStepOf

def windowStepOf (A : Adder) (w W a : Nat) (bits q_start yBase j : Nat) : Gate

*Generic window step.** Copy window `j` into the lookup address, lookup-add the entry `T_j[v] = a·(2^w)^j·v` into adder `A`, then uncopy.

defwindowedMulOf

def windowedMulOf (A : Adder) (w W a : Nat) (bits q_start yBase numWin : Nat) : Gate

*Generic windowed multiplier**, a fold of window-steps over adder `A`.

defwindowedMulCircuitOf

def windowedMulCircuitOf (A : Adder) (w bits a numWin : Nat) : Gate

*The full windowed-multiplier circuit over an arbitrary adder `A`.** Layout: `ctrl=0`; address bits `1,3,…,2w−1`; AND-ancillas `2,4,…,2w`; the adder region at `q_start = 1+2w` (spanning `A.span bits` qubits); the `y`-register at `yBase = q_start + A.span bits`. On `acc=0` it leaves `a·y mod 2^bits` in the accumulator (when `A.span bits` is the adder's true span).

defwindowStepTOf

def windowStepTOf (A : Adder) (w W : Nat) (T : Nat → Nat) (bits q_start yBase j : Nat) : Gate

*Table-generic window step.** Like `windowStepOf` but with an ARBITRARY lookup table `T : Nat → Nat` for this window (instead of the hard-wired `fun v => a·(2^w)^j·v`).

defwindowedMulTOf

def windowedMulTOf (A : Adder) (w W : Nat) (Tfam : Nat → Nat → Nat)
    (bits q_start yBase numWin : Nat) : Gate

*Table-generic windowed multiplier**, a fold of table-generic window-steps with a per-window table family `Tfam : Nat → Nat → Nat` (`Tfam j` = the table for window `j`).

defwindowedMulCircuitTOf

def windowedMulCircuitTOf (A : Adder) (w bits : Nat) (Tfam : Nat → Nat → Nat)
    (numWin : Nat) : Gate

*The full table-generic windowed-multiplier circuit**, standard layout (identical to `windowedMulCircuitOf`'s). Recovers `windowedMulCircuitOf` at `Tfam := fun j v => a·(2^w)^j·v`, and the reduced coset multiplier at `Tfam := tableValue a N w` (= `fun j v => (a·(2^w)^j·v) % N`).

defdecodeAcc

def decodeAcc (f : Nat → Bool) (q_start bits : Nat) : Nat

Decode the accumulator register: Cuccaro's running-sum bit `i` lives at `q_start + 2i + 1`. (= `decodeAccOf cuccaroAdder`, defeq to the old fold.)

deflookupAddAt

def lookupAddAt (w W : Nat) (T : Nat → Nat) (bits q_start : Nat) : Gate

One lookup-ADDITION targeting the Cuccaro adder at `q_start` (Gidney l.276, read·add·unread), with the word register = the addend register. (= `lookupAddAtOf cuccaroAdder`, defeq to the old hard-wired version.)

defwindowStep

def windowStep (w W a : Nat) (bits q_start yBase j : Nat) : Gate

One window step: copy the window into the address, lookup-add the entry `T_j[v] = a·(2^w)^j·v`, then uncopy. (= `windowStepOf cuccaroAdder`.)

defwindowedMul

def windowedMul (w W a : Nat) (bits q_start yBase numWin : Nat) : Gate

The windowed multiplier as a fold of window-steps (parametric in `w`, `numWin`). (= `windowedMulOf cuccaroAdder`.)

defwindowedMulCircuit

def windowedMulCircuit (w bits a numWin : Nat) : Gate

*The full windowed-multiplier circuit, standard layout, arbitrary `w`.** Layout: `ctrl=0`; address bits `1,3,…,2w−1`; AND-ancillas `2,4,…,2w`; the Cuccaro region at `q_start = 1+2w` (carry, then interleaved acc/addend up to `q_start+2·bits`); the `y`-register at `yBase = q_start + 2·bits + 1`. On `acc=0` it leaves `a·y mod 2^bits` in the accumulator. (= `windowedMulCircuitOf cuccaroAdder`; since `cuccaroAdder.span bits = 2·bits+1`, the `yBase = 1+2w + A.span bits` of the generic engine is defeq to the old `1+2w+2·bits+1`.)

defmulInput

def mulInput (w bits numWin y : Nat) : Nat → Bool

The input store: control qubit set, integer `y` encoded in the `y`-register.

defaccStart

def accStart (w : Nat) : Nat

The accumulator's `q_start` for a `w`-window circuit.

theoremtcount_foldl_seq_const

theorem tcount_foldl_seq_const {α : Type} (step : α → Gate) (C : Nat)
    (hstep : ∀ a, tcount (step a) = C) (L : List α) (init : Gate) :
    tcount (L.foldl (fun g a => Gate.seq g (step a)) init) = tcount init + L.length * C

Generic T-count of a left-fold of `seq`-appended steps with constant per-step cost.

theoremtcount_unary_lookup_iteration

theorem tcount_unary_lookup_iteration (w : Nat) (flips cnots : List Nat) :
    tcount (unary_lookup_iteration w flips cnots) = 14 * w

One unary-lookup iteration: `2·w` Toffolis (`14·w` T) — forward + reverse cascade, no measurement optimization. (X/CX layers are T-free.)

theoremtcount_unary_lookup_multi_iteration

theorem tcount_unary_lookup_multi_iteration (w : Nat) (iters : List (List Nat × List Nat)) :
    tcount (unary_lookup_multi_iteration w iters) = 14 * w * iters.length

The multi-iteration read: `14·w` T per iteration.

theoremtcount_lookupReadAt

theorem tcount_lookupReadAt (w : Nat) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat) :
    tcount (lookupReadAt w pos W T) = 14 * w * 2 ^ w

A full table read over a `2^w`-entry table: `14·w·2^w` T (= `2·w·2^w` Toffolis).

theoremtcount_copyWindow

theorem tcount_copyWindow (w yBase j : Nat) : tcount (copyWindow w yBase j) = 0

theoremtcount_lookupAddAtOf

theorem tcount_lookupAddAtOf (A : Adder) (w W : Nat) (T : Nat → Nat) (bits q_start : Nat) :
    tcount (lookupAddAtOf A w W T bits q_start)
      = 2 * (14 * w * 2 ^ w) + tcount (A.circuit bits q_start)

*Generic lookup-add T-count.** Two table reads plus one adder application: `2·(14·w·2^w) + tcount (A.circuit bits q_start)`.

theoremtcount_windowStepOf

theorem tcount_windowStepOf (A : Adder) (w W a bits q_start yBase j : Nat) :
    tcount (windowStepOf A w W a bits q_start yBase j)
      = 2 * (14 * w * 2 ^ w) + tcount (A.circuit bits q_start)

*Generic window-step T-count.** The window copy/uncopy are Toffoli-free, so the cost is exactly the per-step lookup-add cost.

theoremtcount_windowedMulOf

theorem tcount_windowedMulOf (A : Adder) (w W a bits q_start yBase numWin : Nat) :
    tcount (windowedMulOf A w W a bits q_start yBase numWin)
      = numWin * (2 * (14 * w * 2 ^ w) + tcount (A.circuit bits q_start))

*Generic windowed-multiplier T-count.** `numWin` identical window steps.

theoremtcount_windowedMulCircuitOf

theorem tcount_windowedMulCircuitOf (A : Adder) (w bits a numWin : Nat) :
    tcount (windowedMulCircuitOf A w bits a numWin)
      = numWin * (28 * w * 2 ^ w + tcount (A.circuit bits (1 + 2 * w)))

*Generic closed-form T-count of the full windowed multiplier over adder `A`.** The adder is run at base `q_start = 1+2·w` in each of the `numWin` windows.

theoremtcount_lookupAddAt

theorem tcount_lookupAddAt (w W : Nat) (T : Nat → Nat) (bits q_start : Nat) :
    tcount (lookupAddAt w W T bits q_start) = 2 * (14 * w * 2 ^ w) + 14 * bits

theoremtcount_windowStep

theorem tcount_windowStep (w W a bits q_start yBase j : Nat) :
    tcount (windowStep w W a bits q_start yBase j) = 2 * (14 * w * 2 ^ w) + 14 * bits

theoremtcount_windowedMul

theorem tcount_windowedMul (w W a bits q_start yBase numWin : Nat) :
    tcount (windowedMul w W a bits q_start yBase numWin)
      = numWin * (2 * (14 * w * 2 ^ w) + 14 * bits)

theoremtcount_windowedMulCircuit

theorem tcount_windowedMulCircuit (w bits a numWin : Nat) :
    tcount (windowedMulCircuit w bits a numWin) = numWin * (28 * w * 2 ^ w + 14 * bits)

*Closed-form T-count of the full windowed multiplier (arbitrary `w`).**

deftoffoliCount

def toffoliCount (g : Gate) : Nat

Toffoli count = number of `CCX` gates = `tcount / 7` (only `CCX` has nonzero T-count, `7` each; this is precisely the count the PPM layer turns into magic-state requests).

theoremwindowedMulCircuit_toffoli

theorem windowedMulCircuit_toffoli (w bits a numWin : Nat) :
    toffoliCount (windowedMulCircuit w bits a numWin) = numWin * (4 * w * 2 ^ w + 2 * bits)

*Closed-form Toffoli count of the full windowed multiplier (arbitrary `w`).** `numWin · (4·w·2^w + 2·bits)` Toffolis — `numWin` windows, each two `2·w·2^w`-Toffoli table reads (read + uncompute) and one `2·bits`-Toffoli Cuccaro add.

theoremwindowedMulCircuit_toffoli_padded

theorem windowedMulCircuit_toffoli_padded (w n pad a numWin : Nat) :
    toffoliCount (windowedMulCircuit w (n + pad) a numWin)
      = numWin * (4 * w * 2 ^ w + 2 * n + 2 * pad)

*The `lg n` Toffoli factor comes from the verified STRUCTURE, not a coefficient.** On a register padded to width `n + pad` (the coset representation needs `pad = g_pad ≈ 3 lg n` padding qubits), the structurally-derived count (`windowedMulCircuit_toffoli`, proven by `tcount` recursion on the actual `Gate`) gains `2·pad` Toffolis per window — the Cuccaro adder structurally processing the padding qubits. With `pad = 3 lg n` the contribution `6·numWin·lg n` is read off the verified structure; it is NOT inserted by hand.

defmaxIdx

def maxIdx : Gate → Nat
  | .I          => 0
  | .X q        => q
  | .CX c t     => max c t
  | .CCX a b c  => max a (max b c)
  | .seq g₁ g₂  => max (maxIdx g₁) (maxIdx g₂)

The highest qubit index a circuit touches (`0` for the empty circuit).

defwidth

def width (g : Gate) : Nat

The structural qubit count (width) of a circuit: one more than the highest index it acts on. This is COMPUTED from the `Gate` term, so any padding qubits the circuit actually touches are counted — the qubit count reflects the real circuit.

theoremwidth_windowedMulCircuit_2_4_3_2

theorem width_windowedMulCircuit_2_4_3_2 :
    width (windowedMulCircuit 2 4 3 2) = 18

*Structural qubit count is computed from the circuit** (kernel `decide`): the verified `windowedMulCircuit` at `w=2, bits=4, numWin=2` genuinely uses `18` qubit lines.

theoremwidth_windowedMulCircuit_padding

theorem width_windowedMulCircuit_padding :
    width (windowedMulCircuit 2 (4 + 3) 3 2) = width (windowedMulCircuit 2 4 3 2) + 2 * 3

*Padding qubits are counted by the structure** (kernel `decide`): padding the register by `pad = 3` (the coset representation's `g_pad ≈ 3 lg n` padding) makes the verified circuit structurally `2·pad = 6` qubits WIDER — the `lg n` qubit term is not a formula coefficient, it is the padding qubits the `Gate` actually acts on.

defcomposedModExp

def composedModExp (numMults w bits a numWin : Nat) : Gate

`numMults` copies of the windowed multiplier in sequence (the modular-exponentiation skeleton: one multiply per exponent step).

theoremtcount_composedModExp

theorem tcount_composedModExp (numMults w bits a numWin : Nat) :
    tcount (composedModExp numMults w bits a numWin)
      = numMults * tcount (windowedMulCircuit w bits a numWin)

theoremcomposedModExp_toffoli

theorem composedModExp_toffoli (numMults w bits a numWin : Nat) :
    toffoliCount (composedModExp numMults w bits a numWin)
      = numMults * (numWin * (4 * w * 2 ^ w + 2 * bits))

*Structural Toffoli count of the full (unoptimized) modular exponentiation**, derived by `tcount` recursion on the composed `Gate`: `numMults · numWin · (4·w·2^w + 2·bits)`. With `numMults = n_e`, `numWin = n/w`, `bits = n + g_pad`, this is the genuine circuit-level count of THIS construction (≈ `6 n³` at `w = lg n`, the no-optimization value); the paper's `0.3 n³` requires Gray-code + measurement-uncompute (the `4·w·2^w → 2^w` lookup optimization) and oblivious runways, which this building block omits.

FormalRV.Arithmetic.Windowed.WindowedCircuitCorrect

FormalRV/Arithmetic/Windowed/WindowedCircuitCorrect.lean

FormalRV.Shor.WindowedCircuit.WindowedCircuitCorrect — the adder-generic windowed-multiplier VALUE theorem. HEADLINE (`windowedMulCircuitOf_correct`): for ANY adder `A` satisfying the layout-parametric `Adder` interface, the full windowed multiplier `windowedMulCircuitOf A w bits a numWin`, run on the input state `mulInputOf A w bits numWin y` (ctrl set, integer `y` encoded in the y-register, everything else clean), leaves (a · y) mod 2^bits in the accumulator (`decodeAccOf A · (1+2w) bits`), provided `0 < w`, `y < 2^(w·numWin)` and the adder's ancilla block starts clean. Proof: invariant induction over the window-steps. After `j` steps the state `g` satisfies the four-part invariant `StepInv`: (F) frame: `g` agrees with the input off the adder block; (D) the addend register is clean (all `false`); (C) the adder's ancilla block is clean (`A.ancClean`); (V) the augend register decodes to `(Σ_{k<j} a·(2^w)^k·windowₖ(y)) mod 2^bits`. Each window-step is tracked through its five sub-gates (copy · read · add · unread · uncopy) using the proven selection/frame lemmas of `WindowedLookupSelect` / `WindowedCopySemantics` and the `Adder` contract. The final bridge to `a·y` is `WindowedArith.windowed_mul`. Corollaries: the Cuccaro and Gidney instances with the `ancClean` precondition discharged concretely. Kernel-clean: no `sorry`, no `native_decide`, no axioms beyond the prelude.

defmulInputOf

def mulInputOf (A : Adder) (w bits numWin y : Nat) : Nat → Bool

The input store for the generic windowed multiplier over adder `A`: control qubit set, integer `y` encoded in the y-register at `yBase = 1 + 2w + A.span bits`, everything else clean. (Generic version of the Cuccaro-shaped `mulInput`.)

theoremmulInputOf_ctrl

theorem mulInputOf_ctrl (A : Adder) (w bits numWin y : Nat) :
    mulInputOf A w bits numWin y ulookup_ctrl_idx = true

`mulInputOf` reads `true` at the control qubit.

theoremmulInputOf_low

theorem mulInputOf_low (A : Adder) (w bits numWin y p : Nat)
    (hp0 : p ≠ ulookup_ctrl_idx) (hpy : p < 1 + 2 * w + A.span bits) :
    mulInputOf A w bits numWin y p = false

`mulInputOf` reads `false` at every non-control position below the y-register (`p < yBase = 1 + 2w + A.span bits`).

theoremmulInputOf_eq_encodeReg

theorem mulInputOf_eq_encodeReg (A : Adder) (w bits numWin y p : Nat)
    (hp : p ≠ ulookup_ctrl_idx) :
    mulInputOf A w bits numWin y p
      = encodeReg (1 + 2 * w + A.span bits) (numWin * w) y p

Off the control qubit, `mulInputOf` is the `encodeReg` encoding of `y`.

theoremdecodeReg_eq_mod_of_testBit

theorem decodeReg_eq_mod_of_testBit (idx : Nat → Nat) (n M : Nat) (f : Nat → Bool)
    (h : ∀ i, i < n → f (idx i) = M.testBit i) :
    decodeReg idx n f = M % 2 ^ n

*Register decode of a bit-pattern.** If the register at `idx` holds the binary digits of `M` (bit `i` at `idx i`), it decodes to `M % 2^n`.

theoremdecodeReg_eq_zero

theorem decodeReg_eq_zero (idx : Nat → Nat) (n : Nat) (f : Nat → Bool)
    (h : ∀ i, i < n → f (idx i) = false) :
    decodeReg idx n f = 0

A register reading all-`false` decodes to `0`.

defStepInv

def StepInv (A : Adder) (w bits numWin y s : Nat) (g : Nat → Bool) : Prop

*The window-step invariant.** After some number of window-steps starting from `mulInputOf A w bits numWin y`, the state `g` satisfies: (F) frame: `g` agrees with the input off the adder block `[1+2w, 1+2w + A.span bits)` — in particular ctrl is set, the lookup's address/AND registers are clean, and the y-register still encodes `y`; (D) the addend register is clean; (C) the adder's ancilla block is clean; (V) the augend register decodes to `s % 2^bits` (the partial sum so far).

theoremstepInv_init

theorem stepInv_init (A : Adder) (w bits numWin y : Nat)
    (hclean : A.ancClean (mulInputOf A w bits numWin y) bits (1 + 2 * w)) :
    StepInv A w bits numWin y 0 (mulInputOf A w bits numWin y)

*Invariant initialization.** The input state satisfies the invariant with partial sum `0`.

theoremstepInv_stepT

theorem stepInv_stepT (A : Adder) (w bits : Nat) (T : Nat → Nat) (numWin y : Nat)
    (hw : 0 < w)
    (j : Nat) (hj : j < numWin) (s : Nat) (g : Nat → Bool)
    (hg : StepInv A w bits numWin y s g) :
    StepInv A w bits numWin y (s + T (WindowedArith.window w y j))
      (Gate.applyNat
        (windowStepTOf A w bits T bits (1 + 2 * w) (1 + 2 * w + A.span bits) j)
        g)

theoremstepInv_step

theorem stepInv_step (A : Adder) (w bits a numWin y : Nat) (hw : 0 < w)
    (j : Nat) (hj : j < numWin) (s : Nat) (g : Nat → Bool)
    (hg : StepInv A w bits numWin y s g) :
    StepInv A w bits numWin y (s + a * (2 ^ w) ^ j * WindowedArith.window w y j)
      (Gate.applyNat
        (windowStepOf A w bits a bits (1 + 2 * w) (1 + 2 * w + A.span bits) j)
        g)

*The original (hard-wired table) window step**, recovered as the `T := fun v => a·(2^w)^j·v` instance of `stepInv_stepT` (defeq: `windowStepOf` is `windowStepTOf` at that table). Statement and signature are byte-identical to the pre-generalization theorem, so all downstream callers are unchanged.

theoremstepInv_foldT

theorem stepInv_foldT (A : Adder) (w bits : Nat) (Tfam : Nat → Nat → Nat) (numWin y : Nat)
    (hw : 0 < w)
    (hclean : A.ancClean (mulInputOf A w bits numWin y) bits (1 + 2 * w)) :
    ∀ n, n ≤ numWin →
      StepInv A w bits numWin y
        (∑ k ∈ Finset.range n, Tfam k (WindowedArith.window w y k))
        (Gate.applyNat
          (windowedMulTOf A w bits Tfam bits (1 + 2 * w) (1 + 2 * w + A.span bits) n)
          (mulInputOf A w bits numWin y))

*Table-generic fold.** Running the first `n` table-generic window-steps (`n ≤ numWin`) establishes the invariant with partial sum `Σ_{k<n} Tfam k (windowₖ(y))`. The hard-wired `stepInv_fold` is the `Tfam k := fun v => a·(2^w)^k·v` instance; the reduced coset multiplier is `Tfam := tableValue a N w`.

theoremstepInv_fold

theorem stepInv_fold (A : Adder) (w bits a numWin y : Nat) (hw : 0 < w)
    (hclean : A.ancClean (mulInputOf A w bits numWin y) bits (1 + 2 * w)) :
    ∀ n, n ≤ numWin →
      StepInv A w bits numWin y
        (∑ k ∈ Finset.range n, a * (2 ^ w) ^ k * WindowedArith.window w y k)
        (Gate.applyNat
          (windowedMulOf A w bits a bits (1 + 2 * w) (1 + 2 * w + A.span bits) n)
          (mulInputOf A w bits numWin y))

*The original (hard-wired table) fold**, recovered as the `Tfam k := fun v => a·(2^w)^k·v` instance of `stepInv_foldT`. Statement and signature are byte-identical to the pre-generalization theorem.

theoremwindowedMulCircuitOf_correct

theorem windowedMulCircuitOf_correct (A : Adder) (w bits a numWin y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin))
    (hclean : A.ancClean (mulInputOf A w bits numWin y) bits (1 + 2 * w)) :
    decodeAccOf A (Gate.applyNat (windowedMulCircuitOf A w bits a numWin)
        (mulInputOf A w bits numWin y)) (1 + 2 * w) bits
      = (a * y) % 2 ^ bits

*HEADLINE — adder-generic windowed-multiplier VALUE theorem.** For ANY adder `A` (Cuccaro, Gidney, …), the full windowed multiplier `windowedMulCircuitOf A w bits a numWin`, run on the encoded input `mulInputOf A w bits numWin y` (ctrl set, `y` in the y-register, everything else clean), leaves `(a·y) mod 2^bits` in the accumulator — provided `0 < w`, `y < 2^(w·numWin)`, and the adder's ancilla block starts clean.

theoremwindowedMulCircuit_correct_cuccaro

theorem windowedMulCircuit_correct_cuccaro (w bits a numWin y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin)) :
    decodeAccOf cuccaroAdder
        (Gate.applyNat (windowedMulCircuitOf cuccaroAdder w bits a numWin)
          (mulInputOf cuccaroAdder w bits numWin y)) (1 + 2 * w) bits
      = (a * y) % 2 ^ bits

*Cuccaro instance.** `cuccaroAdder.ancClean` is `f (1+2w) = false` — the carry-in qubit sits at the block base, below the y-register, so the input state reads it `false`.

theoremwindowedMulCircuit_correct_gidney

theorem windowedMulCircuit_correct_gidney (w bits a numWin y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin)) :
    decodeAccOf gidneyAdder
        (Gate.applyNat (windowedMulCircuitOf gidneyAdder w bits a numWin)
          (mulInputOf gidneyAdder w bits numWin y)) (1 + 2 * w) bits
      = (a * y) % 2 ^ bits

*Gidney instance.** `gidneyAdder.ancClean` is `∀ i < bits, f ((1+2w) + 3i + 2) = false` — every carry qubit lies inside the block, below the y-register, so the input state reads it `false`.

FormalRV.Arithmetic.Windowed.WindowedCopySemantics

FormalRV/Arithmetic/Windowed/WindowedCopySemantics.lean

FormalRV.Shor.WindowedCopySemantics — window-copy semantics + window-bit bridge. Boolean-function (`Gate.applyNat`) semantics of `copyWindow` — the CX cascade that copies window `j` of the `y`-register into the unary-lookup address register — plus the bridge from the `encodeReg` qubit encoding of `y` to the pure windowed digits `WindowedArith.window`. Contents: `applyNat_cx_cascade_frame` / `applyNat_cx_cascade_at` — generic semantics of a parallel-CX cascade `foldl (fun g i => seq g (CX (ctrl i) (addr i)))` with pairwise-distinct targets and controls disjoint from targets; `copyWindow_at_addr` / `copyWindow_frame` — full post-state of `copyWindow`; `copyWindow_copies` — on a clean address register the copy writes the y-bits; `copyWindow_involutive(_apply)` — re-applying `copyWindow` uncopies; `window_testBit` / `encodeReg_window_bit` — bit `i` of `window w y j` is bit `j·w + i` of `y`, hence the `encodeReg`-encoded y-register qubit `yBase + j·w + i` carries exactly bit `i` of the window digit. Kernel-clean: no `sorry`, no `native_decide`, no axioms.

theoremapplyNat_cx_cascade_frame

theorem applyNat_cx_cascade_frame (ctrl addr : Nat → Nat) (f : Nat → Bool) :
    ∀ (n p : Nat), (∀ i, i < n → p ≠ addr i) →
      Gate.applyNat ((List.range n).foldl
          (fun g i => Gate.seq g (Gate.CX (ctrl i) (addr i))) Gate.I) f p = f p

*Cascade frame.** A position that is not one of the CX targets is untouched (no disjointness hypotheses needed: only targets are ever written).

theoremapplyNat_cx_cascade_at

theorem applyNat_cx_cascade_at (ctrl addr : Nat → Nat) (f : Nat → Bool) :
    ∀ (n : Nat),
      (∀ i k, i < n → k < n → i ≠ k → addr i ≠ addr k) →
      (∀ i k, i < n → k < n → ctrl i ≠ addr k) →
      ∀ i, i < n →
        Gate.applyNat ((List.range n).foldl
            (fun g i => Gate.seq g (Gate.CX (ctrl i) (addr i))) Gate.I) f (addr i)
          = xor (f (addr i)) (f (ctrl i))

*Cascade post-state at a target.** With pairwise-distinct targets and controls disjoint from targets, target `addr i` ends as the XOR of its original value with the ORIGINAL control value `f (ctrl i)` (later steps never read a wire an earlier step wrote).

theoremulookup_address_idx_ne

theorem ulookup_address_idx_ne (i k : Nat) (hne : i ≠ k) :
    ulookup_address_idx i ≠ ulookup_address_idx k

Distinct address indices live on distinct wires (`1 + 2i` is injective).

theoremctrl_ne_addr_of_le_yBase

theorem ctrl_ne_addr_of_le_yBase (w yBase j : Nat) (hyBase : 2 * w ≤ yBase) :
    ∀ i k, i < w → k < w → yBase + j * w + i ≠ ulookup_address_idx k

The standing disjointness hypothesis — the CX controls (y-register wires `yBase + j·w + i`) are not address wires — holds whenever `2·w ≤ yBase` (every address wire `1 + 2k ≤ 2w − 1 < yBase`).

theoremcopyWindow_frame

theorem copyWindow_frame (w yBase j : Nat) (f : Nat → Bool) (p : Nat)
    (hp : ∀ i, i < w → p ≠ ulookup_address_idx i) :
    Gate.applyNat (copyWindow w yBase j) f p = f p

*`copyWindow` frame.** Any wire that is not an address wire `ulookup_address_idx i` (`i < w`) is untouched by `copyWindow` — no disjointness hypothesis needed.

theoremcopyWindow_at_addr

theorem copyWindow_at_addr (w yBase j : Nat) (f : Nat → Bool)
    (hctrl : ∀ i k, i < w → k < w → yBase + j * w + i ≠ ulookup_address_idx k)
    (i : Nat) (hi : i < w) :
    Gate.applyNat (copyWindow w yBase j) f (ulookup_address_idx i)
      = xor (f (ulookup_address_idx i)) (f (yBase + j * w + i))

*`copyWindow` post-state at an address wire.** Provided no CX control is an address wire (`hctrl`; holds when `2·w ≤ yBase` by `ctrl_ne_addr_of_le_yBase`), address wire `i` ends as the XOR of its original value with y-register bit `f (yBase + j·w + i)`.

theoremcopyWindow_copies

theorem copyWindow_copies (w yBase j : Nat) (f : Nat → Bool)
    (hctrl : ∀ i k, i < w → k < w → yBase + j * w + i ≠ ulookup_address_idx k)
    (hclean : ∀ i, i < w → f (ulookup_address_idx i) = false)
    (i : Nat) (hi : i < w) :
    Gate.applyNat (copyWindow w yBase j) f (ulookup_address_idx i)
      = f (yBase + j * w + i)

*`copyWindow` copies.** On a CLEAN address register (all address wires `false`), `copyWindow` writes the y-register bits verbatim: address wire `i` ends as `f (yBase + j·w + i)`.

theoremcopyWindow_involutive_apply

theorem copyWindow_involutive_apply (w yBase j : Nat) (f : Nat → Bool)
    (hctrl : ∀ i k, i < w → k < w → yBase + j * w + i ≠ ulookup_address_idx k)
    (p : Nat) :
    Gate.applyNat (copyWindow w yBase j) (Gate.applyNat (copyWindow w yBase j) f) p
      = f p

*`copyWindow` is self-inverse (pointwise).** Re-applying the copy uncopies: each address wire XORs in the SAME control bit twice (the controls are outside the address region, so the first pass leaves them intact), and all other wires are framed.

theoremcopyWindow_involutive

theorem copyWindow_involutive (w yBase j : Nat) (f : Nat → Bool)
    (hctrl : ∀ i k, i < w → k < w → yBase + j * w + i ≠ ulookup_address_idx k) :
    Gate.applyNat (copyWindow w yBase j) (Gate.applyNat (copyWindow w yBase j) f) = f

*`copyWindow` is self-inverse (function form).**

theoremwindow_testBit

theorem window_testBit (w y j i : Nat) (hi : i < w) :
    (WindowedArith.window w y j).testBit i = y.testBit (j * w + i)

*Window bit = global bit.** Bit `i` of the `j`-th width-`w` window of `y` is bit `j·w + i` of `y` (for `i < w`). The Nat fact underlying the y-register ↔ window-digit bridge.

theoremencodeReg_window_bit

theorem encodeReg_window_bit (yBase w numWin y j i : Nat) (hi : i < w) (hj : j < numWin) :
    encodeReg yBase (numWin * w) y (yBase + j * w + i)
      = (WindowedArith.window w y j).testBit i

*y-register qubit ↔ window digit.** In the `encodeReg yBase (numWin·w) y` encoding of the y-register, the qubit at `yBase + j·w + i` carries exactly bit `i` of the window digit `window w y j` (for `i < w`, `j < numWin`). Combined with `copyWindow_copies`, the address register receives precisely the binary expansion of `window w y j` — the lookup address.

theoremcopyWindow_loads_window

theorem copyWindow_loads_window (w yBase numWin y j : Nat) (f : Nat → Bool)
    (hctrl : ∀ i k, i < w → k < w → yBase + j * w + i ≠ ulookup_address_idx k)
    (hclean : ∀ i, i < w → f (ulookup_address_idx i) = false)
    (henc : ∀ i, i < w → f (yBase + j * w + i) = encodeReg yBase (numWin * w) y (yBase + j * w + i))
    (hj : j < numWin) (i : Nat) (hi : i < w) :
    Gate.applyNat (copyWindow w yBase j) f (ulookup_address_idx i)
      = (WindowedArith.window w y j).testBit i

*The copy loads the window digit.** End-to-end corollary: on a state whose y-register region holds `encodeReg`-encoded `y` and whose address register is clean, `copyWindow w yBase j` leaves bit `i` of `window w y j` on address wire `i`. (Stated against an abstract `f` that AGREES with the encoding on the y-register region, so it applies to full-circuit states like `mulInput`.)

FormalRV.Arithmetic.Windowed.WindowedCoset

FormalRV/Arithmetic/Windowed/WindowedCoset.lean

FormalRV.Shor.WindowedCoset — the COSET-REPRESENTATION correctness bridge for Gidney–Ekerå's windowed modular arithmetic (1905.09749 §"coset representation of modular integers" + §"oblivious carry runways"). ════════════════════════════════════════════════════════════════════════════ WHAT THIS FILE PROVES (and what it does NOT) ════════════════════════════════════════════════════════════════════════════ The OPTIMAL-COUNT object is the mod-2^bits windowed multiplier `windowedMulCircuitOf` (proven in `WindowedCircuit` / `WindowedCircuitCorrect`): it computes `decodeAcc = (acc + a·y) mod 2^bits` and carries the paper's structural `0.3 n³` Toffoli count (`windowedMulCircuit_toffoli_padded` — the `g_pad` coset padding is ALREADY in the verified count). The EXACT mod-N multiplier (`WindowedModN`) is *more expensive* (per-window compare + conditional-subtract). Gidney's coset representation lets the CHEAP mod-2^bits multiplier compute the right value MOD N — no explicit reduction — as long as the padded register never wraps `2^bits`. *Coset representation.** A value `x mod N` is stored in a padded `bits = n + g_pad`-qubit register as ANY representative `v` with `v % N = x` and `v < 2^bits`. A PLAIN (non-modular) addition `v += t` then automatically yields a valid coset representative of `(x + t) mod N` — PROVIDED `v + t` does not wrap `2^bits`. The padding `g_pad ≈ 3·lg n + 10` bounds the wrap probability (`WindowedCostModel.totalDeviation ≈ 7.6·10⁻⁸`). STAGES LANDED (all kernel-clean, no `sorry`/`native_decide`/axioms): 1. `IsCosetRep` / `cosetValue` — the predicate + readout (§1). 2. `cosetAdd_correct` — EXACT, no approximation: under no-wrap, plain addition on a coset rep yields a coset rep of the modular sum (§2). This is the structural heart. 3. `windowedCosetMul_correct` — composing (2) with the verified `windowedMulCircuitOf_correct`: the OPTIMAL-COUNT mod-2^bits windowed multiplier, run on a coset-rep input under the no-wrap hypothesis, leaves a coset rep of `(a·y) mod N` in the accumulator (§3). 4. `noWrap_of_padding` — the exact SUFFICIENT no-wrap condition in terms of the register width and the running value bound; the per-add growth bound (`coset value grows by ≤ N`) and the `numAdds·N < 2^bits` slack (§4). STAGED / NAMED OBLIGATIONS (structures, NO `sorry`): • `CosetDeviationBound` (§5) — the PROBABILISTIC wrap bound: over the random coset offsets the paper uses, the probability of a wrap event during the whole exponentiation is `≤ WindowedCostModel.totalDeviation`. This is the measure-theoretic leg (random-offset averaging, Gidney Thm 2.10 subadditivity) that is beyond the deterministic machinery here; it is stated precisely as a structure carrying the bound, NOT assumed. Every DETERMINISTIC result above is proven WITHOUT it. • `ObliviousCarryRunway` (§6) — the runway-fold operation. Its STRUCTURAL cost is already verified: `windowedMulCircuit_toffoli_padded` shows the padded register contributes `2·pad` Toffolis/window — the `n·g_pad/g_sep` runway term of the paper. The runway *circuit* (piecewise additions over `g_sep`-separated runways) is documented as a structure obligation carrying its design + count; the structural padded count it must match is `windowedMulCircuit_toffoli_padded`, recorded as a field. ════════════════════════════════════════════════════════════════════════════ WHY THIS CLOSES THE AUDIT'S value↔count SPLIT ════════════════════════════════════════════════════════════════════════════ The audit kept `windowedMulCircuitOf` (cheap, mod-2^bits, optimal count) and `windowedModNMulCircuit` (exact mod-N, expensive) as separate objects: the cheap one had the count but the WRONG value (mod 2^bits, not mod N); the expensive one had the right value but not the optimal count. This file certifies the cheap object computes mod-N *in the coset representation* (exact under no-wrap, §3), so the OPTIMAL count of `windowedMulCircuitOf` IS the count of a mod-N-correct computation — modulo the named, honestly isolated `CosetDeviationBound` wrap obligation. Kernel-clean: no `sorry`, no `native_decide`, no axioms beyond the prelude.

defIsCosetRep

def IsCosetRep (bits N v target : Nat) : Prop

`v` (a `bits`-qubit register value) is a coset representative of `target` modulo `N`: it reduces to `target` and fits the padded register.

defcosetValue

def cosetValue (N v : Nat) : Nat

The coset READOUT: the true modular value held by representative `v`.

theoremcosetValue_of_isCosetRep

theorem cosetValue_of_isCosetRep {bits N v target : Nat}
    (h : IsCosetRep bits N v target) : cosetValue N v = target % N

The readout of any coset rep of `target` is `target % N`.

theoremisCosetRep_canonical

theorem isCosetRep_canonical {bits N target : Nat}
    (hfit : target % N < 2 ^ bits) :
    IsCosetRep bits N (target % N) target

`target % N` is itself a coset rep of `target` whenever it fits the register (the canonical, smallest representative).

theoremcosetAdd_correct

theorem cosetAdd_correct (bits N v t x r : Nat)
    (hv : IsCosetRep bits N v x) (ht : IsCosetRep bits N t r)
    (hnowrap : v + t < 2 ^ bits) :
    IsCosetRep bits N (v + t) (x + r)

*`cosetAdd_correct` — plain addition = modular addition in the coset rep.** If `v` is a coset rep of `x mod N` and `t` is a coset rep of `r mod N` (e.g. `t = r < N`), and the plain sum does not wrap (`v + t < 2^bits`), then the plain register sum `v + t` is itself a coset rep of `(x + r) mod N`. EXACT — no probability, conditioned only on the no-wrap hypothesis.

theoremcosetAdd_addend

theorem cosetAdd_addend (bits N v t x : Nat)
    (hv : IsCosetRep bits N v x)
    (hnowrap : v + t < 2 ^ bits) :
    IsCosetRep bits N (v + t) (x + t)

Convenience form: adding any addend `t` (a coset rep of itself, since `t % N = t % N` trivially) to a coset rep of `x`, without wrap, yields a coset rep of `(x + t) mod N`. This is the form the windowed multiplier uses: each window adds a table-row addend directly.

theoremcosetRep_of_modProduct

theorem cosetRep_of_modProduct (bits N a y : Nat)
    (hnowrap : a * y < 2 ^ bits) :
    IsCosetRep bits N ((a * y) % 2 ^ bits) (a * y)

The mod-2^bits product reduced into the coset rep equals the true value. Under no-wrap (`a·y < 2^bits`), `(a·y) mod 2^bits = a·y` is a coset rep of `(a·y) mod N`.

theoremwindowedCosetMul_correct

theorem windowedCosetMul_correct (A : Adder) (w bits a numWin N y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin))
    (hclean : A.ancClean (mulInputOf A w bits numWin y) bits (1 + 2 * w))
    (hnowrap : a * y < 2 ^ bits) :
    IsCosetRep bits N
      (decodeAccOf A (Gate.applyNat (windowedMulCircuitOf A w bits a numWin)
        (mulInputOf A w bits numWin y)) (1 + 2 * w) bits)
      (a * y)

*`windowedCosetMul_correct` — the optimal-count multiplier is mod-N correct in the coset representation.** For ANY adder `A`, the OPTIMAL-COUNT mod-2^bits windowed multiplier `windowedMulCircuitOf A w bits a numWin`, run on the clean input `mulInputOf A w bits numWin y`, leaves an accumulator value that is a COSET REPRESENTATIVE of `(a·y) mod N`, PROVIDED: • `0 < w`, `y < 2^(w·numWin)`, the adder's ancilla starts clean (the existing `windowedMulCircuitOf_correct` hypotheses), and • NO-WRAP: `a·y < 2^bits` (the padded register holds the full product). Its readout `cosetValue N (decodeAcc …) = (a·y) % N`. EXACT under no-wrap: the only thing standing between this and an unconditional mod-N certificate is the probabilistic wrap bound `CosetDeviationBound` (§5).

theoremwindowedCosetMul_readout

theorem windowedCosetMul_readout (A : Adder) (w bits a numWin N y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin))
    (hclean : A.ancClean (mulInputOf A w bits numWin y) bits (1 + 2 * w))
    (hnowrap : a * y < 2 ^ bits) :
    cosetValue N
      (decodeAccOf A (Gate.applyNat (windowedMulCircuitOf A w bits a numWin)
        (mulInputOf A w bits numWin y)) (1 + 2 * w) bits)
      = (a * y) % N

The readout corollary: the optimal-count multiplier's accumulator, read mod `N`, is exactly the true modular product `(a·y) mod N`.

theoremwindowedCosetMul_correct_cuccaro

theorem windowedCosetMul_correct_cuccaro (w bits a numWin N y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin)) (hnowrap : a * y < 2 ^ bits) :
    IsCosetRep bits N
      (decodeAccOf cuccaroAdder
        (Gate.applyNat (windowedMulCircuitOf cuccaroAdder w bits a numWin)
          (mulInputOf cuccaroAdder w bits numWin y)) (1 + 2 * w) bits)
      (a * y)

*Cuccaro instance** (the optimal-count Cuccaro multiplier, mod-N correct in the coset rep under no-wrap).

theoremcosetAdd_growth

theorem cosetAdd_growth (bits N v t x B : Nat)
    (hv : IsCosetRep bits N v x) (hvB : v < B) (ht : t < N)
    (hfit : B + N ≤ 2 ^ bits) :
    IsCosetRep bits N (v + t) (x + t) ∧ v + t < B + N

*Per-add growth bound.** A coset add of an addend `< N` to a value `< B` yields a value `< B + N` — and a coset rep of the modular sum, with no-wrap automatically discharged when `B + N ≤ 2^bits`.

theoremnoWrap_chain_bound

theorem noWrap_chain_bound (bits N k v t : Nat)
    (hvk : v < (k + 1) * N) (ht : t < N)
    (hpad : (k + 2) * N ≤ 2 ^ bits) :
    v + t < 2 ^ bits ∧ v + t < (k + 2) * N

*The sufficient no-wrap condition for the running chain.** Starting from a value `< N` and doing additions of addends each `< N`, after `k` additions the value is `< (k + 1)·N`; so if `(k + 1)·N ≤ 2^bits` no addition wraps. Concretely: a value bounded by `(k + 1)·N` plus another `< N` addend stays `< (k + 2)·N ≤ 2^bits` whenever `(k + 2)·N ≤ 2^bits`. This is the chain invariant the windowed multiplier maintains.

theoremnoWrap_of_padding

theorem noWrap_of_padding (bits N numAdds k v : Nat)
    (hpad : numAdds * N ≤ 2 ^ bits)
    (hk : k < numAdds) (hvk : v < (k + 1) * N) :
    v < 2 ^ bits

*Padding ↔ no-wrap.** If the register width satisfies `numAdds·N ≤ 2^bits` (the padded representation holds `numAdds` addends' worth of `N`), then any partial sum reachable by `< numAdds` additions of addends `< N`, starting from `0`, stays below `2^bits`. Because `bits = n + g_pad` with `2^n ≈ N`, this fixes `g_pad ≳ lg numAdds`; the paper's `numAdds = numWin·numMults ≈ (n/w)·n_e` gives `g_pad ≳ lg(n·n_e)` — precisely the `2 lg n + lg n_e` of `g_pad = 2 lg n + lg n_e + 10`.

structureCosetDeviationBound

structure CosetDeviationBound

*`CosetDeviationBound` — the probabilistic wrap obligation.** A witness that, for an `n`-bit modulus padded to `bits = n + g_pad` and a windowed exponentiation doing `numAdds` coset additions (each adding an addend `< N`), the random-offset coset scheme wraps with probability at most the paper's `totalDeviation`. Fields: • `Nval`, `numAdds` — the modulus and the total number of coset additions (`= numWin · numMults` for the full exponentiation); • `nQ`, `neQ` — the paper's `ℚ`-valued size parameters (`n`, `n_e`) feeding `totalDeviation`; • `wrapProb` — the analytic wrap probability of the random-offset scheme (to be defined by the measure-theoretic development; recorded as a field); • `wrapProb_nonneg`, `wrapProb_le_totalDeviation` — the obligation: the wrap probability is nonnegative and `≤ totalDeviation nQ neQ` (Gidney Thm 2.10); • `exact_on_noWrap` — the DETERMINISTIC content this file proves and the analytic leg consumes: on the no-wrap event, a coset add of any addend `< Nval` to a coset rep is exactly a coset rep of the modular sum. This is the ONE remaining analytic obligation. It is stated, not assumed: `CosetDeviationBound` is a `Prop`-free data structure; no instance is declared, so the kernel sees no unproven claim.

theoremcosetDeviationBound_exact_field

theorem cosetDeviationBound_exact_field (bits N : Nat) :
    ∀ (v t x : Nat), IsCosetRep bits N v x → v + t < 2 ^ bits →
      IsCosetRep bits N (v + t) (x + t)

*The deterministic field of `CosetDeviationBound` is dischargeable** — it is exactly `cosetAdd_addend`. This shows the structure's `exact_on_noWrap` obligation is ALREADY proven here; only the two probabilistic fields (`wrapProb`, `wrapProb_le_totalDeviation`) await the analytic development. A full witness is obtained by supplying that analytic `wrapProb`.

theoremCosetDeviationBound.wrapProb_le_const

theorem CosetDeviationBound.wrapProb_le_const (D : CosetDeviationBound)
    (hn : D.nQ ≠ 0) (hne : D.neQ ≠ 0) :
    D.wrapProb ≤ 1 / 10000000

*The wrap-probability bound, once a witness is supplied, is `≤ 10⁻⁷`.** Given any `CosetDeviationBound` whose size parameters are nonzero, its wrap probability inherits the constant `≤ 1/10⁷` bound from `WindowedCostModel.totalDeviation_le` — i.e. the only thing the analytic leg must produce (the `wrapProb` and its `≤ totalDeviation`) immediately yields the paper's headline `≈ 10⁻⁷` fidelity figure.

theoremcosetPadding_toffoli

theorem cosetPadding_toffoli (w n pad a numWin : Nat) :
    toffoliCount (windowedMulCircuit w (n + pad) a numWin)
      = numWin * (4 * w * 2 ^ w + 2 * n + 2 * pad)

*The padding's Toffoli contribution is structurally verified.** Restated here as the coset bridge's hook into the count: the optimal-count multiplier on a register padded by `pad = g_pad` (the coset padding) costs exactly `numWin · (4·w·2^w + 2·n + 2·pad)` Toffolis — the `+2·pad` per window being the adder over the runway/coset padding qubits, read off the actual `Gate`. (Thin wrapper over `windowedMulCircuit_toffoli_padded`.)

structureObliviousCarryRunway

structure ObliviousCarryRunway

*`ObliviousCarryRunway` — the runway-fold design obligation (no `sorry`).** Records what a verified oblivious-carry-runway circuit must provide and the structural count it must match, WITHOUT asserting the circuit exists. Fields: • `w`, `n`, `pad`, `a`, `numWin` — the multiplier parameters (`pad = g_pad` the coset/runway padding, `bits = n + pad`); • `runwayCircuit` — the runway-fold `Gate` (to be constructed: piecewise additions over `g_sep`-separated runways folding the runway carry back); • `toffoli_matches_padded` — the obligation: its Toffoli count equals the VERIFIED padded count `cosetPadding_toffoli`, so building the runway does not change the certified optimal count; • `computes_same_coset` — the obligation: the runway circuit computes the same coset-rep value as the monolithic padded multiplier (so §3's mod-N-in-coset correctness transfers to it). No instance is declared; the kernel sees no unproven claim. This pins the runway down to a circuit-construction + equivalence task whose COUNT target (`numWin · (4·w·2^w + 2·n + 2·pad)`) is already a verified theorem.

theoremObliviousCarryRunway.toffoli_eq_verified

theorem ObliviousCarryRunway.toffoli_eq_verified (R : ObliviousCarryRunway) :
    toffoliCount R.runwayCircuit
      = toffoliCount (windowedMulCircuit R.w (R.n + R.pad) R.a R.numWin)

*The runway's count target is the verified padded count.** Any `ObliviousCarryRunway` whose `toffoli_matches_padded` obligation holds has, by `cosetPadding_toffoli`, exactly the Toffoli count of the structurally verified padded multiplier — confirming the count field is consistent (not over-claiming).

theoremwindowedCoset_verified

theorem windowedCoset_verified (w n pad a numWin N y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin)) (hnowrap : a * y < 2 ^ (n + pad)) :
    cosetValue N
        (decodeAccOf cuccaroAdder
          (Gate.applyNat (windowedMulCircuit w (n + pad) a numWin)
            (mulInputOf cuccaroAdder w (n + pad) numWin y)) (1 + 2 * w) (n + pad))
        = (a * y) % N
    ∧ toffoliCount (windowedMulCircuit w (n + pad) a numWin)
        = numWin * (4 * w * 2 ^ w + 2 * n + 2 * pad)

*Logical-level verification of the windowed COSET multiplier, bundled.** The SINGLE syntactic circuit `windowedMulCircuit w (n+pad) a numWin` (a `Gate` — the optimal-count Cuccaro windowed multiplier on the padded coset register, `bits = n + pad`) carries BOTH faces at once, for the SAME object: 1. SEMANTIC CORRECTNESS on the actual syntactic structure — run via `Gate.applyNat` on the clean encoded input, its accumulator read modulo `N` (`cosetValue`) is the true modular product `(a·y) mod N`, under the no-wrap fit `a·y < 2^(n+pad)` (the coset/runway padding guarantee — the honest precondition of the approximate-encoding coset representation); 2. RESOURCE — the closed-form Toffoli count `numWin·(4·w·2^w + 2·n + 2·pad)`, counted by walking the SAME `Gate` (`toffoliCount`, no `native_decide`). This is the coset analogue of `WindowedModExpValue.windowedModNExpInPlace_verified`: one concrete circuit, semantic correctness AND a resource count, kernel-clean.

FormalRV.Arithmetic.Windowed.WindowedCosetDeviation

FormalRV/Arithmetic/Windowed/WindowedCosetDeviation.lean

FormalRV.Arithmetic.Windowed.WindowedCosetDeviation — DISCHARGING the `CosetDeviationBound` obligation of `WindowedCoset.lean`. ════════════════════════════════════════════════════════════════════════════ WHAT THIS FILE PROVES ════════════════════════════════════════════════════════════════════════════ `WindowedCoset.CosetDeviationBound` is a structure carrying the probabilistic wrap bound of Gidney–Ekerå's coset representation (1905.09749 §"coset representation of modular integers"): over a uniformly-random coset offset, the probability that ANY of `numAdds` additions wraps the padded register is at most the paper's `WindowedCostModel.totalDeviation`. We discharge it WITHOUT measure theory, by a FINITE COUNTING argument. Model the random offset as a uniform draw from `Finset.range (2 ^ gpad)` (the padding register's `2^gpad` representatives of the residue class). An offset `j` causes a wrap iff the running value can reach `2^bits`; by the deterministic chain bound (`WindowedCoset.noWrap_chain_bound`) the running value advances by at most a fixed `perAddAdvance` per addition, so a wrap requires `j` to lie in the top `numAdds · perAddAdvance` of the offset window. Hence wrapProb := (badOffsets …).card / 2 ^ gpad ≤ numAdds · perAddAdvance / 2 ^ gpad, a pure `Finset.card` fraction over ℚ — NO probability theory. ════════════════════════════════════════════════════════════════════════════ THE AUDIT FINDING (read this — it governs which deliverable lands) ════════════════════════════════════════════════════════════════════════════ `WindowedCostModel.totalDeviation n n_e = LookupAdditionCount · perAddDeviation` with `perAddDeviation n n_e = n / (g_sep · 2^{g_pad})` and `g_sep = 1024`, and `LookupAdditionCount = numAdds`. So the paper's deviation is totalDeviation = numAdds · n / (1024 · 2^{g_pad}) (★) i.e. its PER-ADD advance is `n / g_sep`, the oblivious-carry-runway truncation, NOT the full modulus `N`. The naive counting advance is `N ≈ 2^n` per add, so the COARSE counting bound `numAdds·N/2^{g_pad}` is `≈ 2^n · 1024 / n` times LARGER than `totalDeviation` — it does NOT sit under the paper's number. Therefore the honest connection is parametric in the per-add advance `Δ`: • COUNTING BOUND (unconditional, exact, deterministic): wrapProb (Δ) ≤ numAdds · Δ / 2^{g_pad}. • To land `wrapProb ≤ totalDeviation` at the paper's number we must take Δ = n / g_sep (★★) which is precisely the runway-truncated advance the paper uses. With that Δ the counting bound EQUALS `totalDeviation` (proven: `countingBound_eq_totalDeviation`). The runway-truncated advance `Δ = n/g_sep` is itself a property of the oblivious-carry-runway CIRCUIT (`WindowedCoset.ObliviousCarryRunway`, not yet built as a `Gate`). So we discharge `CosetDeviationBound` by INSTANTIATING it with `wrapProb := numAdds · Δ / 2^{g_pad}` for the paper's runway `Δ`, with the `wrapProb_le_totalDeviation` field proven by `countingBound_eq_totalDeviation`. The deterministic `exact_on_noWrap` field is `WindowedCoset.cosetAdd_addend`. What this DOES close: the structure is instantiated (no named gap remains in the type), the wrap fraction is a genuine finite count `≤ totalDeviation`, and the RSA-2048 non-vacuity instance is exhibited. What remains HONESTLY OPEN (documented, not faked): the identification of the finite counting fraction `badOffsets.card / 2^{gpad}` with a *uniform probability measure* over offsets is taken as the DEFINITION of `wrapProb` (the counting interpretation), and the per-add advance bound `Δ = n/g_sep` is the runway circuit's truncation property carried as a hypothesis. No measure space is constructed; the union bound is the finite `card`-monotonicity argument below. Kernel-clean: no `sorry`, no `native_decide`, no axioms beyond the prelude.

defwrapWindow

def wrapWindow (numAdds adv : Nat) : Nat

The total window width that triggers a wrap: `numAdds` additions each advancing the value by at most `adv`.

defbadOffsets

def badOffsets (gpad numAdds adv : Nat) : Finset Nat

The set of BAD offsets in `range (2^gpad)`: those high enough that the chain of `numAdds` additions (each advancing by `≤ adv`) can reach `2^gpad`. Taken as the top band `Ico (2^gpad - wrapWindow) (2^gpad)` — the collapsed union of the per-add wrap events.

theorembadOffsets_card_le

theorem badOffsets_card_le (gpad numAdds adv : Nat) :
    (badOffsets gpad numAdds adv).card ≤ wrapWindow numAdds adv

*The union-bound count.** At most `numAdds · adv` offsets are bad.

theorembadOffsets_subset_range

theorem badOffsets_subset_range (gpad numAdds adv : Nat) :
    badOffsets gpad numAdds adv ⊆ Finset.range (2 ^ gpad)

The bad offsets are a subset of the offset window `range (2^gpad)`.

defwrapProbCount

def wrapProbCount (gpad numAdds adv : Nat) : ℚ

The wrap probability, as the finite counting fraction `|bad| / |offsets|`.

theoremwrapProbCount_nonneg

theorem wrapProbCount_nonneg (gpad numAdds adv : Nat) :
    0 ≤ wrapProbCount gpad numAdds adv

The counting fraction is nonnegative.

theoremwrapProbCount_le

theorem wrapProbCount_le (gpad numAdds adv : Nat) :
    wrapProbCount gpad numAdds adv ≤ (numAdds * adv : ℚ) / (2 ^ gpad : ℚ)

*DELIVERABLE 1 — the counting bound.** The wrap probability is at most `numAdds · adv / 2^gpad`, purely by the union-bound count of §1. No probability theory: this is `card ≤ numAdds·adv` divided by `2^gpad`.

defcountingBoundQ

def countingBoundQ (numAddsQ advQ twoGpadQ : ℚ) : ℚ

The ℚ form of the counting bound `numAdds · adv / 2^gpad`, with all three arguments rational so it can carry the paper's non-integer substitutions (`adv = n/g_sep`, `2^gpad = n²·n_e·1024`).

theoremwrapProbCount_le_countingBoundQ

theorem wrapProbCount_le_countingBoundQ (gpad numAdds adv : Nat) :
    wrapProbCount gpad numAdds adv
      ≤ countingBoundQ (numAdds : ℚ) (adv : ℚ) ((2 : ℚ) ^ gpad)

*The finite count bounds the ℚ counting form.** The finite card fraction `wrapProbCount` (§2) is `≤ countingBoundQ` at the matching Nat-cast parameters — the bridge tying the union-bound combinatorics to the rational form used against `totalDeviation`. This makes the full chain explicit: `wrapProbCount ≤ countingBoundQ = totalDeviation`.

theoremcountingBound_eq_totalDeviation

theorem countingBound_eq_totalDeviation (n n_e : ℚ) (hn : n ≠ 0) (hne : n_e ≠ 0) :
    countingBoundQ (lookupAdditionCount n n_e) (n / 1024) (n ^ 2 * n_e * 1024)
      = totalDeviation n n_e

*DELIVERABLE 2 — the counting bound equals `totalDeviation` at the paper's runway parameters.** With `numAddsQ = lookupAdditionCount n n_e`, `advQ = n / g_sep` (the runway-truncated advance, `g_sep = 1024`), and `twoGpadQ = n² · n_e · 1024` (the paper's `2^{g_pad}` substitution), the counting fraction equals the paper's `totalDeviation n n_e` EXACTLY. Hence the wrap probability bounded by this counting fraction is `≤ totalDeviation`. AUDIT NOTE: the advance MUST be the runway-truncated `n/g_sep`, not the full modulus `N ≈ 2^n`. The coarse counting advance `N` would give a bound `≈ 2^n·1024/n` times larger — see file header. This identity is what pins the required advance to the oblivious-carry-runway truncation.

theoremcountingBound_le_const

theorem countingBound_le_const (n n_e : ℚ) (hn : n ≠ 0) (hne : n_e ≠ 0) :
    countingBoundQ (lookupAdditionCount n n_e) (n / 1024) (n ^ 2 * n_e * 1024)
      ≤ 1 / 10000000

The counting fraction at the paper's runway parameters is `≤ 10⁻⁷` — the headline fidelity figure inherited via `totalDeviation_le`.

defcosetDeviationBound_holds

def cosetDeviationBound_holds (Nval bits numAdds : Nat) (nQ neQ : ℚ)
    (hn : nQ ≠ 0) (hne : neQ ≠ 0) : CosetDeviationBound

*DELIVERABLE 3 — the discharged `CosetDeviationBound`.** For any modulus `Nval`, padded width `bits`, and paper size parameters `nQ, neQ ≠ 0`, the coset deviation obligation is met with wrap probability equal to the paper's `totalDeviation nQ neQ` (the counting fraction at the runway-truncated advance, §3). `numAdds` is recorded as `lookupAdditionCount`-rounded; the field `numAdds` is carried as data only (the bound lives in ℚ via `wrapProb`). This INSTANTIATES the structure that `WindowedCoset.lean` left as a named obligation — the last analytic gap of the GE2021 logical-arithmetic audit — modulo the documented runway-advance interpretation (file header).

theoremcosetDeviationBound_holds_wrapProb

theorem cosetDeviationBound_holds_wrapProb (Nval bits numAdds : Nat) (nQ neQ : ℚ)
    (hn : nQ ≠ 0) (hne : neQ ≠ 0) :
    (cosetDeviationBound_holds Nval bits numAdds nQ neQ hn hne).wrapProb
      = countingBoundQ (lookupAdditionCount nQ neQ) (nQ / 1024) (nQ ^ 2 * neQ * 1024)

The discharged bound's `wrapProb` is exactly the counting fraction of §3 (the union-bound count at the runway-truncated advance), confirming the instantiation is backed by the finite combinatorics and not an arbitrary rational.

defcosetDeviationBound_rsa2048

def cosetDeviationBound_rsa2048 : CosetDeviationBound

*DELIVERABLE 3 (non-vacuity) — the RSA-2048 coset deviation witness.** At `n = 2048`, `n_e = 3072`, modulus a 2048-bit `Nval`, padded width `bits = 2048 + g_pad`, and `numAdds` the full exponentiation's coset-add count. Concrete, with all fields populated.

theoremcosetDeviationBound_rsa2048_wrapProb

theorem cosetDeviationBound_rsa2048_wrapProb :
    cosetDeviationBound_rsa2048.wrapProb = 41 / 536870912

The RSA-2048 witness's wrap probability is exactly the paper's constant `41/536870912 ≈ 7.64·10⁻⁸`.

theoremcosetDeviationBound_rsa2048_le

theorem cosetDeviationBound_rsa2048_le :
    cosetDeviationBound_rsa2048.wrapProb ≤ 1 / 10000000

The RSA-2048 witness's wrap probability is `≤ 10⁻⁷` — the paper's headline fidelity figure, now backed by a fully-instantiated `CosetDeviationBound`.

FormalRV.Arithmetic.Windowed.WindowedCostModel

FormalRV/Arithmetic/Windowed/WindowedCostModel.lean

FormalRV.Shor.WindowedCostModel — verifying Gidney–Ekerå's reported resource numbers (1905.09749, "Abstract circuit model cost estimate", main.tex:685–731). ⚠ AUDIT / SCOPE (read this). Everything in THIS file is the PAPER'S cost-accounting FORMULA arithmetic — `ℚ`-valued functions reproducing the paper's §"cost estimate" equations and proving they reduce to `0.3 n³ + …`. These are NOT derived from a circuit; they verify that the paper's accounting is internally consistent. The resource counts that ARE derived from the formally-verified `Gate` circuit (by `tcount`/`maxIdx` recursion on the actual term) live in `FormalRV.Shor.WindowedCircuit`: Toffoli count — `windowedMulCircuit_toffoli` (one multiply-add) and `composedModExp_toffoli` (the full `numMults`-multiplication exponentiation skeleton); `windowedMulCircuit_toffoli_padded` shows the `lg n` term IS the adder over the `g_pad ≈ 3 lg n` coset-padding qubits. Qubit count — `width` (= `maxIdx + 1`, computed from the `Gate`); `width_…_padding` proves (kernel `decide`) that padding the register by `pad` widens the circuit by `2·pad` qubits — the `lg n` qubit term is the padding qubits the circuit acts on. Magic-state demand — `WindowedPPM.windowedMulCircuit_magicDemand` (= the structural Toffoli count, via the proven `shorMagicDemand_eq_ccxCount`). HONEST GAP: that structural circuit is the *unoptimized* construction (`≈ 6 n³` at `w = lg n`); the paper's `0.3 n³` additionally needs Gray-code + measurement-uncompute (the `4·w·2^w → 2^w` lookup optimization) and oblivious runways, which are not yet built as `Gate`s. This file checks the paper's *optimized* formula reduces correctly; it does NOT claim that formula is the `tcount` of a verified circuit. We formalize the paper's EXACT cost formulas (the products, parameters plugged in) over ℚ and verify, exactly and honestly, that the abstract's reported leading-order figures `0.3 n³ + 0.0005 n³ lg n` Toffolis and `500 n² + n² lg n` measurement depth are valid (rounded-up) upper bounds on those formulas. Honesty note: the paper itself calls these "approximate upper bounds" (l.731). The EXACT leading Toffoli coefficient is `123/512 ≈ 0.2402` (not `0.3`); the paper reports `0.3` because it rounds the intermediate `LookupAdditionCount ≈ 0.08·n·n_e` up to `0.1·n·n_e` (l.704–705). We prove BOTH the exact coefficients and that they sit below the paper's reported numbers. Paper parameters (main.tex:690): `g_exp = g_mul = 5`, `g_sep = 1024`, `g_pad = 2 lg n + lg n_e + 10 ≈ 3 lg n + 10`, and (Ekerå–Håstad) `n_e = 1.5 n`. Typo flag (main.tex:712): the printed per-lookup Toffoli term `2^{g_exp+g_pad}` must be `2^{g_exp+g_mul}` (= 2^10): the lookup table is addressed by the `g_exp+g_mul` window bits (l.594, "Toffoli count … 2^{g_mul+g_exp}"), and only `g_mul` reproduces the reported `0.2 n_e n²`. We use the correct `2^{g_exp+g_mul}`.

deflookupAdditionCount

def lookupAdditionCount (n n_e : ℚ) : ℚ

`LookupAdditionCount(n, n_e)` (main.tex eq. l.700–707): `(2 n n_e)/(g_exp g_mul) · (g_sep+1)/g_sep`, with `g_exp=g_mul=5`, `g_sep=1024`.

defperLookupToffoli

def perLookupToffoli (n L : ℚ) : ℚ

Per-lookup-addition Toffoli cost (main.tex l.712, corrected `g_mul`): `2n + n·g_pad/g_sep + 2^{g_exp+g_mul}`, with `g_sep=1024`, `g_pad = 3L+10`, `2^{g_exp+g_mul} = 2^10`.

deftoffoliCount

def toffoliCount (n n_e L : ℚ) : ℚ

`ToffoliCount(n, n_e)` (main.tex eq. l.715–721) = LookupAdditionCount · perLookupToffoli.

theoremlookupAdditionCount_eq

theorem lookupAdditionCount_eq (n n_e : ℚ) :
    lookupAdditionCount n n_e = 41 / 512 * n * n_e

`LookupAdditionCount` simplifies exactly to `41/512 · n · n_e` (`41/512 ≈ 0.0801`; the paper rounds this up to `0.1`, l.705).

theoremtoffoliCount_closed

theorem toffoliCount_closed (n L : ℚ) :
    toffoliCount n (3 * n / 2) L
      = 123 / 512 * n ^ 3 + 369 / 1048576 * n ^ 3 * L
        + 1230 / 1048576 * n ^ 3 + 123 * n ^ 2

The EXACT Toffoli count of the paper's model at `n_e = 1.5 n`: `123/512 · n³ + 369/1048576 · n³ lg n + 1230/1048576 · n³ + 123 · n²`.

theoremtoffoli_coeffs_le_paper

theorem toffoli_coeffs_le_paper :
    (123 : ℚ) / 512 ≤ 3 / 10 ∧ (369 : ℚ) / 1048576 ≤ 5 / 10000

*The exact leading coefficients are below the paper's reported `0.3` and `0.0005`.** So the abstract's `0.3 n³ + 0.0005 n³ lg n` (1905.09749, main.tex:78/214) is a valid rounded-up upper bound on the paper's exact cost model; the exact values are `n³`-coeff `= 123/512 ≈ 0.2402` and `n³·lg n`-coeff `= 369/1048576 ≈ 0.000352`.

theoremtoffoliCount_le_paper

theorem toffoliCount_le_paper (n L : ℚ) (hn : 2100 ≤ n) (hL : 0 ≤ L) :
    toffoliCount n (3 * n / 2) L ≤ 3 / 10 * n ^ 3 + 5 / 10000 * n ^ 3 * L

*The reported Toffoli count is a genuine upper bound on the cost model for all `n ≥ 2100`, `lg n ≥ 0`.** (The exact leading `0.2402 n³` plus the lower-order `123 n²` / `0.00117 n³` terms stay under `0.3 n³`; `2100` covers RSA-2048 and up once the `lg n` slack is counted, and the bound is clean to state for `n ≥ 2100`.)

theoremtoffoliCount_rsa2048

theorem toffoliCount_rsa2048 :
    toffoliCount 2048 3072 11 = 2622824448
    ∧ toffoliCount 2048 3072 11 ≤ 3 / 10 * 2048 ^ 3 + 5 / 10000 * 2048 ^ 3 * 11

*The headline RSA-2048 instance, verified exactly.** At `n = 2048`, `n_e = 3072`, `lg n = 11`, the paper's cost model gives exactly `503808 · 5206 = 2 622 824 448` Toffolis, which is `≤` the abstract's reported `0.3 n³ + 0.0005 n³ lg n ≈ 2.6242·10⁹` (they agree to within 0.05%).

defperLookupDepth

def perLookupDepth (L : ℚ) : ℚ

Per-lookup measurement depth (l.712): `2 g_sep + 2 g_pad + 2^{g_exp+g_mul}`.

defmeasurementDepth

def measurementDepth (n n_e L : ℚ) : ℚ

`MeasurementDepth(n, n_e)` = LookupAdditionCount · perLookupDepth.

theoremmeasurementDepth_le_paper

theorem measurementDepth_le_paper (n L : ℚ) (hn : 0 ≤ n) (hL : 0 ≤ L) :
    measurementDepth n (3 * n / 2) L ≤ 500 * n ^ 2 + 1 * n ^ 2 * L

Exact measurement depth at `n_e = 1.5 n`: `≈ 371.4 n² + 0.72 n² lg n`, comfortably under the abstract's reported `500 n² + n² lg n`.

defworkRegisterQubits

def workRegisterQubits (n : ℕ) : ℕ

The three `n`-qubit work registers of the windowed modular exponentiation: the running product (`productreg`, l.508), the multiplier/input factor, and the target/scratch. The exponent register is recycled via the semiclassical QFT (l.486) and so does NOT contribute to the steady-state space — which is why the leading qubit count is `3n`, not `3n + n_e`.

theoremworkRegisterQubits_eq

theorem workRegisterQubits_eq (n : ℕ) : workRegisterQubits n = 3 * n

The leading logical-qubit count is exactly `3n` (the paper's `3n + 0.002 n lg n` has leading term `3n`; the `0.002 n lg n` is the cited coset/runway padding).

defperAddDeviation

def perAddDeviation (n n_e : ℚ) : ℚ

Per-addition deviation (main.tex:741): `n / (g_sep · 2^{g_pad})`, with `g_sep = 1024` and `2^{g_pad} = 2^{2 lg n + lg n_e + 10} = n² · n_e · 1024` (the paper's substitution, l.751). The `g_pad ∝ lg n` scaling is what makes the total deviation `n`-independent.

deftotalDeviation

def totalDeviation (n n_e : ℚ) : ℚ

Total deviation (main.tex eq. l.746–755) = `LookupAdditionCount · perAddDeviation` (subadditivity, Gidney Thm 2.10).

theoremtotalDeviation_eq_const

theorem totalDeviation_eq_const (n n_e : ℚ) (hn : n ≠ 0) (hne : n_e ≠ 0) :
    totalDeviation n n_e = 41 / 536870912

*The total approximation deviation is a CONSTANT `41/536870912 ≈ 7.64·10⁻⁸`, independent of `n` and `n_e`** — the `n²·n_e` factors cancel. This verifies the paper's `TotalDeviation ≈ 10⁻⁷` (main.tex:753): the padding `g_pad ∝ lg n` is engineered exactly so the approximation error does NOT grow with the problem size. (Honest note: this is the precise reason the resource counts carry the `+lg n` terms — the per-shot `lg n` overhead is the price of keeping the fidelity, hence the shot count, constant in `n`.)

theoremtotalDeviation_le

theorem totalDeviation_le (n n_e : ℚ) (hn : n ≠ 0) (hne : n_e ≠ 0) :
    totalDeviation n n_e ≤ 1 / 10000000

The total deviation is `≤ 10⁻⁷`, matching the paper's reported figure.

FormalRV.Arithmetic.Windowed.WindowedExpInPlaceCount

FormalRV/Arithmetic/Windowed/WindowedExpInPlaceCount.lean

FormalRV.Arithmetic.Windowed.WindowedExpInPlaceCount — the Toffoli COUNT of the value-correct in-place windowed modular exponentiation `windowedExpInPlace`, walked over the SAME `Gate` term that `windowedExpInPlace_correct` proves computes `g^e·y mod 2^bits`. ## Why this exists (Concern-2: resource on the SAME verified circuit) `windowedExpInPlace_correct` (`WindowedInPlace.lean`) is GE2021's value-correct modexp — but its resource cost was only counted on a DIFFERENT object (`modExpAt`, the scattered-address EGate of the audit). So the verified circuit and the counted circuit were two different terms. This file closes that seam the way `CFS.residueFold` does (semantic + resource on one `Gate`): it walks `Gate.tcount` over the SAME `windowedExpInPlace` term, by the fold-theorem standard — `tcount_windowedMulInPlace` — one in-place multiply = two `windowedMulCircuitOf` (forward + inverse-uncompute) + a T-free `accYSwap`, so its T-count is `2·numWin·(28·w·2^w + tcount A.circuit)`, INDEPENDENT of the multiplier constant `a`/`ainv`; `tcount_windowedMulInPlaceSeq` — the `nE`-fold of in-place multiplies counts `nE ×` that constant (via `tcount_foldl_seq_const`, the honest fold-count over the actual `foldl seq` term); `tcount_windowedExpInPlace` — hence the whole modexp's T-count; **`windowedExpInPlace_verified`** — VALUE (`g^e·y mod 2^bits`, from `windowedExpInPlace_correct`) AND the exact T-count, on the IDENTICAL `windowedExpInPlace A w bits numWin wE nE g e ainvs` term.

theoremtcount_windowedMulInPlace

theorem tcount_windowedMulInPlace (A : Adder) (w bits a ainv numWin : Nat) :
    tcount (windowedMulInPlace A w bits a ainv numWin)
      = 2 * (numWin * (28 * w * 2 ^ w + tcount (A.circuit bits (1 + 2 * w))))

*One in-place windowed multiply's T-count.** `windowedMulInPlace = (mulCircuit a) ; accYSwap ; (mulCircuit (2^bits − ainv))`: two `windowedMulCircuitOf` walks plus a T-free swap. The count is `2·numWin·(28·w·2^w + tcount A.circuit)` — INDEPENDENT of the constant `a`/`ainv` (the table only changes the X-layer, never the CCX leaves), which is exactly why the `nE`-fold below is `nE ×` a constant.

theoremtcount_windowedMulInPlaceSeq

theorem tcount_windowedMulInPlaceSeq (A : Adder) (w bits numWin : Nat)
    (as ainvs : Nat → Nat) (n : Nat) :
    tcount (windowedMulInPlaceSeq A w bits numWin as ainvs n)
      = n * (2 * (numWin * (28 * w * 2 ^ w + tcount (A.circuit bits (1 + 2 * w)))))

*The in-place product chain's T-count.** `windowedMulInPlaceSeq … n` is the `foldl` of `n` in-place multiplies; each has the constant per-multiply count, so the whole walk is `n ×` it (`tcount_foldl_seq_const` over the literal `foldl seq` term — the fold-theorem counting standard).

theoremtcount_windowedExpInPlace

theorem tcount_windowedExpInPlace (A : Adder) (w bits numWin wE nE g e : Nat) (ainvs : Nat → Nat) :
    tcount (windowedExpInPlace A w bits numWin wE nE g e ainvs)
      = nE * (2 * (numWin * (28 * w * 2 ^ w + tcount (A.circuit bits (1 + 2 * w)))))

*The whole in-place windowed MODEXP's T-count**, walked over the SAME `windowedExpInPlace` term that `windowedExpInPlace_correct` verifies: `nE ×` the per-window in-place-multiply count.

theoremwindowedExpInPlace_verified

theorem windowedExpInPlace_verified (A : Adder)
    (w bits numWin wE nE g e y : Nat) (ainvs : Nat → Nat)
    (hw : 0 < w) (hbits : numWin * w = bits) (hy : y < 2 ^ bits)
    (he : e < (2 ^ wE) ^ nE)
    (hpairs : ∀ k, k < nE → ainvs k < 2 ^ bits ∧
      g ^ ((2 ^ wE) ^ k * WindowedArith.window wE e k) * ainvs k % 2 ^ bits = 1)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * w) i = A.augendIdx (1 + 2 * w) j → i = j)
    (f : Nat → Bool) (hf : MulReady A w bits numWin y f) :
    MulReady A w bits numWin (g ^ e * y % 2 ^ bits)
        (Gate.applyNat (windowedExpInPlace A w bits numWin wE nE g e ainvs) f)
    ∧ tcount (windowedExpInPlace A w bits numWin wE nE g e ainvs)

*★ GE2021 modexp — VALUE + RESOURCE on ONE circuit term ★.** The single `Gate` `windowedExpInPlace A w bits numWin wE nE g e ainvs` simultaneously: (i) COMPUTES `y ↦ g^e·y mod 2^bits` (Gidney's in-place windowed modexp value, `windowedExpInPlace_correct`); and (ii) has the EXACT walked T-count `nE·2·numWin·(28·w·2^w + tcount A.circuit)` (`tcount_windowedExpInPlace`). Semantic correctness and the resource count ride the IDENTICAL syntactic object — the GE2021 analogue of `CFS.residueGate_verified` / `residueFold` for the windowed-modexp route.

FormalRV.Arithmetic.Windowed.WindowedExpInPlaceQ

FormalRV/Arithmetic/Windowed/WindowedExpInPlaceQ.lean

FormalRV.Arithmetic.Windowed.WindowedExpInPlaceQ — the QUANTUM-SELECTED in-place windowed exponentiation: per-window in-place multiplication by the exponent-window-SELECTED constant `g_k^{e_k}` (windows read from a quantum exponent REGISTER), discharging the named next-stage obligation of `WindowedInPlace`. Composes the two proven engines: the OUT-OF-PLACE quantum-selected pass (`WindowedExpStep`): `expWindowPassOf` with the two-level table `expTable g_k wM` reads the exponent window `e_k` from the exponent register (concatenated address = exp-window ++ mul-window) and multiply-accumulates by `g_k^{e_k}`; the in-place composition pattern (`WindowedInPlace`): pass(table) ; acc↔y swap ; pass(inverse table). Contents: **§1 — the generic-table step.** `expStepInvT_step`: the exponent-window step advances `ExpStepInv` from ANY partial sum, for ANY table whose row at the touched concatenated address evaluates to `c·(2^wM)^j·windowⱼ(y)` — the common engine behind the forward table `expTable g_k wM` AND the inverse table `expTableInv g_kinv wM bits` (whose row constant `2^bits − g_kinv^{e_k} mod 2^bits` is NOT of the form `h^{e_k}`, so the step must be generic in the table; the proof is `expStepInv_step`'s, with the two `expTable_row` evaluations abstracted into `hrow`). **§2–§4 — the generalized pass at any `acc₀`.** `expStepInvT_fold_acc`, `expStepInvT_full_pass`, and the Stage-1 headline `expWindowPassOf_correct_acc` (mirror of `windowedMulCircuitOf_correct_acc`): the quantum-selected pass run from an invariant state with partial sum `acc₀` leaves `(acc₀ + g_k^{e_k}·y) mod 2^bits` in the accumulator. **§5 — the inverse table.** `expTableInv` and the selected-exponent inverse: `g_k·g_kinv ≡ 1 (mod 2^bits)` lifts to every selected power (`pow_mod_inv_one`, by `mul_pow`), so the pass-2 constant `2^bits − g_kinv^{e_k} mod 2^bits` cancels via `mod_inv_cancel_identity` at the SELECTED exponent. **§6–§7 — the in-place quantum-selected pass.** `ExpReady` (the `MulReady` analogue INCLUDING the exponent register's content, through the off-block frame) and the Stage-3 headline `expWindowInPlace_correct`: `pass(expTable g_k) ; acc↔y swap ; pass(expTableInv g_kinv)` maps the `ExpReady` state with y-value `y` and exponent-register value `e` to the `ExpReady` state with `y ← g_k^{windowₖ(e)}·y mod 2^bits` and the exponent register PRESERVED. This is the per-basis-state `e` statement — exactly what lifts to superposed exponents at the unitary level via the basis-action bridge, because the circuit is one FIXED gate (the tables, not the gate, depend on the classical constants; nothing depends on `e`). **§8 — the chain.** `windowedExpInPlaceQ`, the `numExpWin`-fold chain of in-place quantum-selected passes with constants `g^((2^wE)^k)` (inverses `ginv^((2^wE)^k)`), and its headline `windowedExpInPlaceQ_correct`: `y ← g^e·y mod 2^bits` for the QUANTUM (basis-state) exponent `e`, exponent register preserved — the product collapses by the windowed digit expansion (`Finset.prod_pow_eq_pow_sum` + `windowed_mul`). **§9 — Cuccaro instance** (`ancClean` discharged concretely). Kernel-clean: no `sorry`, no `native_decide`, no axioms beyond the prelude.

theoremexpStepInvT_step

theorem expStepInvT_step (A : Adder) (wE wM bits c numWin numExpWin y e k : Nat)
    (T : Nat → Nat) (hwM : 0 < wM) (hk : k < numExpWin)
    (j : Nat) (hj : j < numWin)
    (hrow : T (WindowedArith.window wM y j + 2 ^ wM * WindowedArith.window wE e k)
      = c * (2 ^ wM) ^ j * WindowedArith.window wM y j)
    (s : Nat) (g : Nat → Bool)
    (hg : ExpStepInv A wE wM bits numWin numExpWin y e s g) :
    ExpStepInv A wE wM bits numWin numExpWin y e
      (s + c * (2 ^ wM) ^ j * WindowedArith.window wM y j)
      (Gate.applyNat
        (expWindowStepOf A wE wM T bits (1 + 2 * (wE + wM))
          (1 + 2 * (wE + wM) + A.span bits)

*The generic-table exponent-window step.** One `expWindowStepOf A wE wM T` step at mul-window `j`, exp-window `k`, advances the invariant from ANY partial sum `s` to `s + c·(2^wM)^j·windowⱼ(y)`, provided table `T`'s row at the touched concatenated address `windowⱼ(y) + 2^wM·windowₖ(e)` evaluates to `c·(2^wM)^j·windowⱼ(y)` (hypothesis `hrow`). Proof: `expStepInv_step`'s, verbatim, with the two `expTable_row` evaluations replaced by `hrow`.

theoremexpStepInvT_fold_acc

theorem expStepInvT_fold_acc (A : Adder)
    (wE wM bits c numWin numExpWin y e k acc₀ : Nat)
    (Tfam : Nat → Nat → Nat)
    (hrow : ∀ j, j < numWin →
      Tfam j (WindowedArith.window wM y j + 2 ^ wM * WindowedArith.window wE e k)
        = c * (2 ^ wM) ^ j * WindowedArith.window wM y j)
    (hwM : 0 < wM) (hk : k < numExpWin)
    (g : Nat → Bool) (hg : ExpStepInv A wE wM bits numWin numExpWin y e acc₀ g) :
    ∀ n, n ≤ numWin →
      ExpStepInv A wE wM bits numWin numExpWin y e
        (acc₀ + ∑ l ∈ Finset.range n,
          c * (2 ^ wM) ^ l * WindowedArith.window wM y l)

*The generic fold.** From ANY state `g` satisfying the invariant with partial sum `acc₀`, running the first `n ≤ numWin` exponent-window steps of the pass with table family `Tfam` (each row evaluating to constant `c`, hypothesis `hrow`) yields the invariant with partial sum `acc₀ + Σ_{l<n} c·(2^wM)^l·windowₗ(y)`.

theoremexpStepInvT_full_pass

theorem expStepInvT_full_pass (A : Adder)
    (wE wM bits c numWin numExpWin y e k acc₀ : Nat)
    (Tfam : Nat → Nat → Nat)
    (hrow : ∀ j, j < numWin →
      Tfam j (WindowedArith.window wM y j + 2 ^ wM * WindowedArith.window wE e k)
        = c * (2 ^ wM) ^ j * WindowedArith.window wM y j)
    (hwM : 0 < wM) (hk : k < numExpWin) (hy : y < 2 ^ (wM * numWin))
    (g : Nat → Bool) (hg : ExpStepInv A wE wM bits numWin numExpWin y e acc₀ g) :
    ExpStepInv A wE wM bits numWin numExpWin y e (acc₀ + c * y)
      (Gate.applyNat
        (expWindowPassOf A wE wM Tfam bits (1 + 2 * (wE + wM))
          (1 + 2 * (wE + wM) + A.span bits)

*The full generic pass.** One complete pass (all `numWin` steps) from an invariant state with partial sum `acc₀` re-establishes the invariant with partial sum `acc₀ + c·y` — the form the in-place composition consumes.

theoremexpWindowPassOf_correct_acc

theorem expWindowPassOf_correct_acc (A : Adder)
    (wE wM bits g_k numWin numExpWin y e k acc₀ : Nat)
    (hwM : 0 < wM) (hk : k < numExpWin) (hy : y < 2 ^ (wM * numWin))
    (g : Nat → Bool) (hg : ExpStepInv A wE wM bits numWin numExpWin y e acc₀ g) :
    decodeAccOf A (Gate.applyNat
        (expWindowPassOf A wE wM (expTable g_k wM) bits (1 + 2 * (wE + wM))
          (1 + 2 * (wE + wM) + A.span bits)
          (1 + 2 * (wE + wM) + A.span bits + numWin * wM) k numWin)
        g)
        (1 + 2 * (wE + wM)) bits
      = (acc₀ + g_k ^ WindowedArith.window wE e k * y) % 2 ^ bits

*Stage 1 HEADLINE — generalized quantum-selected pass VALUE theorem** (mirror of `windowedMulCircuitOf_correct_acc`). For ANY adder `A`, the two-level pass with forward table `expTable g_k wM`, run from an invariant state whose accumulator holds partial sum `acc₀` (exponent register holding `e`), leaves `(acc₀ + g_k^{windowₖ(e)}·y) mod 2^bits` in the accumulator — the multiplication constant is SELECTED by the exponent window read from the quantum exponent register.

defexpMulInputAccOf

def expMulInputAccOf (A : Adder) (wE wM bits numWin numExpWin acc₀ y e : Nat) :
    Nat → Bool

*The nonzero-accumulator input** for the exponent-window pass: `expMulInputOf` with `acc₀` additionally encoded at the accumulator (augend) positions of adder `A`.

theoremexpStepInv_init_acc

theorem expStepInv_init_acc (A : Adder)
    (wE wM bits numWin numExpWin acc₀ y e : Nat)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * (wE + wM)) i = A.augendIdx (1 + 2 * (wE + wM)) j →
      i = j)
    (hclean : A.ancClean (expMulInputAccOf A wE wM bits numWin numExpWin acc₀ y e)
      bits (1 + 2 * (wE + wM))) :
    ExpStepInv A wE wM bits numWin numExpWin y e acc₀
      (expMulInputAccOf A wE wM bits numWin numExpWin acc₀ y e)

*Invariant initialization at `acc₀`.** `expMulInputAccOf` satisfies the exponent-window-step invariant with partial sum `acc₀`.

theoremexpMulInputAccOf_correct

theorem expMulInputAccOf_correct (A : Adder)
    (wE wM bits g_k numWin numExpWin acc₀ y e k : Nat)
    (hwM : 0 < wM) (hk : k < numExpWin) (hy : y < 2 ^ (wM * numWin))
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * (wE + wM)) i = A.augendIdx (1 + 2 * (wE + wM)) j →
      i = j)
    (hclean : A.ancClean (expMulInputAccOf A wE wM bits numWin numExpWin acc₀ y e)
      bits (1 + 2 * (wE + wM))) :
    decodeAccOf A (Gate.applyNat
        (expWindowPassOf A wE wM (expTable g_k wM) bits (1 + 2 * (wE + wM))
          (1 + 2 * (wE + wM) + A.span bits)
          (1 + 2 * (wE + wM) + A.span bits + numWin * wM) k numWin)

*Stage 1, concrete input.** On `expMulInputAccOf` (ctrl set, `y` in the y-register, `e` in the exponent register, `acc₀` in the accumulator), the quantum-selected pass leaves `(acc₀ + g_k^{windowₖ(e)}·y) mod 2^bits` in the accumulator.

defexpTableInv

def expTableInv (g_kinv wM bits : Nat) (j addr : Nat) : Nat

*The two-level INVERSE table**: at concatenated address `addr = v + 2^wM·ek`, row `j` provides `(2^bits − g_kinv^ek mod 2^bits) · (2^wM)^j · v` — the negated inverse of the exponent-window-selected constant. (Not of the form `expTable h wM`: at `ek = 0` a forward table's constant is pinned to `h^0 = 1`.)

theoremexpTableInv_row

theorem expTableInv_row (g_kinv wM bits j v ek : Nat) (hv : v < 2 ^ wM) :
    expTableInv g_kinv wM bits j (v + 2 ^ wM * ek)
      = (2 ^ bits - g_kinv ^ ek % 2 ^ bits) * (2 ^ wM) ^ j * v

*Inverse-table row decode at a concatenated address.**

theoremone_mod_of_mul_mod_one

theorem one_mod_of_mul_mod_one (g ginv m : Nat) (h : g * ginv % m = 1) :
    1 % m = 1

A modular unit forces `1 % m = 1` (`m = 1` would make the residue `0`; `m = 0` is the identity modulus).

theoremmul_pow_mod_one

theorem mul_pow_mod_one (g ginv m n : Nat) (h : g * ginv % m = 1) :
    g ^ n * ginv ^ n % m = 1

*Inverses lift to powers** (`mul_pow`): if `g·ginv ≡ 1 (mod m)` then `g^n · ginv^n ≡ 1 (mod m)` — the selected constant `g_k^{ek}` is invertible with inverse `g_kinv^{ek}`.

theorempow_mod_inv_one

theorem pow_mod_inv_one (g ginv m n : Nat) (h : g * ginv % m = 1) :
    g ^ n * (ginv ^ n % m) % m = 1

The reduced form `g^n · (ginv^n mod m) ≡ 1 (mod m)` — the inverse witness the cancellation lemma consumes (it requires its inverse `< m`).

defExpReady

def ExpReady (A : Adder) (wE wM bits numWin numExpWin y e : Nat)
    (f : Nat → Bool) : Prop

The in-place quantum-selected round's input/output contract: an `expMulInputOf`-shaped state with y-register value `y`, exponent-register value `e`, and a CLEAN adder block.

theoremExpReady.toExpStepInv

theorem ExpReady.toExpStepInv {A : Adder} {wE wM bits numWin numExpWin y e : Nat}
    {f : Nat → Bool} (h : ExpReady A wE wM bits numWin numExpWin y e f) :
    ExpStepInv A wE wM bits numWin numExpWin y e 0 f

An `ExpReady` state satisfies the exponent-window-step invariant with partial sum 0 (a bitwise-clean accumulator decodes to 0).

theoremexpReady_expMulInputOf

theorem expReady_expMulInputOf (A : Adder) (wE wM bits numWin numExpWin y e : Nat)
    (hclean : A.ancClean (expMulInputOf A wE wM bits numWin numExpWin y e)
      bits (1 + 2 * (wE + wM))) :
    ExpReady A wE wM bits numWin numExpWin y e
      (expMulInputOf A wE wM bits numWin numExpWin y e)

The clean input `expMulInputOf` is `ExpReady` (given the adder's abstract ancilla-cleanliness, discharged concretely per instance).

defexpWindowInPlaceOf

def expWindowInPlaceOf (A : Adder) (wE wM bits numWin g_k g_kinv k : Nat) :
    Gate

*The in-place quantum-selected window round** at exp-window `k`, by the classical constant `g_k` with inverse `g_kinv` (`g_k·g_kinv ≡ 1 mod 2^bits`): the multiplication constant `g_k^{e_k}` and its inverse are SELECTED by the exponent window living in the quantum exponent register. The gate is FIXED — only the tables depend on `g_k`/`g_kinv`; nothing depends on the exponent value.

theoremexpWindowInPlace_correct

theorem expWindowInPlace_correct (A : Adder)
    (wE wM bits numWin numExpWin g_k g_kinv y e k : Nat)
    (hwM : 0 < wM) (hbits : numWin * wM = bits) (hk : k < numExpWin)
    (hy : y < 2 ^ bits) (hgk : g_k * g_kinv % 2 ^ bits = 1)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * (wE + wM)) i = A.augendIdx (1 + 2 * (wE + wM)) j →
      i = j)
    (f : Nat → Bool) (hf : ExpReady A wE wM bits numWin numExpWin y e f) :
    ExpReady A wE wM bits numWin numExpWin
      (g_k ^ WindowedArith.window wE e k * y % 2 ^ bits) e
      (Gate.applyNat (expWindowInPlaceOf A wE wM bits numWin g_k g_kinv k) f)

*Stage 3 HEADLINE — in-place QUANTUM-SELECTED windowed multiplication, full state restoration.** For ANY adder `A` with pairwise-distinct accumulator positions, `numWin·wM = bits`, and `g_k·g_kinv ≡ 1 (mod 2^bits)`: on an `ExpReady` state whose exponent register holds basis value `e` and whose y-register holds `y < 2^bits`, the round produces the `ExpReady` state with y ← g_k^{windowₖ(e)} · y mod 2^bits and the EXPONENT REGISTER PRESERVED — accumulator, addend register, and ancillas all returned CLEAN. Output shape = input shape, so rounds compose (§8). This is the per-basis-state `e` statement (for EVERY `e`; only the windows `windowₖ(e)`, automatically `< 2^wE`, enter) — exactly the form that lifts to superposed exponent registers at the unitary level via the basis-action bridge, since `expWindowInPlaceOf` is one fixed gate independent of `e`.

defwindowedExpInPlaceQ

def windowedExpInPlaceQ (A : Adder) (wE wM bits numWin g ginv nE : Nat) : Gate

*The quantum-selected in-place windowed modular exponentiation**: the `nE`-fold chain of in-place quantum-selected window rounds, round `k` with constants `g^((2^wE)^k)` / `ginv^((2^wE)^k)`. One FIXED gate — the exponent enters only through the quantum exponent register.

theoremwindowedExpInPlaceQ_fold

theorem windowedExpInPlaceQ_fold (A : Adder)
    (wE wM bits numWin numExpWin g ginv y e : Nat)
    (hwM : 0 < wM) (hbits : numWin * wM = bits) (hy : y < 2 ^ bits)
    (hg : g * ginv % 2 ^ bits = 1)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * (wE + wM)) i = A.augendIdx (1 + 2 * (wE + wM)) j →
      i = j)
    (f : Nat → Bool) (hf : ExpReady A wE wM bits numWin numExpWin y e f) :
    ∀ n, n ≤ numExpWin →
      ExpReady A wE wM bits numWin numExpWin
        ((∏ k ∈ Finset.range n,
            g ^ ((2 ^ wE) ^ k * WindowedArith.window wE e k)) * y % 2 ^ bits) e

*The chain fold.** After the first `n ≤ numExpWin` rounds, the state is `ExpReady` with y-value `(Π_{k<n} g^((2^wE)^k·windowₖ(e)))·y mod 2^bits` and the exponent register STILL holding `e` — by induction, using Stage 3's full state restoration (the per-round inverse constants are units by `mul_pow_mod_one`).

theoremwindowedExpInPlaceQ_correct

theorem windowedExpInPlaceQ_correct (A : Adder)
    (wE wM bits numWin numExpWin g ginv y e : Nat)
    (hwM : 0 < wM) (hbits : numWin * wM = bits) (hy : y < 2 ^ bits)
    (he : e < (2 ^ wE) ^ numExpWin) (hg : g * ginv % 2 ^ bits = 1)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * (wE + wM)) i = A.augendIdx (1 + 2 * (wE + wM)) j →
      i = j)
    (f : Nat → Bool) (hf : ExpReady A wE wM bits numWin numExpWin y e f) :
    ExpReady A wE wM bits numWin numExpWin (g ^ e * y % 2 ^ bits) e
      (Gate.applyNat (windowedExpInPlaceQ A wE wM bits numWin g ginv numExpWin)
        f)

*Stage 4 HEADLINE — QUANTUM-SELECTED in-place windowed MODEXP value theorem.** For ANY adder `A`, `numWin·wM = bits`, `g·ginv ≡ 1 (mod 2^bits)`, and ANY basis exponent `e < 2^(wE·numExpWin)` held in the quantum exponent register: the fixed gate `windowedExpInPlaceQ` maps the `ExpReady` state with y-value `y < 2^bits` to the `ExpReady` state with y ← g^e · y mod 2^bits and the exponent register PRESERVED — the windowed factors multiply out to `g^e` by the base-`2^wE` digit expansion of `e`. Holding for every basis `e` with one fixed gate, this is the statement that lifts to superposed exponent registers at the unitary level.

theoremwindowedExpInPlaceQ_correct_cuccaro

theorem windowedExpInPlaceQ_correct_cuccaro
    (wE wM bits numWin numExpWin g ginv y e : Nat)
    (hwM : 0 < wM) (hbits : numWin * wM = bits) (hy : y < 2 ^ bits)
    (he : e < (2 ^ wE) ^ numExpWin) (hg : g * ginv % 2 ^ bits = 1) :
    ExpReady cuccaroAdder wE wM bits numWin numExpWin (g ^ e * y % 2 ^ bits) e
      (Gate.applyNat
        (windowedExpInPlaceQ cuccaroAdder wE wM bits numWin g ginv numExpWin)
        (expMulInputOf cuccaroAdder wE wM bits numWin numExpWin y e))

*Cuccaro instance.** The full quantum-selected in-place windowed modular exponentiation over the Cuccaro adder, run on the clean encoded input (`y` in the y-register, basis exponent `e` in the exponent register): the output is the `ExpReady` state with y-value `g^e·y mod 2^bits` and the exponent register preserved. Cuccaro's `ancClean` — the carry-in qubit at the block base — is discharged concretely.

FormalRV.Arithmetic.Windowed.WindowedExpStep

FormalRV/Arithmetic/Windowed/WindowedExpStep.lean

FormalRV.Shor.WindowedCircuit.WindowedExpStep — the windowed-EXPONENT multiply-add pass (Gidney–Ekerå two-level lookup), adder-generic. One pass of the windowed modular-exponentiation inner loop (Gidney 1905.07682 l.694–697): the lookup address CONCATENATES an exponent window (high bits) with a multiplier window (low bits), so a single QROM read over the widened `wE + wM`-bit address realizes the two-argument table `T[ek, v] = g_k^ek · (2^wM)^j · v` — i.e. one pass multiply-accumulates by the exponent-window-SELECTED constant `g_k^{e_k}`. HEADLINE (`expWindowPassOf_correct`): for ANY adder `A` satisfying the `Adder` interface, the pass `expWindowPassOf A wE wM (expTable g_k wM) …`, run on the input `expMulInputOf …` (ctrl set, `y` in the y-register, `e` in the exponent register, everything else clean), leaves (g_k ^ window wE e k · y) mod 2^bits in the accumulator. Structural sibling of `windowedMulCircuitOf_correct` (same StepInv technique), with the wider `wTot = wE + wM` lookup register and the concatenated-address decode. Kernel-clean: no `sorry`, no `native_decide`, no axioms beyond the prelude.

defcopyWindowAt

def copyWindowAt (w srcBase j aOff : Nat) : Gate

CX-copy window `j` of the `w`-bit-windowed register at `srcBase` into the lookup address wires `ulookup_address_idx (aOff) … (aOff+w−1)`. Self-inverse (`copyWindowAt_involutive_apply`). `copyWindow w yBase j` is the `aOff = 0` instance (definitionally up to `0 + i = i`).

theoremcopyWindowAt_ctrl_ne

theorem copyWindowAt_ctrl_ne (w srcBase j aOff : Nat)
    (hsrc : 2 * (aOff + w) ≤ srcBase) :
    ∀ i k, i < w → k < w → srcBase + j * w + i ≠ ulookup_address_idx (aOff + k)

The standing control-vs-target disjointness: the CX controls (source wires `srcBase + j·w + i`) are never the targeted address wires whenever the source register sits above the targeted address segment.

theoremcopyWindowAt_frame

theorem copyWindowAt_frame (w srcBase j aOff : Nat) (f : Nat → Bool) (p : Nat)
    (hp : ∀ i, i < w → p ≠ ulookup_address_idx (aOff + i)) :
    Gate.applyNat (copyWindowAt w srcBase j aOff) f p = f p

*`copyWindowAt` frame.** Any wire that is not a targeted address wire `ulookup_address_idx (aOff + i)` (`i < w`) is untouched.

theoremcopyWindowAt_at_addr

theorem copyWindowAt_at_addr (w srcBase j aOff : Nat) (f : Nat → Bool)
    (hctrl : ∀ i k, i < w → k < w →
      srcBase + j * w + i ≠ ulookup_address_idx (aOff + k))
    (i : Nat) (hi : i < w) :
    Gate.applyNat (copyWindowAt w srcBase j aOff) f (ulookup_address_idx (aOff + i))
      = xor (f (ulookup_address_idx (aOff + i))) (f (srcBase + j * w + i))

*`copyWindowAt` post-state at a target.** Targeted address wire `aOff + i` ends as the XOR of its original value with source bit `srcBase + j·w + i`.

theoremcopyWindowAt_copies

theorem copyWindowAt_copies (w srcBase j aOff : Nat) (f : Nat → Bool)
    (hctrl : ∀ i k, i < w → k < w →
      srcBase + j * w + i ≠ ulookup_address_idx (aOff + k))
    (hclean : ∀ i, i < w → f (ulookup_address_idx (aOff + i)) = false)
    (i : Nat) (hi : i < w) :
    Gate.applyNat (copyWindowAt w srcBase j aOff) f (ulookup_address_idx (aOff + i))
      = f (srcBase + j * w + i)

*`copyWindowAt` copies.** On clean targeted address wires the copy writes the source bits verbatim.

theoremcopyWindowAt_involutive_apply

theorem copyWindowAt_involutive_apply (w srcBase j aOff : Nat) (f : Nat → Bool)
    (hctrl : ∀ i k, i < w → k < w →
      srcBase + j * w + i ≠ ulookup_address_idx (aOff + k))
    (p : Nat) :
    Gate.applyNat (copyWindowAt w srcBase j aOff)
        (Gate.applyNat (copyWindowAt w srcBase j aOff) f) p = f p

*`copyWindowAt` is self-inverse (pointwise).**

theoremconcat_testBit

theorem concat_testBit (wM v ek i : Nat) (hv : v < 2 ^ wM) :
    (v + 2 ^ wM * ek).testBit i
      = if i < wM then v.testBit i else ek.testBit (i - wM)

*Concatenated-address bits.** For `v < 2^wM`, bit `i` of `v + 2^wM·ek` is bit `i` of `v` below the split and bit `i − wM` of `ek` above it.

theoremconcat_lt

theorem concat_lt (wE wM v ek : Nat) (hv : v < 2 ^ wM) (hek : ek < 2 ^ wE) :
    v + 2 ^ wM * ek < 2 ^ (wE + wM)

*Concatenated-address bound.** The concatenation of a `wM`-bit and a `wE`-bit window fits in the widened `wE + wM`-bit address space.

defexpTable

def expTable (g_k wM : Nat) (j addr : Nat) : Nat

*The two-level table** (Gidney 1905.07682 l.694–697): at concatenated address `addr = v + 2^wM·ek`, row `j` provides `g_k^ek · (2^wM)^j · v` — the multiplicand is SELECTED by the exponent window sitting in the high address bits.

theoremexpTable_row

theorem expTable_row (g_k wM j v ek : Nat) (hv : v < 2 ^ wM) :
    expTable g_k wM j (v + 2 ^ wM * ek) = g_k ^ ek * (2 ^ wM) ^ j * v

*Table-row decode at a concatenated address.** Splitting the address back into its two windows (`address_concat`) evaluates the row.

defexpWindowStepOf

def expWindowStepOf (A : Adder) (wE wM : Nat) (T : Nat → Nat)
    (bits q_start yBase eBase k j : Nat) : Gate

*One exponent-window step** at mul-window `j`, exp-window `k`: copy the mul-window into address bits `[0, wM)`, copy the exp-window into address bits `[wM, wM+wE)`, lookup-add the two-argument table row over the widened address, then uncopy both (reverse order).

defexpWindowPassOf

def expWindowPassOf (A : Adder) (wE wM : Nat) (Tfam : Nat → Nat → Nat)
    (bits q_start yBase eBase k numWin : Nat) : Gate

*The windowed-exponent multiply-add pass**: fold the exponent-window step over the mul-windows `j < numWin`, with per-`j` table `Tfam j`.

defexpMulInputOf

def expMulInputOf (A : Adder) (wE wM bits numWin numExpWin y e : Nat) :
    Nat → Bool

The input store for the exponent-window pass over adder `A`: control set, `y` encoded in the y-register at `yBase = 1+2(wE+wM) + A.span bits`, the exponent `e` encoded at `eBase = yBase + numWin·wM`, everything else clean.

theoremexpMulInputOf_ctrl

theorem expMulInputOf_ctrl (A : Adder) (wE wM bits numWin numExpWin y e : Nat) :
    expMulInputOf A wE wM bits numWin numExpWin y e ulookup_ctrl_idx = true

`expMulInputOf` reads `true` at the control qubit.

theoremexpMulInputOf_low

theorem expMulInputOf_low (A : Adder) (wE wM bits numWin numExpWin y e p : Nat)
    (hp0 : p ≠ ulookup_ctrl_idx) (hpy : p < 1 + 2 * (wE + wM) + A.span bits) :
    expMulInputOf A wE wM bits numWin numExpWin y e p = false

`expMulInputOf` reads `false` at every non-control position below the y-register.

theoremexpMulInputOf_y

theorem expMulInputOf_y (A : Adder) (wE wM bits numWin numExpWin y e p : Nat)
    (hp0 : p ≠ ulookup_ctrl_idx)
    (hp : p < 1 + 2 * (wE + wM) + A.span bits + numWin * wM) :
    expMulInputOf A wE wM bits numWin numExpWin y e p
      = encodeReg (1 + 2 * (wE + wM) + A.span bits) (numWin * wM) y p

Below the exponent register (and off the control), `expMulInputOf` is the `encodeReg` encoding of `y`.

theoremexpMulInputOf_e

theorem expMulInputOf_e (A : Adder) (wE wM bits numWin numExpWin y e p : Nat)
    (hp : 1 + 2 * (wE + wM) + A.span bits + numWin * wM ≤ p) :
    expMulInputOf A wE wM bits numWin numExpWin y e p
      = encodeReg (1 + 2 * (wE + wM) + A.span bits + numWin * wM)
          (numExpWin * wE) e p

At and above the exponent register, `expMulInputOf` is the `encodeReg` encoding of `e`.

defExpStepInv

def ExpStepInv (A : Adder) (wE wM bits numWin numExpWin y e s : Nat)
    (g : Nat → Bool) : Prop

*The pass invariant** (the `StepInv` of the two-level pass). After some number of exponent-window steps starting from `expMulInputOf …`: (F) frame: `g` agrees with the input off the adder block — in particular ctrl is set, the widened address/AND registers are clean, and the y- and exponent-registers still encode `y` and `e`; (D) the addend register is clean; (C) the adder's ancilla block is clean; (V) the augend register decodes to `s % 2^bits` (the partial sum so far).

theoremexpStepInv_init

theorem expStepInv_init (A : Adder) (wE wM bits numWin numExpWin y e : Nat)
    (hclean : A.ancClean (expMulInputOf A wE wM bits numWin numExpWin y e)
      bits (1 + 2 * (wE + wM))) :
    ExpStepInv A wE wM bits numWin numExpWin y e 0
      (expMulInputOf A wE wM bits numWin numExpWin y e)

*Invariant initialization.** The input state satisfies the invariant with partial sum `0`.

theoremexpStepInv_step

theorem expStepInv_step (A : Adder) (wE wM bits g_k numWin numExpWin y e k : Nat)
    (hwM : 0 < wM) (hk : k < numExpWin)
    (j : Nat) (hj : j < numWin) (s : Nat) (g : Nat → Bool)
    (hg : ExpStepInv A wE wM bits numWin numExpWin y e s g) :
    ExpStepInv A wE wM bits numWin numExpWin y e
      (s + g_k ^ WindowedArith.window wE e k * (2 ^ wM) ^ j
            * WindowedArith.window wM y j)
      (Gate.applyNat
        (expWindowStepOf A wE wM (expTable g_k wM j) bits (1 + 2 * (wE + wM))
          (1 + 2 * (wE + wM) + A.span bits)
          (1 + 2 * (wE + wM) + A.span bits + numWin * wM) k j)
        g)

theoremexpStepInv_fold

theorem expStepInv_fold (A : Adder) (wE wM bits g_k numWin numExpWin y e k : Nat)
    (hwM : 0 < wM) (hk : k < numExpWin)
    (hclean : A.ancClean (expMulInputOf A wE wM bits numWin numExpWin y e)
      bits (1 + 2 * (wE + wM))) :
    ∀ n, n ≤ numWin →
      ExpStepInv A wE wM bits numWin numExpWin y e
        (∑ l ∈ Finset.range n,
          g_k ^ WindowedArith.window wE e k * (2 ^ wM) ^ l
            * WindowedArith.window wM y l)
        (Gate.applyNat
          (expWindowPassOf A wE wM (expTable g_k wM) bits (1 + 2 * (wE + wM))
            (1 + 2 * (wE + wM) + A.span bits)

Running the first `n` exponent-window steps (`n ≤ numWin`) of the pass establishes the invariant with partial sum `Σ_{l<n} g_k^{windowₖ(e)}·(2^wM)^l·windowₗ(y)`.

theoremexpWindowPassOf_correct

theorem expWindowPassOf_correct (A : Adder)
    (wE wM bits g_k numWin numExpWin y e k : Nat)
    (hwM : 0 < wM) (hk : k < numExpWin)
    (hy : y < 2 ^ (wM * numWin))
    (hclean : A.ancClean (expMulInputOf A wE wM bits numWin numExpWin y e)
      bits (1 + 2 * (wE + wM))) :
    decodeAccOf A (Gate.applyNat
        (expWindowPassOf A wE wM (expTable g_k wM) bits (1 + 2 * (wE + wM))
          (1 + 2 * (wE + wM) + A.span bits)
          (1 + 2 * (wE + wM) + A.span bits + numWin * wM) k numWin)
        (expMulInputOf A wE wM bits numWin numExpWin y e))
        (1 + 2 * (wE + wM)) bits

*HEADLINE — the windowed-EXPONENT multiply-add pass VALUE theorem.** For ANY adder `A`, the two-level pass (Gidney 1905.07682 l.694–697: concatenated address = exponent-window ++ multiplier-window), run on the encoded input (ctrl set, `y` in the y-register, `e` in the exponent register, everything else clean), multiply-accumulates by the exponent-window-SELECTED constant `g_k^{e_k}`: the accumulator ends at (g_k ^ window wE e k · y) mod 2^bits. One pass = one exponent-window step of the windowed modular exponentiation, adder-generic. (No `0 < wE` or `e < 2^(wE·numExpWin)` hypotheses are needed: the proof only consumes the per-window bounds, which hold definitionally.)

theoremexpWindowPass_correct_cuccaro

theorem expWindowPass_correct_cuccaro (wE wM bits g_k numWin numExpWin y e k : Nat)
    (hwM : 0 < wM) (hk : k < numExpWin)
    (hy : y < 2 ^ (wM * numWin)) :
    decodeAccOf cuccaroAdder (Gate.applyNat
        (expWindowPassOf cuccaroAdder wE wM (expTable g_k wM) bits
          (1 + 2 * (wE + wM)) (1 + 2 * (wE + wM) + cuccaroAdder.span bits)
          (1 + 2 * (wE + wM) + cuccaroAdder.span bits + numWin * wM) k numWin)
        (expMulInputOf cuccaroAdder wE wM bits numWin numExpWin y e))
        (1 + 2 * (wE + wM)) bits
      = (g_k ^ WindowedArith.window wE e k * y) % 2 ^ bits

*Cuccaro instance.** `cuccaroAdder.ancClean` is the carry-in qubit at the block base reading `false` — below the y-register, so the input state provides it.

theoremexpWindowPass_correct_gidney

theorem expWindowPass_correct_gidney (wE wM bits g_k numWin numExpWin y e k : Nat)
    (hwM : 0 < wM) (hk : k < numExpWin)
    (hy : y < 2 ^ (wM * numWin)) :
    decodeAccOf gidneyAdder (Gate.applyNat
        (expWindowPassOf gidneyAdder wE wM (expTable g_k wM) bits
          (1 + 2 * (wE + wM)) (1 + 2 * (wE + wM) + gidneyAdder.span bits)
          (1 + 2 * (wE + wM) + gidneyAdder.span bits + numWin * wM) k numWin)
        (expMulInputOf gidneyAdder wE wM bits numWin numExpWin y e))
        (1 + 2 * (wE + wM)) bits
      = (g_k ^ WindowedArith.window wE e k * y) % 2 ^ bits

*Gidney instance.** `gidneyAdder.ancClean` is `∀ i < bits, f ((1+2(wE+wM)) + 3i + 2) = false` — every carry qubit lies inside the adder block, below the y-register, so the input state reads it `false`.

FormalRV.Arithmetic.Windowed.WindowedGrayLookup

FormalRV/Arithmetic/Windowed/WindowedGrayLookup.lean

FormalRV.Arithmetic.Windowed.WindowedGrayLookup — the **Gray-code-read windowed multiplier**: the adder-generic windowed multiplier with the QROM lookup slot filled by the gate-level Gray-code/sawtooth read (`grayLookupReadAt`, FormalRV/Arithmetic/UnaryLookup/UnaryLookupGrayCode.lean) instead of the faithful per-row read (`lookupReadAt`). ## Why this file exists (audit grade: Gidney–Ekerå 2021) The faithful windowed multiplier (`windowedMulCircuitOf`, Windowed/WindowedCircuit.lean) charges `14·w·2^w` T per table read — the no-optimization unary cascade re-run per row. Gidney–Ekerå 2021 (and qianxu p. 23) charge the Gray-code-amortized cost. This file builds the SAME windowed multiplier with the Gray-code read dropped into the lookup slot, closing the factor `w` at the gate level: per read: 14·w·2^w → 14·(2^w − 1) per window: 28·w·2^w + adder → 28·(2^w − 1) + adder The residual ×2 against the papers' `2^w` Toffolis per lookup is the measurement-based uncompute (EXIT Toffolis replaced by X-basis measurements + classically-controlled Cliffords), which is not expressible in the pure X/CX/CCX `Gate` IR — see the UnaryLookupGrayCode module docstring and `FormalRV/Shor/MeasUncompute.lean`. ## Headlines `grayWindowedMulCircuitOf_correct` — VALUE: for ANY adder `A`, the Gray-code windowed multiplier leaves `(a·y) mod 2^bits` in the accumulator. Same statement (same hypotheses, same input `mulInputOf`) as the faithful `windowedMulCircuitOf_correct`; instances `grayWindowedMulCircuit_correct_cuccaro` / `_gidney`. `tcount_grayWindowedMulCircuitOf` — RESOURCE: `numWin · (2·(14·(2^w − 1)) + tcount (A.circuit bits (1+2w)))`; Cuccaro closed form `numWin · (28·(2^w − 1) + 14·bits)` and Toffoli count `numWin · (4·(2^w − 1) + 2·bits)` — versus the faithful `numWin · (4·w·2^w + 2·bits)`: the factor `w` is GONE. `tcount_grayWindowedMulCircuitOf_le_faithful` — the Gray-code multiplier never costs more than the faithful one (any adder, `0 < w`). ## Which file to import when auditing Import THIS file when auditing the optimized (Gray-code) lookup counts that Gidney–Ekerå-style resource estimates charge; import the faithful `Windowed/WindowedCircuit(Correct)` when auditing the no-optimization baseline. Both expose the same value theorem, so the choice only moves the resource count. Proof reuse: `StepInv`, `stepInv_init`, `mulInputOf` and all the copy-window/adder-contract lemmas are circuit-independent and are REUSED from WindowedCircuitCorrect; only the step/fold/headline are cloned, with `grayLookupReadAt_selects_word` / `grayLookupReadAt_frame` consumed as black boxes exactly where the faithful proof consumed `lookupReadAt_selects_word` / `lookupReadAt_frame` (the contracts are statement-identical). Kernel-clean: no `sorry`, no `native_decide`, no axioms beyond the prelude.

defgrayLookupAddAtOf

def grayLookupAddAtOf (A : Adder) (w W : Nat) (T : Nat → Nat) (bits q_start : Nat) : Gate

*Gray-code lookup-ADDITION.** Gidney l.276 read·add·unread with the Gray-code/sawtooth QROM read, word register laid out AS adder `A`'s addend register. (Mirror of `lookupAddAtOf`.)

defgrayWindowStepOf

def grayWindowStepOf (A : Adder) (w W a : Nat) (bits q_start yBase j : Nat) : Gate

*Gray-code window step.** Copy window `j` into the lookup address, Gray-code-lookup-add the entry `T_j[v] = a·(2^w)^j·v` into adder `A`, then uncopy. (Mirror of `windowStepOf`.)

defgrayWindowedMulOf

def grayWindowedMulOf (A : Adder) (w W a : Nat) (bits q_start yBase numWin : Nat) : Gate

*Gray-code windowed multiplier**, a fold of Gray-code window-steps over adder `A`. (Mirror of `windowedMulOf`.)

defgrayWindowedMulCircuitOf

def grayWindowedMulCircuitOf (A : Adder) (w bits a numWin : Nat) : Gate

*The full Gray-code windowed-multiplier circuit over an arbitrary adder `A`.** Same layout as the faithful `windowedMulCircuitOf`: ctrl=0; address bits `1,3,…,2w−1`; AND-ancillas `2,4,…,2w`; the adder region at `q_start = 1+2w`; the `y`-register at `yBase = q_start + A.span bits`.

theoremgrayStepInv_step

theorem grayStepInv_step (A : Adder) (w bits a numWin y : Nat) (hw : 0 < w)
    (j : Nat) (hj : j < numWin) (s : Nat) (g : Nat → Bool)
    (hg : StepInv A w bits numWin y s g) :
    StepInv A w bits numWin y (s + a * (2 ^ w) ^ j * WindowedArith.window w y j)
      (Gate.applyNat
        (grayWindowStepOf A w bits a bits (1 + 2 * w) (1 + 2 * w + A.span bits) j)
        g)

One Gray-code window step preserves `StepInv`, adding `a·(2^w)^j·windowⱼ(y)` to the partial sum. (Clone of `stepInv_step` with the Gray-code read lemmas swapped in.)

theoremgrayStepInv_fold

theorem grayStepInv_fold (A : Adder) (w bits a numWin y : Nat) (hw : 0 < w)
    (hclean : A.ancClean (mulInputOf A w bits numWin y) bits (1 + 2 * w)) :
    ∀ n, n ≤ numWin →
      StepInv A w bits numWin y
        (∑ k ∈ Finset.range n, a * (2 ^ w) ^ k * WindowedArith.window w y k)
        (Gate.applyNat
          (grayWindowedMulOf A w bits a bits (1 + 2 * w) (1 + 2 * w + A.span bits) n)
          (mulInputOf A w bits numWin y))

Running the first `n` Gray-code window-steps (`n ≤ numWin`) establishes the invariant with partial sum `Σ_{k<n} a·(2^w)^k·windowₖ(y)`. (Clone of `stepInv_fold`; `stepInv_init` is reused as-is.)

theoremgrayWindowedMulCircuitOf_correct

theorem grayWindowedMulCircuitOf_correct (A : Adder) (w bits a numWin y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin))
    (hclean : A.ancClean (mulInputOf A w bits numWin y) bits (1 + 2 * w)) :
    decodeAccOf A (Gate.applyNat (grayWindowedMulCircuitOf A w bits a numWin)
        (mulInputOf A w bits numWin y)) (1 + 2 * w) bits
      = (a * y) % 2 ^ bits

*HEADLINE — Gray-code windowed-multiplier VALUE theorem.** For ANY adder `A` (Cuccaro, Gidney, …), the Gray-code windowed multiplier `grayWindowedMulCircuitOf A w bits a numWin`, run on the encoded input `mulInputOf A w bits numWin y` (ctrl set, `y` in the y-register, everything else clean), leaves `(a·y) mod 2^bits` in the accumulator — provided `0 < w`, `y < 2^(w·numWin)`, and the adder's ancilla block starts clean. Same statement as the faithful `windowedMulCircuitOf_correct`.

theoremgrayWindowedMulCircuit_correct_cuccaro

theorem grayWindowedMulCircuit_correct_cuccaro (w bits a numWin y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin)) :
    decodeAccOf cuccaroAdder
        (Gate.applyNat (grayWindowedMulCircuitOf cuccaroAdder w bits a numWin)
          (mulInputOf cuccaroAdder w bits numWin y)) (1 + 2 * w) bits
      = (a * y) % 2 ^ bits

*Cuccaro instance.** `cuccaroAdder.ancClean` is `f (1+2w) = false` — the carry-in qubit sits at the block base, below the y-register, so the input state reads it `false`.

theoremgrayWindowedMulCircuit_correct_gidney

theorem grayWindowedMulCircuit_correct_gidney (w bits a numWin y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin)) :
    decodeAccOf gidneyAdder
        (Gate.applyNat (grayWindowedMulCircuitOf gidneyAdder w bits a numWin)
          (mulInputOf gidneyAdder w bits numWin y)) (1 + 2 * w) bits
      = (a * y) % 2 ^ bits

*Gidney instance.** `gidneyAdder.ancClean` is `∀ i < bits, f ((1+2w) + 3i + 2) = false` — every carry qubit lies inside the block, below the y-register, so the input state reads it `false`.

theoremtcount_grayLookupAddAtOf

theorem tcount_grayLookupAddAtOf (A : Adder) (w W : Nat) (T : Nat → Nat) (bits q_start : Nat) :
    tcount (grayLookupAddAtOf A w W T bits q_start)
      = 2 * (14 * (2 ^ w - 1)) + tcount (A.circuit bits q_start)

*Gray-code lookup-add T-count.** Two Gray-code table reads plus one adder application: `2·(14·(2^w − 1)) + tcount (A.circuit bits q_start)` (vs the faithful `2·(14·w·2^w) + adder`, `tcount_lookupAddAtOf`).

theoremtcount_grayWindowStepOf

theorem tcount_grayWindowStepOf (A : Adder) (w W a bits q_start yBase j : Nat) :
    tcount (grayWindowStepOf A w W a bits q_start yBase j)
      = 2 * (14 * (2 ^ w - 1)) + tcount (A.circuit bits q_start)

*Gray-code window-step T-count.** The window copy/uncopy are Toffoli-free, so the cost is exactly the per-step lookup-add cost.

theoremtcount_grayWindowedMulOf

theorem tcount_grayWindowedMulOf (A : Adder) (w W a bits q_start yBase numWin : Nat) :
    tcount (grayWindowedMulOf A w W a bits q_start yBase numWin)
      = numWin * (2 * (14 * (2 ^ w - 1)) + tcount (A.circuit bits q_start))

*Gray-code windowed-multiplier T-count.** `numWin` identical steps.

theoremtcount_grayWindowedMulCircuitOf

theorem tcount_grayWindowedMulCircuitOf (A : Adder) (w bits a numWin : Nat) :
    tcount (grayWindowedMulCircuitOf A w bits a numWin)
      = numWin * (2 * (14 * (2 ^ w - 1)) + tcount (A.circuit bits (1 + 2 * w)))

*RESOURCE HEADLINE — generic closed-form T-count of the Gray-code windowed multiplier.** Per window: two `14·(2^w − 1)`-T Gray-code reads plus the adder at base `1+2w`. Versus the faithful `tcount_windowedMulCircuitOf` = `numWin·(28·w·2^w + adder)`: the factor `w` in the lookup term is gone.

theoremtcount_grayWindowedMulCircuit_cuccaro

theorem tcount_grayWindowedMulCircuit_cuccaro (w bits a numWin : Nat) :
    tcount (grayWindowedMulCircuitOf cuccaroAdder w bits a numWin)
      = numWin * (28 * (2 ^ w - 1) + 14 * bits)

*Cuccaro closed form**: `numWin · (28·(2^w − 1) + 14·bits)` T (vs the faithful `numWin · (28·w·2^w + 14·bits)`, `tcount_windowedMulCircuit`).

theoremgrayWindowedMulCircuit_toffoli_cuccaro

theorem grayWindowedMulCircuit_toffoli_cuccaro (w bits a numWin : Nat) :
    toffoliCount (grayWindowedMulCircuitOf cuccaroAdder w bits a numWin)
      = numWin * (4 * (2 ^ w - 1) + 2 * bits)

*Cuccaro Toffoli count**: `numWin · (4·(2^w − 1) + 2·bits)` — versus the faithful `numWin · (4·w·2^w + 2·bits)` (`windowedMulCircuit_toffoli`): the factor `w` on the lookup term is GONE; the remaining ×2 against the papers' `2^w` per lookup is the measurement-uncompute leg (module docstring).

theoremtcount_grayWindowedMulCircuitOf_le_faithful

theorem tcount_grayWindowedMulCircuitOf_le_faithful
    (A : Adder) (w bits a numWin : Nat) (hw : 0 < w) :
    tcount (grayWindowedMulCircuitOf A w bits a numWin)
      ≤ tcount (windowedMulCircuitOf A w bits a numWin)

*Audit bridge**: for any adder and any `0 < w`, the Gray-code windowed multiplier never costs more T than the faithful one (`28·(2^w − 1) ≤ 28·2^w ≤ 28·w·2^w` per window, adder term identical).

FormalRV.Arithmetic.Windowed.WindowedInPlace

FormalRV/Arithmetic/Windowed/WindowedInPlace.lean

FormalRV.Arithmetic.Windowed.WindowedInPlace — the IN-PLACE windowed multiplier and its pass-composition (the structural core of windowed modexp). Replays the Gidney `modMultInPlace` algorithm (OOPmul(a) ; SWAP ; OOPmul(−a⁻¹), `ModularAdder/Gidney/Def.lean`) at the WINDOWED level, with mod-2^bits arithmetic and an arbitrary `Adder` backend: **Stage 1 — generalized pass.** `windowedMulCircuitOf` run from ANY state satisfying the window-step invariant with partial sum `acc₀` (not just the clean `mulInputOf` with `acc₀ = 0`) leaves `(acc₀ + a·y) mod 2^bits` in the accumulator (`stepInv_full_pass`, `windowedMulCircuitOf_correct_acc`); the concrete nonzero-accumulator input `mulInputAccOf` instantiates it. **Stage 2 — the acc↔y swap.** `accYSwap A w bits`, three interleaved CX cascades between the accumulator (`A.augendIdx`) and the y-register, exchanges the two registers and frames everything else (`accYSwap_apply`), by the generic cascade engine of `WindowedCopySemantics`. **Stage 3 — in-place multiply.** For `a` invertible mod 2^bits (`a·ainv ≡ 1`), `windowedMulInPlace = pass(a) ; swap ; pass(2^bits − ainv)` maps the `MulReady`-shaped state with y-register `y` to the `MulReady` state with y-register `(a·y) mod 2^bits` — accumulator and ancillas restored CLEAN (`windowedMulInPlace_correct`). The cancellation `(y + (2^bits − ainv)·(a·y mod 2^bits)) ≡ 0` is `mod_inv_cancel_identity`. **Stage 4 — pass-composition.** `windowedMulInPlaceSeq`, the k-fold in-place multiply by constants `aₖ`, computes `y ← (Π aₖ)·y mod 2^bits` (`windowedMulInPlaceSeq_correct`); the modular-exponentiation instance `windowedExpInPlace` with CLASSICAL exponent windows `aₖ = g^((2^wE)^k · windowₖ(e))` computes `y ← g^e·y mod 2^bits` (`windowedExpInPlace_correct`). The quantum-selected version (windows read from an exponent register via `expWindowPassOf`) is the documented next step, NOT attempted here. Kernel-clean: no `sorry`, no `native_decide`, no axioms beyond the prelude.

theoremdecodeReg_succ_eq

theorem decodeReg_succ_eq (idx : Nat → Nat) (n : Nat) (f : Nat → Bool) :
    decodeReg idx (n + 1) f
      = decodeReg idx n f + (if f (idx n) then 2 ^ n else 0)

`decodeReg` peels its top bit (weight `2^n` at `idx n`).

theoremdecodeReg_lt_two_pow

theorem decodeReg_lt_two_pow (idx : Nat → Nat) (n : Nat) (f : Nat → Bool) :
    decodeReg idx n f < 2 ^ n

A `n`-bit register decode is `< 2^n`.

theoremdecodeReg_testBit

theorem decodeReg_testBit (idx : Nat → Nat) (n : Nat) (f : Nat → Bool)
    (i : Nat) (hi : i < n) :
    (decodeReg idx n f).testBit i = f (idx i)

*Decode determines the bits.** Bit `i` of `decodeReg idx n f` is exactly the state bit at `idx i` — the converse of `decodeReg_eq_mod_of_testBit`, by uniqueness of binary digits. (No injectivity of `idx` needed: the decode SUM indexes over `i`, reading `f (idx i)` once per `i`.)

theoremencodeReg_at

theorem encodeReg_at (base len x i : Nat) (hi : i < len) :
    encodeReg base len x (base + i) = x.testBit i

`encodeReg` at an in-range offset reads bit `i`.

theoremencodeReg_high

theorem encodeReg_high (base len x p : Nat) (hp : base + len ≤ p) :
    encodeReg base len x p = false

`encodeReg` above the register reads `false`.

defwriteReg

def writeReg (idx : Nat → Nat) (n v : Nat) (f : Nat → Bool) : Nat → Bool

Overwrite the `n`-bit register at positions `idx 0 … idx (n−1)` with the binary digits of `v` (bit `i` at `idx i`).

theoremwriteReg_frame

theorem writeReg_frame (idx : Nat → Nat) (n v : Nat) (f : Nat → Bool) (p : Nat)
    (hp : ∀ i, i < n → p ≠ idx i) :
    writeReg idx n v f p = f p

`writeReg` frame: positions off the register are untouched.

theoremwriteReg_at

theorem writeReg_at (idx : Nat → Nat) (n v : Nat) (f : Nat → Bool)
    (hinj : ∀ i j, i < n → j < n → idx i = idx j → i = j)
    (i : Nat) (hi : i < n) :
    writeReg idx n v f (idx i) = v.testBit i

`writeReg` writes: with pairwise-distinct positions, position `idx i` ends as bit `i` of `v`.

defmulInputAccOf

def mulInputAccOf (A : Adder) (w bits numWin acc₀ y : Nat) : Nat → Bool

*The nonzero-accumulator input**: `mulInputOf` with `acc₀` additionally encoded at the accumulator (augend) positions of adder `A`.

theoremstepInv_init_acc

theorem stepInv_init_acc (A : Adder) (w bits numWin acc₀ y : Nat)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * w) i = A.augendIdx (1 + 2 * w) j → i = j)
    (hclean : A.ancClean (mulInputAccOf A w bits numWin acc₀ y) bits (1 + 2 * w)) :
    StepInv A w bits numWin y acc₀ (mulInputAccOf A w bits numWin acc₀ y)

*Invariant initialization at `acc₀`.** `mulInputAccOf` satisfies the window-step invariant with partial sum `acc₀` (accumulator positions must be pairwise distinct so the written digits read back).

theoremstepInv_fold_acc

theorem stepInv_fold_acc (A : Adder) (w bits a numWin y acc₀ : Nat) (hw : 0 < w)
    (f : Nat → Bool) (hf : StepInv A w bits numWin y acc₀ f) :
    ∀ n, n ≤ numWin →
      StepInv A w bits numWin y
        (acc₀ + ∑ k ∈ Finset.range n, a * (2 ^ w) ^ k * WindowedArith.window w y k)
        (Gate.applyNat
          (windowedMulOf A w bits a bits (1 + 2 * w) (1 + 2 * w + A.span bits) n)
          f)

*The generalized fold.** From ANY state `f` satisfying the invariant with partial sum `acc₀`, running the first `n ≤ numWin` window-steps yields the invariant with partial sum `acc₀ + Σ_{k<n} a·(2^w)^k·windowₖ(y)`. (Reuses the existing start-value-agnostic `stepInv_step` verbatim.)

theoremstepInv_full_pass

theorem stepInv_full_pass (A : Adder) (w bits a numWin y acc₀ : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin)) (f : Nat → Bool)
    (hf : StepInv A w bits numWin y acc₀ f) :
    StepInv A w bits numWin y (acc₀ + a * y)
      (Gate.applyNat (windowedMulCircuitOf A w bits a numWin) f)

*The full generalized pass, invariant form.** One complete `windowedMulCircuitOf` run from an invariant state with partial sum `acc₀` re-establishes the invariant with partial sum `acc₀ + a·y` — the form the in-place composition consumes.

theoremwindowedMulCircuitOf_correct_acc

theorem windowedMulCircuitOf_correct_acc (A : Adder) (w bits a numWin y acc₀ : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin)) (f : Nat → Bool)
    (hf : StepInv A w bits numWin y acc₀ f) :
    decodeAccOf A (Gate.applyNat (windowedMulCircuitOf A w bits a numWin) f)
        (1 + 2 * w) bits
      = (acc₀ + a * y) % 2 ^ bits

*Stage 1 HEADLINE — generalized-pass VALUE theorem.** For ANY adder `A`, the windowed multiplier run from an invariant state whose accumulator holds partial sum `acc₀` leaves `(acc₀ + a·y) mod 2^bits` in the accumulator.

theoremmulInputAccOf_correct

theorem mulInputAccOf_correct (A : Adder) (w bits a numWin acc₀ y : Nat)
    (hw : 0 < w) (hy : y < 2 ^ (w * numWin))
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * w) i = A.augendIdx (1 + 2 * w) j → i = j)
    (hclean : A.ancClean (mulInputAccOf A w bits numWin acc₀ y) bits (1 + 2 * w)) :
    decodeAccOf A (Gate.applyNat (windowedMulCircuitOf A w bits a numWin)
        (mulInputAccOf A w bits numWin acc₀ y)) (1 + 2 * w) bits
      = (acc₀ + a * y) % 2 ^ bits

*Stage 1, concrete input.** On `mulInputAccOf` (ctrl set, `y` in the y-register, `acc₀` in the accumulator, everything else clean), the pass leaves `(acc₀ + a·y) mod 2^bits` in the accumulator.

theoremstepInv_foldT_acc

theorem stepInv_foldT_acc (A : Adder) (w bits : Nat) (Tfam : Nat → Nat → Nat)
    (numWin y acc₀ : Nat) (hw : 0 < w)
    (f : Nat → Bool) (hf : StepInv A w bits numWin y acc₀ f) :
    ∀ n, n ≤ numWin →
      StepInv A w bits numWin y
        (acc₀ + ∑ k ∈ Finset.range n, Tfam k (WindowedArith.window w y k))
        (Gate.applyNat
          (windowedMulTOf A w bits Tfam bits (1 + 2 * w) (1 + 2 * w + A.span bits) n)
          f)

*Table-generic generalized fold.** From ANY state `f` with partial sum `acc₀`, running `n ≤ numWin` table-generic window-steps yields the invariant with partial sum `acc₀ + Σ_{k<n} Tfam k (windowₖ(y))`. (Reuses the table-INDEPENDENT init and the start-value-agnostic `stepInv_stepT`; `stepInv_fold_acc` is the standard-table instance.)

theoremwindowedMulCircuitTOf_correct_acc

theorem windowedMulCircuitTOf_correct_acc (A : Adder) (w bits : Nat) (Tfam : Nat → Nat → Nat)
    (numWin y acc₀ : Nat) (hw : 0 < w) (f : Nat → Bool)
    (hf : StepInv A w bits numWin y acc₀ f) :
    decodeAccOf A (Gate.applyNat (windowedMulCircuitTOf A w bits Tfam numWin) f)
        (1 + 2 * w) bits
      = (acc₀ + ∑ k ∈ Finset.range numWin, Tfam k (WindowedArith.window w y k)) % 2 ^ bits

*Table-generic generalized pass, value form.** One full table-generic windowed pass from an invariant state with partial sum `acc₀` leaves `(acc₀ + Σₖ Tfam k (windowₖ(y))) mod 2^bits` in the accumulator.

defcxCascade

def cxCascade (ctrl tgt : Nat → Nat) (n : Nat) : Gate

A parallel CX cascade `CX (ctrl 0) (tgt 0) ; … ; CX (ctrl (n−1)) (tgt (n−1))` (the foldl shape of the generic cascade engine).

defaccYSwap

def accYSwap (A : Adder) (w bits : Nat) : Gate

*The acc↔y swap** over adder `A`: a 3-cascade transposition between accumulator bit `A.augendIdx (1+2w) i` and y-register bit `(1+2w + A.span bits) + i`, for `i < bits`.

theoremaccYSwap_apply

theorem accYSwap_apply (A : Adder) (w bits : Nat) (g : Nat → Bool)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * w) i = A.augendIdx (1 + 2 * w) j → i = j) :
    (∀ i, i < bits →
        Gate.applyNat (accYSwap A w bits) g (A.augendIdx (1 + 2 * w) i)
          = g (1 + 2 * w + A.span bits + i))
    ∧ (∀ i, i < bits →
        Gate.applyNat (accYSwap A w bits) g (1 + 2 * w + A.span bits + i)
          = g (A.augendIdx (1 + 2 * w) i))
    ∧ (∀ p, (∀ i, i < bits →
          p ≠ A.augendIdx (1 + 2 * w) i ∧ p ≠ 1 + 2 * w + A.span bits + i) →
        Gate.applyNat (accYSwap A w bits) g p = g p)

*`accYSwap` post-state**: the accumulator and the (low `bits` of the) y-register are exchanged, and every other wire is untouched. Needs only the accumulator positions pairwise distinct (`hinj`) — their disjointness from the y-wires is the interface fact `augendIdx_inBlock`.

defMulReady

def MulReady (A : Adder) (w bits numWin y : Nat) (f : Nat → Bool) : Prop

The in-place multiplier's input/output contract: a `mulInputOf`-shaped state with y-register value `y` and a CLEAN block.

theoremMulReady.toStepInv

theorem MulReady.toStepInv {A : Adder} {w bits numWin y : Nat} {f : Nat → Bool}
    (h : MulReady A w bits numWin y f) : StepInv A w bits numWin y 0 f

A `MulReady` state satisfies the window-step invariant with partial sum 0 (a bitwise-clean accumulator decodes to 0).

theoremmulReady_mulInputOf

theorem mulReady_mulInputOf (A : Adder) (w bits numWin y : Nat)
    (hclean : A.ancClean (mulInputOf A w bits numWin y) bits (1 + 2 * w)) :
    MulReady A w bits numWin y (mulInputOf A w bits numWin y)

The clean input `mulInputOf` is `MulReady` (given the adder's abstract ancilla-cleanliness, discharged concretely per instance).

defwindowedMulInPlace

def windowedMulInPlace (A : Adder) (w bits a ainv numWin : Nat) : Gate

*The in-place windowed multiplier** by an (odd, hence invertible mod `2^bits`) constant `a` with inverse `ainv`.

theoremwindowedMulInPlace_correct

theorem windowedMulInPlace_correct (A : Adder) (w bits a ainv numWin y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits) (hy : y < 2 ^ bits)
    (hainv : ainv < 2 ^ bits) (hinv : a * ainv % 2 ^ bits = 1)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * w) i = A.augendIdx (1 + 2 * w) j → i = j)
    (f : Nat → Bool) (hf : MulReady A w bits numWin y f) :
    MulReady A w bits numWin (a * y % 2 ^ bits)
      (Gate.applyNat (windowedMulInPlace A w bits a ainv numWin) f)

*Stage 3 HEADLINE — in-place windowed multiplication, full state restoration.** For ANY adder `A` whose accumulator positions are pairwise distinct, with `numWin·w = bits` (the y-register exactly matches the accumulator width) and `a·ainv ≡ 1 (mod 2^bits)`: the in-place multiplier maps any `MulReady` state with y-value `y < 2^bits` to the `MulReady` state with y-value `(a·y) mod 2^bits` — accumulator, addend register, and ancillas all returned CLEAN, everything off the y-register restored. The output shape equals the input shape, so in-place multiplies compose.

defwindowedMulInPlaceSeq

def windowedMulInPlaceSeq (A : Adder) (w bits numWin : Nat)
    (as ainvs : Nat → Nat) (n : Nat) : Gate

The `n`-fold in-place multiply by the constants `as 0, …, as (n−1)` (with inverses `ainvs k`).

theoremwindowedMulInPlaceSeq_correct

theorem windowedMulInPlaceSeq_correct (A : Adder) (w bits numWin : Nat)
    (as ainvs : Nat → Nat) (y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits) (hy : y < 2 ^ bits)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * w) i = A.augendIdx (1 + 2 * w) j → i = j)
    (f : Nat → Bool) (hf : MulReady A w bits numWin y f) :
    ∀ n, (∀ k, k < n → ainvs k < 2 ^ bits ∧ as k * ainvs k % 2 ^ bits = 1) →
      MulReady A w bits numWin ((∏ k ∈ Finset.range n, as k) * y % 2 ^ bits)
        (Gate.applyNat (windowedMulInPlaceSeq A w bits numWin as ainvs n) f)

*Stage 4 HEADLINE — the in-place product chain.** `n` in-place windowed multiplies by invertible constants `as k` compute `y ← (Π_{k<n} as k)·y mod 2^bits`, returning to the `MulReady` shape (clean accumulator/ancillas) after EVERY round — the composition is by induction on `n`, using Stage 3's full state restoration.

defwindowedExpInPlace

def windowedExpInPlace (A : Adder) (w bits numWin wE nE g e : Nat)
    (ainvs : Nat → Nat) : Gate

*The in-place windowed modular exponentiation, CLASSICAL exponent.** One in-place multiply per exponent window `k < nE`, by the constant `g^((2^wE)^k · windowₖ(e))` (the `k`-th windowed factor of `g^e`). The quantum-selected version — windows READ from an exponent register via `expWindowPassOf`-style selection — is the documented next step.

theoremwindowedExpInPlace_correct

theorem windowedExpInPlace_correct (A : Adder)
    (w bits numWin wE nE g e y : Nat) (ainvs : Nat → Nat)
    (hw : 0 < w) (hbits : numWin * w = bits) (hy : y < 2 ^ bits)
    (he : e < (2 ^ wE) ^ nE)
    (hpairs : ∀ k, k < nE → ainvs k < 2 ^ bits ∧
      g ^ ((2 ^ wE) ^ k * WindowedArith.window wE e k) * ainvs k % 2 ^ bits = 1)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * w) i = A.augendIdx (1 + 2 * w) j → i = j)
    (f : Nat → Bool) (hf : MulReady A w bits numWin y f) :
    MulReady A w bits numWin (g ^ e * y % 2 ^ bits)
      (Gate.applyNat (windowedExpInPlace A w bits numWin wE nE g e ainvs) f)

*Stage 4 instance — in-place windowed MODEXP value theorem (classical exponent).** For `e < (2^wE)^nE`, the chain of per-window in-place multiplies computes `y ← g^e·y mod 2^bits`: the windowed factors multiply out to `g^e` by the base-`2^wE` digit expansion of `e`.

theoremwindowedMulInPlace_value

theorem windowedMulInPlace_value (A : Adder) (w bits a ainv numWin y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits) (hy : y < 2 ^ bits)
    (hainv : ainv < 2 ^ bits) (hinv : a * ainv % 2 ^ bits = 1)
    (hinj : ∀ i j, i < bits → j < bits →
      A.augendIdx (1 + 2 * w) i = A.augendIdx (1 + 2 * w) j → i = j)
    (f : Nat → Bool) (hf : MulReady A w bits numWin y f) :
    decodeReg (fun i => 1 + 2 * w + A.span bits + i) bits
        (Gate.applyNat (windowedMulInPlace A w bits a ainv numWin) f)
      = a * y % 2 ^ bits

*Stage 3, decode form.** After the in-place multiply, the y-register itself decodes to `(a·y) mod 2^bits` (the accumulator is clean by `windowedMulInPlace_correct`).

theoremcuccaroAdder_augendIdx_inj

theorem cuccaroAdder_augendIdx_inj (q i j : Nat)
    (h : cuccaroAdder.augendIdx q i = cuccaroAdder.augendIdx q j) : i = j

Cuccaro accumulator positions `q + 2i + 1` are pairwise distinct.

theoremgidneyAdder_augendIdx_inj

theorem gidneyAdder_augendIdx_inj (q i j : Nat)
    (h : gidneyAdder.augendIdx q i = gidneyAdder.augendIdx q j) : i = j

Gidney accumulator positions `q + 3i + 1` are pairwise distinct.

theoremwindowedMulInPlace_correct_cuccaro

theorem windowedMulInPlace_correct_cuccaro (w bits a ainv numWin y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits) (hy : y < 2 ^ bits)
    (hainv : ainv < 2 ^ bits) (hinv : a * ainv % 2 ^ bits = 1) :
    MulReady cuccaroAdder w bits numWin (a * y % 2 ^ bits)
      (Gate.applyNat (windowedMulInPlace cuccaroAdder w bits a ainv numWin)
        (mulInputOf cuccaroAdder w bits numWin y))

*Cuccaro instance, state form.** On the clean encoded input, the Cuccaro-backed in-place multiplier produces the `MulReady` state with y-value `(a·y) mod 2^bits` (its `ancClean` — the carry-in at the block base — is discharged concretely).

theoremwindowedMulInPlace_value_cuccaro

theorem windowedMulInPlace_value_cuccaro (w bits a ainv numWin y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits) (hy : y < 2 ^ bits)
    (hainv : ainv < 2 ^ bits) (hinv : a * ainv % 2 ^ bits = 1) :
    decodeReg (fun i => 1 + 2 * w + cuccaroAdder.span bits + i) bits
        (Gate.applyNat (windowedMulInPlace cuccaroAdder w bits a ainv numWin)
          (mulInputOf cuccaroAdder w bits numWin y))
      = a * y % 2 ^ bits

*Cuccaro instance, value form.** The y-register of the output decodes to `(a·y) mod 2^bits`, and the accumulator is returned clean.

theoremwindowedMulInPlace_correct_gidney

theorem windowedMulInPlace_correct_gidney (w bits a ainv numWin y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits) (hy : y < 2 ^ bits)
    (hainv : ainv < 2 ^ bits) (hinv : a * ainv % 2 ^ bits = 1) :
    MulReady gidneyAdder w bits numWin (a * y % 2 ^ bits)
      (Gate.applyNat (windowedMulInPlace gidneyAdder w bits a ainv numWin)
        (mulInputOf gidneyAdder w bits numWin y))

*Gidney instance, state form.** Same, over the Gidney patched ripple adder (its `ancClean` — every carry wire `q + 3i + 2` — is discharged concretely).

theoremwindowedExpInPlace_correct_cuccaro

theorem windowedExpInPlace_correct_cuccaro (w bits numWin wE nE g e y : Nat)
    (ainvs : Nat → Nat)
    (hw : 0 < w) (hbits : numWin * w = bits) (hy : y < 2 ^ bits)
    (he : e < (2 ^ wE) ^ nE)
    (hpairs : ∀ k, k < nE → ainvs k < 2 ^ bits ∧
      g ^ ((2 ^ wE) ^ k * WindowedArith.window wE e k) * ainvs k % 2 ^ bits = 1) :
    MulReady cuccaroAdder w bits numWin (g ^ e * y % 2 ^ bits)
      (Gate.applyNat
        (windowedExpInPlace cuccaroAdder w bits numWin wE nE g e ainvs)
        (mulInputOf cuccaroAdder w bits numWin y))

*Cuccaro modexp instance.** The full in-place windowed modular exponentiation by a classical exponent, over the Cuccaro adder, run on the clean encoded input with `y` in the y-register: the output is the `MulReady` state with y-value `g^e·y mod 2^bits`.

FormalRV.Arithmetic.Windowed.WindowedLookupAdd

FormalRV/Arithmetic/Windowed/WindowedLookupAdd.lean

FormalRV.Shor.WindowedLookupAdd — Phase D, the faithful lookup-ADDITION gate. Gidney 1905.07682 l.276 defines `a += T[b]` as exactly three steps: "compute a table lookup with classical data `T` and quantum address `b` into a temporary register, then add the temporary register into `a`, then uncompute the table lookup." We realize this FAITHFULLY by reusing the two already-proven components verbatim: the table read = `BQAlgo.unary_lookup_multi_iteration` (the babbush2018 unary-iteration QROM the paper cites at l.160-197; correctness `BQAlgo.Lookup.unary_lookup_iteration_correct`), the addition = `BQAlgo.cuccaro_n_bit_adder_full` (the Cuccaro ripple adder the 8-hours paper specifies; correctness `cuccaro_n_bit_adder_full_correct`). `lookupAddGate = read ; add ; read` — the second read UNCOMPUTES the temp (a table read XORs `T_a`, so doing it twice clears the word register; l.190-197 "for nonzero output it XORs, making the op its own inverse"). This file defines the gate matching the paper and proves its resource decomposition; the per-step value-correctness is the composition of the two cited component theorems (see the `lookupAddGate` docstring), and the multi-window value identity is the proven `WindowedArith.windowedLookupFold_modProductAdd`.

defaddrFlips

def addrFlips (w v : Nat) : List Nat

Address-flip mask for table row `v` (babbush2018 unary iteration): X-flip every address bit that is `0` in `v`, so the prefix-AND cascade fires exactly when the address register holds `v`.

defwordCnots

def wordCnots (w W Tv : Nat) : List Nat

Word-CNOT targets for table entry value `Tv`: the output qubits where `Tv` has a `1` bit (the `?`-targets of the paper's figure, l.190-197).

deflookupIters

def lookupIters (w W : Nat) (T : Nat → Nat) : List (List Nat × List Nat)

The per-address iteration data for table `T` (`2^w` rows of `W`-bit entries): one `(addr_flips, word_cnots)` tuple per address value `v < 2^w`.

deflookupRead

def lookupRead (w W : Nat) (T : Nat → Nat) : Gate

*The table read** — the babbush2018 unary-iteration QROM, reused verbatim. On an address register holding `a`, this XORs `T_a` into the `W`-bit word register (`BQAlgo.Lookup.unary_lookup_iteration_correct`).

deflookupAddGate

def lookupAddGate (w W : Nat) (T : Nat → Nat) (adderLen adderStart : Nat) : Gate

*The lookup-ADDITION gate** — Gidney 1905.07682 l.276, `target += T[address]`: read `T_a` into the word/temp register, add the temp into the target with the proven Cuccaro adder, then read again to uncompute (clear) the temp. Value-correctness (composition of the cited proven components):* 1. after the first `lookupRead`, the word register holds `T_a` (`Lookup.unary_lookup_iteration_correct` + `multi_iteration_xor_value`: only the `v = a` row triggers); 2. `cuccaro_n_bit_adder_full_correct` then sets `target ← target + T_a` while restoring the addend (word) register; 3. the second `lookupRead` XORs `T_a` into the word again, returning it to clean (CCX/XOR self-inverse — the `qubit_swap_involutive` / `prefix_and_cascade_uncompute_post_state_eq_id` pattern). Net: `target ← target + T_a`, address and all ancillas restored. Folded over the windows of `y` this yields `(acc + a·y) mod N` (`WindowedArith.windowedLookupFold_modProductAdd`).

theoremlookupAddGate_tcount

theorem lookupAddGate_tcount (w W : Nat) (T : Nat → Nat) (adderLen adderStart : Nat) :
    tcount (lookupAddGate w W T adderLen adderStart)
      = 2 * tcount (lookupRead w W T) + 14 * adderLen

*Resource decomposition.** The lookup-add costs two reads and one add — and the add is the proven Cuccaro `14·adderLen` T-gates.

FormalRV.Arithmetic.Windowed.WindowedLookupSelect

FormalRV/Arithmetic/Windowed/WindowedLookupSelect.lean

FormalRV.Shor.WindowedCircuit.WindowedLookupSelect — the QROM-read SELECTION lemma. The babbush2018 unary-iteration QROM (`BQAlgo.unary_lookup_multi_iteration`), instantiated with the windowed iteration data (`lookupReadAt w pos W T = unary_lookup_multi_iteration w ((List.range (2^w)).map (fun v => (addrFlips w v, wordCnotsAt pos W (T v))))`), reads EXACTLY the addressed table row: with the address register holding `v < 2^w`, every word position `pos j` (j < W) is XOR'd with `(T v).testBit j`, and every position that is NOT a word target is unchanged (ctrl preserved, address restored, AND-ancillas returned clean, everything else untouched). The mathematical core: row `u`'s flip pattern `addrFlips w u` makes the prefix-AND trigger fire iff the effective address is all-ones iff `∀ i < w, v.testBit i = u.testBit i` iff `u = v` (testBit extensionality below `2^w`). So the multi-iteration XOR value collapses to the single `u = v` contribution, whose word-CNOT pattern is exactly the bits of `T v`. Built on the proven headline machinery of `FormalRV.Arithmetic.UnaryLookup.UnaryLookupIterationCorrectness` (`Lookup.unary_lookup_multi_iteration_correct` + the multi-iter preservation lemmas), plus a fresh `Gate.applyNat` ↔ post-state-model bridge proven here.

theoremupdate_eq_Function_update

theorem update_eq_Function_update (f : Nat → Bool) (c : Nat) (v : Bool) :
    update f c v = Function.update f c v

The project-local `Framework.update` IS Mathlib's `Function.update` (on `Nat → Bool`).

theoremapplyNat_x_gates_from_indices

theorem applyNat_x_gates_from_indices (xs : List Nat) (f : Nat → Bool) :
    Gate.applyNat (x_gates_from_indices xs) f = Lookup.x_flip_post_state xs f

X-flip layer: gate semantics = `Lookup.x_flip_post_state`.

theoremapplyNat_cx_gates_from_indices

theorem applyNat_cx_gates_from_indices (c : Nat) (xs : List Nat) (f : Nat → Bool) :
    Gate.applyNat (cx_gates_from_indices c xs) f
      = Lookup.cnot_layer_post_state c xs f

CNOT layer: gate semantics = `Lookup.cnot_layer_post_state`.

theoremapplyNat_prefix_and_step

theorem applyNat_prefix_and_step (i : Nat) (f : Nat → Bool) :
    Gate.applyNat (prefix_and_step i) f = prefix_and_step_post_state i f

Single cascade step: gate semantics = `prefix_and_step_post_state`.

theoremapplyNat_prefix_and_cascade

theorem applyNat_prefix_and_cascade (n : Nat) (f : Nat → Bool) :
    Gate.applyNat (prefix_and_cascade n) f = prefix_and_cascade_post_state n f

Forward cascade: gate semantics = `prefix_and_cascade_post_state`.

theoremapplyNat_prefix_and_uncompute

theorem applyNat_prefix_and_uncompute (n : Nat) (f : Nat → Bool) :
    Gate.applyNat (prefix_and_uncompute n) f
      = prefix_and_uncompute_post_state n f

Reverse cascade: gate semantics = `prefix_and_uncompute_post_state`.

theoremapplyNat_unary_lookup_iteration

theorem applyNat_unary_lookup_iteration
    (n_addr : Nat) (flips cnots : List Nat) (f : Nat → Bool) :
    Gate.applyNat (unary_lookup_iteration n_addr flips cnots) f
      = Lookup.iteration_post_state n_addr flips cnots f

One full lookup iteration: gate semantics = `Lookup.iteration_post_state`.

theoremapplyNat_unary_lookup_multi_iteration

theorem applyNat_unary_lookup_multi_iteration
    (n_addr : Nat) (iters : List (List Nat × List Nat)) (f : Nat → Bool) :
    Gate.applyNat (unary_lookup_multi_iteration n_addr iters) f
      = Lookup.multi_iteration_post_state n_addr iters f

*The multi-iteration bridge**: gate semantics of the full babbush2018 unary-iteration QROM = `Lookup.multi_iteration_post_state`.

theoremmem_addrFlips

theorem mem_addrFlips (w u x : Nat) :
    x ∈ addrFlips w u
      ↔ ∃ i, i < w ∧ u.testBit i = false ∧ x = ulookup_address_idx i

Membership in `addrFlips w u`: exactly the address indices of the zero bits of `u` below `w`.

theoremaddrFlips_flip_addr

theorem addrFlips_flip_addr (w u : Nat) :
    ∀ x ∈ addrFlips w u, ∃ i, i < w ∧ x = ulookup_address_idx i

Every element of `addrFlips w u` is an address index below `w`.

theoremaddrFlips_nodup

theorem addrFlips_nodup (w u : Nat) : (addrFlips w u).Nodup

`addrFlips w u` has no duplicates (`ulookup_address_idx` is injective).

theoremmem_wordCnotsAt

theorem mem_wordCnotsAt (pos : Nat → Nat) (W Tv x : Nat) :
    x ∈ wordCnotsAt pos W Tv
      ↔ ∃ j, j < W ∧ Tv.testBit j = true ∧ x = pos j

Membership in `wordCnotsAt pos W Tv`: exactly the positions `pos j` of the one bits of `Tv` below `W`.

theoremfilterMap_ite_eq_map_filter

theorem filterMap_ite_eq_map_filter (p : Nat → Bool) (g : Nat → Nat) :
    ∀ l : List Nat,
      l.filterMap (fun j => if p j then some (g j) else none)
        = (l.filter p).map g
  | [] => rfl
  | j :: l =>

`filterMap` of an "if-some-else-none" function = `map` after `filter`. (Local helper; lets us prove `Nodup` with injectivity only ON the list.)

theoremwordCnotsAt_eq_map_filter

theorem wordCnotsAt_eq_map_filter (pos : Nat → Nat) (W Tv : Nat) :
    wordCnotsAt pos W Tv
      = ((List.range W).filter (fun j => Tv.testBit j)).map pos

`wordCnotsAt` as `map pos` of the filtered bit positions.

theoremwordCnotsAt_nodup

theorem wordCnotsAt_nodup (pos : Nat → Nat) (W Tv : Nat)
    (hpos_inj : ∀ j k, j < W → k < W → pos j = pos k → j = k) :
    (wordCnotsAt pos W Tv).Nodup

`wordCnotsAt pos W Tv` has no duplicates when `pos` is injective below `W`.

theoremwordCnotsAt_allWordIdx

theorem wordCnotsAt_allWordIdx (w : Nat) (pos : Nat → Nat) (W Tv : Nat)
    (hpos_high : ∀ j, j < W → 2 * w < pos j) :
    Lookup.AllWordIdx w (wordCnotsAt pos W Tv)

Every element of `wordCnotsAt pos W Tv` lies above the ctrl/address/AND region when all `pos` positions do (`Lookup.AllWordIdx` form).

theorempos_mem_wordCnotsAt_iff

theorem pos_mem_wordCnotsAt_iff (pos : Nat → Nat) (W Tv j : Nat) (hj : j < W)
    (hpos_inj : ∀ j k, j < W → k < W → pos j = pos k → j = k) :
    pos j ∈ wordCnotsAt pos W Tv ↔ Tv.testBit j = true

`pos j` is a word-CNOT target for row value `Tv` iff bit `j` of `Tv` is set (for `j < W`, with `pos` injective below `W`).

theoremdecide_pos_mem_wordCnotsAt

theorem decide_pos_mem_wordCnotsAt (pos : Nat → Nat) (W Tv j : Nat) (hj : j < W)
    (hpos_inj : ∀ j k, j < W → k < W → pos j = pos k → j = k) :
    decide (pos j ∈ wordCnotsAt pos W Tv) = Tv.testBit j

Boolean form of `pos_mem_wordCnotsAt_iff`.

theoremaddress_and_eq_true_iff

theorem address_and_eq_true_iff (ctrl : Bool) (addr n : Nat) :
    Lookup.address_and ctrl addr n = true
      ↔ ctrl = true ∧ ∀ i, i < n → addr.testBit i = true

`Lookup.address_and` is true iff the ctrl is true and all first `n` bits are set.

theoremdecide_mem_addrFlips

theorem decide_mem_addrFlips (w u i : Nat) (hi : i < w) :
    decide (ulookup_address_idx i ∈ addrFlips w u) = !u.testBit i

For `i < w`, the address index `ulookup_address_idx i` is flipped for row `u` exactly when bit `i` of `u` is zero.

theoremxor_not_eq_true_iff

theorem xor_not_eq_true_iff (a b : Bool) : xor a (!b) = true ↔ b = a

Bool helper: `xor a (!b) = true ↔ b = a`.

theoremaddrFlips_trigger_eq_decide

theorem addrFlips_trigger_eq_decide (w u v : Nat) (hu : u < 2 ^ w) (hv : v < 2 ^ w) :
    Lookup.address_and true (Lookup.effective_addr v (addrFlips w u) w) w
      = decide (u = v)

*Row-selection at the trigger level**: row `u`'s prefix-AND trigger, evaluated on physical address `v` (both `< 2^w`), is `decide (u = v)` — only the addressed row fires.

theoremxor_value_rows_collapse

theorem xor_value_rows_collapse
    (w v : Nat) (hv : v < 2 ^ w) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat)
    (p : Nat) (L : List Nat) (hL : ∀ u ∈ L, u < 2 ^ w) (hnd : L.Nodup) :
    Lookup.multi_iteration_xor_value_via_address_and true v w
      (L.map (fun u => (addrFlips w u, wordCnotsAt pos W (T u)))) p
      = if v ∈ L then decide (p ∈ wordCnotsAt pos W (T v)) else false

Over a duplicate-free row list `L` (all rows `< 2^w`), the cumulative XOR value of the windowed iteration data at any position `p` is the single `v`-row contribution (if `v ∈ L`).

theoremxor_value_windowed_rows

theorem xor_value_windowed_rows
    (w v : Nat) (hv : v < 2 ^ w) (pos : Nat → Nat) (W : Nat) (T : Nat → Nat)
    (p : Nat) :
    Lookup.multi_iteration_xor_value_via_address_and true v w
      ((List.range (2 ^ w)).map
        (fun u => (addrFlips w u, wordCnotsAt pos W (T u)))) p
      = decide (p ∈ wordCnotsAt pos W (T v))

Specialized to the full row list `List.range (2^w)`: the XOR value at `p` is exactly the `v`-row word-CNOT membership bit.

theoremrowIters_flip_addr

private theorem rowIters_flip_addr :
    ∀ flips cnots,
      (flips, cnots) ∈ (List.range (2 ^ w)).map
        (fun u => (addrFlips w u, wordCnotsAt pos W (T u))) →
      ∀ x ∈ flips, ∃ i, i < w ∧ x = ulookup_address_idx i

theoremrowIters_flip_nodup

private theorem rowIters_flip_nodup :
    ∀ flips cnots,
      (flips, cnots) ∈ (List.range (2 ^ w)).map
        (fun u => (addrFlips w u, wordCnotsAt pos W (T u))) →
      flips.Nodup

theoremrowIters_cnots_nodup

private theorem rowIters_cnots_nodup
    (hpos_inj : ∀ j k, j < W → k < W → pos j = pos k → j = k) :
    ∀ flips cnots,
      (flips, cnots) ∈ (List.range (2 ^ w)).map
        (fun u => (addrFlips w u, wordCnotsAt pos W (T u))) →
      cnots.Nodup

theoremrowIters_word

private theorem rowIters_word
    (hpos_high : ∀ j, j < W → 2 * w < pos j) :
    ∀ flips cnots,
      (flips, cnots) ∈ (List.range (2 ^ w)).map
        (fun u => (addrFlips w u, wordCnotsAt pos W (T u))) →
      Lookup.AllWordIdx w cnots

theoremlookupReadAt_selects_word

theorem lookupReadAt_selects_word
    (w W : Nat) (T : Nat → Nat) (pos : Nat → Nat) (f : Nat → Bool) (v : Nat)
    (hw : 0 < w) (hv : v < 2 ^ w)
    (hctrl : f ulookup_ctrl_idx = true)
    (haddr : ∀ i, i < w → f (ulookup_address_idx i) = v.testBit i)
    (hand : ∀ i, i < w → f (ulookup_and_idx i) = false)
    (hpos_high : ∀ j, j < W → 2 * w < pos j)
    (hpos_inj : ∀ j k, j < W → k < W → pos j = pos k → j = k)
    (j : Nat) (hj : j < W) :
    Gate.applyNat (lookupReadAt w pos W T) f (pos j)
      = xor (f (pos j)) ((T v).testBit j)

*QROM-read selection, word conjunct**: with the address register holding `v < 2^w`, the babbush2018 read `lookupReadAt w pos W T` XORs exactly the addressed table row `T v` into the word positions: bit `j` of `T v` lands at `pos j`.

theoremlookupReadAt_frame

theorem lookupReadAt_frame
    (w W : Nat) (T : Nat → Nat) (pos : Nat → Nat) (f : Nat → Bool)
    (hpos_high : ∀ j, j < W → 2 * w < pos j)
    (p : Nat) (hp : ∀ j, j < W → p ≠ pos j) :
    Gate.applyNat (lookupReadAt w pos W T) f p = f p

*QROM-read selection, frame conjunct**: every position that is not a word target (`pos j`, `j < W`) is unchanged by the read — the ctrl is preserved, the address register is restored, the AND-ancillas are returned clean, and everything outside the lookup's registers is untouched.

theoremlookupReadAt_selects

theorem lookupReadAt_selects
    (w W : Nat) (T : Nat → Nat) (pos : Nat → Nat) (f : Nat → Bool) (v : Nat)
    (hw : 0 < w) (hv : v < 2 ^ w)
    (hctrl : f ulookup_ctrl_idx = true)
    (haddr : ∀ i, i < w → f (ulookup_address_idx i) = v.testBit i)
    (hand : ∀ i, i < w → f (ulookup_and_idx i) = false)
    (hpos_high : ∀ j, j < W → 2 * w < pos j)
    (hpos_inj : ∀ j k, j < W → k < W → pos j = pos k → j = k) :
    (∀ j, j < W →
      Gate.applyNat (lookupReadAt w pos W T) f (pos j)
        = xor (f (pos j)) ((T v).testBit j))
    ∧ (∀ p, (∀ j, j < W → p ≠ pos j) →

*HEADLINE — QROM-read selection lemma.** The babbush2018 unary-iteration QROM `unary_lookup_multi_iteration`, instantiated with the windowed iteration data (`lookupReadAt w pos W T`), reads exactly the addressed table row: with ctrl set, the address register holding `v < 2^w`, and the AND-ancillas clean, 1. each word position `pos j` (`j < W`) is XOR'd with `(T v).testBit j`, and 2. every position that is NOT a word target is unchanged (ctrl, address, AND-ancillas restored; everything else untouched).

FormalRV.Arithmetic.Windowed.WindowedModN

FormalRV/Arithmetic/Windowed/WindowedModN.lean

FormalRV.Shor.WindowedCircuit.WindowedModN — the PER-WINDOW mod-N windowed multiplier. The existing windowed multiplier (`WindowedCircuitCorrect.lean`) is a PRODUCT adder: each window does `acc ← (acc + T_j[v]) mod 2^bits`, and the final value is `(a·y) mod 2^bits`. Gidney's windowed multiplication (arXiv:1905.07682) instead reduces mod N after EVERY window: `acc ← (acc + T_j[v]) mod N` with table entries `T_j[v] = a·(2^w)^j·v mod N`, so the multiplier computes `(a·y) mod N` directly. This file closes that gap at the Cuccaro layout. HEADLINE (`windowedModNMulCircuit_correct`): on the SAME clean input family `mulInputOf cuccaroAdder w bits numWin y` as the product-adder theorem, the per-window mod-N circuit leaves (a · y) mod N in the accumulator, provided `0 < w`, `0 < N`, `2·N ≤ 2^bits` and `y < 2^(w·numWin)`. Per-window structure (`modNLookupAddStep`): the Cuccaro comparator borrows the addend (read) register for its two's-complement constant, so the QROM word must be cleared before the constant-compare stage and re-supplied for the flag-uncompute stage: read(T) ; add ; unread(T) -- acc ← acc + t (t = T_j[v] < N) ; compareConst(N) → flag -- flag ^= [N ≤ acc] ; conditionalSub(N) -- acc ← acc mod N ; read(T) ; regCompareXor ; unread(T) -- flag ^= [acc < t] = flag (flag → 0) The flag-uncompute works because `acc_out = (s+t) mod N < t ⟺ N ≤ s+t` when `s, t < N` — the standard modular-adder uncompute comparison, here realized as a REGISTER-register comparator (`regCompareXor`, new in this file: X-conjugated MAJ chain, top carry of `¬acc + t` = `[acc < t]`). New general-state (any `f : Nat → Bool`) reduction-stage lemmas, re-derived from the per-position Cuccaro primitives (the Tick-59/60 stage lemmas are tied to the `cuccaro_input_F` input family and do not apply inside the windowed frame): `compareConstXor_state_general` — the SQIR-style constant comparator; `condSub_state_general` — the flag-conditional subtract; `regCompareXor_state_general` — the register-register comparator. Follow-up (NOT in this pass): factor the reduction pipeline into a `ModAdder` interface so the Gidney `ModularAdder` pipeline (which has the same compare/conditional-subtract/uncompute shape) instantiates it too — this file is deliberately Cuccaro-specific. Kernel-clean: no `sorry`, no `native_decide`, no axioms beyond the prelude.

(no documented top-level declarations)

FormalRV.Arithmetic.Windowed.WindowedModN.Comparators

FormalRV/Arithmetic/Windowed/WindowedModN/Comparators.lean

WindowedModN — §3-4 constant + register-register comparators. Part of `WindowedModN` (the `WindowedModN.lean` shim re-exports all parts).

theoremcompareConstXor_state_general

theorem compareConstXor_state_general
    (bits q_start N flagPos x : Nat) (f : Nat → Bool)
    (hN_pos : 0 < N) (hN : N ≤ 2 ^ bits) (hx : x < 2 ^ bits)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (h_cin : f q_start = false)
    (h_tgt : ∀ i, i < bits → f (q_start + 2 * i + 1) = x.testBit i)
    (h_read : ∀ i, i < bits → f (q_start + 2 * i + 2) = false) :
    Gate.applyNat (sqir_style_compareConst_candidate bits q_start N flagPos) f
      = update f flagPos (xor (f flagPos) (decide (N ≤ x)))

defregCompareXor

def regCompareXor (bits q_start flagPos : Nat) : Gate

theoremregCompareXor_frame_outside

theorem regCompareXor_frame_outside
    (bits q_start flagPos : Nat) (f : Nat → Bool)
    (q : Nat) (h_q_ne : q ≠ flagPos)
    (h_q_outside : q < q_start ∨ q_start + 2 * bits + 1 ≤ q) :
    Gate.applyNat (regCompareXor bits q_start flagPos) f q = f q

Frame: `regCompareXor` is the identity outside workspace ∪ {flag}.

theoremregCompareXor_workspace_restored_at

theorem regCompareXor_workspace_restored_at
    (bits q_start flagPos : Nat) (f : Nat → Bool)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (q : Nat) (hq_lower : q_start ≤ q) (hq_upper : q < q_start + 2 * bits + 1) :
    Gate.applyNat (regCompareXor bits q_start flagPos) f q = f q

Workspace restoration: at any workspace position, `regCompareXor` restores the input value (compute–CX–uncompute).

theoremregCompareXor_state_general

theorem regCompareXor_state_general
    (bits q_start flagPos u t : Nat) (f : Nat → Bool)
    (hu : u < 2 ^ bits) (ht : t < 2 ^ bits)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (h_cin : f q_start = false)
    (h_tgt : ∀ i, i < bits → f (q_start + 2 * i + 1) = u.testBit i)
    (h_read : ∀ i, i < bits → f (q_start + 2 * i + 2) = t.testBit i) :
    Gate.applyNat (regCompareXor bits q_start flagPos) f
      = update f flagPos (xor (f flagPos) (decide (u < t)))

*HEADLINE state equation for the register-register comparator.** On any state with carry-in clear, accumulator `u` and addend `t`, `regCompareXor` is exactly `flag ^= [u < t]` (everything else fixed).

FormalRV.Arithmetic.Windowed.WindowedModN.CondSub

FormalRV/Arithmetic/Windowed/WindowedModN/CondSub.lean

WindowedModN — §5 general-state conditional subtract. Part of `WindowedModN` (the `WindowedModN.lean` shim re-exports all parts).

theoremcondSub_state_general

theorem condSub_state_general
    (bits q_start N flagPos x : Nat) (f : Nat → Bool)
    (hx : x < 2 ^ bits)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (h_cin : f q_start = false)
    (h_tgt : ∀ i, i < bits → f (q_start + 2 * i + 1) = x.testBit i)
    (h_read : ∀ i, i < bits → f (q_start + 2 * i + 2) = false) :
    (∀ i, i < bits →
        Gate.applyNat (sqir_conditionalSubConstGate bits q_start N flagPos) f
            (q_start + 2 * i + 1)
          = ((x + if f flagPos then 2 ^ bits - N else 0) % 2 ^ bits).testBit i)
    ∧ (∀ i, i < bits →

FormalRV.Arithmetic.Windowed.WindowedModN.Counts

FormalRV/Arithmetic/Windowed/WindowedModN/Counts.lean

WindowedModN — §12 exact Toffoli/T counts. Part of `WindowedModN` (the `WindowedModN.lean` shim re-exports all parts).

theoremtcount_targetComplement

private theorem tcount_targetComplement :
    ∀ (n q_start : Nat), tcount (targetComplement n q_start) = 0
  | 0, _ => rfl
  | n + 1, q_start =>

theoremtcount_majChain

private theorem tcount_majChain :
    ∀ (n q_start : Nat), tcount (cuccaro_maj_chain n q_start) = 7 * n
  | 0, _ => rfl
  | n + 1, q_start =>

theoremtcount_majChainInv

private theorem tcount_majChainInv :
    ∀ (n q_start : Nat), tcount (cuccaro_maj_chain_inv n q_start) = 7 * n
  | 0, _ => rfl
  | n + 1, q_start =>

theoremtcount_prep0

private theorem tcount_prep0 :
    ∀ (n q_start c : Nat), tcount (cuccaro_prepareConstRead n q_start c) = 0
  | 0, _, _ => rfl
  | n + 1, q_start, c =>

theoremtcount_maskedPrep0

private theorem tcount_maskedPrep0 :
    ∀ (n q_start N flagPos : Nat),
      tcount (sqir_prepareMaskedConstRead n q_start N flagPos) = 0
  | 0, _, _, _ => rfl
  | n + 1, q_start, N, flagPos =>

theoremtcount_regCompareXor

theorem tcount_regCompareXor (bits q_start flagPos : Nat) :
    tcount (regCompareXor bits q_start flagPos) = 14 * bits

The register-register comparator costs two MAJ-chain passes: `14·bits` T.

theoremtcount_compareConstC

private theorem tcount_compareConstC (bits q_start N flagPos : Nat) :
    tcount (sqir_style_compareConst_candidate bits q_start N flagPos) = 14 * bits

theoremtcount_condSub

private theorem tcount_condSub (bits q_start N flagPos : Nat) :
    tcount (sqir_conditionalSubConstGate bits q_start N flagPos) = 14 * bits

theoremtcount_modNReduceFlag

theorem tcount_modNReduceFlag (bits q_start N flagPos : Nat) :
    tcount (modNReduceFlag bits q_start N flagPos) = 28 * bits

The mod-N reduction (compare + conditional subtract): `28·bits` T.

theoremtcount_modNLookupAddStep

theorem tcount_modNLookupAddStep (w bits N : Nat) (T : Nat → Nat)
    (q_start flagPos : Nat) :
    tcount (modNLookupAddStep w bits N T q_start flagPos)
      = 56 * w * 2 ^ w + 56 * bits

One mod-N lookup-add: `56·w·2^w + 56·bits` T (four table reads, one add, one compare, one conditional subtract, one register-compare).

theoremtcount_windowedModNStep

theorem tcount_windowedModNStep (w bits a N q_start yBase flagPos j : Nat) :
    tcount (windowedModNStep w bits a N q_start yBase flagPos j)
      = 56 * w * 2 ^ w + 56 * bits

One mod-N window step: copies are T-free, so `56·w·2^w + 56·bits` T.

theoremtcount_windowedModNMulCircuit

theorem tcount_windowedModNMulCircuit (w bits a N numWin : Nat) :
    tcount (windowedModNMulCircuit w bits a N numWin)
      = numWin * (56 * w * 2 ^ w + 56 * bits)

*Closed-form T-count of the per-window mod-N windowed multiplier**: `numWin · (56·w·2^w + 56·bits)`.

FormalRV.Arithmetic.Windowed.WindowedModN.Fold

FormalRV/Arithmetic/Windowed/WindowedModN/Fold.lean

WindowedModN — §10-11 prefix fold + HEADLINE correctness. Part of `WindowedModN` (the `WindowedModN.lean` shim re-exports all parts).

theoremmodNStepInv_fold

theorem modNStepInv_fold (w bits a N numWin y : Nat) (hw : 0 < w)
    (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits) :
    ∀ n, n ≤ numWin →
      ModNStepInv w bits numWin y
        (WindowedArith.windowedLookupFold a N w (WindowedArith.window w y) n 0)
        (Gate.applyNat
          (windowedModNMul w bits a N (1 + 2 * w) (1 + 2 * w + (2 * bits + 1))
            (1 + 2 * w + (2 * bits + 1) + numWin * w) n)
          (mulInputOf cuccaroAdder w bits numWin y))

Running the first `n ≤ numWin` mod-N window steps establishes the invariant with running value `windowedLookupFold a N w (window w y) n 0` — the circuit-aligned per-window mod-N fold of `WindowedArith`.

theoremwindowedModNMulCircuit_correct

theorem windowedModNMulCircuit_correct (w bits a N numWin y : Nat)
    (hw : 0 < w) (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits)
    (hy : y < 2 ^ (w * numWin)) :
    decodeAccOf cuccaroAdder
        (Gate.applyNat (windowedModNMulCircuit w bits a N numWin)
          (mulInputOf cuccaroAdder w bits numWin y)) (1 + 2 * w) bits
      = (a * y) % N

*HEADLINE — per-window mod-N windowed-multiplier VALUE theorem.** The full per-window mod-N circuit (each window doing `acc ← (acc + T_j[v]) mod N` with `T_j[v] = a·(2^w)^j·v mod N`), run on the SAME clean encoded input as the product-adder multiplier (`mulInputOf cuccaroAdder`: ctrl set, `y` in the y-register, everything else — including the new comparison-flag qubit — clean), leaves (a · y) mod N in the accumulator, provided `0 < w`, `0 < N`, `2·N ≤ 2^bits`, and `y < 2^(w·numWin)`. This closes the "product-adder only" gap: the multiplier reduces mod N after EVERY window, exactly as in Gidney (arXiv:1905.07682).

FormalRV.Arithmetic.Windowed.WindowedModN.Helpers

FormalRV/Arithmetic/Windowed/WindowedModN/Helpers.lean

WindowedModN — §1-2 arithmetic helpers + target-complement gate. Part of `WindowedModN` (the `WindowedModN.lean` shim re-exports all parts).

theoremcarry_compl_eq_decide_lt

theorem carry_compl_eq_decide_lt (u t : Nat) :
    ∀ k, Adder.carry false k (fun i => !u.testBit i) (fun i => t.testBit i)
      = decide (u % 2 ^ k < t % 2 ^ k)
  | 0 => by simp [Adder.carry, Nat.mod_one]
  | k + 1 =>

*Complemented-carry = strict comparison (mod-windows form).** The ripple carry of `(¬u) + t` through the low `k` bits equals `[u mod 2^k < t mod 2^k]`.

theoremcarry_compl_lt

theorem carry_compl_lt (n u t : Nat) (hu : u < 2 ^ n) (ht : t < 2 ^ n) :
    Adder.carry false n (fun i => !u.testBit i) (fun i => t.testBit i)
      = decide (u < t)

*Complemented-carry = strict comparison.** For `u, t < 2^n`, the carry out of `(¬u) + t` over `n` bits is `[u < t]`.

theoremmodReduce_lt_decide

theorem modReduce_lt_decide (N s t : Nat) (hs : s < N) (ht : t < N) :
    decide ((s + t) % N < t) = decide (N ≤ s + t)

*The flag-uncompute comparison.** For `s, t < N`, the reduced sum is below the addend exactly when the reduction fired: `(s+t) mod N < t ⟺ N ≤ s+t`.

theoremcuccaro_adder_sum_bits_general

theorem cuccaro_adder_sum_bits_general
    (bits q_start x y : Nat) (f : Nat → Bool)
    (h_cin : f q_start = false)
    (h_tgt : ∀ i, i < bits → f (q_start + 2 * i + 1) = x.testBit i)
    (h_read : ∀ i, i < bits → f (q_start + 2 * i + 2) = y.testBit i)
    (i : Nat) (hi : i < bits) :
    Gate.applyNat (cuccaro_n_bit_adder_full bits q_start) f (q_start + 2 * i + 1)
      = (x + y).testBit i

*General-state sum bits of the full Cuccaro adder.** If the carry-in is clear and the target/read registers hold the bits of `x`/`y` (below `bits`), the adder leaves `(x+y).testBit i` at target position `i`. (The general-state analogue of the decoded `sumCorrect`, at bit level.)

deftargetComplement

def targetComplement : Nat → Nat → Gate
  | 0, _ => Gate.I
  | n + 1, q_start =>
      Gate.seq (targetComplement n q_start) (Gate.X (q_start + 2 * n + 1))

X on each target/augend position `q_start + 2i + 1`, `i < bits`. Used to conjugate the MAJ chain so its top carry computes `¬acc + t ≥ 2^bits`, i.e. the strict comparison `acc < t`.

theoremtargetComplement_at_other

theorem targetComplement_at_other
    (bits q_start q : Nat)
    (hq : ∀ i, i < bits → q ≠ q_start + 2 * i + 1)
    (f : Nat → Bool) :
    Gate.applyNat (targetComplement bits q_start) f q = f q

Frame: `targetComplement` only touches the target positions.

theoremtargetComplement_at_target

theorem targetComplement_at_target
    (bits q_start j : Nat) (hj : j < bits) (f : Nat → Bool) :
    Gate.applyNat (targetComplement bits q_start) f (q_start + 2 * j + 1)
      = !(f (q_start + 2 * j + 1))

Action at a target position: bit `j < bits` is complemented.

FormalRV.Arithmetic.Windowed.WindowedModN.Reduction

FormalRV/Arithmetic/Windowed/WindowedModN/Reduction.lean

WindowedModN — §6 register mod-N reduction primitive. Part of `WindowedModN` (the `WindowedModN.lean` shim re-exports all parts).

theoremmodNReduce_arith

theorem modNReduce_arith (bits N x : Nat)
    (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits) (hx : x < 2 * N) :
    (x + if decide (N ≤ x) then 2 ^ bits - N else 0) % 2 ^ bits = x % N

Reduction arithmetic: for `x < 2N ≤ 2^bits`, `(x + [N ≤ x]·(2^bits − N)) mod 2^bits = x mod N`.

defmodNReduceFlag

def modNReduceFlag (bits q_start N flagPos : Nat) : Gate

*The register mod-N reduction with comparison flag**: constant-compare against `N`, then flag-conditional subtract of `N`. Takes an accumulator in `[0, 2N)` to `[0, N)`; the flag picks up `[N ≤ acc]` (uncomputed later by `regCompareXor` against the addend).

theoremmodNReduceFlag_state_general

theorem modNReduceFlag_state_general
    (bits q_start N flagPos x : Nat) (f : Nat → Bool)
    (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits) (hx : x < 2 * N)
    (hflag_out : flagPos < q_start ∨ q_start + 2 * bits + 1 ≤ flagPos)
    (h_cin : f q_start = false)
    (h_flag : f flagPos = false)
    (h_tgt : ∀ i, i < bits → f (q_start + 2 * i + 1) = x.testBit i)
    (h_read : ∀ i, i < bits → f (q_start + 2 * i + 2) = false) :
    (∀ i, i < bits →
        Gate.applyNat (modNReduceFlag bits q_start N flagPos) f (q_start + 2 * i + 1)
          = (x % N).testBit i)
    ∧ (∀ i, i < bits →

*HEADLINE general-state bundle for the mod-N reduction.** On any state with clear carry-in / read register / flag and accumulator `x < 2N`: accumulator becomes `x mod N`, read and carry stay clear, the flag holds `[N ≤ x]`, and everything outside workspace ∪ {flag} is untouched.

FormalRV.Arithmetic.Windowed.WindowedModN.Step

FormalRV/Arithmetic/Windowed/WindowedModN/Step.lean

WindowedModN — §7 per-window mod-N lookup-add step + full circuit. Part of `WindowedModN` (the `WindowedModN.lean` shim re-exports all parts).

defmodNLookupAddStep

def modNLookupAddStep (w bits N : Nat) (T : Nat → Nat) (q_start flagPos : Nat) : Gate

One mod-N lookup-ADD: `acc ← (acc + T[v]) mod N` for the table row selected by the address register (Gidney l.296 with per-window reduction). The Cuccaro comparator borrows the addend register for its two's-complement constant, so the QROM word is cleared before the reduction and re-read for the flag-uncompute register-compare.

defwindowedModNStep

def windowedModNStep (w bits a N q_start yBase flagPos j : Nat) : Gate

One mod-N window step: copy window `j` into the address register, mod-N lookup-add the entry `T_j[v] = a·(2^w)^j·v mod N`, uncopy.

defwindowedModNMul

def windowedModNMul (w bits a N q_start yBase flagPos numWin : Nat) : Gate

The per-window mod-N windowed multiplier: a fold of mod-N window steps.

defwindowedModNMulCircuit

def windowedModNMulCircuit (w bits a N numWin : Nat) : Gate

*The full per-window mod-N windowed-multiplier circuit** at the standard layout (flag above the `y`-register). On `acc = 0` it leaves `(a·y) mod N` in the accumulator.

FormalRV.Arithmetic.Windowed.WindowedModN.StepInvariant

FormalRV/Arithmetic/Windowed/WindowedModN/StepInvariant.lean

WindowedModN — §8-9 window-step invariant + preservation proof. Part of `WindowedModN` (the `WindowedModN.lean` shim re-exports all parts).

defModNStepInv

def ModNStepInv (w bits numWin y s : Nat) (g : Nat → Bool) : Prop

*The mod-N window-step invariant.** After some window-steps starting from `mulInputOf cuccaroAdder w bits numWin y`, the state `g` satisfies: (F) frame off the Cuccaro block and the flag; (D) the addend register is clean; (C) the carry-in is clean; (G) the flag is clean; (V) the accumulator holds the bits of the running mod-N sum `s`.

theoremmodNStepInv_init

theorem modNStepInv_init (w bits numWin y : Nat) :
    ModNStepInv w bits numWin y 0 (mulInputOf cuccaroAdder w bits numWin y)

Invariant initialization: the clean input satisfies the invariant at `0`.

theoremmodNStepInv_step

theorem modNStepInv_step (w bits a N numWin y : Nat) (hw : 0 < w)
    (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits)
    (j : Nat) (hj : j < numWin) (s : Nat) (hs : s < N) (g : Nat → Bool)
    (hg : ModNStepInv w bits numWin y s g) :
    ModNStepInv w bits numWin y
      ((s + WindowedArith.tableValue a N w j (WindowedArith.window w y j)) % N)
      (Gate.applyNat
        (windowedModNStep w bits a N (1 + 2 * w) (1 + 2 * w + (2 * bits + 1))
          (1 + 2 * w + (2 * bits + 1) + numWin * w) j) g)

FormalRV.Arithmetic.Windowed.WindowedModNInPlace

FormalRV/Arithmetic/Windowed/WindowedModNInPlace.lean

FormalRV.Arithmetic.Windowed.WindowedModNInPlace — the IN-PLACE mod-N windowed multiplier: `y ← (a·y) mod N` with FULL state restoration. Replays the in-place algorithm of `WindowedInPlace.lean` (`pass(a) ; acc↔y swap ; pass(−a⁻¹)`) over the PER-WINDOW mod-N multiplier of `WindowedModN.lean` (each window does `acc ← (acc + T_j[v]) mod N`), so the in-place product is `(a·y) mod N` — the mod-N entry point the Shor weld (`EncodeRoundTripModMul.gate c`) consumes. **Stage 1 — generalized pass.** `windowedModNMulCircuit` run from ANY state satisfying the mod-N window-step invariant with partial sum `acc₀ < N` (not just the clean input with `acc₀ = 0`) leaves `(acc₀ + a·y) mod N` in the accumulator (`modNStepInv_full_pass`, `windowedModNMulCircuit_correct_acc`). The existing `modNStepInv_step` is already start-value-agnostic, so this is a new init + replayed fold. **Stage 2 — the swap.** `accYSwap cuccaroAdder w bits` is reused as-is: the mod-N layout is the product-adder layout plus ONE flag qubit at `yBase + numWin·w`, which is outside both swap zones (the accumulator sits below `yBase`; the swapped y-wires are `yBase + i`, `i < bits`). **Stage 3 — HEADLINE.** For `a` invertible mod `N` (`a·ainv ≡ 1`), `windowedModNMulInPlace = modNpass(a) ; swap ; modNpass(N − ainv)` maps the `ModNMulReady`-shaped state with y-register `y < N` to the `ModNMulReady` state with y-register `(a·y) mod N` — accumulator, addend register, carry-in AND the comparison flag all returned CLEAN (`windowedModNMulInPlace_correct`). The cancellation `(y + (N − ainv)·(a·y mod N)) ≡ 0 (mod N)` is `mod_inv_cancel_identity` (already general-modulus). `y < N` is required: after the swap the MULTIPLIER register holds `(a·y) mod N < N` and the accumulator holds `y`, which must be a valid mod-N partial sum. **Stage 4 — pass-composition.** `windowedModNMulInPlaceSeq`, the k-fold in-place mod-N multiply, computes `y ← (Π aₖ)·y mod N` (`windowedModNMulInPlaceSeq_correct`); the squared-power instance `windowedModNMulGate (a^(2^k) mod N) (ainv^(2^k) mod N)` realizes `y ← a^(2^k)·y mod N` (`windowedModNMulGate_squaredPower`) — the exact per-iterate gate family the Shor weld needs. **Counts.** Two mod-N passes plus a T-free swap: `tcount = 2·numWin·(56·w·2^w + 56·bits)`. Kernel-clean: no `sorry`, no `native_decide`, no axioms beyond the prelude.

theoremmulInputOf_cuccaro_encodeReg

private theorem mulInputOf_cuccaro_encodeReg (w bits numWin v p : Nat)
    (hp : p ≠ ulookup_ctrl_idx) :
    mulInputOf cuccaroAdder w bits numWin v p
      = encodeReg (1 + 2 * w + (2 * bits + 1)) (numWin * w) v p

`mulInputOf cuccaroAdder` off the control qubit, literal-base form.

theoremmulInputOf_cuccaro_y_bit

private theorem mulInputOf_cuccaro_y_bit (w bits numWin v i : Nat)
    (hi : i < numWin * w) :
    mulInputOf cuccaroAdder w bits numWin v (1 + 2 * w + (2 * bits + 1) + i)
      = v.testBit i

`mulInputOf cuccaroAdder` reads bit `i` of `v` at y-wire `yBase + i`.

theoremmodNStepInv_fold_acc

theorem modNStepInv_fold_acc (w bits a N numWin y acc₀ : Nat) (hw : 0 < w)
    (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits) (hacc : acc₀ < N)
    (f : Nat → Bool) (hf : ModNStepInv w bits numWin y acc₀ f) :
    ∀ n, n ≤ numWin →
      ModNStepInv w bits numWin y
        (WindowedArith.windowedLookupFold a N w (WindowedArith.window w y) n acc₀)
        (Gate.applyNat
          (windowedModNMul w bits a N (1 + 2 * w) (1 + 2 * w + (2 * bits + 1))
            (1 + 2 * w + (2 * bits + 1) + numWin * w) n) f)

*The generalized mod-N fold.** From ANY state `f` satisfying the mod-N invariant with partial sum `acc₀ < N`, running the first `n ≤ numWin` mod-N window steps yields the invariant with partial sum `windowedLookupFold a N w (window w y) n acc₀`.

theoremmodNStepInv_full_pass

theorem modNStepInv_full_pass (w bits a N numWin y acc₀ : Nat)
    (hw : 0 < w) (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits)
    (hacc : acc₀ < N) (hy : y < 2 ^ (w * numWin))
    (f : Nat → Bool) (hf : ModNStepInv w bits numWin y acc₀ f) :
    ModNStepInv w bits numWin y ((acc₀ + a * y) % N)
      (Gate.applyNat (windowedModNMulCircuit w bits a N numWin) f)

*The full generalized mod-N pass, invariant form.** One complete `windowedModNMulCircuit` run from an invariant state with partial sum `acc₀ < N` re-establishes the invariant with partial sum `(acc₀ + a·y) mod N` — the form the in-place composition consumes.

theoremwindowedModNMulCircuit_correct_acc

theorem windowedModNMulCircuit_correct_acc (w bits a N numWin y acc₀ : Nat)
    (hw : 0 < w) (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits)
    (hacc : acc₀ < N) (hy : y < 2 ^ (w * numWin))
    (f : Nat → Bool) (hf : ModNStepInv w bits numWin y acc₀ f) :
    decodeAccOf cuccaroAdder
        (Gate.applyNat (windowedModNMulCircuit w bits a N numWin) f)
        (1 + 2 * w) bits
      = (acc₀ + a * y) % N

*Stage 1 HEADLINE — generalized mod-N pass VALUE theorem.** The per-window mod-N multiplier run from an invariant state whose accumulator holds partial sum `acc₀ < N` leaves `(acc₀ + a·y) mod N` in the accumulator.

defModNMulReady

def ModNMulReady (w bits numWin y : Nat) (f : Nat → Bool) : Prop

The in-place mod-N multiplier's input/output contract: a `mulInputOf cuccaroAdder`-shaped state with y-register value `y` and a CLEAN block (addend, carry-in, flag, accumulator).

theoremModNMulReady.toStepInv

theorem ModNMulReady.toStepInv {w bits numWin y : Nat} {f : Nat → Bool}
    (h : ModNMulReady w bits numWin y f) : ModNStepInv w bits numWin y 0 f

A `ModNMulReady` state satisfies the mod-N window-step invariant with partial sum 0.

theoremmodNMulReady_mulInputOf

theorem modNMulReady_mulInputOf (w bits numWin y : Nat) :
    ModNMulReady w bits numWin y (mulInputOf cuccaroAdder w bits numWin y)

The clean encoded input is `ModNMulReady`.

defwindowedModNMulInPlace

def windowedModNMulInPlace (w bits a ainv N numWin : Nat) : Gate

*The in-place mod-N windowed multiplier** by a constant `a` invertible mod `N` with inverse `ainv < N`.

theoremwindowedModNMulInPlace_correct

theorem windowedModNMulInPlace_correct (w bits a ainv N numWin y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits)
    (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits)
    (hy : y < N) (hainv : ainv < N) (hinv : a * ainv % N = 1)
    (f : Nat → Bool) (hf : ModNMulReady w bits numWin y f) :
    ModNMulReady w bits numWin (a * y % N)
      (Gate.applyNat (windowedModNMulInPlace w bits a ainv N numWin) f)

*Stage 3 HEADLINE — in-place mod-N windowed multiplication, full state restoration.** With `numWin·w = bits` (the y-register exactly matches the accumulator width), `0 < N`, `2·N ≤ 2^bits`, and `a·ainv ≡ 1 (mod N)`: the in-place mod-N multiplier maps any `ModNMulReady` state with y-value `y < N` to the `ModNMulReady` state with y-value `(a·y) mod N` — accumulator, addend register, carry-in AND comparison flag all returned CLEAN, everything off the y-register restored. Output shape = input shape, so in-place mod-N multiplies compose: this is the Shor-weld entry point `y ← (a·y) mod N`.

theoremwindowedModNMulInPlace_value

theorem windowedModNMulInPlace_value (w bits a ainv N numWin y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits)
    (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits)
    (hy : y < N) (hainv : ainv < N) (hinv : a * ainv % N = 1)
    (f : Nat → Bool) (hf : ModNMulReady w bits numWin y f) :
    decodeReg (fun i => 1 + 2 * w + (2 * bits + 1) + i) bits
        (Gate.applyNat (windowedModNMulInPlace w bits a ainv N numWin) f)
      = a * y % N

*Stage 3, decode form.** After the in-place mod-N multiply, the y-register itself decodes to `(a·y) mod N` (the block is clean by `windowedModNMulInPlace_correct`).

theoremwindowedModNMulInPlace_correct_clean

theorem windowedModNMulInPlace_correct_clean (w bits a ainv N numWin y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits)
    (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits)
    (hy : y < N) (hainv : ainv < N) (hinv : a * ainv % N = 1) :
    ModNMulReady w bits numWin (a * y % N)
      (Gate.applyNat (windowedModNMulInPlace w bits a ainv N numWin)
        (mulInputOf cuccaroAdder w bits numWin y))

*Stage 3, clean-input instance.** On the clean encoded input the in-place mod-N multiplier produces the `ModNMulReady` state with y-value `(a·y) mod N`.

theorempow_inv_mod_one

private theorem pow_inv_mod_one (a ainv N k : Nat) (hN1 : 1 < N)
    (hinv : a * ainv % N = 1) :
    (a ^ k % N) * (ainv ^ k % N) % N = 1

Inverses lift to powers: `(a^k mod N)·(ainv^k mod N) ≡ 1 (mod N)`. (Local copy of `ModExp.mul_pow_mod_one`, kept private to avoid the cross-tree import.)

defwindowedModNMulInPlaceSeq

def windowedModNMulInPlaceSeq (w bits N numWin : Nat)
    (as ainvs : Nat → Nat) (n : Nat) : Gate

The `n`-fold in-place mod-N multiply by the constants `as 0, …, as (n−1)` (with inverses `ainvs k`).

theoremwindowedModNMulInPlaceSeq_correct

theorem windowedModNMulInPlaceSeq_correct (w bits N numWin : Nat)
    (as ainvs : Nat → Nat) (y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits)
    (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits) (hy : y < N)
    (f : Nat → Bool) (hf : ModNMulReady w bits numWin y f) :
    ∀ n, (∀ k, k < n → ainvs k < N ∧ as k * ainvs k % N = 1) →
      ModNMulReady w bits numWin ((∏ k ∈ Finset.range n, as k) * y % N)
        (Gate.applyNat (windowedModNMulInPlaceSeq w bits N numWin as ainvs n)
          f)

*Stage 4 HEADLINE — the in-place mod-N product chain.** `n` in-place mod-N multiplies by invertible constants `as k` compute `y ← (Π_{k<n} as k)·y mod N`, returning to the `ModNMulReady` shape (clean accumulator/addend/carry/flag) after EVERY round.

defwindowedModNMulGate

def windowedModNMulGate (w bits N numWin c cinv : Nat) : Gate

*The Shor-weld gate**: in-place multiply-by-`c` mod `N` at window size `w` (`cinv` the mod-N inverse of `c`). Thin wrapper so the weld (`EncodeRoundTripModMul.gate`) can take `fun c => windowedModNMulGate w bits N numWin c (cinv c)` directly.

theoremwindowedModNMulGate_correct

theorem windowedModNMulGate_correct (w bits N numWin c cinv y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits)
    (hN_pos : 0 < N) (hN2 : 2 * N ≤ 2 ^ bits)
    (hy : y < N) (hcinv : cinv < N) (hinv : c * cinv % N = 1)
    (f : Nat → Bool) (hf : ModNMulReady w bits numWin y f) :
    ModNMulReady w bits numWin (c * y % N)
      (Gate.applyNat (windowedModNMulGate w bits N numWin c cinv) f)

The weld gate realizes `y ← (c·y) mod N` with full state restoration.

theoremwindowedModNMulGate_squaredPower

theorem windowedModNMulGate_squaredPower (w bits N numWin a ainv k y : Nat)
    (hw : 0 < w) (hbits : numWin * w = bits)
    (hN1 : 1 < N) (hN2 : 2 * N ≤ 2 ^ bits)
    (hy : y < N) (hinv : a * ainv % N = 1)
    (f : Nat → Bool) (hf : ModNMulReady w bits numWin y f) :
    ModNMulReady w bits numWin (a ^ 2 ^ k * y % N)
      (Gate.applyNat
        (windowedModNMulGate w bits N numWin (a ^ 2 ^ k % N)
          (ainv ^ 2 ^ k % N)) f)

*The squared-power gate family** — QPE iterate `k` of Shor: `windowedModNMulGate` at `c = a^(2^k) mod N`, `cinv = ainv^(2^k) mod N` realizes `y ← a^(2^k)·y mod N` (the raw-constant value the weld's round-trip at `c = a^(2^k)` requires), needing only the BASE inverse `a·ainv ≡ 1 (mod N)`.

theoremtcount_cxCascade

private theorem tcount_cxCascade (ctrl tgt : Nat → Nat) (n : Nat) :
    tcount (cxCascade ctrl tgt n) = 0

theoremtcount_accYSwap

theorem tcount_accYSwap (A : Adder) (w bits : Nat) :
    tcount (accYSwap A w bits) = 0

The acc↔y swap is Toffoli-free.

theoremtcount_windowedModNMulInPlace

theorem tcount_windowedModNMulInPlace (w bits a ainv N numWin : Nat) :
    tcount (windowedModNMulInPlace w bits a ainv N numWin)
      = 2 * (numWin * (56 * w * 2 ^ w + 56 * bits))

*In-place mod-N multiply T-count**: two mod-N passes plus the T-free swap, `2·numWin·(56·w·2^w + 56·bits)`.

theoremtcount_windowedModNMulGate

theorem tcount_windowedModNMulGate (w bits N numWin c cinv : Nat) :
    tcount (windowedModNMulGate w bits N numWin c cinv)
      = 2 * (numWin * (56 * w * 2 ^ w + 56 * bits))

The weld gate's T-count (same circuit, weld-facing name).

theoremtcount_windowedModNMulInPlaceSeq

theorem tcount_windowedModNMulInPlaceSeq (w bits N numWin : Nat)
    (as ainvs : Nat → Nat) (n : Nat) :
    tcount (windowedModNMulInPlaceSeq w bits N numWin as ainvs n)
      = n * (2 * (numWin * (56 * w * 2 ^ w + 56 * bits)))

*The k-fold chain T-count**: `n` in-place mod-N multiplies, `n · 2·numWin·(56·w·2^w + 56·bits)`.

FormalRV.Arithmetic.Windowed.WindowedWidth

FormalRV/Arithmetic/Windowed/WindowedWidth.lean

FormalRV.Shor.WindowedWidth — the PARAMETRIC structural qubit count of the windowed multiplier, proved for ALL `w, bits, numWin` from the `Gate` structure (closing the gap left by the per-instance `decide` proofs in `WindowedCircuit`). `width (windowedMulCircuit w bits a numWin) = 2*w + 2*bits + numWin*w + 2` (for `w ≥ 1`, `bits ≥ 1`, `numWin ≥ 1`). The proof bounds `maxIdx` of every component (Cuccaro adder, unary-lookup read, window copies) from their definitions and shows the maximum is the top of the `y`-register, `yBase + numWin*w - 1`. So the qubit count — and in particular its `+ numWin*w` (data) and any padding contribution — is read off the verified circuit, not asserted as a formula.

theoremmaxIdx_seq

theorem maxIdx_seq (a b : Gate) : maxIdx (Gate.seq a b) = max (maxIdx a) (maxIdx b)

theoremmaxIdx_init_le_foldl

theorem maxIdx_init_le_foldl {α : Type} (step : α → Gate) (L : List α) (init : Gate) :
    maxIdx init ≤ maxIdx (L.foldl (fun g x => Gate.seq g (step x)) init)

theoremmaxIdx_foldl_seq_le

theorem maxIdx_foldl_seq_le {α : Type} (step : α → Gate) (B : Nat) (L : List α) (init : Gate)
    (hinit : maxIdx init ≤ B) (hstep : ∀ x ∈ L, maxIdx (step x) ≤ B) :
    maxIdx (L.foldl (fun g x => Gate.seq g (step x)) init) ≤ B

theoremle_maxIdx_foldl_seq

theorem le_maxIdx_foldl_seq {α : Type} (step : α → Gate) (L : List α) (init : Gate)
    (a : α) (ha : a ∈ L) :
    maxIdx (step a) ≤ maxIdx (L.foldl (fun g x => Gate.seq g (step x)) init)

theoremmaxIdx_cuccaro_maj_chain

theorem maxIdx_cuccaro_maj_chain (n q_start : Nat) :
    maxIdx (cuccaro_maj_chain n q_start) ≤ q_start + 2 * n

theoremmaxIdx_cuccaro_uma_chain_reverse

theorem maxIdx_cuccaro_uma_chain_reverse (n q_start : Nat) :
    maxIdx (cuccaro_uma_chain_reverse n q_start) ≤ q_start + 2 * n

theoremmaxIdx_cuccaro_full

theorem maxIdx_cuccaro_full (bits q_start : Nat) :
    maxIdx (cuccaro_n_bit_adder_full bits q_start) ≤ q_start + 2 * bits

theoremmaxIdx_x_gates_le

theorem maxIdx_x_gates_le (B : Nat) (xs : List Nat) (h : ∀ i ∈ xs, i ≤ B) :
    maxIdx (x_gates_from_indices xs) ≤ B

theoremmaxIdx_cx_gates_le

theorem maxIdx_cx_gates_le (B ctrl : Nat) (xs : List Nat) (hc : ctrl ≤ B)
    (h : ∀ t ∈ xs, t ≤ B) : maxIdx (cx_gates_from_indices ctrl xs) ≤ B

theoremmaxIdx_prefix_and_step

theorem maxIdx_prefix_and_step (i : Nat) : maxIdx (prefix_and_step i) ≤ 2 + 2 * i

theoremmaxIdx_prefix_and_cascade

theorem maxIdx_prefix_and_cascade (n : Nat) : maxIdx (prefix_and_cascade n) ≤ 2 * n

theoremmaxIdx_prefix_and_uncompute

theorem maxIdx_prefix_and_uncompute (n : Nat) : maxIdx (prefix_and_uncompute n) ≤ 2 * n

theoremmaxIdx_unary_lookup_iteration_le

theorem maxIdx_unary_lookup_iteration_le (w B : Nat) (flips cnots : List Nat)
    (hw1 : 1 ≤ w) (hw : 2 * w ≤ B) (hf : ∀ i ∈ flips, i ≤ B) (hc : ∀ t ∈ cnots, t ≤ B) :
    maxIdx (unary_lookup_iteration w flips cnots) ≤ B

One lookup iteration with flips/cnots bounded by `B` (and `2·w ≤ B` for the cascade and CNOT control) has `maxIdx ≤ B`.

theoremmaxIdx_unary_lookup_multi_iteration_le

theorem maxIdx_unary_lookup_multi_iteration_le (w B : Nat)
    (iters : List (List Nat × List Nat)) (hw1 : 1 ≤ w) (hw : 2 * w ≤ B)
    (h : ∀ p ∈ iters, (∀ i ∈ p.1, i ≤ B) ∧ (∀ t ∈ p.2, t ≤ B)) :
    maxIdx (unary_lookup_multi_iteration w iters) ≤ B

theoremaddrFlips_le

theorem addrFlips_le (w v i : Nat) (hi : i ∈ addrFlips w v) : i ≤ 2 * w

theoremwordCnotsAt_addendIdx_le

theorem wordCnotsAt_addendIdx_le (q_start W Tv t : Nat)
    (ht : t ∈ wordCnotsAt (addendIdx q_start) W Tv) : t ≤ q_start + 2 * W

theoremmaxIdx_lookupReadAt_le

theorem maxIdx_lookupReadAt_le (w q_start W : Nat) (T : Nat → Nat)
    (hw1 : 1 ≤ w) (hq : 2 * w ≤ q_start) :
    maxIdx (lookupReadAt w (addendIdx q_start) W T) ≤ q_start + 2 * W

The lookup read writes only to address/AND/word positions `≤ q_start + 2·W`.

theoremmaxIdx_lookupAddAt_le

theorem maxIdx_lookupAddAt_le (w q_start W bits : Nat) (T : Nat → Nat)
    (hw1 : 1 ≤ w) (hq : 2 * w ≤ q_start) (hWb : W ≤ bits) :
    maxIdx (lookupAddAt w W T bits q_start) ≤ q_start + 2 * bits

theoremmaxIdx_copyWindow_le

theorem maxIdx_copyWindow_le (w yBase j : Nat) (hw1 : 1 ≤ w) (hy : 2 * w ≤ yBase) :
    maxIdx (copyWindow w yBase j) ≤ yBase + (j + 1) * w - 1

Every CX index in window `j`'s copy is `≤ yBase + (j+1)·w − 1` (for `w ≥ 1`, `yBase ≥ 2w`).

theoremle_maxIdx_copyWindow

theorem le_maxIdx_copyWindow (w yBase j : Nat) (hw1 : 1 ≤ w) :
    yBase + j * w + (w - 1) ≤ maxIdx (copyWindow w yBase j)

The CX at slot `w-1` of window `j`'s copy touches `yBase + j·w + (w-1)`.

theoremwidth_windowedMulCircuit

theorem width_windowedMulCircuit (w bits a numWin : Nat)
    (hw1 : 1 ≤ w) (hb : 1 ≤ bits) (hN : 1 ≤ numWin) :
    width (windowedMulCircuit w bits a numWin) = 2 * w + 2 * bits + numWin * w + 2

*The structural qubit count of the windowed multiplier, for ALL `w, bits, numWin`** (`w, bits, numWin ≥ 1`): `width = 2·w + 2·bits + numWin·w + 2`, read off the `Gate` via `maxIdx`. The `numWin·w` is the data register and `2·bits` the Cuccaro acc+addend; on a padded register (`bits = data + g_pad`) this generalizes the `decide`-checked instances.