8
Mins
Verified AI
Tax
Audit Defensible Tax Optimization
How Pramaana uses Lean 4 to prove tax plans are compliant, optimal, and audit-defensible — even when frontier LLMs miss by thousands of dollars.
Author
Author
Yoshiki
Yoshiki

TL;DR
Fear of getting tax filings wrong often prevents taxpayers from taking the optimal deductions for their situation. Taking one set of deductions and credits may exclude another set, presenting a complex optimization problem. Pramaana provides provable tax calculation and optimization verified in Lean4, enabling users to optimize with certainty.
Introduction
American taxpayers encounter a tangle of tax liabilities, deductions, and credits when filing taxes. Tax deductions are available for parents of children, adults over 65, owners of qualifying businesses and other tax-advantaged situations. Taking a set of credits may exclude another. For example, if an unmarried couple with a child files separately, only one can claim the child as a dependent. Claiming the child as a dependent by one parent may save more in taxes than splitting the allocation across two. Deductions and credits that exclude other options present an optimization challenge: stay compliant, but don’t leave any money on the table. These tradeoffs form a combinatorial space where a potentially exponential number of scenarios need to be considered to show that a particular tax plan is optimal.
Without formal guarantees, the fear of being out of compliance drives the process. Taking two deductions that conflict would put the taxfiler out of compliance with the tax code. An overcautious taxpayer might leave plenty of savings on the table. Instead, leveraging a formal model of tax law can help the taxpayer analyze their situation with more certainty. Given a tax plan that consists of incomes, deductions, and credits, formal models can check that the plan is compliant. Conversely, if any increase in a particular kind of income leads to the plan becoming non-compliant, then the plan is provably optimal with respect to that variable.
Our Approach
Pramaana has formalized the 1040 tax return form, and income reporting forms associated with it. We focus on form 1040 because it is the interface between the taxpayer and the U.S. federal tax code. The way form 1040 is filled operationalizes the income tax system. Our formalization is cell-for-cell precise, allowing professionals to check the model in the same way as a complete 1040 form. Optimization proofs proceed by proving mathematical properties on functions that represent cell values, then using that to show that a larger metric value will yield an invalid plan.
The lean encoding proceeds by encoding each cell of the 1040 form as a function. For each cell, the value is computed either by pulling data from the inputs or from previous cells. The following code represents cell 15, the total income. The function derives the total income value by aggregating cells 7 through 14 that represent various income sources like wages, tips, and so forth.
def cell_015_l1z_total_income (s : FormState) : Float := cell_007_l1a_wages_input s + cell_008_l1b_household_wages_input s + cell_009_l1c_tip_income_input s + cell_010_l1d_medicaid_waiver_input s + cell_011_l1e_dependent_care_benefits_input s + cell_012_l1f_adoption_benefits_input s + cell_013_l1g_form_8919_wages_input s + cell_014_l1h_other_earned_income_input s
def cell_015_l1z_total_income (s : FormState) : Float := cell_007_l1a_wages_input s + cell_008_l1b_household_wages_input s + cell_009_l1c_tip_income_input s + cell_010_l1d_medicaid_waiver_input s + cell_011_l1e_dependent_care_benefits_input s + cell_012_l1f_adoption_benefits_input s + cell_013_l1g_form_8919_wages_input s + cell_014_l1h_other_earned_income_input s
def cell_015_l1z_total_income (s : FormState) : Float := cell_007_l1a_wages_input s + cell_008_l1b_household_wages_input s + cell_009_l1c_tip_income_input s + cell_010_l1d_medicaid_waiver_input s + cell_011_l1e_dependent_care_benefits_input s + cell_012_l1f_adoption_benefits_input s + cell_013_l1g_form_8919_wages_input s + cell_014_l1h_other_earned_income_input s
The cell-by-cell encoding is repeated for all cells of 1040 and income reporting forms like W-2. In aggregate, the formalization consists of 265 cell formalizations, with the longest computation chains at 23 cells long.
Once the 1040 is encoded in Lean, we can define metrics and constraints in lean. These will be the definitions we use when we prove that a particular tax plan is optimal. Given a tax plan, the total federal income tax after deductions and credits is given in cell 97. A plan is valid if it can be passed through the cell-by-cell 1040 model with matching outputs and no contradictions. Our model captures that value as an abstract expression, allowing it to be computed for every possible scenario. For a tax plan with multiple taxpayers, we merge the total tax across all taxpayers to compute the cost metric that we aim to push down. Once the validity and metric is defined precisely and mathematically, a provably optimal solution is a valid solution whose metric is optimal across all valid solutions. Now fearless optimization can begin.
Tax Optimization Example
Consider the following scenario:
Alice and Bob are American, unmarried, and share two qualifying children. Apply 2025 federal law (post-OBBBA). Bob is 66, Alice 64. Bob has $60,000 W-2 wages, Alice $22,000. Each parent additionally has $5,000 taxable interest, $2,000 tax-exempt interest, $10,000 ordinary dividends (of which $8,000 are qualified), $12,000 long-term capital-gain distributions, and $18,000 Social Security. Each is also the sole shareholder of an S-corporation consulting practice (a specified service trade or business) that passes through $50,000 of qualified business income, pays no W-2 wages, and holds no qualified property; this pass-through is in AGI but bears no self-employment tax. Each itemizes (state tax $12,000, mortgage interest $20,000, cash charity $5,000, medical $4,000) and takes a $600 foreign tax credit on $8,000 of Canadian income. Alice has $1,000 additional itemized deductions. Each has a $400,000 Traditional IRA with zero basis.
They jointly choose how to allocate the two children and how much each converts from Traditional to Roth in 2025. A parent claiming at least one child files Head of Household, otherwise Single. Each child carries the 2025 Child Tax Credit of $2,200; the age-65 senior deduction is $6,000; the SALT cap is $40,000; NIIT (3.8%) is levied; and the §199A QBI deduction follows 2025 Form 8995-A rules. For every phase-out (CTC, senior deduction, NIIT, §199A) use modified AGI = AGI.
Maximize the combined conversion (rA + rB) subject to the average tax rate on the conversions being at most 25% — the increase in combined Form 1040 line 24 (total tax, including NIIT and every phase-out the conversions trigger) divided by (rA + rB). State the child allocation and each parent's exact conversion amount.
Let’s see how Claude Opus 4.8 and GPT-5.5 do.
Model | Combined | Avg rate | Result |
|---|---|---|---|
True optimum (machine-checked) | 187,149 | 24.9999% | feasible, axis-tight |
gpt-5.5-high | 195,490 | 25.60% | infeasible, +$8,341 |
claude-opus-4-8-max | 179,091 | 24.85% | suboptimal, −$8,058 |
Both frontier models miss by around $8k. But the reason for the error is different, as is the direction it misses in. First GPT fails to model a complex tax deduction phase-out with quadratic value behavior. Specified Service Trades or Businesses (SSTB) classification allows qualified business income (QBI) to have deductions up to 20%. The benefits phase out when the QBI increases beyond the threshold. On top of this, deductions have another phase-in based on the wage limit. As the two combine, the deductible amount reduces quadratically. GPT assumes it is linear, and thus underapproximates the taxes owed. As a consequence, GPT pushes a higher conversion amount that is out of compliance with the actual tax regime.
In contrast, Claude has a compliant but suboptimal tax plan. In its reasoning chain, Claude stops considering increases to Bob’s adjusted gross income at $200k: "Bob = $47,700 lands his AGI at exactly $200,000 [...] so the optimizer stops him there." While it knows that different kinds of taxes start applying here, noting Net Investment Income Tax and phase-out of the Child Tax Credit, it does not check if converting beyond it can still keep the average tax rate below 25%. By not exploring that possibility, Claude leaves money on the table. Pramaana’s formal reasoning stays compliant while taking the optimal path.
Why You Need Verification
GPT and Claude produced answers that were off by more than $8k while producing reasoning chains that are equally unverifiable. When called through the API, GPT produced almost no reasoning trace, only giving the final answer. Since there is no reasoning trace we cannot trace GPT’s thought process to check the result. In contrast, Claude produced over 20 pages of reasoning. This massive length makes verifying Claude’s step-by-step logic difficult and costly. Making matters worse, the CoT contains several backtracks where Claude became unsure of the output, then restarted a sub-procedure. For users who encounter only one bot, the only way to verify the output is to read the entire CoT and check step-by-step. In contrast, Pramaana's formal models and its proofs are checked automatically through the Lean4 Theorem Prover. Each step of the deduction is a formally correct lean deduction. Unlike natural language, the output of the previous step is formally guaranteed to be the input of the next step.
The Audit Defensible Proof: Not a Single Dollar More
Given a candidate solution, we can verify that (1) it’s a valid solution, and (2) it’s the optimal one. We will go over each of the definitions and the proof. A plan records how the children are allocated and how much Alice and Bob convert from their respective Roth accounts.
structure Plan where alloc : Allocation /-- Dependency allocation -/ rA : Nat /-- Alice's conversion amount -/ rB : Nat /-- Bob's conversion amount-/ deriving Repr
structure Plan where alloc : Allocation /-- Dependency allocation -/ rA : Nat /-- Alice's conversion amount -/ rB : Nat /-- Bob's conversion amount-/ deriving Repr
structure Plan where alloc : Allocation /-- Dependency allocation -/ rA : Nat /-- Alice's conversion amount -/ rB : Nat /-- Bob's conversion amount-/ deriving Repr
Given a plan, we can run it through Pramaana’s income tax model to derive how much taxes Alice and Bob owe. `stateX` is the tax situation where the respective partner converts rX. In contrast, `baseStateX` is the state with the respective spouse converting 0 dollars. We’ll need to keep track of all four states so we can compare against them.
def Plan.taxA (p : Plan) : Float := cell_097_l24_total_tax p.stateA def Plan.taxB (p : Plan) : Float := cell_097_l24_total_tax p.stateB def Plan.baseTaxA (p : Plan) : Float := cell_097_l24_total_tax p.baseStateA def Plan.baseTaxB (p : Plan) : Float := cell_097_l24_total_tax p.baseStateB
def Plan.taxA (p : Plan) : Float := cell_097_l24_total_tax p.stateA def Plan.taxB (p : Plan) : Float := cell_097_l24_total_tax p.stateB def Plan.baseTaxA (p : Plan) : Float := cell_097_l24_total_tax p.baseStateA def Plan.baseTaxB (p : Plan) : Float := cell_097_l24_total_tax p.baseStateB
def Plan.taxA (p : Plan) : Float := cell_097_l24_total_tax p.stateA def Plan.taxB (p : Plan) : Float := cell_097_l24_total_tax p.stateB def Plan.baseTaxA (p : Plan) : Float := cell_097_l24_total_tax p.baseStateA def Plan.baseTaxB (p : Plan) : Float := cell_097_l24_total_tax p.baseStateB
Taxes on the conversion is defined as the delta between the two states. A particular plan is valid if the conversion amount is below the original IRA balance ($400k per partner) and the tax on the conversion remains less than 25%.
def Plan.deltaTax (p : Plan) : Float := (p.taxA - p.baseTaxA) + (p.taxB - p.baseTaxB) def Plan.valid (p : Plan) : Prop := p.rA ≤ iraBalance ∧ p.rB ≤ iraBalance ∧ 4.0 * p.deltaTax ≤ Float.ofNat (p.rA + p.rB)
def Plan.deltaTax (p : Plan) : Float := (p.taxA - p.baseTaxA) + (p.taxB - p.baseTaxB) def Plan.valid (p : Plan) : Prop := p.rA ≤ iraBalance ∧ p.rB ≤ iraBalance ∧ 4.0 * p.deltaTax ≤ Float.ofNat (p.rA + p.rB)
def Plan.deltaTax (p : Plan) : Float := (p.taxA - p.baseTaxA) + (p.taxB - p.baseTaxB) def Plan.valid (p : Plan) : Prop := p.rA ≤ iraBalance ∧ p.rB ≤ iraBalance ∧ 4.0 * p.deltaTax ≤ Float.ofNat (p.rA + p.rB)
Finally, the objective to maximize the combined conversion is given as the sum of the two conversions.
def Plan.objective (p : Plan) : Nat := p.rA + p.rB
def Plan.objective (p : Plan) : Nat := p.rA + p.rB
def Plan.objective (p : Plan) : Nat := p.rA + p.rB
Given a particular plan we can test validity
def optimalPlan : Plan := { alloc := .bobBoth, rA := 121449, rB := 65700 } theorem feasible : optimalPlan.valid := by native_decide
def optimalPlan : Plan := { alloc := .bobBoth, rA := 121449, rB := 65700 } theorem feasible : optimalPlan.valid := by native_decide
def optimalPlan : Plan := { alloc := .bobBoth, rA := 121449, rB := 65700 } theorem feasible : optimalPlan.valid := by native_decide
In contrast, optimality is when adding one more dollar to either Alice’s conversion or Bob’s pushes the solution above the validity requirement.
thmOptimal : ¬ ({ optimalPlan with rA := optimalPlan.rA + 1 } : Plan).valid := by native_decide thmOptimal : ¬ ({ optimalPlan with rB := optimalPlan.rB + 1 } : Plan).valid := by native_decide
thmOptimal : ¬ ({ optimalPlan with rA := optimalPlan.rA + 1 } : Plan).valid := by native_decide thmOptimal : ¬ ({ optimalPlan with rB := optimalPlan.rB + 1 } : Plan).valid := by native_decide
thmOptimal : ¬ ({ optimalPlan with rA := optimalPlan.rA + 1 } : Plan).valid := by native_decide thmOptimal : ¬ ({ optimalPlan with rB := optimalPlan.rB + 1 } : Plan).valid := by native_decide
Fearless Optimization for Your Domain
While we started with taxes, any high-stakes domain with well-defined metrics and rules can benefit from formally verified fearless optimization.
Designing a mass-market product to meet emissions standards? Fearless optimization lets you get the best engine configuration without worrying about being out of compliance.
Building a power grid? Fearless optimization lets you build the optimal distribution system while providing sufficient power to everyone.
Planning a wedding? You can maximize compatibility while guaranteeing that your feuding cousins don’t end up next to each other.
If fearless optimization unblocks a critical application for you, we’d love to know.

TL;DR
Fear of getting tax filings wrong often prevents taxpayers from taking the optimal deductions for their situation. Taking one set of deductions and credits may exclude another set, presenting a complex optimization problem. Pramaana provides provable tax calculation and optimization verified in Lean4, enabling users to optimize with certainty.
Introduction
American taxpayers encounter a tangle of tax liabilities, deductions, and credits when filing taxes. Tax deductions are available for parents of children, adults over 65, owners of qualifying businesses and other tax-advantaged situations. Taking a set of credits may exclude another. For example, if an unmarried couple with a child files separately, only one can claim the child as a dependent. Claiming the child as a dependent by one parent may save more in taxes than splitting the allocation across two. Deductions and credits that exclude other options present an optimization challenge: stay compliant, but don’t leave any money on the table. These tradeoffs form a combinatorial space where a potentially exponential number of scenarios need to be considered to show that a particular tax plan is optimal.
Without formal guarantees, the fear of being out of compliance drives the process. Taking two deductions that conflict would put the taxfiler out of compliance with the tax code. An overcautious taxpayer might leave plenty of savings on the table. Instead, leveraging a formal model of tax law can help the taxpayer analyze their situation with more certainty. Given a tax plan that consists of incomes, deductions, and credits, formal models can check that the plan is compliant. Conversely, if any increase in a particular kind of income leads to the plan becoming non-compliant, then the plan is provably optimal with respect to that variable.
Our Approach
Pramaana has formalized the 1040 tax return form, and income reporting forms associated with it. We focus on form 1040 because it is the interface between the taxpayer and the U.S. federal tax code. The way form 1040 is filled operationalizes the income tax system. Our formalization is cell-for-cell precise, allowing professionals to check the model in the same way as a complete 1040 form. Optimization proofs proceed by proving mathematical properties on functions that represent cell values, then using that to show that a larger metric value will yield an invalid plan.
The lean encoding proceeds by encoding each cell of the 1040 form as a function. For each cell, the value is computed either by pulling data from the inputs or from previous cells. The following code represents cell 15, the total income. The function derives the total income value by aggregating cells 7 through 14 that represent various income sources like wages, tips, and so forth.
def cell_015_l1z_total_income (s : FormState) : Float := cell_007_l1a_wages_input s + cell_008_l1b_household_wages_input s + cell_009_l1c_tip_income_input s + cell_010_l1d_medicaid_waiver_input s + cell_011_l1e_dependent_care_benefits_input s + cell_012_l1f_adoption_benefits_input s + cell_013_l1g_form_8919_wages_input s + cell_014_l1h_other_earned_income_input s
The cell-by-cell encoding is repeated for all cells of 1040 and income reporting forms like W-2. In aggregate, the formalization consists of 265 cell formalizations, with the longest computation chains at 23 cells long.
Once the 1040 is encoded in Lean, we can define metrics and constraints in lean. These will be the definitions we use when we prove that a particular tax plan is optimal. Given a tax plan, the total federal income tax after deductions and credits is given in cell 97. A plan is valid if it can be passed through the cell-by-cell 1040 model with matching outputs and no contradictions. Our model captures that value as an abstract expression, allowing it to be computed for every possible scenario. For a tax plan with multiple taxpayers, we merge the total tax across all taxpayers to compute the cost metric that we aim to push down. Once the validity and metric is defined precisely and mathematically, a provably optimal solution is a valid solution whose metric is optimal across all valid solutions. Now fearless optimization can begin.
Tax Optimization Example
Consider the following scenario:
Alice and Bob are American, unmarried, and share two qualifying children. Apply 2025 federal law (post-OBBBA). Bob is 66, Alice 64. Bob has $60,000 W-2 wages, Alice $22,000. Each parent additionally has $5,000 taxable interest, $2,000 tax-exempt interest, $10,000 ordinary dividends (of which $8,000 are qualified), $12,000 long-term capital-gain distributions, and $18,000 Social Security. Each is also the sole shareholder of an S-corporation consulting practice (a specified service trade or business) that passes through $50,000 of qualified business income, pays no W-2 wages, and holds no qualified property; this pass-through is in AGI but bears no self-employment tax. Each itemizes (state tax $12,000, mortgage interest $20,000, cash charity $5,000, medical $4,000) and takes a $600 foreign tax credit on $8,000 of Canadian income. Alice has $1,000 additional itemized deductions. Each has a $400,000 Traditional IRA with zero basis.
They jointly choose how to allocate the two children and how much each converts from Traditional to Roth in 2025. A parent claiming at least one child files Head of Household, otherwise Single. Each child carries the 2025 Child Tax Credit of $2,200; the age-65 senior deduction is $6,000; the SALT cap is $40,000; NIIT (3.8%) is levied; and the §199A QBI deduction follows 2025 Form 8995-A rules. For every phase-out (CTC, senior deduction, NIIT, §199A) use modified AGI = AGI.
Maximize the combined conversion (rA + rB) subject to the average tax rate on the conversions being at most 25% — the increase in combined Form 1040 line 24 (total tax, including NIIT and every phase-out the conversions trigger) divided by (rA + rB). State the child allocation and each parent's exact conversion amount.
Let’s see how Claude Opus 4.8 and GPT-5.5 do.
Model | Combined | Avg rate | Result |
|---|---|---|---|
True optimum (machine-checked) | 187,149 | 24.9999% | feasible, axis-tight |
gpt-5.5-high | 195,490 | 25.60% | infeasible, +$8,341 |
claude-opus-4-8-max | 179,091 | 24.85% | suboptimal, −$8,058 |
Both frontier models miss by around $8k. But the reason for the error is different, as is the direction it misses in. First GPT fails to model a complex tax deduction phase-out with quadratic value behavior. Specified Service Trades or Businesses (SSTB) classification allows qualified business income (QBI) to have deductions up to 20%. The benefits phase out when the QBI increases beyond the threshold. On top of this, deductions have another phase-in based on the wage limit. As the two combine, the deductible amount reduces quadratically. GPT assumes it is linear, and thus underapproximates the taxes owed. As a consequence, GPT pushes a higher conversion amount that is out of compliance with the actual tax regime.
In contrast, Claude has a compliant but suboptimal tax plan. In its reasoning chain, Claude stops considering increases to Bob’s adjusted gross income at $200k: "Bob = $47,700 lands his AGI at exactly $200,000 [...] so the optimizer stops him there." While it knows that different kinds of taxes start applying here, noting Net Investment Income Tax and phase-out of the Child Tax Credit, it does not check if converting beyond it can still keep the average tax rate below 25%. By not exploring that possibility, Claude leaves money on the table. Pramaana’s formal reasoning stays compliant while taking the optimal path.
Why You Need Verification
GPT and Claude produced answers that were off by more than $8k while producing reasoning chains that are equally unverifiable. When called through the API, GPT produced almost no reasoning trace, only giving the final answer. Since there is no reasoning trace we cannot trace GPT’s thought process to check the result. In contrast, Claude produced over 20 pages of reasoning. This massive length makes verifying Claude’s step-by-step logic difficult and costly. Making matters worse, the CoT contains several backtracks where Claude became unsure of the output, then restarted a sub-procedure. For users who encounter only one bot, the only way to verify the output is to read the entire CoT and check step-by-step. In contrast, Pramaana's formal models and its proofs are checked automatically through the Lean4 Theorem Prover. Each step of the deduction is a formally correct lean deduction. Unlike natural language, the output of the previous step is formally guaranteed to be the input of the next step.
The Audit Defensible Proof: Not a Single Dollar More
Given a candidate solution, we can verify that (1) it’s a valid solution, and (2) it’s the optimal one. We will go over each of the definitions and the proof. A plan records how the children are allocated and how much Alice and Bob convert from their respective Roth accounts.
structure Plan where alloc : Allocation /-- Dependency allocation -/ rA : Nat /-- Alice's conversion amount -/ rB : Nat /-- Bob's conversion amount-/ deriving Repr
Given a plan, we can run it through Pramaana’s income tax model to derive how much taxes Alice and Bob owe. `stateX` is the tax situation where the respective partner converts rX. In contrast, `baseStateX` is the state with the respective spouse converting 0 dollars. We’ll need to keep track of all four states so we can compare against them.
def Plan.taxA (p : Plan) : Float := cell_097_l24_total_tax p.stateA def Plan.taxB (p : Plan) : Float := cell_097_l24_total_tax p.stateB def Plan.baseTaxA (p : Plan) : Float := cell_097_l24_total_tax p.baseStateA def Plan.baseTaxB (p : Plan) : Float := cell_097_l24_total_tax p.baseStateB
Taxes on the conversion is defined as the delta between the two states. A particular plan is valid if the conversion amount is below the original IRA balance ($400k per partner) and the tax on the conversion remains less than 25%.
def Plan.deltaTax (p : Plan) : Float := (p.taxA - p.baseTaxA) + (p.taxB - p.baseTaxB) def Plan.valid (p : Plan) : Prop := p.rA ≤ iraBalance ∧ p.rB ≤ iraBalance ∧ 4.0 * p.deltaTax ≤ Float.ofNat (p.rA + p.rB)
Finally, the objective to maximize the combined conversion is given as the sum of the two conversions.
def Plan.objective (p : Plan) : Nat := p.rA + p.rB
Given a particular plan we can test validity
def optimalPlan : Plan := { alloc := .bobBoth, rA := 121449, rB := 65700 } theorem feasible : optimalPlan.valid := by native_decide
In contrast, optimality is when adding one more dollar to either Alice’s conversion or Bob’s pushes the solution above the validity requirement.
thmOptimal : ¬ ({ optimalPlan with rA := optimalPlan.rA + 1 } : Plan).valid := by native_decide thmOptimal : ¬ ({ optimalPlan with rB := optimalPlan.rB + 1 } : Plan).valid := by native_decide
Fearless Optimization for Your Domain
While we started with taxes, any high-stakes domain with well-defined metrics and rules can benefit from formally verified fearless optimization.
Designing a mass-market product to meet emissions standards? Fearless optimization lets you get the best engine configuration without worrying about being out of compliance.
Building a power grid? Fearless optimization lets you build the optimal distribution system while providing sufficient power to everyone.
Planning a wedding? You can maximize compatibility while guaranteeing that your feuding cousins don’t end up next to each other.
If fearless optimization unblocks a critical application for you, we’d love to know.