← Papers

Building Systems That Cannot Be Wrong

March 20, 2026

Most software tolerates mistakes. Financial systems do not.

In many domains, a bug is inconvenient. A request fails, a page reloads, a user retries. The system recovers and the impact is limited. In regulated financial systems, the same mistake can move money incorrectly, violate compliance rules, or create a reconciliation problem that takes days to unwind. The cost is not just technical. It is operational, regulatory, and sometimes legal.

Correctness is not a feature in these systems. It is the baseline.

Systems that move money do not get to be wrong.


The Illusion of Normal Operation

Many systems are designed around the assumption that things will work as expected. Requests succeed. Dependencies respond. Data is available when needed. Under those assumptions, the system appears stable.

That assumption does not hold in financial systems.

Files arrive late. External systems respond inconsistently. Data is incomplete. Messages are duplicated. Processes fail midway through execution. These are not edge cases. They are normal operating conditions.

A system that is only correct when everything works is not correct.

The architecture has to assume that every boundary can fail, that data can arrive out of order, and that operations can be partially completed. If those conditions are not modeled explicitly, the system will behave unpredictably under pressure. That is where most real failures occur.


Determinism Over Convenience

One of the most important properties in financial systems is determinism. Given the same input, the system must produce the same output every time. Not approximately the same. Exactly the same.

This requirement shows up everywhere. It affects how identifiers are generated, how state transitions are modeled, how retries are handled, and how data is persisted and retrieved. When determinism is missing, ambiguity is introduced. Ambiguity leads to manual reconciliation. Manual reconciliation introduces risk.

It is often easier to build systems that rely on randomness, implicit ordering, or loosely defined state. Those choices work until they do not. When a system needs to be replayed, audited, or corrected, those same choices become liabilities.

Determinism is not a constraint imposed by regulation. It is a requirement for operating a system that must be explainable after the fact.


State Must Be Explicit

In many applications, state is inferred. A record exists, a flag is set, a status is implied by the presence or absence of data. That approach breaks down quickly in financial workflows.

Every meaningful transition needs to be explicit. A transaction is not just processed. It moves through defined states such as received, validated, accepted, settled, or failed. Each transition has rules. Each rule has conditions. Each condition has to be observable.

When state is implicit, failures become difficult to reason about. When state is explicit, the system becomes inspectable. You can answer not just what happened, but why.

This is the difference between systems that require investigation and systems that provide answers.


Boundaries Are Where Systems Fail

Most failures do not occur within a single component. They occur at the boundaries between them.

A service assumes data is complete. Another assumes it is validated. A third assumes it is unique. None of those assumptions are enforced consistently.

In regulated systems, boundaries must be defined and enforced clearly. Ownership of the data, responsibility for validation, accountability for correctness, and the guarantees that exist when data crosses a boundary all need to be explicit. When these are not defined, responsibility becomes diffuse and failures become difficult to correct.

Strong systems do not just define boundaries. They make those boundaries difficult to violate.


Reconciliation Is a First-Class Concern

In most software, reconciliation is an afterthought. In financial systems, it is part of the architecture.

You need to be able to answer what was supposed to happen, what actually happened, and where the two diverged. That requires traceability across the lifecycle, consistency in state transitions, and the ability to replay or reconstruct events when needed.

If reconciliation is not designed into the system, it will be handled manually. That does not scale, and it does not hold up under audit.

Systems that cannot be reconciled cannot be trusted.


Simplicity Is Harder Here

There is a tendency to over-engineer financial systems because the domain is complex. Regulation, edge cases, and failure modes create pressure to add abstraction and flexibility early. That instinct often makes the system worse.

The goal is not to eliminate complexity. It is to place it where it belongs. Clear data models, explicit state machines, and well-defined boundaries reduce the need for defensive complexity elsewhere.

Simple systems in this domain are not simplistic. They are deliberate. They reflect a clear understanding of what must be correct and what can be flexible. Everything else is noise.


What This Requires

Building systems that cannot be wrong requires a different mindset.

You do not optimize for speed of implementation. You optimize for clarity of behavior. You do not assume success. You design for failure as a normal condition. You do not rely on implicit behavior. You make state, transitions, and boundaries explicit.

These are not theoretical principles. They are practical requirements for systems that operate under real constraints. The difference is visible over time. Systems built this way remain stable under pressure. Systems that are not eventually require constant intervention.

That is the dividing line.

Continue reading

Papers

This reflects how I approach building systems at XRiley. If you are solving real problems and need clarity in how to design, scale, or stabilize your software, that is the work I do.