The Performance Dilemma
The 'too many allocations' rebuttal - DOD where it matters, clean architecture everywhere else.
The Challenge
Let’s be honest: “Clean Architecture” has a reputation problem in game development.
The problem goes like this: “All those interfaces and abstractions? That’s allocations. That’s indirection. That’s GC spikes. Maybe that works for enterprise CRUD apps, but games run at 60 fps. We can’t afford the overhead.”
This concern is not a myth. It’s not ignorance. It comes from experienced developers who have profiled real projects and seen real costs. Dismissing it - “performance doesn’t matter for game logic” - is the fastest way to lose credibility with the people who most need to hear the architectural argument.
So let’s take it seriously.
The Knuth Principle
Donald Knuth’s full quote is worth reading:
“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative effect when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”
The key is not “don’t optimize.” The key is don’t optimize the wrong thing. And in a typical game project, the vast majority of your code is not the critical 3%.
Consider what most game logic actually does:
- Inventory system: Player picks up an item. Validate capacity. Add to collection. Notify UI. This happens when the player presses a button - maybe a few times per minute.
- Quest system: Check objective completion. Update state machine. Award rewards. This triggers on specific game events - dozens of times per session at most.
- Progression system: Calculate XP. Check level thresholds. Unlock abilities. This runs on player actions.
- Matchmaking logic: Evaluate player criteria. Find compatible sessions. This runs once between matches.
- Economy: Process a purchase. Validate currency. Update balances. Once per transaction.
- Dialog system: Advance conversation state. Evaluate branching conditions. Check prerequisites. Player-paced.
None of these need nanosecond performance. They need correctness. They need to be testable, maintainable, and safe to modify under deadline pressure. Clean Architecture is the right default for all of them.
The question is: what about the other 3%?
Know Your Hot Paths
Hot paths are the loops that run thousands or millions of times per frame. They are identifiable through profiling - not guessing. Here are the usual suspects:
- Particle and projectile systems: Thousands of entities updated per frame, each with position, velocity, lifetime, collision checks.
- Pathfinding: A* or navmesh queries for hundreds of AI agents every frame or every few frames.
- Physics simulation: Collision detection and resolution for large numbers of bodies.
- Procedural generation: Noise functions, voxel meshing, terrain generation - often needing to complete within a frame budget.
- Real-time audio DSP: Sample-level processing at 44,100+ Hz with zero tolerance for latency.
- Spatial queries: “What’s near me?” for hundreds of entities - spatial hashing, BVH traversal, quadtree lookups.
Notice something about this list: it’s short. In a game with 50+ systems, maybe 3-5 are genuine hot paths. The rest - the overwhelming majority of your codebase - are correctness-critical, not performance-critical.
The critical discipline is: profile first, then optimize. The hot paths that actually matter are almost never the ones you’d guess. Developers routinely optimize code that runs once per user action while ignoring the real bottleneck three files away.
The Data-Oriented Answer
When profiling proves you have a hot path, the answer is not “abandon architecture.” The answer is Data-Oriented Design at the right layer.
The Bullet-Hell Example
Imagine a bullet-hell game. You have a projectile system that manages thousands of active bullets. Each bullet has a position, velocity, lifetime, damage value, and owner. Each frame, you update positions, check collisions, and expire dead bullets.
The naive object-oriented approach creates a Bullet class and instantiates a thousand of them. Each one flows through Core Use Cases Logic. Each one is a separate heap allocation. Each one causes a cache miss when you iterate through them. The garbage collector eventually has to clean up the dead ones.
This is, in fact, slow. The critics are right - for this specific case.
The Architecture-Preserving Solution
But the architectural answer is not “put all the bullet logic in a MonoBehaviour and hope for the best.” It’s this:
Core defines the contract:
// Core layer - the rules and the interface
public interface IProjectileSystem
{
void SpawnProjectile(Position origin, Direction velocity, float damage);
EventSource<ITarget, float> OnTargetDamaged { get; }
}
Core owns the rules - what happens when a projectile hits, what damage means, how spawning is triggered by game logic. The interface defines the contract: you can spawn bullets, tick the system, and retrieve hit events.
Services own the execution strategy - when core requests new projectile to be spawned it gets added to array of Entities, when collisions with specific ITarget are detected then damage caused by all projectiles in the current frame are passed back to the core.
Inside this implementation, there are no objects. There are arrays of data - positions, velocities, lifetimes - laid out contiguously in memory for cache-friendly iteration. The tight loop is a Burst-compiled Job that runs on multiple cores with SIMD instructions. Zero allocations per frame. Zero GC pressure.
The architecture boundary is preserved. Core defines what a projectile system does. Services defines how it does it. The rest of the game - the weapon that spawns projectiles, the health system that processes hit events, the UI that shows bullet counts - interacts only with the IProjectileSystem interface. They don’t know or care whether the implementation uses object-per-bullet, struct-of-arrays, ECS entities, or GPU compute shaders.
The Pattern Generalizes
This isn’t specific to projectiles. The same pattern works for any hot path:
| Hot Path | Core Defines | Services Implements |
|---|---|---|
| Projectiles | Spawn rules, hit processing, lifetime | Struct-of-arrays, Burst jobs |
| Pathfinding | ”Agent X needs a path to Y” | A*, JPS, flowfields - with Job System |
| Spatial queries | ”What entities are near position P?” | Spatial hash, BVH, octree |
| Proc-gen | Biome rules, generation parameters | Noise functions, chunk meshing, GPU compute |
| Crowd simulation | Behavior rules, group logic | DOTS entities, parallel steering |
In every case: one system manages data, not many objects managed individually. Core defines the contract. Services executes with whatever engine-specific optimization is needed.
The Architecture Enables Optimization
Here’s the counterintuitive insight: clean architecture actually makes performance optimization easier, not harder.
You can swap implementations without touching game logic. Because the hot path is behind an interface, you can start with a simple implementation and optimize later:
- Week 1: Naive
List<Bullet>implementation. It works. Ship the prototype. - Week 8: Profiling shows projectiles are the bottleneck. Swap to struct-of-arrays. No other code changes.
- Week 15: Need more throughput. Swap to Burst-compiled Jobs. Still no other code changes.
- Week 20: Targeting mobile. Swap to a GPU compute implementation. Still no other code changes.
Each optimization is isolated behind the interface. The weapon system, the damage system, the UI - none of them are touched. None of them need to be retested. The optimization has zero blast radius.
Compare this to the “no architecture” approach: the bullet logic is spread across three MonoBehaviours, a ScriptableObject, and a static utility class. Optimizing means rewriting all five, updating every reference, and retesting every system that touches bullets. The optimization becomes a project-wide refactor, and the team avoids it until the performance is truly unacceptable.
You can benchmark and A/B test implementations. Need to know if your new spatial hash is actually faster than the old quadtree? Both implement the same interface. Swap one for the other, profile, compare. The rest of the game doesn’t even know the experiment is happening.
You can specialize per platform. One IProjectileSystem for PC (Burst + Job System), another for mobile (simplified, fewer max projectiles), another for the server (headless, no rendering). The Core is identical across all platforms. Only the Service implementation varies.
What Clean Architecture Actually Costs
Let’s address the real runtime costs head-on, because vague reassurances don’t convince anyone who has profiled a C# game:
DI Container Resolution
Cost: One-time at startup or scene load. Not per-frame.
The DI container resolves dependencies when objects are created - typically during initialization. Once the IProjectileSystem reference is injected into the weapon logic, it’s a direct reference for the rest of the session. There is no per-frame lookup. The runtime cost is functionally zero during gameplay.
Interface Indirection
Cost: One virtual call per invocation. Effectively zero.
A virtual method call (interface dispatch) costs a few nanoseconds. The JIT compiler often inlines these in hot paths after warmup. For game-logic-frequency calls - “spawn a projectile,” “process a quest update” - the cost is unmeasurable against the cost of the actual work being done.
The important distinction: you do NOT call through an interface per particle or per bullet. You call the interface once per frame (Tick(deltaTime)), and inside that call, the tight loop runs on raw data with no abstraction overhead.
Event Systems
Cost: Negligible for game-logic-frequency events.
An event that fires when the player picks up an item, or when a quest state changes, or when a level loads - these are infrequent enough that the dispatch overhead is irrelevant. The concern would be valid if you fired events per-particle per-frame. Don’t do that. The event system is for game logic; the tight loop inside a hot-path Service has its own internal communication.
Object Allocations and GC
Cost: Legitimate concern - but only in per-frame hot paths.
This is the one real cost that deserves respect. In C# (and especially in Unity’s older Mono runtime), heap allocations in per-frame code cause GC pressure. But this applies to per-frame hot paths, not to business logic that runs on user actions.
The projectile system’s Tick() method should allocate nothing - it operates on pre-allocated native arrays. The inventory system that runs when the player opens a menu can allocate freely; no one notices a microsecond of GC during a UI interaction.
The rule: allocations matter inside tight loops. They don’t matter in code that runs at “human speed” - player actions, game events, state transitions.
The Honest Summary
| Concern | Real Cost | Where It Matters |
|---|---|---|
| DI resolution | One-time startup | Nowhere during gameplay |
| Virtual dispatch | ~nanoseconds per call | Not in tight inner loops |
| Event dispatch | Negligible | Not per-particle, not per-frame |
| Heap allocations | GC pressure | Only inside per-frame hot-path loops |
The architecture adds no meaningful cost to 97% of your code. For the 3% where it would matter, you push the tight loop behind an interface and optimize inside the implementation, where the architecture’s abstractions don’t exist.
Profiling-Driven Development
The practical workflow for performance in a clean architecture:
-
Build it clean. Use interfaces, events, proper layering. Don’t optimize. The code is readable, testable, and maintainable. Most of it will never need optimization.
-
Profile. Use your engine’s profiler (Unity Profiler, Unreal Insights, or platform tools). Find the actual bottlenecks. They are almost never where you expect.
-
Isolate the hot path. If the bottleneck is inside a system that’s already behind an interface - great, replace the implementation. If it’s not, refactor it behind an interface first. This refactor is a one-time cost that pays for itself immediately.
-
Optimize the implementation. Inside the Service, use whatever technique is needed: struct-of-arrays, SIMD, Job System, Burst, GPU compute, object pooling, native containers. No architectural constraints apply inside the implementation - only at the boundary.
-
Verify. Profile again. The rest of the codebase is untouched, so you know exactly what changed and can measure the improvement precisely.
This is the opposite of premature optimization. You spend zero effort optimizing code that doesn’t need it. When optimization is needed, the architecture makes it cheaper - not more expensive - because the hot path is isolated and replaceable.
The Rule
Clean Architecture is the default. It provides correctness, testability, and maintainability for the 97% of your game that doesn’t need nanosecond performance.
When profiling proves a hot path exists - and only then - push the tight loop into a Service implementation using Data-Oriented Design. Core still owns the contract: what the system does, what events it produces, what rules govern its behavior. Services own the execution strategy: how the data is laid out, how the loop is parallelized, what engine-specific optimizations are applied.
The architecture boundary is the optimization boundary. Inside the boundary, you have complete freedom to use whatever technique delivers the performance you need. Outside the boundary, the rest of your game is insulated from optimization churn and remains clean, testable, and maintainable.
This is not a compromise. It’s the best of both worlds - and the only approach that scales to large, long-lived game projects where both correctness and performance are non-negotiable.