Spatial interfaces vs layouts

2026 · 05 · 2,300 words

Seven years building spatial interfaces, and I can still predict the first question in any demo. Someone drags an object, notices that it stays exactly where they dropped it, and asks the same thing every time: "Won't this just become a mess?"

They're right to worry. A spatial interface is a freeform canvas where you control the exact position of every object instead of flowing content through a layout algorithm, and that one difference has consequences that run the whole way down the stack. Freeform space does become a mess, and I've watched it happen, not in demos but in daily use. Give someone an infinite canvas and enough objects and they will build a junk drawer. The interesting part is what you do about it, because the fix isn't to go back to grids. It's to think harder about where structure actually comes from.

The case for structure

Layout engines exist because structure works. CSS flexbox, SwiftUI stacks, Figma auto-layout: they all solve the same problem, which is taking N things of varying size and finding positions that produce something readable.

A flexbox container distributes space, wraps when the viewport shrinks, and keeps text legible at any width. You describe the relationships (this comes after that, these share a row, this one stretches to fill) and the engine computes the coordinates. You never think about position, and that is exactly the point.

For blogs, settings panels, documentation, anywhere the content is mostly sequential, layout engines are precisely right. They absorb the spatial decisions so you don't have to. A well-configured grid will produce a readable layout for content it has never seen, at screen sizes nobody tested. I don't think people in the spatial computing world give that enough credit.

Where layout stops being enough

Layout engines assume they already know where things should go. That assumption holds right up until you have spatial intent: until you need something here, and not wherever the algorithm would have put it.

You want a diagram next to the paragraph it illustrates. Not below it, not in a sidebar, but adjacent, so your eyes move between the text and the image without scrolling. You want three clusters of related notes arranged so the connections between the clusters are visible. You want to annotate a sketch by putting your comment physically next to the line you're responding to.

Layout can't express "here." It expresses "after the previous sibling, inside this container, subject to these constraints." Position is always derived: a consequence of the rules rather than a choice you get to make.

CSS does have position: absolute, but the moment you reach for it you've opted out of the layout system entirely. No reflow, no responsiveness, no automatic spacing. The engine's answer is blunt: you want to control position? Then you're on your own.

This keeps coming up across the whole stack. Layout systems handle position-as-consequence beautifully. Position-as-intent, the idea that where something sits means something and should belong to the person rather than the algorithm, simply isn't in the model.

Figma arrived at the same place from the other direction. It started as a spatial canvas, infinite and freeform, then added auto-layout, because when you're designing lists and buttons you genuinely do want the algorithm to handle the spacing and the flow. The fact that one tool needed both paradigms tells you that neither one covers the full range of what people actually do.

Structure that comes from context

At Kosmik we build a spatial canvas. You can place anything anywhere. But freeform on its own is just a whiteboard, and whiteboards become unreadable somewhere past thirty objects.

So we don't add a grid. We add contextual structure instead. Drag an object across the canvas and the system looks at its neighbors: their bounding boxes, their edges, the gaps between them. It computes alignment candidates in real time. Your left edge could match a neighbor's right edge. Your top could snap to a cluster's top line. Your center could line up with the group you're joining.

Edge alignment. Drag any block. Guide lines show when edges are close, regardless of the gap.

The suggestions come from what is already on the canvas, not from a predetermined grid designed for different content. If your neighbors are scattered at odd intervals, the snapping adapts to them. If they form a neat row, the snap points extend that row.

The first version of this was a traditional snap-to-grid: fixed intervals, uniform spacing. It worked beautifully for the first five objects. By the twentieth, everything felt like a spreadsheet, tidy but lifeless, fighting you the moment the content didn't fit the grid's assumptions.

Neighbor-aware snapping changed how people used the canvas. They started building clusters with internal structure, aligned within a cluster and freely placed between clusters. Structure where it helped them, looseness where it didn't.

The distinction

In a layout system, the algorithm decides where things go and you configure parameters. In a spatial interface, you decide where things go and the system assists. Same structure, different authority.

What alignment misses

The demo above shows edge alignment: your left edge matches a neighbor's right edge, your top matches their top line. This is what every design tool with snapping does, and has done for decades. Sketch, Miro, PowerPoint, Figma's smart guides, all the same idea.

But edge alignment only solves half the problem. It tells you where edges line up, and it knows nothing about the space between objects. Worse, the guide lines can feel arbitrary. A line appears because your left edge happens to match the right edge of some object three hundred pixels away, an object you weren't even thinking about. You're trying to align with the card next to you, and the system is showing you a relationship with something off-screen: technically correct, practically useless.

Drag three different-sized cards next to each other using only edge alignment and you get neat edges with inconsistent gaps. The first gap is 12 pixels, the second is 23. Everything lines up, and nothing breathes evenly, because the snapping never once considered size.

Size-aware placement. Drag any block. The system computes positions that maintain a consistent 16px gutter.

What we build at Kosmik works the other way around. Instead of asking "which edges are close?", the system asks "where should this object actually go?" The answer uses both objects' sizes together with a consistent gutter, 16 pixels in our case. It doesn't align edges and hope the rest works out. It computes full placement positions, the spots where the object would create even spacing with its neighbors whatever their dimensions happen to be.

The difference looks subtle in a screenshot and becomes obvious the instant you use it. Edge alignment makes things look organized. Size-aware placement makes them actually be organized.

What's actually different under the hood

The architecture gap is bigger than it looks from the outside.

A layout engine stores its content in a tree. Each node carries style properties (width, padding, flex-grow) but no intrinsic position of its own. Position gets computed top-down: the root establishes the viewport, each container measures its children, resolves the constraints, and assigns coordinates. Change one node's size and the engine reflows its siblings, its ancestors, and its descendants. A single width change can cascade into hundreds of position recalculations.

A spatial engine stores its objects in a flat or shallow scene graph, and each object owns its position as state: { x: 412, y: 88, w: 200, h: 140 }. There is no reflow. Move an object and only that object's coordinates change. The rendering pipeline applies a camera transform (the pan offset and the zoom level) and draws each object where it sits. Move the camera and everything shifts together; move an object and only it moves. (How that per-object state is actually modelled, once the same object needs to live in more than one place at once, is its own design problem.)

	Layout engine	Spatial engine
Position source	Computed from constraints	Stored per object
Move an element	Reflow cascade	Single coordinate update
Viewport resize	Re-layout everything	Camera adjustment
Data structure	Tree (parent → child)	Flat scene graph
Reading order	Implicit in tree order	Must be derived from 2D positions

This is why spatial tools usually can't live inside the DOM. The DOM is a layout tree: every element's position depends on its ancestors and its siblings. To build a spatial interface on top of it you either fight the layout engine at every turn (position: absolute on everything, then reimplementing hit-testing and scroll and focus management yourself) or you leave the DOM behind altogether and render onto a <canvas> or a WebGL surface.

Figma, Miro, and tldraw all chose canvas. Kosmik tried that too, then settled back onto the DOM with enough tweaks to make it work: minimizing JavaScript, leaning hard on CSS transforms for the camera and the positioning. That's a story for another time. The point here is that neither path is free.

What you give up

Now for the bill.

You give up automatic responsiveness. Resize the viewport and the objects don't rearrange themselves; they stay at their stored coordinates while the camera adjusts around them. This preserves spatial meaning, so the relationships don't scramble when someone resizes their browser, but it also means spatial interfaces struggle on small screens. Kosmik keeps the positions fixed and moves the camera instead: on a phone you simply see less of the canvas, but what you do see is in the right place. Some canvases that read beautifully on a 27-inch monitor become navigation puzzles on a phone, and there is no clean way around that.

You give up any built-in reading order. A layout tree has an implicit sequential order, which is just DOM order: screen readers walk it, keyboard navigation follows it. A spatial scene graph has none of that, so you have to derive one yourself. Left-to-right then top-to-bottom? Within each cluster first, then between clusters? What about diagonal arrangements? Every derivation is a heuristic, and every heuristic breaks for some canvas.

And the tidiness burden falls on the user. Layout engines are authoritarian but tidy. Spatial engines hand you freedom and then trust you to keep things coherent. The snapping helps, and the alignment guides help, but nothing rescues a canvas that someone scattered things across in a hurry.

These are not small tradeoffs. Accessibility alone is reason enough for many applications to never go spatial at all. If your content has a natural reading order, a layout system will serve those users better than a canvas that depends on heuristic navigation.

How much structure before it stops being spatial

This is the design question I think about most.

Add enough snap strength and you have built a grid. Add spacing normalization and you have built flexbox. Add a "tidy up" button that rearranges everything into columns and you have built a layout engine, only with extra steps and less tooling.

The line I keep coming back to is that the user can always override. Snap to an alignment and then drag straight past it, and the snap releases. Offset something from the group on purpose, and the system won't quietly "correct" it the next time you move a neighbor. Once the interface starts knowing better than you do, it has become a layout engine again; once it stops helping you entirely, it's back to being a whiteboard. The useful space sits between those two, and it's narrower than you would think.

There's a constant temptation to keep adding to it: auto-spacing, group detection, symmetry enforcement. Each one sounds perfectly reasonable on its own. Put them all together, though, and you have rebuilt the layout paradigm inside the spatial one, with all the rigidity of a grid and none of the decades of accessibility work and familiarity that make CSS layouts usable in the first place. Here, restraint matters more than ambition.

Spatial vs layout: a false binary

I think "spatial vs layout" probably is one, and I say that as someone who has spent seven years building a spatial canvas.

The best spatial tools borrow heavily from layout. Kosmik's neighbor snapping is really just layout logic (alignment, spacing, edge matching) applied locally and optionally instead of globally and mandatorily. The best layout tools, in turn, let you occasionally break the grid. Figma added auto-layout to a spatial canvas, not the other way around, because people needed both at once.

What I keep circling back to instead is the real question underneath it all: who holds authority over position, the algorithm or the person, and how smoothly that authority can pass between them.

The web platform has no vocabulary for this. CSS doesn't know that two elements are spatially related; it only knows that they share a container. That gap is the reason every spatial tool on the web ends up rebuilding its rendering from scratch on a canvas element.

So maybe spatial will always live outside the DOM. Or maybe the line between a "layout document" and a "spatial workspace" will one day feel as arbitrary as the line we used to draw between a "file" and an "application."