Spatial interfaces vs layouts

2026 · 05 · 2,000 words

Seven years at Kosmik, and I can still predict the first question in any demo. Someone drags an object, notices it stays wherever they put it, and asks: "Won't this just become a mess?"

They're right to worry. Freeform space does become a mess. I've watched it happen, not in demos but in daily use. Give someone an infinite canvas and enough objects, and they'll build a junk drawer.

The answer isn't to go back to grids, though. It's to think harder about where structure comes from.

The case for structure

Layout engines exist because structure works. CSS flexbox, SwiftUI stacks, Figma auto-layout all solve the same problem: given N things of varying size, find positions that produce something readable.

A flexbox container distributes space, wraps when the viewport shrinks, keeps text legible at any width. You describe relationships (this comes after that, these share a row, this stretches to fill) and the engine computes coordinates. You never think about position. That's the point.

For blogs, settings panels, documentation, anywhere content is mostly sequential, layout engines are exactly right. They absorb spatial decisions so you don't have to. A well-configured grid system will produce readable layouts for content it has never seen, at screen sizes nobody tested. I don't think people in the spatial computing world give that enough credit.

Where layout stops being enough

Layout engines assume they know where things should go. That assumption holds until you have spatial intent, until you need something here, not wherever the algorithm would place it.

You want a diagram next to the paragraph it illustrates. Not below it, not in a sidebar, adjacent, so your eyes can move between text and image without scrolling. You want three clusters of related notes arranged so the connections between clusters are visible. You want to annotate a sketch by placing your comment physically next to the line you're responding to.

Layout can't express "here." It expresses "after the previous sibling, inside this container, subject to these constraints." Position is always derived, a consequence of rules rather than a choice you make.

CSS has position: absolute, which immediately opts you out of the layout system. No reflow, no responsiveness, no automatic spacing. The engine says: you want to control position? You're on your own.

This keeps coming up across the stack. Layout systems handle position-as-consequence well. Position-as-intent, the idea that where something sits means something and should be controlled by the person rather than the algorithm, just isn't part of the model.

Figma saw this from the other direction. It started as a spatial canvas, infinite and freeform. Then it added auto-layout, because when you're designing lists and buttons, you genuinely need the algorithm to handle spacing and flow. That Figma needed both paradigms in the same tool tells you neither one covers the full range of what people actually do.

Structure that comes from context

At Kosmik, we build a spatial canvas. You can place anything anywhere. But freeform alone is just a whiteboard, and whiteboards become unreadable past about thirty objects.

What we build instead is contextual structure. When you drag an object across the canvas, the system looks at its neighbors: their bounding boxes, their edges, the gaps between them. It computes alignment candidates in real time. Your left edge could match a neighbor's right edge. Your top could snap to a cluster's top line. Your center could align with the group you're joining.

notessketchreferencedrag me
Edge alignment. Drag any block. Guide lines show when edges are close, regardless of the gap.

The suggestions come from what's already on the canvas, not from a predetermined grid or a column system designed for different content. If your neighbors are scattered at odd intervals, the snapping adapts. If they form a neat row, the snap points extend that row.

The first version of this feature was a traditional snap-to-grid. Fixed intervals, uniform spacing. It worked for the first five objects. By the twentieth, everything felt like a spreadsheet, tidy but lifeless, fighting you whenever the content didn't fit the grid's assumptions.

Switching to neighbor-aware snapping changed how people used the canvas. They started building clusters with internal structure, aligned within the cluster but freely placed between clusters. Structure where it helped, looseness where it didn't.

The distinction
In a layout system, the algorithm decides where things go and you configure parameters. In a spatial interface, you decide where things go and the system assists. Same structure, different authority.

What alignment misses

The demo above shows edge alignment: your left edge matches a neighbor's right edge, your top matches their top line. This is what every design tool with snapping does. Sketch, Miro, PowerPoint, Figma's smart guides. It's been the same idea for decades.

But edge alignment only solves half the problem. It tells you where edges line up. It knows nothing about the space between objects. Worse, the guide lines can feel arbitrary. A line appears because your left edge happens to match some object's right edge three hundred pixels away, an object you weren't even thinking about. You're trying to align with the card next to you, and the system is showing you a relationship with something off-screen. The guides are technically correct and practically useless.

Drag three different-sized cards next to each other using only edge alignment and you'll get neat edges with inconsistent gaps. The first gap might be 12 pixels. The second might be 23. Everything lines up. Nothing breathes evenly. The snapping never considered size.

16wide cardtall carddrag me
Size-aware placement. Drag any block. The system computes positions that maintain a consistent 16px gutter.

What we build at Kosmik works differently. Instead of asking "which edges are close?", the system asks "where should this object go?" The answer uses both objects' sizes and a consistent gutter (16 pixels in our case). It doesn't align edges and hope for the best. It computes full placement positions where the object would create consistent spacing with its neighbors, regardless of their dimensions.

The difference looks subtle in a screenshot but becomes obvious when you use it. Edge alignment makes things look organized. Size-aware placement makes them actually be organized.

What's actually different under the hood

The architecture gap is bigger than it looks from the outside.

A layout engine stores content in a tree. Each node carries style properties (width, padding, flex-grow) but no intrinsic position. Position is computed top-down: the root establishes the viewport, each container measures its children, resolves constraints, assigns coordinates. Change one node's size and the engine reflows siblings, ancestors, descendants. A single width change can cascade into hundreds of position recalculations.

A spatial engine stores objects in a flat or shallow scene graph. Each object owns its position as state: { x: 412, y: 88, w: 200, h: 140 }. There's no reflow. Move an object and only that object's coordinates change. The rendering pipeline applies a camera transform (pan offset, zoom level) and draws each object at its stored position. Move the camera and everything shifts. Move an object and only it moves.

Layout engineSpatial engine
Position sourceComputed from constraintsStored per object
Move an elementReflow cascadeSingle coordinate update
Viewport resizeRe-layout everythingCamera adjustment
Data structureTree (parent → child)Flat scene graph
Reading orderImplicit in tree orderMust be derived from 2D positions

This is why spatial tools typically can't live inside the DOM. The DOM is a layout tree. Every element's position depends on its ancestors and siblings. To build a spatial interface on top of it, you either fight the layout engine at every turn (position: absolute on everything, reimplementing hit-testing and scroll and focus management) or you leave the DOM behind and render onto a <canvas> or WebGL surface.

Figma, Miro, and tldraw all chose canvas. Kosmik tried that too, then settled back on the DOM with enough tweaks to make it work, minimizing JavaScript and leaning on CSS transforms for camera and positioning. That's a story for another time. The point is that neither path is free.

What you give up

Here's what you lose.

No automatic responsiveness. Resize the viewport and objects don't rearrange. They stay at their stored coordinates while the camera adjusts. This preserves spatial meaning (the relationships between objects don't scramble when someone resizes their browser) but it means spatial interfaces struggle on small screens. Kosmik handles this by keeping positions fixed and adjusting the camera: on a phone, you see less of the canvas, but what you see is in the right place. Some canvases that read well on a 27-inch monitor become navigation puzzles on a phone, and there's no clean way around that.

No built-in reading order. A layout tree has implicit sequential order: DOM order. Screen readers walk it. Keyboard navigation follows it. A spatial scene graph has no such thing. You have to derive one. Left-to-right then top-to-bottom? Within each cluster, then between clusters? Diagonal arrangements? Every derivation is a heuristic, and every heuristic breaks for some canvas.

The tidiness burden falls on the user. Layout engines are authoritarian but tidy. Spatial engines give you freedom and trust you to maintain coherence. The snapping helps. Alignment guides help. But nothing will rescue a canvas someone scattered things across in a hurry.

These are not small tradeoffs. Accessibility alone is a reason many applications should never go spatial. If your content has a natural reading order, a layout system will serve those users better than a canvas that requires heuristic navigation.

How much structure before it stops being spatial

This is the design question I think about most.

Add enough snap strength and you've built a grid. Add spacing normalization and you've built flexbox. Add a "tidy up" button that rearranges everything into columns and you've just built a layout engine with extra steps and less tooling.

The line I keep coming back to: the user can always override. Snap to an alignment and drag past it, the snap releases. Deliberately offset something from the group, the system doesn't "correct" it next time you move a neighbor. Once the interface starts knowing better than you, it's a layout engine again. Once it stops helping entirely, it's a whiteboard. The useful space is in between, and it's narrower than you'd think.

There's a temptation to keep adding features: auto-spacing, group detection, symmetry enforcement. Each one sounds reasonable on its own. Together they recreate the layout paradigm inside the spatial one, all the rigidity of a grid system but without the decades of accessibility work and user familiarity that make CSS layouts actually usable.

Restraint matters here more than ambition.

The wrong framing

"Spatial vs. layout" is probably a false binary. I say that as someone whose job title literally includes "spatial interfaces."

The best spatial tools borrow from layout. Kosmik's neighbor snapping is layout logic (alignment, spacing, edge matching) applied locally and optionally rather than globally and mandatorily. The best layout tools let you occasionally break the grid. Figma added auto-layout to a spatial canvas, not the other way around, because people needed both.

What I keep thinking about instead is who holds authority over position, the algorithm or the person, and how smoothly that authority transfers between them.

The web platform has no vocabulary for this. CSS doesn't know that two elements are spatially related. It only knows they share a container. That gap is why every spatial tool on the web ends up rebuilding rendering from scratch on a canvas element.

Maybe spatial will always live outside the DOM. Or maybe the line between "layout document" and "spatial workspace" will eventually feel as arbitrary as the line between "file" and "application."