Generative UI and the Downfall of Digital Experiences — The Swift Path to Average

Joe Fletcher
7 min readNov 21, 2023

--

An image generated by a coworker around inclusivity scenarios.

At this point, most of us who use Generative AI tools like ChatGPT have at least a basic understanding of how a Large Language Model (LLM) works. A model is trained on an enormous set of data — millions of text samples, billions of words — and gradually coaxed into recognizing patterns and associating them with descriptive terms. So when I ask it to answer a question, or write in a certain style, or output in a particular format, the model has millions of examples to reference, and does the math to figure out which approach is most likely to satisfy the request.

This same logic applies to images as well. Tools like Midjourney and Stable Diffusion are based on large models containing millions or billions of images, enabling them to identify the patterns that correlate with phrases like “cyberpunk heroine” or “art nouveau” or “monkey in a cowboy hat riding a poodle.” Large models trained on code are able to work similar magic for programmers.

Whether they’re learning from words, images, or code, these models have proven incredibly capable. But what else could they be trained on? In theory, a “large model” can be created from any data set that provides huge numbers of well-defined examples. There are large models of financial transactions, of course, but in my world, one set of examples is especially interesting: user interactions.

User Experience designers spend their entire careers developing mental models of what constitutes an engaging, satisfying, intuitive or meaningful interaction with a digital tool, and many of these models are eventually codified into repeatable best practices. UI toolkits for mobile, tablet, and laptops which host the same sets of common components to be used on repeat to make a product usable and efficient. We have rules of thumb about what works and what doesn’t, and the better designers can make nuanced choices about what makes the most sense for different audiences in different contexts: where that button should go, how wordy that description should be, what kind of visual indicator should be used to confirm a successful entry, and so on. But in the end, we are often ruled by guiding heuristics and rules.

It’s all fairly blunt, though, in the same way that “10 rules for punchier writing” is blunt: every best practice is a broad generalization, and we rely on experienced professionals to translate them for specific challenges.

But what if we could be a lot sharper about those recommendations and heuristics? What if they weren’t simply guidelines, but UI delivered “in the moment”. What if we trained a model not on words or images, but clicks, hovers, scrolls, eye-tracking, components, flows, and other interactive behaviors? I’m not a Machine Learning expert, but I have enough repetitive experience in this area to recognize the potential here — to build an engine for creating a user interface and experience on-the-fly.

A Large Interaction Model (LIM).

A model that would examine massive sets of digital interactive elements and recorded user interactions to create a model that can be queried and customized, much as LLM and image generators are.

Feeding this (so far hypothetical) model, a prompt of “e-commerce shopping cart page” would get you an idealized or average version of this familiar interface, which could also be customized in myriad ways. Throwing in a few adjectives (“professional”, “friendly”), a target market (“contractors”, “college students”), or a type of commerce (“fly fishing gear”, “second-hand formalwear”) would quickly get you a modified version, as would the familiar Q&A process used by so many LLM users. We can create all of this functionality in the moment, ephemeral, and instant.

But of course, since we’re talking about interactions, you wouldn’t necessarily need to “tell” the model how to customize its output. People’s interactions are themselves a form of information, and could potentially become the inputs that customize a model’s output. This could be quickly A/B tested. The quick downfall of digital design towards engines that provide better UI that we could in a fraction of the time. And an engine that doesn’t only foresee a “happy path” customer journey, but can see all journeys, at all times, at once. And this is where things get interesting.

The Elimination of Customer Journeys, or “Reducing the Cost of Variance”

Currently, the way we design much of our software is to select a few of the most likely customer / user journeys (hopefully based on research), and create an interactive system that facilitates that journey. We design the “happy path”. The common path. The path for the majority because we can’t design everything. But what if we have a model that already contains an effectively infinite number of such systems? What if we can design… for everything.

With a Large Interaction Model, it wouldn’t be necessary to pre-select a journey and identify an optimal path, because that decision can be made on the fly. As a user starts engaging with our hypothetical e-commerce page, they’re leaving bits of data with every passing second: what they click on, how fast they scroll, where they linger, and so on. An LIM trained on billions of such interactions could quickly spot patterns in each user engagement, and start generating a unique experience that’s specific to that user, at that moment.

The basics of a Large Interaction Model for creating Generative UI

The immediate value here is, of course, a customized and therefore dramatically improved experience for just about everyone — no matter how small the group… or individual. Perhaps you have an online service that you really love using because it just “gets” you, even if not everyone likes it. Now imagine if every online service felt like that — and even adjusted itself from day to day to be ideal for your needs at that moment. An application or experience that truly grows and scales with you. This is possible. It seems a little insane, but imagine what Midjourney would sound like if you’d described it to someone just three years ago.

There’s also a deeper implication that I think could ultimately be more important.

When PCs started transforming the office in the 80s and 90s, it was largely a story of replacing hardware, which has a marginal cost, with software, which doesn’t. Making a copy of a document once required a copy machine, and before that a sheet of carbon paper, all of which took time, effort and money. The costs of hardware scale along with the revenue.

But making a digital copy costs nothing and is immediate. This was revolutionary, and it changed how we thought about information, working practices, and the very concept of utility. It also generated an enormous amount of value for millions of users, and tremendous wealth for software companies. A great piece of software, unlike a great car, lamp, or blender, can continue building revenue with very little additional cost, if it catches on. That’s why seven of the 10 biggest companies on earth today are software-based — and none of them were in 1980.

Design is like hardware right now in the sense that more variants of a product — even a digital one — requires more effort. Every Google customer is using the same Google products, everyone uses the same Facebook, and so on. Their experiences are customized to some degree, mostly through direct adjustment by the user, but it’s all the same underlying software. If I want to design twice as many digital products, it will cost roughly twice as much.

But with generative design systems, we have zero marginal cost design. We design for intent, set a few constants (there’s always a top header menu, color palette must adhere to certain parameters, etc.), and create outcomes. From then on, it’s up to the model.

This approach could potentially create a shift that’s as transformative as the one brought on by software. If the cost of design variance drops to near zero, then companies that are able to provide instant customized experiences stand to generate similar levels of value — and revenue — as the vanguards of the digital revolution. But, this also may mean that every company can copy any other company’s features… or experiences. Where does the value go in these scenarios?

What becomes of experiences where every moment can be designed by a machine better than a person?

Over the past 3 decades, we’ve only seen a few major digital design UI shifts — see the image below. The late 90’s during the dot-come boom, the mobile wave with the iPhone and subsequent App Store democritization of applications in 2007 and following years, and potentially the rise of the SuperApps in Asia. However the coming shift with GenAI may be nothing like we’ve ever seen before. A much larger shift.

I’m not sure if this type of Large Interaction Model output will plant us firmly in the Age of Average. Or if personalized experiences will explode what it means to be compelling UI… but I fear it’s more of the former as we narrow down on what type of UI creates the most control of attention of the best conversion funnel. Either way, it points to the fact that without a doubt, Tech will get a lot bigger.

The waves of digital interface disruption — thanks to a coworker for this as well!

The landscape of the field of UX is shifting in real time — and we need to shift with it — both as companies and as individual designers. For companies, how do businesses sell a proposition in a world where experiences are infinitely copiable and instantly creatable? For designers… how do we stay relevant when Generative UI is quicker than any of us?

I tend to think nothing will replace some of the nuance of human insights and connections. But if I answer the above questions honestly, then…

I’m not sure yet either…

--

--