Discover more from blending bits
Unlocking Superior UX with LLMs
Avoiding the 'blank canvas' problem
LLMs offer developers exciting new capabilities alongside new challenges.
On the one hand they are like eager, infinitely-patient interns, with broad knowledge to boot. On the other hand, they occasionally make stuff up (not unlike human interns). To make things worse, the best LLMs are compute-intensive, so they can introduce deleterious latency to applications.
Having tackled these issues while developing multiple AI applications, including Atticus, I'm eager to share insights to transform these hurdles into UX triumphs.
Stream when possible
ChatGPT brought LLMs into the mainstream, and a big reason for its success is its UX; users simply chat with the LLM, so very little training or education is required. Furthermore, ChatGPT streams its responses token by token, so users get immediate feedback, despite the fact that a full response may take close to a minute (vastly exceeding the Doherty Threshold).
The big takeaway here is clear: stream output when users are interacting directly with an LLM. Streaming is not limited to raw output either, and its great for users to get streamed progress information (eg for a long-running process) and structured output.
Tactically, for backend applications I advocate using server-sent events (it’s also what many LLM APIs like GPT-4 use). SSEs are relatively simple compared to websockets, while still providing a responsive experience.
Chat is no silver bullet
As great as text is as a universal interfaces, it has its problems.
Chat interfaces often present a "blank canvas" problem: while users recognize their potential, they struggle to initiate or refine their ideas. In design terms, this is a discoverability issue. Providing templates or domain-specific examples, like "how to take a derivative of a polynomial" in a math app, can help. But as workflows become more complex and involve multiple stakeholders or integrations, this approach proves inadequate.
LLMs, like well-prepared students taking a closed-book exam, can be easily distracted or prone to inaccuracies when not referencing source material. This is akin to relying on an imperfect long-term memory. Grounding LLMs in specific data produces superior outcomes, leveraging their "short-term" memory to avoid errors. For instance, in Atticus, users upload legal documents for analysis, allowing the LLM to identify issues based on the given text, thereby reducing inaccuracies and enhancing relevance.
Merging these concepts often leads to a substantial UX improvement. Grounding an LLM in source material facilitates low-friction user experiences. A text-to-SQL application, for instance, can guide users towards insightful analysis by offering preformed queries such as "Retention by cohort", derived from its database knowledge.
By and large, I recommend judicious use of context windows, but grounding in source material is critical; would you bring your best at a closed-book or open book exam?
UIs still matter
While chat and text interfaces have their limitations, traditional UI elements retain significant value. Humans take seconds to process language, with a well-educated adult reading 200-300 words per minute; understanding adds further delay. In contrast, visual information processing occurs in milliseconds. Thus, visual interfaces are highly efficient, often resulting in more engaging user experiences.
For LLMs, this suggests the importance of transforming their extracted structured data into user-friendly interfaces. Instead of merely presenting raw data or textual summaries, consider visual representations like graphs when time series data is involved, for example.
Formats that have simple text-based delimiters like CSV or JSONL are great because output can be chunked.
Such formats are trivial to buffer, so make intermediate processing easier.
Structured output, especially easily-delimited output can be combined with streaming to get the best of all worlds: feedback for users, and data that can drive the UI.
Tying these ideas together: structured output that’s easy to manipulate textually enables applications to build rich UIs powered by LLMs. It’s a cheat code for building UIs fast.
LLMs and generative AI are exciting tech frontiers I'm eager to master. I invite others to delve into this uncharted field. Mobile's innovations like haptic feedback have been groundbreaking; I anticipate similar strides in AI.