Malleable software in the age of LLMs
Historically, end-user programming efforts have been limited by the difficulty of turning informal user intent into executable code, but LLMs can help open up this programming bottleneck. However, user interfaces still matter, and while chatbots have their place, they are an essentially limited interaction mode. An intriguing way forward is to combine LLMs with open-ended, user-moldable computational media, where the AI acts as an assistant to help users directly manipulate and extend their tools over time.
LLMs will represent a step change in tool support for end-user programming: the ability of normal people to fully harness the general power of computers without resorting to the complexity of normal programming. Until now, that vision has been bottlenecked on turning fuzzy informal intent into formal, executable code; now that bottleneck is rapidly opening up thanks to LLMs.
If this hypothesis indeed comes true, we might start to see some surprising changes in the way people use software: One-off scripts: Normal computer users have their AI create and execute scripts dozens of times a day, to perform tasks like data analysis, video editing, or automating tedious tasks. One-off GUIs: People use AI to create entire GUI applications just for performing a single specific task—containing just the features they need, no bloat. Build don’t buy: Businesses develop more software in-house that meets their custom needs, rather than buying SaaS off the shelf, since it’s now cheaper to get software tailored to the use case. Modding/extensions: Consumers and businesses demand the ability to extend and mod their existing software, since it’s now easier to specify a new feature or a tweak to match a user’s workflow. Recombination: Take the best parts of the different applications you like best, and create a new hybrid that composes them together.
Chat will never feel like driving a car, no matter how good the bot is. In their 1986 book Understanding Computers and Cognition, Terry Winograd and Fernando Flores elaborate on this point: In driving a car, the control interaction is normally transparent. You do not think “How far should I turn the steering wheel to go around that curve?” In fact, you are not even aware (unless something intrudes) of using a steering wheel…The long evolution of the design of automobiles has led to this readiness-to-hand. It is not achieved by having a car communicate like a person, but by providing the right coupling between the driver and action in the relevant domain (motion down the road).
Think about how a spreadsheet works. If you have a financial model in a spreadsheet, you can try changing a number in a cell to assess a scenario—this is the inner loop of direct manipulation at work. But, you can also edit the formulas! A spreadsheet isn’t just an “app” focused on a specific task; it’s closer to a general computational medium which lets you flexibly express many kinds of tasks. The “platform developers"—the creators of the spreadsheet—have given you a set of general primitives that can be used to make many tools. We might draw the double loop of the spreadsheet interaction like this. You can edit numbers in the spreadsheet, but you can also edit formulas, which edits the tool
what if you had an LLM play the role of the local developer? That is, the user mainly drives the creation of the spreadsheet, but asks for technical help with some of the formulas when needed? The LLM wouldn’t just create an entire solution, it would also teach the user how to create the solution themselves next time.
This picture shows a world that I find pretty compelling. There’s an inner interaction loop that takes advantage of the full power of direct manipulation. There’s an outer loop where the user can also more deeply edit their tools within an open-ended medium. They can get AI support for making tool edits, and grow their own capacity to work in the medium. Over time, they can learn things like the basics of formulas, or how a VLOOKUP works. This structural knowledge helps the user think of possible use cases for the tool, and also helps them audit the output from the LLMs. In a ChatGPT world, the user is left entirely dependent on the AI, without any understanding of its inner mechanism. In a computational medium with AI as assistant, the user’s reliance on the AI gently decreases over time as they become more comfortable in the medium.
How can we develop transformative tools for thought?
a more powerful aim is to develop a new medium for thought. A medium such as, say, Adobe Illustrator is essentially different from any of the individual tools Illustrator contains. Such a medium creates a powerful immersive context, a context in which the user can have new kinds of thought, thoughts that were formerly impossible for them. Speaking loosely, the range of expressive thoughts possible in such a medium is an emergent property of the elementary objects and actions in that medium. If those are well chosen, the medium expands the possible range of human thought.
Memory systems make memory into a choice, rather than an event left up to chance: This changes the relationship to what we're learning, reduces worry, and frees up attention to focus on other kinds of learning, including conceptual, problem-solving, and creative.
Memory systems can be used to build genuine conceptual understanding, not just learn facts: In Quantum Country we achieve this in part through the aspiration to virtuoso card writing, and in part through a narrative embedding of spaced repetition that gradually builds context and understanding.
Mnemonic techniques such as memory palaces are great, but not versatile enough to build genuine conceptual understanding: Such techniques are very specialized, and emphasize artificial connections, not the inherent connections present in much conceptual knowledge. The mnemonic techniques are, however, useful for bootstrapping knowledge with an ad hoc structure.
What practices would lead to tools for thought as transformative as Hindu-Arabic numerals? And in what ways does modern design practice and tech industry product practice fall short? To be successful, you need an insight-through-making loop to be operating at full throttle, combining the best of deep research culture with the best of Silicon Valley product culture.
Historically, work on tools for thought has focused principally on cognition; much of the work has been stuck in Spock-space. But it should take emotion as seriously as the best musicians, movie directors, and video game designers. Mnemonic video is a promising vehicle for such explorations, possibly combining both deep emotional connection with the detailed intellectual mastery the mnemonic medium aspires toward.
It's striking to contrast conventional technical books with the possibilities enabled by executable books. You can imagine starting an executable book with, say, quantum teleportation, right on the first page. You'd provide an interface – perhaps a library is imported – that would let users teleport quantum systems immediately. They could experiment with different parts of the quantum teleportation protocol, illustrating immediately the most striking ideas about it. The user wouldn't necessarily understand all that was going on. But they'd begin to internalize an accurate picture of the meaning of teleportation. And over time, at leisure, the author could unpack some of what might a priori seem to be the drier details. Except by that point the reader will be bought into those details, and they won't be so dry
Aspiring to canonicity, one fun project would be to take the most recent IPCC climate assessment report (perhaps starting with a small part), and develop a version which is executable. Instead of a report full of assertions and references, you'd have a live climate model – actually, many interrelated models – for people to explore. If it was good enough, people would teach classes from it; if it was really superb, not only would they teach classes from it, it could perhaps become the creative working environment for many climate scientists.
In serious mediums, there's a notion of canonical media. By this, we mean instances of the medium that expand its range, and set a new standard widely known amongst creators in that medium. For instance, Citizen Kane, The Godfather, and 2001 all expanded the range of film, and inspired later film makers. It's also true in new media. YouTubers like Grant Sanderson have created canonical videos: they expand the range of what people think is possible in the video form. And something like the Feynman Lectures on Physics does it for textbooks. In each case one gets the sense of people deeply committed to what they're doing. In many of his lectures it's obvious that Feynman isn't just educating: he's reporting the results of a lifelong personal obsession with understanding how the world works. It's thrilling, and it expands the form.
There's a general principle here: good tools for thought arise mostly as a byproduct of doing original work on serious problems.
Game companies develop many genuinely new interface ideas. This perhaps seems surprising, since you'd expect such interface ideas to also suffer from the public goods problem: game designers need to invest enormous effort to develop those interface ideas, and they are often immediately copied (and improved on) by other companies, at little cost. In that sense, they are public goods, and enrich the entire video game ecosystem.
Many video games make most of their money from the first few months of sales. While other companies can (and do) come in and copy or riff on any new ideas, it often does little to affect revenue from the original game, which has already made most of its money In fact, cloning is a real issue in gaming, especially in very technically simple games. An example is the game Threes, which took the developers more than a year to make. Much of that time was spent developing beautiful new interface ideas. The resulting game was so simple that clones and near-clones began appearing within days. One near clone, a game called 2048, sparked a mini-craze, and became far more successful than Threes. At the other extreme, some game companies prolong the revenue-generating lifetime of their games with re-releases, long-lived online versions, and so on. This is particularly common for capital-intensive AAA games, such as the Grand Theft Auto series. In such cases the business model relies less on clever new ideas, and more on improved artwork (for re-release), network effects (for online versions), and branding. . While this copying is no doubt irritating for the companies being copied, it's still worth it for them to make the up-front investment.
in gaming, clever new interface ideas can be distinguishing features which become a game's primary advantage in the marketplace. Indeed, new interface ideas may even help games become classics – consider the many original (at the time) ideas in games ranging from Space Invaders to Wolfenstein 3D to Braid to Monument Valley. As a result, rather than underinvesting, many companies make sizeable investments in developing new interface ideas, even though they then become public goods. In this way the video game industry has largely solved the public goods problems.
It's encouraging that the video game industry can make inroads on the public goods problem. Is there a solution for tools for thought? Unfortunately, the novelty-based short-term revenue approach of the game industry doesn't work. You want people to really master the best new tools for thought, developing virtuoso skill, not spend a few dozen hours (as with most games) getting pretty good, and then moving onto something new.
Adobe shares in common with many other software companies that much of their patenting is defensive: they patent ideas so patent trolls cannot sue them for similar ideas. The situation is almost exactly the reverse of what you'd like. Innovative companies can easily be attacked by patent trolls who have made broad and often rather vague claims in a huge portfolio of patents, none of which they've worked out in much detail. But when the innovative companies develop (at much greater cost) and ship a genuinely good new idea, others can often copy the essential core of that idea, while varying it enough to plausibly evade any patent. The patent system is not protecting the right things.
many of the most fundamental and powerful tools for thought do suffer the public goods problem. And that means tech companies focus elsewhere; it means many imaginative and ambitious people decide to focus elsewhere; it means we haven't developed the powerful practices needed to do work in the area, and a result the field is still in a pre-disciplinary stage. The result, ultimately, is that it means the most fundamental and powerful tools for thought are undersupplied.
Culturally, tech is dominated by an engineering, goal-driven mindset. It's much easier to set KPIs, evaluate OKRs, and manage deliverables, when you have a very specific end-goal in mind. And so it's perhaps not surprising that tech culture is much more sympathetic to AGI and BCI as overall programs of work. But historically it's not the case that humanity's biggest breakthroughs have come about in this goal-driven way. The creation of language – the ur tool for thought – is perhaps the most important occurrence of humanity's existence. And although the origin of language is hotly debated and uncertain, it seems extremely unlikely to have been the result of a goal-driven process. It's amusing to try imagining some prehistoric quarterly OKRs leading to the development of language. What sort of goals could one possibly set? Perhaps a quota of new irregular verbs? It's inconceivable!
Even the computer itself came out of an exploration that would be regarded as ridiculously speculative and poorly-defined in tech today. Someone didn't sit down and think “I need to invent the computer”; that's not a thought they had any frame of reference for. Rather, pioneers such as Alan Turing and Alonzo Church were exploring extremely basic and fundamental (and seemingly esoteric) questions about logic, mathematics, and the nature of what is provable. Out of those explorations the idea of a computer emerged, after many years; it was a discovered concept, not a goal.
Fundamental, open-ended questions seem to be at least as good a source of breakthroughs as goals, no matter how ambitious. This is difficult to imagine or convince others of in Silicon Valley's goal-driven culture. Indeed, we ourselves feel the attraction of a goal-driven culture. But empirically open-ended exploration can be just as, or more successful.
There's a lot of work on tools for thought that takes the form of toys, or “educational” environments. Tools for writing that aren't used by actual writers. Tools for mathematics that aren't used by actual mathematicians. And so on. Even though the creators of such tools have good intentions, it's difficult not to be suspicious of this pattern. It's very easy to slip into a cargo cult mode, doing work that seems (say) mathematical, but which actually avoids engagement with the heart of the subject. Often the creators of these toys have not ever done serious original work in the subjects for which they are supposedly building tools. How can they know what needs to be included?
AI Models in Software UI - LukeW
In the first approach, the primary interface affordance is an input that directly (for the most part) instructs an AI model(s). In this paradigm, people are authoring prompts that result in text, image, video, etc. generation. These prompts can be sequential, iterative, or un-related. Marquee examples are OpenAI's ChatGPT interface or Midjourney's use of Discord as an input mechanism. Since there are few, if any, UI affordances to guide people these systems need to respond to a very wide range of instructions. Otherwise people get frustrated with their primarily hidden (to the user) limitations.
The second approach doesn't include any UI elements for directly controlling the output of AI models. In other words, there's no input fields for prompt construction. Instead instructions for AI models are created behind the scenes as people go about using application-specific UI elements. People using these systems could be completely unaware an AI model is responsible for the output they see.
The third approach is application specific UI with AI assistance. Here people can construct prompts through a combination of application-specific UI and direct model instructions. These could be additional controls that generate portions of those instructions in the background. Or the ability to directly guide prompt construction through the inclusion or exclusion of content within the application. Examples of this pattern are Microsoft's Copilot suite of products for GitHub, Office, and Windows.
they could be overlays, modals, inline menus and more. What they have in common, however, is that they supplement application specific UIs instead of completely replacing them.
LLM Powered Assistants for Complex Interfaces - Nick Arner
complexity can make it difficult for both domain novices and experts alike to learn how to use the interface. LLMs can help reduce this barrier by being leveraged to prove assistance to the user if they’re trying to accomplish something, but don’t exactly know how to navigate the interface.The user could tell the program what they’re trying to do via a text or voice interface, or perhaps, the program may be able to infer the user’s intent or goals based on what actions they’ve taken so far.Modern GUI apps are slowly starting to add in more features for assisting users with navigating the space of available commands and actions via command palettes; popularised in software iA Writer and Superhuman.
for executing a sequence of tasks as part of a complex workflow, LLM powered interfaces afford a richer opportunity for learning and using complex software.The program could walk them through the task they’re trying to accomplish by highlighting and selecting the interface elements in the correct order to accomplish the task, along with explanations provided.
Expert interfaces that take advantage of LLMs may end up looking like they currently do - again, complex tasks require complex interfaces. However, it may be easier and faster for users to learn how to use these interfaces thanks to built-in LLM-powered assistants. This will help them to get into flow faster, improving their productivity and feeling of satisfaction when using this complex software.
unlike Clippy, these new types of assistant would be able to act on the interface directly. These actions will be made in accordance to the goals of the person using them, but each discrete action taken by the assistant on the interface will not be done according to explicit human actions - the goals are directed by he human user, but the steps to achieve those goals are unknown to the user, which is why they’re engaging with the assistant in the first place
r/compsci - What is typically taught in Human Computer Interaction?
Graduate HCI classes are far better because there is so much depth to the field. Basically, through a combination of understanding human psychology, knowing the right questions to ask, and understanding how to properly model how people will use a system you can make software that flows naturally. That last point, sometimes referred to as Cognitive Engineering, is extremely important.
