Arc browser’s ambiguous user alignment
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
Paper page - DocLLM: A layout-aware generative language model for multimodal document understanding
Research Papers in November 2023
Photoshop for text
In the near future, transforming text will become as commonplace as filtering images. A new set of tools is emerging, like Photoshop for text.
Up until now, text editors have been focused on input. The next evolution of text editors will make it easy to alter, summarize and lengthen text. You’ll be able to do this for entire documents, not just individual sentences or paragraphs. The filters will be instantaneous and as good as if you wrote the text yourself. You will also be able to do this with local files, on your device, without relying on remote servers.
Initially, many of Photoshop’s capabilities were adaptations of analog effects. For example, “dodge” and “burn” are old darkroom techniques used to alter photographs. There are countless skeuomorphic names throughout digital image editing tools that refer to analog processes.
Text seems like it would be easier to manipulate than images. But languages have far more rules than images do. A reader expects writing to follow proper spelling and grammar, a consistent tone, and a logical sequence of sentences. Until now, solving this problem required building complex rule-based algorithms. Now we can solve this problem with AI models that can teach themselves to create readable text in any language.
AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support
This paper maps concepts from AI alignment onto a basic, three step interaction cycle, yielding a corresponding set of alignment objectives: 1) specification alignment: ensuring the user can efficiently and reliably communicate objectives to the AI, 2) process alignment: providing the ability to verify and optionally control the AI's execution process, and 3) evaluation support: ensuring the user can verify and understand the AI's output.
the notion of a Process Gulf, which highlights how differences between human and AI processes can lead to challenges in AI control.
The OpenAI Keynote
what I cheered as an analyst was Altman’s clear articulation of the company’s priorities: lower price first, speed later. You can certainly debate whether that is the right set of priorities (I think it is, because the biggest need now is for increased experimentation, not optimization), but what I appreciated was the clarity.
The fact that Microsoft is benefiting from OpenAI is obvious; what this makes clear is that OpenAI uniquely benefits from Microsoft as well, in a way they would not from another cloud provider: because Microsoft is also a product company investing in the infrastructure to run OpenAI’s models for said products, it can afford to optimize and invest ahead of usage in a way that OpenAI alone, even with the support of another cloud provider, could not. In this case that is paying off in developers needing to pay less, or, ideally, have more latitude to discover use cases that result in them paying far more because usage is exploding.
You can, in effect, program a GPT, with language, just by talking to it. It’s easy to customize the behavior so that it fits what you want. This makes building them very accessible, and it gives agency to everyone.
Stephen Wolfram explained:
For decades there’s been a dichotomy in thinking about AI between “statistical approaches” of the kind ChatGPT uses, and “symbolic approaches” that are in effect the starting point for Wolfram|Alpha. But now—thanks to the success of ChatGPT—as well as all the work we’ve done in making Wolfram|Alpha understand natural language—there’s finally the opportunity to combine these to make something much stronger than either could ever achieve on their own.
This new model somewhat alleviates the problem: now, instead of having to select the correct plug-in (and thus restart your chat), you simply go directly to the GPT in question. In other words, if I want to create a poster, I don’t enable the Canva plugin in ChatGPT, I go to Canva GPT in the sidebar. Notice that this doesn’t actually solve the problem of needing to have selected the right tool; what it does do is make the choice more apparent to the user at a more appropriate stage in the process, and that’s no small thing.
ChatGPT will seamlessly switch between text generation, image generation, and web browsing, without the user needing to change context. What is necessary for the plug-in/GPT idea to ultimately take root is for the same capabilities to be extended broadly: if my conversation involved math, ChatGPT should know to use Wolfram|Alpha on its own, without me adding the plug-in or going to a specialized GPT.
the obvious technical challenges of properly exposing capabilities and training the model to know when to invoke those capabilities are a textbook example of Professor Clayton Christensen’s theory of integration and modularity, wherein integration works better when a product isn’t good enough; it is only when a product exceeds expectation that there is room for standardization and modularity.
To summarize the argument, consumers care about things in ways that are inconsistent with whatever price you might attach to their utility, they prioritize ease-of-use, and they care about the quality of the user experience and are thus especially bothered by the seams inherent in a modular solution. This means that integrated solutions win because nothing is ever “good enough”
the fact of the matter is that a lot of people use ChatGPT for information despite the fact it has a well-documented flaw when it comes to the truth; that flaw is acceptable, because to the customer ease-of-use is worth the loss of accuracy. Or look at plug-ins: the concept as originally implemented has already been abandoned, because the complexity in the user interface was more detrimental than whatever utility might have been possible. It seems likely this pattern will continue: of course customers will say that they want accuracy and 3rd-party tools; their actions will continue to demonstrate that convenience and ease-of-use matter most.
How OpenAI is building a path toward AI agents
Many of the most pressing concerns around AI safety will come with these features, whenever they arrive. The fear is that when you tell AI systems to do things on your behalf, they might accomplish them via harmful means. This is the fear embedded in the famous paperclip problem, and while that remains an outlandish worst-case scenario, other potential harms are much more plausible.Once you start enabling agents like the ones OpenAI pointed toward today, you start building the path toward sophisticated algorithms manipulating the stock market; highly personalized and effective phishing attacks; discrimination and privacy violations based on automations connected to facial recognition; and all the unintended (and currently unimaginable) consequences of infinite AIs colliding on the internet.
That same Copy Editor I described above might be able in the future to automate the creation of a series of blogs, publish original columns on them every day, and promote them on social networks via an established daily budget, all working toward the overall goal of undermining support for Ukraine.
Which actions is OpenAI comfortable letting GPT-4 take on the internet today, and which does the company not want to touch? Altman’s answer is that, at least for now, the company wants to keep it simple. Clear, direct actions are OK; anything that involves high-level planning isn’t.
For most of his keynote address, Altman avoided making lofty promises about the future of AI, instead focusing on the day-to-day utility of the updates that his company was announcing. In the final minutes of his talk, though, he outlined a loftier vision.“We believe that AI will be about individual empowerment and agency at a scale we've never seen before,” Altman said, “And that will elevate humanity to a scale that we've never seen before, either. We'll be able to do more, to create more, and to have more. As intelligence is integrated everywhere, we will all have superpowers on demand.”
Generative AI and intellectual property — Benedict Evans
A person can’t mimic another voice perfectly (impressionists don’t have to pay licence fees) but they can listen to a thousand hours of music and make something in that style - a ‘pastiche’, we sometimes call it. If a person did that, they wouldn’t have to pay a fee to all those artists, so if we use a computer for that, do we need to pay them?
I think most people understand that if I post a link to a news story on my Facebook feed and tell my friends to read it, it’s absurd for the newspaper to demand payment for this. A newspaper, indeed, doesn’t pay a restaurant a percentage when it writes a review.
one way to think about this might be that AI makes practical at a massive scale things that were previously only possible on a small scale. This might be the difference between the police carrying wanted pictures in their pockets and the police putting face recognition cameras on every street corner - a difference in scale can be a difference in principle. What outcomes do we want? What do we want the law to be? What can it be?
OpenAI hasn’t ‘pirated’ your book or your story in the sense that we normally use that word, and it isn’t handing it out for free. Indeed, it doesn’t need that one novel in particular at all. In Tim O’Reilly’s great phrase, data isn’t oil; data is sand. It’s only valuable in the aggregate of billions,, and your novel or song or article is just one grain of dust in the Great Pyramid.
it’s supposed to be inferring ‘intelligence’ (a placeholder word) from seeing as much as possible of how people talk, as a proxy for how they think.
it doesn’t need your book or website in particular and doesn’t care what you in particular wrote about, but it does need ‘all’ the books and ‘all’ the websites. It would work if one company removed its content, but not if everyone did.
What if I use an engine trained on the last 50 years of music to make something that sounds entirely new and original? No-one should be under the delusion that this won’t happen.
I can buy the same camera as Cartier-Bresson, and I can press the button and make a picture without being able to draw or paint, but that’s not what makes the artist - photography is about where you point the camera, what image you see and which you choose. No-one claims a machine made the image.
Spotify already has huge numbers of ‘white noise’ tracks and similar, gaming the recommendation algorithm and getting the same payout per play as Taylor Swift or the Rolling Stones. If we really can make ‘music in the style of the last decade’s hits,’ how much of that will there be, and how will we wade through it? How will we find the good stuff, and how will we define that? Will we care?
Synthography – An Invitation to Reconsider the Rapidly Changing Toolkit of Digital Image Creation as a New Genre Beyond Photography
With the comprehensive application of Artificial Intelligence into the creation and post production of images, it seems questionable if the resulting visualisations can still be considered ‘photographs’ in a classical sense – drawing with light. Automation has been part of the popular strain of photography since its inception, but even the amateurs with only basic knowledge of the craft could understand themselves as author of their images. We state a legitimation crisis for the current usage of the term. This paper is an invitation to consider Synthography as a term for a new genre for image production based on AI, observing the current occurrence and implementation in consumer cameras and post-production.
Natural Language Is an Unnatural Interface
On the user experience of interacting with LLMs
Prompt engineers not only need to get the model to respond to a given question but also structure the output in a parsable way (such as JSON), in case it needs to be rendered in some UI components or be chained into the input of a future LLM query. They scaffold the raw input that is fed into an LLM so the end user doesn’t need to spend time thinking about prompting at all.
From the user’s side, it’s hard to decide what to ask while providing the right amount of context.From the developer’s side, two problems arise. It’s hard to monitor natural language queries and understand how users are interacting with your product. It’s also hard to guarantee that an LLM can successfully complete an arbitrary query. This is especially true for agentic workflows, which are incredibly brittle in practice.
When we speak to other people, there is a shared context that we communicate under. We’re not just exchanging words, but a larger information stream that also includes intonation while speaking, hand gestures, memories of each other, and more. LLMs unfortunately cannot understand most of this context and therefore, can only do as much as is described by the prompt
most people use LLMs for ~4 basic natural language tasks, rarely taking advantage of the conversational back-and-forth built into chat systems:Summarization: Summarizing a large amount of information or text into a concise yet comprehensive summary. This is useful for quickly digesting information from long articles, documents or conversations. An AI system needs to understand the key ideas, concepts and themes to produce a good summary.ELI5 (Explain Like I'm 5): Explaining a complex concept in a simple, easy-to-understand manner without any jargon. The goal is to make an explanation clear and simple enough for a broad, non-expert audience.Perspectives: Providing multiple perspectives or opinions on a topic. This could include personal perspectives from various stakeholders, experts with different viewpoints, or just a range of ways a topic can be interpreted based on different experiences and backgrounds. In other words, “what would ___ do?”Contextual Responses: Responding to a user or situation in an appropriate, contextualized manner (via email, message, etc.). Contextual responses should feel organic and on-topic, as if provided by another person participating in the same conversation.
Prompting nearly always gets in the way because it requires the user to think. End users ultimately do not wish to confront an empty text box in accomplishing their goals. Buttons and other interactive design elements make life easier.The interface makes all the difference in crafting an AI system that augments and amplifies human capabilities rather than adding additional cognitive load.Similar to standup comedy, delightful LLM-powered experiences require a subversion of expectation.
Users will expect the usual drudge of drafting an email or searching for a nearby restaurant, but instead will be surprised by the amount of work that has already been done for them from the moment that their intent is made clear. For example, it would a great experience to discover pre-written email drafts or carefully crafted restaurant and meal recommendations that match your personal taste.If you still need to use a text input box, at a minimum, also provide some buttons to auto-fill the prompt box. The buttons can pass LLM-generated questions to the prompt box.
Data compression == AI??
Chatbots are accurate, show more empathy than doctors in answering questions, study finds
Jared Palmer on Twitter
Why AI Will Save the World | Andreessen Horowitz
What is the testable hypothesis? What would falsify the hypothesis? How do we know when we are getting into a danger zone? These questions go mainly unanswered apart from “You can’t prove it won’t happen!” In fact, these Baptists’ position is so non-scientific and so extreme – a conspiracy theory about math and code – and is already calling for physical violence, that I will do something I would normally not do and question their motives as well.
daviddao/awful-ai: 😈Awful AI is a curated list to track current scary usages of AI - hoping to raise awareness
Advanced ChatGPT: Full Guide
Wikipedia:Large language models - Wikipedia
I’m a Student. You Have No Idea How Much We’re Using ChatGPT.
This time, it feels different
In the past several months, I have come across people who do programming, legal work, business, accountancy and finance, fashion design, architecture, graphic design, research, teaching, cooking, travel planning, event management etc., all of whom have started using the same tool, ChatGPT, to solve use cases specific to their domains and problems specific to their personal workflows. This is unlike everyone using the same messaging tool or the same document editor. This is one tool, a single class of technology (LLM), whose multi-dimensionality has achieved widespread adoption across demographics where people are discovering how to solve a multitude of problems with no technical training, in the one way that is most natural to humans—via language and conversations.
I cannot recall the last time a single tool gained such widespread acceptance so swiftly, for so many use cases, across entire demographics.
there is significant substance beneath the hype. And that is what is worrying; the prospect of us starting to depend indiscriminately on poorly understood blackboxes, currently offered by megacorps, that actually work shockingly well.
If a single dumb, stochastic, probabilistic, hallucinating, snake oil LLM with a chat UI offered by one organisation can have such a viral, organic, and widespread adoption—where large disparate populations, people, corporations, and governments are integrating it into their daily lives for use cases that they are discovering themselves—imagine what better, faster, more “intelligent” systems to follow in the wake of what exists today would be capable of doing.
A policy for “AI anxiety”
We ended up codifying this into an actual AI policy to bring clarity to the organisation.[10] It states that no one at Zerodha will lose their job if a technology implementation (AI or non-AI) directly renders their existing responsibilities and tasks obsolete. The goal is to prevent unexpected rug-pulls from underneath the feet of humans. Instead, there will be efforts to create avenues and opportunities for people to upskill and switch between roles and responsibilities
To those who believe that new jobs will emerge at meaningful rates to absorb the losses and shocks, what exactly are those new jobs? To those who think that governments will wave magic wands to regulate AI technologies, one just has to look at how well governments have managed to regulate, and how well humanity has managed to self-regulate, human-made climate change and planetary destruction. It is not then a stretch to think that the unraveling of our civilisation and its socio-politico-economic systems that are built on extracting, mass producing, and mass consuming garbage, might be exacerbated. Ted Chiang’s recent essay is a grim, but fascinating exploration of this. Speaking of grim, we can always count on us to ruin nice things! Along the lines of Murphy’s Law,[11] I present:
Anything that can be ruined, will be ruined — Grumphy’s law
I asked GPT-4 to summarise this post and write five haikus on it. I have always operated a piece of software, but never asked it anything—that is, until now. Anyway, here is the fifth one.
Future’s tangled web,
Offloading choices to black boxes,
Humanity’s voice fades
Prompt Engineering
Think of language models like ChatGPT as a “calculator for words”
This is reflected in their name: a “language model” implies that they are tools for working with language. That’s what they’ve been trained to do, and it’s language manipulation where they truly excel.
Want them to work with specific facts? Paste those into the language model as part of your original prompt!
There are so many applications of language models that fit into this calculator for words category:
Summarization. Give them an essay and ask for a summary.
Question answering: given these paragraphs of text, answer this specific question about the information they represent.
Fact extraction: ask for bullet points showing the facts presented by an article.
Rewrites: reword things to be more “punchy” or “professional” or “sassy” or “sardonic”—part of the fun here is using increasingly varied adjectives and seeing what happens. They’re very good with language after all!
Suggesting titles—actually a form of summarization.
World’s most effective thesaurus. “I need a word that hints at X”, “I’m very Y about this situation, what could I use for Y?”—that kind of thing.
Fun, creative, wild stuff. Rewrite this in the voice of a 17th century pirate. What would a sentient cheesecake think of this? How would Alexander Hamilton rebut this argument? Turn this into a rap battle. Illustrate this business advice with an anecdote about sea otters running a kayak rental shop. Write the script for kickstarter fundraising video about this idea.
A flaw in this analogy: calculators are repeatable
Andy Baio pointed out a flaw in this particular analogy: calculators always give you the same answer for a given input. Language models don’t—if you run the same prompt through a LLM several times you’ll get a slightly different reply every time.
Investing in AI
Coming back to the internet analogy, how did Google, Amazon etc ended up so successful? Metcalf’s law explains this. It states that as more users join the network, the value of the network increases thereby attracting even more users. The most important thing here was to make people join your network. The end goal was to build the largest network possible. Google did this with search, Amazon did this with retail, Facebook did this with social.
Collecting as much data as possible is important. But you don’t want just any data. The real competitive advantage lies in having high-quality proprietary data. Think about it this way, what does it take to build an AI system? It takes 1) data, which is the input that goes into the 2) AI models which are analogous to machines and lastly it requires energy to run these models i.e. 3) compute. Today, most AI models have become standardized and are widely available. And on the other hand, the cost of compute is rapidly trending to zero. Hence AI models and compute have become a commodity. The only thing that remains is data. But even data is widely available on the internet. Thus, a company can only have a true competitive advantage when it has access to high-quality proprietary data.
Recently, Chamath Palihapitiya gave an interview where he had this interesting analogy. He compared these large language models like GPT to refrigeration. He said “People that invented refrigeration, made some money. But most of the money was made by Coca-Cola who used refrigeration to build an empire. And so similarly, companies building these large models will make some money, but the Coca-Cola is yet to be built.” What he meant by this is that right now there are lot of companies crawling the open web to scrap the data. Once that is widely available like refrigeration, we will see companies and startups coming up with proprietary data building on top of it
Society's Technical Debt and Software's Gutenberg Moment
Past innovations have made costly things become cheap enough to proliferate widely across society. He suggests LLMs will make software development vastly more accessible and productive, alleviating the "technical debt" caused by underproduction of software over decades.
Software is misunderstood. It can feel like a discrete thing, something with which we interact. But, really, it is the intrusion into our world of something very alien. It is the strange interaction of electricity, semiconductors, and instructions, all of which somehow magically control objects that range from screens to robots to phones, to medical devices, laptops, and a bewildering multitude of other things. It is almost infinitely malleable, able to slide and twist and contort itself such that, in its pliability, it pries open doorways as yet unseen.
the clearing price for software production will change. But not just because it becomes cheaper to produce software. In the limit, we think about this moment as being analogous to how previous waves of technological change took the price of underlying technologies—from CPUs, to storage and bandwidth—to a reasonable approximation of zero, unleashing a flood of speciation and innovation. In software evolutionary terms, we just went from human cycle times to that of the drosophila: everything evolves and mutates faster.
A software industry where anyone can write software, can do it for pennies, and can do it as easily as speaking or writing text, is a transformative moment. It is an exaggeration, but only a modest one, to say that it is a kind of Gutenberg moment, one where previous barriers to creation—scholarly, creative, economic, etc—are going to fall away, as people are freed to do things only limited by their imagination, or, more practically, by the old costs of producing software.
We have almost certainly been producing far less software than we need. The size of this technical debt is not knowable, but it cannot be small, so subsequent growth may be geometric. This would mean that as the cost of software drops to an approximate zero, the creation of software predictably explodes in ways that have barely been previously imagined.
Entrepreneur and publisher Tim O’Reilly has a nice phrase that is applicable at this point. He argues investors and entrepreneurs should “create more value than you capture.” The technology industry started out that way, but in recent years it has too often gone for the quick win, usually by running gambits from the financial services playbook. We think that for the first time in decades, the technology industry could return to its roots, and, by unleashing a wave of software production, truly create more value than its captures.
Software production has been too complex and expensive for too long, which has caused us to underproduce software for decades, resulting in immense, society-wide technical debt.
technology has a habit of confounding economics. When it comes to technology, how do we know those supply and demand lines are right? The answer is that we don’t. And that’s where interesting things start happening. Sometimes, for example, an increased supply of something leads to more demand, shifting the curves around. This has happened many times in technology, as various core components of technology tumbled down curves of decreasing cost for increasing power (or storage, or bandwidth, etc.).
Suddenly AI has become cheap, to the point where people are “wasting” it via “do my essay” prompts to chatbots, getting help with microservice code, and so on. You could argue that the price/performance of intelligence itself is now tumbling down a curve, much like as has happened with prior generations of technology.
it’s worth reminding oneself that waves of AI enthusiasm have hit the beach of awareness once every decade or two, only to recede again as the hyperbole outpaces what can actually be done.
ChatPDF - Chat with any PDF!
Pause Giant AI Experiments: An Open Letter - Future of Life Institute
AI-Powered Regex Solver
Universal Summarizer by Kagi
ChatGPT sends shockwaves across college campuses
Across universities, professors have been looking into ways to engage students so cheating with ChatGPT is not as attractive, such as making assignments more personalized to students’ interests and requiring students to complete brainstorming assignments and essay drafts instead of just one final paper.
Character.AI