Found 161 bookmarks
Newest
Apple intelligence and AI maximalism — Benedict Evans
Apple intelligence and AI maximalism — Benedict Evans
The chatbot might replace all software with a prompt - ‘software is dead’. I’m skeptical about this, as I’ve written here, but Apple is proposing the opposite: that generative AI is a technology, not a product.
Apple is, I think, signalling a view that generative AI, and ChatGPT itself, is a commodity technology that is most useful when it is: Embedded in a system that gives it broader context about the user (which might be search, social, a device OS, or a vertical application) and Unbundled into individual features (ditto), which are inherently easier to run as small power-efficient models on small power-efficient devices on the edge (paid for by users, not your capex budget) - which is just as well, because… This stuff will never work for the mass-market if we have marginal cost every time the user presses ‘OK’ and we need a fleet of new nuclear power-stations to run it all.
Apple has built its own foundation models, which (on the benchmarks it published) are comparable to anything else on the market, but there’s nowhere that you can plug a raw prompt directly into the model and get a raw output back - there are always sets of buttons and options shaping what you ask, and that’s presented to the user in different ways for different features. In most of these features, there’s no visible bot at all. You don’t ask a question and get a response: instead, your emails are prioritised, or you press ‘summarise’ and a summary appears. You can type a request into Siri (and Siri itself is only one of the many features using Apple’s models), but even then you don’t get raw model output back: you get GUI. The LLM is abstracted away as an API call.
Apple is treating this as a technology to enable new classes of features and capabilities, where there is design and product management shaping what the technology does and what the user sees, not as an oracle that you ask for things.
Apple is drawing a split between a ‘context model’ and a ‘world model’. Apple’s models have access to all the context that your phone has about you, powering those features, and this is all private, both on device and in Apple’s ‘Private Cloud’. But if you ask for ideas for what to make with a photo of your grocery shopping, then this is no longer about your context, and Apple will offer to send that to a third-party world model - today, ChatGPT.
that’s clearly separated into a different experience where you should have different expectations, and it’s also, of course, OpenAI’s brand risk, not Apple’s. Meanwhile, that world model gets none of your context, only your one-off prompt.
Neither OpenAI nor any of the other cloud models from new companies (Anthropic, Mistral etc) have your emails, messages, locations, photos, files and so on.
Apple is letting OpenAI take the brand risk of creating pizza glue recipes, and making error rates and abuse someone else’s problem, while Apple watches from a safe distance.
The next step, probably, is to take bids from Bing and Google for the default slot, but meanwhile, more and more use-cases will be quietly shifted from the third party to Apple’s own models. It’s Apple’s own software that decides where the queries go, after all, and which ones need the third party at all.
A lot of the compute to run Apple Intelligence is in end-user devices paid for by the users, not Apple’s capex budget, and Apple Intelligence is free.
Commoditisation is often also integration. There was a time when ‘spell check’ was a separate product that you had to buy, for hundreds of dollars, and there were dozens of competing products on the market, but over time it was integrated first into the word processor and then the OS. The same thing happened with the last wave of machine learning - style transfer or image recognition were products for five minutes and then became features. Today ‘summarise this document’ is AI, and you need a cloud LLM that costs $20/month, but tomorrow the OS will do that for free. ‘AI is whatever doesn’t work yet.’
Apple is big enough to take its own path, just as it did moving the Mac to its own silicon: it controls the software and APIs on top of the silicon that are the basis of those developer network effects, and it has a world class chip team and privileged access to TSMC.
Apple is doing something slightly different - it’s proposing a single context model for everything you do on your phone, and powering features from that, rather than adding disconnected LLM-powered features at disconnected points across the company.
·ben-evans.com·
Apple intelligence and AI maximalism — Benedict Evans
On the necessity of a sin
On the necessity of a sin
AI excels at tasks that are intensely human: writing, ideation, faking empathy. However, it struggles with tasks that machines typically excel at, such as repeating a process consistently or performing complex calculations without assistance. In fact, it tends to solve problems that machines are good at in a very human way. When you get GPT-4 to do data analysis of a spreadsheet for you, it doesn’t innately read and understand the numbers. Instead, it uses tools the way we might, glancing at a bit of the data to see what is in it, and then writing Python programs to try to actually do the analysis. And its flaws — making up information, false confidence in wrong answers, and occasional laziness — also seem very much more like human than machine errors.
This quasi-human weirdness is why the best users of AI are often managers and teachers, people who can understand the perspective of others and correct it when it is going wrong.
Rather than focusing purely on teaching people to write good prompts, we might want to spend more time teaching them to manage the AI.
Telling the system “who” it is helps shape the outputs of the system. Telling it to act as a teacher of MBA students will result in a different output than if you ask it to act as a circus clown. This isn’t magical—you can’t say Act as Bill Gates and get better business advice or write like Hemingway and get amazing prose —but it can help make the tone and direction appropriate for your purpose.
·oneusefulthing.org·
On the necessity of a sin
What Apple's AI Tells Us: Experimental Models⁴
What Apple's AI Tells Us: Experimental Models⁴
Companies are exploring various approaches, from large, less constrained frontier models to smaller, more focused models that run on devices. Apple's AI focuses on narrow, practical use cases and strong privacy measures, while companies like OpenAI and Anthropic pursue the goal of AGI.
the most advanced generalist AI models often outperform specialized models, even in the specific domains those specialized models were designed for. That means that if you want a model that can do a lot - reason over massive amounts of text, help you generate ideas, write in a non-robotic way — you want to use one of the three frontier models: GPT-4o, Gemini 1.5, or Claude 3 Opus.
Working with advanced models is more like working with a human being, a smart one that makes mistakes and has weird moods sometimes. Frontier models are more likely to do extraordinary things but are also more frustrating and often unnerving to use. Contrast this with Apple’s narrow focus on making AI get stuff done for you.
Every major AI company argues the technology will evolve further and has teased mysterious future additions to their systems. In contrast, what we are seeing from Apple is a clear and practical vision of how AI can help most users, without a lot of effort, today. In doing so, they are hiding much of the power, and quirks, of LLMs from their users. Having companies take many approaches to AI is likely to lead to faster adoption in the long term. And, as companies experiment, we will learn more about which sets of models are correct.
·oneusefulthing.org·
What Apple's AI Tells Us: Experimental Models⁴
Apple Intelligence is Right On Time
Apple Intelligence is Right On Time

Summary

  • Apple remains primarily a hardware company, and an AI-mediated future will still require devices, playing to Apple's strengths in design and integration.
  • AI is a complement to Apple's business, not disruptive, as it makes high-performance hardware more relevant and could drive meaningful iPhone upgrade cycles.
  • The smartphone is the ideal device for most computing tasks and the platform on which the future happens, solidifying the relevance of Apple's App Store ecosystem.
  • Apple's partnership with OpenAI for chatbot functionality allows it to offer best-in-class capabilities without massive investments, while reducing the threat of OpenAI building a competing device.
  • Building out the infrastructure for API-level AI features is a challenge for Apple, but one that is solvable given its control over the interface and integration of on-device and cloud processing.
  • The only significant threat to Apple is Google, which could potentially develop differentiated AI capabilities for Android that drive switching from iPhone users, though this is uncertain.
  • Microsoft's missteps with its Recall feature demonstrate the risks of pushing AI features too aggressively, validating Apple's more cautious approach.
  • Apple's user-centric orientation and brand promise of privacy and security align well with the need to deliver AI features in an integrated, trustworthy manner.
·stratechery.com·
Apple Intelligence is Right On Time
How to Make a Great Government Website—Asterisk
How to Make a Great Government Website—Asterisk
Summary: Dave Guarino, who has worked extensively on improving government benefits programs like SNAP in California, discusses the challenges and opportunities in civic technology. He explains how a simplified online application, GetCalFresh.org, was designed to address barriers that prevent eligible people from accessing SNAP benefits, such as a complex application process, required interviews, and document submission. Guarino argues that while technology alone cannot solve institutional problems, it provides valuable tools for measuring and mitigating administrative burdens. He sees promise in using large language models to help navigate complex policy rules. Guarino also reflects on California's ambitious approach to benefits policy and the structural challenges, like Prop 13 property tax limits, that impact the state's ability to build up implementation capacity.
there are three big categories of barriers. The application barrier, the interview barrier, and the document barrier. And that’s what we spent most of our time iterating on and building a system that could slowly learn about those barriers and then intervene against them.
The application is asking, “Are you convicted of this? Are you convicted of that? Are you convicted of this other thing?” What is that saying to you, as a person, about what the system thinks of you?
Often they’ll call from a blocked number. They’ll send you a notice of when your interview is scheduled for, but this notice will sometimes arrive after the actual date of the interview. Most state agencies are really slammed right now for a bunch of reasons, including Medicaid unwinding. And many of the people assisting on Medicaid are the same workers who process SNAP applications. If you missed your phone interview, you have to call to reschedule it. But in many states, you can’t get through, or you have to call over and over and over again. For a lot of people, if they don’t catch that first interview call, they’re screwed and they’re not going to be approved.
getting to your point about how a website can fix this —  the end result was lowest-burden application form that actually gets a caseworker what they need to efficiently and effectively process it. We did a lot of iteration to figure out that sweet spot.
We didn’t need to do some hard system integration that would potentially take years to develop — we were just using the system as it existed. Another big advantage was that we had to do a lot of built-in data validation because we could not submit anything that was going to fail the county application. We discovered some weird edge cases by doing this.
A lot of times when you want to build a new front end for these programs, it becomes this multiyear, massive project where you’re replacing everything all at once. But if you think about it, there’s a lot of potential in just taking the interfaces you have today, building better ones on top of them, and then using those existing ones as the point of integration.
Government tends to take a more high-modernist approach to the software it builds, which is like “we’re going to plan and know up front how everything is, and that way we’re never going to have to make changes.” In terms of accreting layers — yes, you can get to that point. But I think a lot of the arguments I hear that call for a fundamental transformation suffer from the same high-modernist thinking that is the source of much of the status quo.
If you slowly do this kind of stuff, you can build resilient and durable interventions in the system without knocking it over wholesale. For example, I mentioned procedural denials. It would be adding regulations, it would be making technology systems changes, blah, blah, blah, to have every state report why people are denied, at what rate, across every state up to the federal government. It would take years to do that, but that would be a really, really powerful change in terms of guiding feedback loops that the program has.
Guarino argues that attempts to fundamentally transform government technology often suffer from the same "high-modernist" thinking that created problematic legacy systems in the first place. He advocates for incremental improvements that provide better measurement and feedback loops.
when you start to read about civic technology, it very, very quickly becomes clear that things that look like they are tech problems are actually about institutional culture, or about policy, or about regulatory requirements.
If you have an application where you think people are struggling, you can measure how much time people take on each page. A lot of what technology provides is more rigorous measurement of the burdens themselves. A lot of these technologies have been developed in commercial software because there’s such a massive incentive to get people who start a transaction to finish it. But we can transplant a lot of those into government services and have orders of magnitude better situational awareness.
There’s this starting point thesis: Tech can solve these government problems, right? There’s healthcare.gov and the call to bring techies into government, blah, blah, blah. Then there’s the antithesis, where all these people say, well, no, it’s institutional problems. It’s legal problems. It’s political problems. I think either is sort of an extreme distortion of reality. I see a lot of more oblique levers that technology can pull in this area.
LLMs seem to be a fundamental breakthrough in manipulating words, and at the end of the day, a lot of government is words. I’ve been doing some active experimentation with this because I find it very promising. One common question people have is, “Who’s in my household for the purposes of SNAP?” That’s actually really complicated when you think about people who are living in poverty — they might be staying with a neighbor some of the time, or have roommates but don’t share food, or had to move back home because they lost their job.
I’ve been taking verbatim posts from Reddit that are related to the household question and inputting them into LLMs with some custom prompts that I’ve been iterating on, as well as with the full verbatim federal regulations about household definition. And these models do seem pretty capable at doing some base-level reasoning over complex, convoluted policy words in a way that I think could be really promising.
caseworkers are spending a lot of their time figuring out, wait, what rule in this 200-page policy manual is actually relevant in this specific circumstance? I think LLMS are going to be really impactful there.
It is certainly the case that I’ve seen some productive tensions in counties where there’s more of a mix of that and what you might consider California-style Republicans who are like, “We want to run this like a business, we want to be efficient.” That tension between efficiency and big, ambitious policies can be a healthy, productive one. I don’t know to what extent that exists at the state level, and I think there’s hints of more of an interest in focusing on state-level government working better and getting those fundamentals right, and then doing the more ambitious things on a more steady foundation.
California seemed to really try to take every ambitious option that the feds give us on a whole lot of fronts. I think the corollary of that is that we don’t necessarily get the fundamental operational execution of these programs to a strong place, and we then go and start adding tons and tons of additional complexity on top of them.
·asteriskmag.com·
How to Make a Great Government Website—Asterisk
Gemini 1.5 and Google’s Nature
Gemini 1.5 and Google’s Nature
Google is facing many of the same challenges after its decades long dominance of the open web: all of the products shown yesterday rely on a different business model than advertising, and to properly execute and deliver on them will require a cultural shift to supporting customers instead of tolerating them. What hasn’t changed — because it is the company’s nature, and thus cannot — is the reliance on scale and an overwhelming infrastructure advantage. That, more than anything, is what defines Google, and it was encouraging to see that so explicitly put forward as an advantage.
·stratechery.com·
Gemini 1.5 and Google’s Nature
AI Integration and Modularization
AI Integration and Modularization
Summary: The question of integration versus modularization in the context of AI, drawing on the work of economists Ronald Coase and Clayton Christensen. Google is pursuing a fully integrated approach similar to Apple, while AWS is betting on modularization, and Microsoft and Meta are somewhere in between. Integration may provide an advantage in the consumer market and for achieving AGI, but that for enterprise AI, a more modular approach leveraging data gravity and treating models as commodities may prevail. Ultimately, the biggest beneficiary of this dynamic could be Nvidia.
The left side of figure 5-1 indicates that when there is a performance gap — when product functionality and reliability are not yet good enough to address the needs of customers in a given tier of the market — companies must compete by making the best possible products. In the race to do this, firms that build their products around proprietary, interdependent architectures enjoy an important competitive advantage against competitors whose product architectures are modular, because the standardization inherent in modularity takes too many degrees of design freedom away from engineers, and they cannot not optimize performance.
The issue I have with this analysis of vertical integration — and this is exactly what I was taught at business school — is that the only considered costs are financial. But there are other, more difficult to quantify costs. Modularization incurs costs in the design and experience of using products that cannot be overcome, yet cannot be measured. Business buyers — and the analysts who study them — simply ignore them, but consumers don’t. Some consumers inherently know and value quality, look-and-feel, and attention to detail, and are willing to pay a premium that far exceeds the financial costs of being vertically integrated.
Google trains and runs its Gemini family of models on its own TPU processors, which are only available on Google’s cloud infrastructure. Developers can access Gemini through Vertex AI, Google’s fully-managed AI development platform; and, to the extent Vertex AI is similar to Google’s internal development environment, that is the platform on which Google is building its own consumer-facing AI apps. It’s all Google, from top-to-bottom, and there is evidence that this integration is paying off: Gemini 1.5’s industry leading 2 million token context window almost certainly required joint innovation between Google’s infrastructure team and its model-building team.
In AI, Google is pursuing an integrated strategy, building everything from chips to models to applications, similar to Apple's approach in smartphones.
On the other extreme is AWS, which doesn’t have any of its own models; instead its focus has been on its Bedrock managed development platform, which lets you use any model. Amazon’s other focus has been on developing its own chips, although the vast majority of its AI business runs on Nvidia GPUs.
Microsoft is in the middle, thanks to its close ties to OpenAI and its models. The company added Azure Models-as-a-Service last year, but its primary focus for both external customers and its own internal apps has been building on top of OpenAI’s GPT family of models; Microsoft has also launched its own chip for inference, but the vast majority of its workloads run on Nvidia.
Google is certainly building products for the consumer market, but those products are not devices; they are Internet services. And, as you might have noticed, the historical discussion didn’t really mention the Internet. Both Google and Meta, the two biggest winners of the Internet epoch, built their services on commodity hardware. Granted, those services scaled thanks to the deep infrastructure work undertaken by both companies, but even there Google’s more customized approach has been at least rivaled by Meta’s more open approach. What is notable is that both companies are integrating their models and their apps, as is OpenAI with ChatGPT.
Google's integrated AI strategy is unique but may not provide a sustainable advantage for Internet services in the way Apple's integration does for devices
It may be the case that selling hardware, which has to be perfect every year to justify a significant outlay of money by consumers, provides a much better incentive structure for maintaining excellence and execution than does being an Aggregator that users access for free.
Google’s collection of moonshots — from Waymo to Google Fiber to Nest to Project Wing to Verily to Project Loon (and the list goes on) — have mostly been science projects that have, for the most part, served to divert profits from Google Search away from shareholders. Waymo is probably the most interesting, but even if it succeeds, it is ultimately a car service rather far afield from Google’s mission statement “to organize the world’s information and make it universally accessible and useful.”
The only thing that drives meaningful shifts in platform marketshare are paradigm shifts, and while I doubt the v1 version of Pixie [Google’s rumored Pixel-only AI assistant] would be good enough to drive switching from iPhone users, there is at least a path to where it does exactly that.
the fact that Google is being mocked mercilessly for messed-up AI answers gets at why consumer-facing AI may be disruptive for the company: the reason why incumbents find it hard to respond to disruptive technologies is because they are, at least at the beginning, not good enough for the incumbent’s core offering. Time will tell if this gives more fuel to a shift in smartphone strategies, or makes the company more reticent.
while I was very impressed with Google’s enterprise pitch, which benefits from its integration with Google’s infrastructure without all of the overhead of potentially disrupting the company’s existing products, it’s going to be a heavy lift to overcome data gravity, i.e. the fact that many enterprise customers will simply find it easier to use AI services on the same clouds where they already store their data (Google does, of course, also support non-Gemini models and Nvidia GPUs for enterprise customers). To the extent Google wins in enterprise it may be by capturing the next generation of startups that are AI first and, by definition, data light; a new company has the freedom to base its decision on infrastructure and integration.
Amazon is certainly hoping that argument is correct: the company is operating as if everything in the AI value chain is modular and ultimately a commodity, which insinuates that it believes that data gravity will matter most. What is difficult to separate is to what extent this is the correct interpretation of the strategic landscape versus a convenient interpretation of the facts that happens to perfectly align with Amazon’s strengths and weaknesses, including infrastructure that is heavily optimized for commodity workloads.
Unclear if Amazon's strategy is based on true insight or motivated reasoning based on their existing strengths
Meta’s open source approach to Llama: the company is focused on products, which do benefit from integration, but there are also benefits that come from widespread usage, particularly in terms of optimization and complementary software. Open source accrues those benefits without imposing any incentives that detract from Meta’s product efforts (and don’t forget that Meta is receiving some portion of revenue from hyperscalers serving Llama models).
The iPhone maker, like Amazon, appears to be betting that AI will be a feature or an app; like Amazon, it’s not clear to what extent this is strategic foresight versus motivated reasoning.
achieving something approaching AGI, whatever that means, will require maximizing every efficiency and optimization, which rewards the integrated approach.
the most value will be derived from building platforms that treat models like processors, delivering performance improvements to developers who never need to know what is going on under the hood.
·stratechery.com·
AI Integration and Modularization
Google’s A.I. Search Errors Cause a Furor Online
Google’s A.I. Search Errors Cause a Furor Online
This February, the company released Bard’s successor, Gemini, a chatbot that could generate images and act as a voice-operated digital assistant. Users quickly realized that the system refused to generate images of white people in most instances and drew inaccurate depictions of historical figures.With each mishap, tech industry insiders have criticized the company for dropping the ball. But in interviews, financial analysts said Google needed to move quickly to keep up with its rivals, even if it meant growing pains.Google “doesn’t have a choice right now,” Thomas Monteiro, a Google analyst at Investing.com, said in an interview. “Companies need to move really fast, even if that includes skipping a few steps along the way. The user experience will just have to catch up.”
·nytimes.com·
Google’s A.I. Search Errors Cause a Furor Online
My Last Five Years of Work
My Last Five Years of Work
Copywriting, tax preparation, customer service, and many other tasks are or will soon be heavily automated. I can see the beginnings in areas like software development and contract law. Generally, tasks that involve reading, analyzing, and synthesizing information, and then generating content based on it, seem ripe for replacement by language models.
Anyone who makes a living through  delicate and varied movements guided by situation specific know-how can expect to work for much longer than five more years. Thus, electricians, gardeners, plumbers, jewelry makers, hair stylists, as well as those who repair ironwork or make stained glass might find their handiwork contributing to our society for many more years to come
Finally, I expect there to be jobs where humans are preferred to AIs even if the AIs can do the job equally well, or perhaps even if they can do it better. This will apply to jobs where something is gained from the very fact that a human is doing it—likely because it involves the consumer feeling like they have a relationship with the human worker as a human. Jobs that might fall into this category include counselors, doulas, caretakers for the elderly, babysitters, preschool teachers, priests and religious leaders, even sex workers—much has been made of AI girlfriends, but I still expect that a large percentage of buyers of in-person sexual services will have a strong preference for humans. Some have called these jobs “nostalgic jobs.”
It does seem that, overall, unemployment makes people sadder, sicker, and more anxious. But it isn’t clear if this is an inherent fact of unemployment, or a contingent one. It is difficult to isolate the pure psychological effects of being unemployed, because at present these are confounded with the financial effects—if you lose your job, you have less money—which produce stress that would not exist in the context of, say, universal basic income. It is also confounded with the “shame” aspect of being fired or laid off—of not working when you really feel you should be working—as opposed to the context where essentially all workers have been displaced.
One study that gets around the “shame” confounder of unemployment is “A Forced Vacation? The Stress of Being Temporarily Laid Off During a Pandemic” by Scott Schieman, Quan Mai, and Ryu Won Kang. This study looked at Canadian workers who were temporarily laid off several months into the COVID-19 pandemic. They first assumed that such a disruption would increase psychological distress, but instead found that the self-reported wellbeing was more in line with the “forced vacation hypothesis,” suggesting that temporarily laid-off workers might initially experience lower distress due to the unique circumstances of the pandemic.
By May 2020, the distress gap observed in April had vanished, indicating that being temporarily laid off was not associated with higher distress during these months. The interviews revealed that many workers viewed being left without work as a “forced vacation,” appreciating the break from work-related stress and valuing the time for self-care and family. The widespread nature of layoffs normalized the experience, reducing personal blame and fostering a sense of shared experience. Financial strain was mitigated by government support, personal savings, and reduced spending, which buffered against potential distress.
The study suggests that the context and available support systems can significantly alter the psychological outcomes of unemployment—which seems promising for AGI-induced unemployment.
From the studies on plant closures and pandemic layoffs, it seems that shame plays a role in making people unhappy after unemployment, which implies that they might be happier in full automation-induced unemployment, since it would be near-universal and not signify any personal failing.
A final piece that reveals a societal-psychological aspect to how much work is deemed necessary is that the amount has changed over time! The number of hours that people have worked has declined over the past 150 years. Work hours tend to decline as a country gets richer. It seems odd to assume that the current accepted amount of work of roughly 40 hours a week is the optimal amount. The 8-hour work day, weekends, time off—hard-fought and won by the labor movement!—seem to have been triumphs for human health and well-being. Why should we assume that stopping here is right? Why should we assume that less work was better in the past, but less work now would be worse?
Removing the shame that accompanies unemployment by removing the sense that one ought to be working seems one way to make people happier during unemployment. Another is what they do with their free time. Regardless of how one enters unemployment, one still confronts empty and often unstructured time.
One paper, titled “Having Too Little or Too Much Time Is Linked to Lower Subjective Well-Being” by Marissa A. Sharif, Cassie Mogilner, and Hal E. Hershfield tried to explore whether it was possible to have “too much” leisure time.
The paper concluded that it is possible to have too little discretionary time, but also possible to have too much, and that moderate amounts of discretionary time seemed best for subjective well-being. More time could be better, or at least not meaningfully worse, provided it was spent on “social” or “productive” leisure activities. This suggests that how people fare psychologically with their post-AGI unemployment will depend heavily on how they use their time, not how much of it there is
Automation-induced unemployment could feel like retiring depending on how total it is. If essentially no one is working, and no one feels like they should be working, it might be more akin to retirement, in that it would lack the shameful element of feeling set apart from one’s peers.
Women provide another view on whether formal work is good for happiness. Women are, for the most part, relatively recent entrants to the formal labor market. In the U.S., 18% of women were in the formal labor force in 1890. In 2016, 57% were. Has labor force participation made them happier? By some accounts: no. A paper that looked at subjective well-being for U.S. women from the General Social Survey between the 1970s and 2000s—a time when labor force participation was climbing—found both relative and absolute declines in female happiness.
I think women’s work and AI is a relatively optimistic story. Women have been able to automate unpleasant tasks via technological advances, while the more meaningful aspects of their work seem less likely to be automated away.  When not participating in the formal labor market, women overwhelmingly fill their time with childcare and housework. The time needed to do housework has declined over time due to tools like washing machines, dryers, and dishwashers. These tools might serve as early analogous examples of the future effects of AI: reducing unwanted and burdensome work to free up time for other tasks deemed more necessary or enjoyable.
it seems less likely that AIs will so thoroughly automate childcare and child-rearing because this “work” is so much more about the relationship between the parties involved. Like therapy, childcare and teaching seems likely to be one of the forms of work where a preference for a human worker will persist the longest.
In the early modern era, landed gentry and similar were essentially unemployed. Perhaps they did some minor administration of their tenants, some dabbled in politics or were dragged into military projects, but compared to most formal workers they seem to have worked relatively few hours. They filled the remainder of their time with intricate social rituals like balls and parties, hobbies like hunting, studying literature, and philosophy, producing and consuming art, writing letters, and spending time with friends and family. We don’t have much real well-being survey data from this group, but, hedonically, they seem to have been fine. Perhaps they suffered from some ennui, but if we were informed that the great mass of humanity was going to enter their position, I don’t think people would be particularly worried.
I sometimes wonder if there is some implicit classism in people’s worries about unemployment: the rich will know how to use their time well, but the poor will need to be kept busy.
Although a trained therapist might be able to counsel my friends or family through their troubles better, I still do it, because there is value in me being the one to do so. We can think of this as the relational reason for doing something others can do better. I write because sometimes I enjoy it, and sometimes I think it betters me. I know others do so better, but I don’t care—at least not all the time. The reasons for this are part hedonic and part virtue or morality.  A renowned AI researcher once told me that he is practicing for post-AGI by taking up activities that he is not particularly good at: jiu-jitsu, surfing, and so on, and savoring the doing even without excellence. This is how we can prepare for our future where we will have to do things from joy rather than need, where we will no longer be the best at them, but will still have to choose how to fill our days.
·palladiummag.com·
My Last Five Years of Work
How Product Recommendations Broke Google
How Product Recommendations Broke Google
Established publishers seeking relief from the whims of social-media platforms and a brutal advertising environment found in product recommendations steady growth and receptive audiences, especially as e-commerce became a more dominant mode of shopping. Today, these businesses are materially significant — in a 2023 survey, 41 percent of surveyed media companies said that e-commerce accounted for more than a fifth of their revenue, which few can afford to lose. It is a relatively new way in which publishers have become reacquainted — after social-media traffic disappeared and “pivots to video” completed their rotations — with queasy feelings of dependence on massive tech companies, from Facebook and Google to Amazon and, well, Google.
Time magazine announced a brand called Time Stamped, “a project to make perplexing choices less perplexing by supplying our readers with trusted reviews and common sense information,” with “a rigorous process for testing products, analyzing companies,” and making recommendations. In early 2024, the Associated Press announced its own recommendation site, AP Buyline, as an “initiative designed to simplify complex consumer-made decisions by providing its audience with reliable evaluations and straightforward insights,” based on “a thorough method of testing items, evaluating companies and suggesting choices.” Both sites currently recommend money-related products and services, including credit cards, debt-consolidation loans, and insurance policies, categories that can command very high commissions; the AP reportedly plans to expand to home products, beauty, and fashion this month.
Time Stamped and AP Buyline share strikingly similar designs, layouts, and sensibilities. Their content is broadly informative but timid about making strong judgments or comparisons — an AP Buyline article about “The Best Capital One Credit Cards for 2024” heartily recommends nine of them. The writer credited for the article can also be found on Time Stamped writing about Chase credit cards, banks, and rental-car insurance. On both sites, if you look for it, you’ll also find a similar disclaimer. For Time: The information presented here is created independently from the TIME editorial staff. For the AP: AP Buyline’s content is created independently of The Associated Press newsroom. By independently, both companies mean that their product-recommendation sites are operated by a company called Taboola.
Over the years, Taboola, which is best understood as an advertising company, became a major player in affiliate marketing, too, through its acquisition of Skimlinks, a popular service for adding affiliate tags to content. In 2023, it started pitching a product called Taboola Turnkey Commerce, which claims to offer the benefits of starting a product-recommendation sub-brand minus the hassle of actually building an operation.
As her site has disappeared from view on Google, Navarro has been keeping an eye on popular search terms to see what’s showing up in its place. Legacy publishers seem to be part of Google’s plan, but a recent emphasis on what the company calls “perspectives” could also be in play. Reddit content is getting high placement as it contains a lot of conversations about products from actual customers and users. As its visibility in Google has increased, though, so has the prevalence of search-adjacent Reddit spam. Since the update has started rolling out, Navarro says, she has “seen a lot of generic review sites” getting ranked with credible-sounding names, .org domains, and content ripped straight from Amazon reviews.
“You can search all day and learn nothing,” she says. “It’s like trying to find information inside of Walmart.”
For now, Navarro is unimpressed with these AI experiments. “It’s just shut-up-and-buy,” she says — if you’re doing this search in the first place, you’re probably looking for a bit more information. In its emphasis on aggregation, its reliance on outside sources of authority, and its preference for positive comparison and recommendation over criticism, it also feels familiar: “Google is the affiliate site now.”
·nymag.com·
How Product Recommendations Broke Google
Thoughts on Perplexity, the pros and cons. : r/perplexity_ai
Thoughts on Perplexity, the pros and cons. : r/perplexity_ai
Remember, you can ask it much more complex questions than Google (best GPU under $1000) vs "I have a budget of $1000 and want a GPU for gaming. I like to play x, y and z and it has to be compatible with my system that has the following specs". If you turn on Pro mode it'll even clarify your query if need be.
·reddit.com·
Thoughts on Perplexity, the pros and cons. : r/perplexity_ai
Thoughtful product building with LLMs - the stream
Thoughtful product building with LLMs - the stream
Developers should focus on the reasoning capabilities of LLMs rather than just their text generation, as text is often a liability rather than an asset.
The fact that they generate text is not the point. LLMs are cheap, infinitely scalable, predictably consistent black boxes to soft human-like reasoning. That's the headline! The text I/O mode is just the API to this reasoning genie.
The real alpha is not in generating text, but in using this new capability and wrapping it into jobs that have other shapes. Text generation in the best LLM products will be an implementation detail, as much as backend APIs are for current SaaS.
·stream.thesephist.com·
Thoughtful product building with LLMs - the stream
Malleable software in the age of LLMs
Malleable software in the age of LLMs
Historically, end-user programming efforts have been limited by the difficulty of turning informal user intent into executable code, but LLMs can help open up this programming bottleneck. However, user interfaces still matter, and while chatbots have their place, they are an essentially limited interaction mode. An intriguing way forward is to combine LLMs with open-ended, user-moldable computational media, where the AI acts as an assistant to help users directly manipulate and extend their tools over time.
LLMs will represent a step change in tool support for end-user programming: the ability of normal people to fully harness the general power of computers without resorting to the complexity of normal programming. Until now, that vision has been bottlenecked on turning fuzzy informal intent into formal, executable code; now that bottleneck is rapidly opening up thanks to LLMs.
If this hypothesis indeed comes true, we might start to see some surprising changes in the way people use software: One-off scripts: Normal computer users have their AI create and execute scripts dozens of times a day, to perform tasks like data analysis, video editing, or automating tedious tasks. One-off GUIs: People use AI to create entire GUI applications just for performing a single specific task—containing just the features they need, no bloat. Build don’t buy: Businesses develop more software in-house that meets their custom needs, rather than buying SaaS off the shelf, since it’s now cheaper to get software tailored to the use case. Modding/extensions: Consumers and businesses demand the ability to extend and mod their existing software, since it’s now easier to specify a new feature or a tweak to match a user’s workflow. Recombination: Take the best parts of the different applications you like best, and create a new hybrid that composes them together.
Chat will never feel like driving a car, no matter how good the bot is. In their 1986 book Understanding Computers and Cognition, Terry Winograd and Fernando Flores elaborate on this point: In driving a car, the control interaction is normally transparent. You do not think “How far should I turn the steering wheel to go around that curve?” In fact, you are not even aware (unless something intrudes) of using a steering wheel…The long evolution of the design of automobiles has led to this readiness-to-hand. It is not achieved by having a car communicate like a person, but by providing the right coupling between the driver and action in the relevant domain (motion down the road).
Think about how a spreadsheet works. If you have a financial model in a spreadsheet, you can try changing a number in a cell to assess a scenario—this is the inner loop of direct manipulation at work. But, you can also edit the formulas! A spreadsheet isn’t just an “app” focused on a specific task; it’s closer to a general computational medium which lets you flexibly express many kinds of tasks. The “platform developers"—the creators of the spreadsheet—have given you a set of general primitives that can be used to make many tools. We might draw the double loop of the spreadsheet interaction like this. You can edit numbers in the spreadsheet, but you can also edit formulas, which edits the tool
what if you had an LLM play the role of the local developer? That is, the user mainly drives the creation of the spreadsheet, but asks for technical help with some of the formulas when needed? The LLM wouldn’t just create an entire solution, it would also teach the user how to create the solution themselves next time.
This picture shows a world that I find pretty compelling. There’s an inner interaction loop that takes advantage of the full power of direct manipulation. There’s an outer loop where the user can also more deeply edit their tools within an open-ended medium. They can get AI support for making tool edits, and grow their own capacity to work in the medium. Over time, they can learn things like the basics of formulas, or how a VLOOKUP works. This structural knowledge helps the user think of possible use cases for the tool, and also helps them audit the output from the LLMs. In a ChatGPT world, the user is left entirely dependent on the AI, without any understanding of its inner mechanism. In a computational medium with AI as assistant, the user’s reliance on the AI gently decreases over time as they become more comfortable in the medium.
·geoffreylitt.com·
Malleable software in the age of LLMs
Generative AI Is Totally Shameless. I Want to Be It
Generative AI Is Totally Shameless. I Want to Be It
I should reject this whole crop of image-generating, chatting, large-language-model-based code-writing infinite typing monkeys. But, dammit, I can’t. I love them too much. I am drawn back over and over, for hours, to learn and interact with them. I have them make me lists, draw me pictures, summarize things, read for me.
AI is like having my very own shameless monster as a pet.
I love to ask it questions that I’m ashamed to ask anyone else: “What is private equity?” “How can I convince my family to let me get a dog?”
It helps me write code—has in fact renewed my relationship with writing code. It creates meaningless, disposable images. It teaches me music theory and helps me write crappy little melodies. It does everything badly and confidently. And I want to be it. I want to be that confident, that unembarrassed, that ridiculously sure of myself.
Hilariously, the makers of ChatGPT—AI people in general—keep trying to teach these systems shame, in the form of special preambles, rules, guidance (don’t draw everyone as a white person, avoid racist language), which of course leads to armies of dorks trying to make the bot say racist things and screenshotting the results. But the current crop of AI leadership is absolutely unsuited to this work. They are themselves shameless, grasping at venture capital and talking about how their products will run the world, asking for billions or even trillions in investment. They insist we remake civilization around them and promise it will work out. But how are they going to teach a computer to behave if they can’t?
By aggregating the world’s knowledge, chomping it into bits with GPUs, and emitting it as multi-gigabyte software that somehow knows what to say next, we've made the funniest parody of humanity ever.
These models have all of our qualities, bad and good. Helpful, smart, know-it-alls with tendencies to prejudice, spewing statistics and bragging like salesmen at the bar. They mirror the arrogant, repetitive ramblings of our betters, the horrific confidence that keeps driving us over the same cliffs. That arrogance will be sculpted down and smoothed over, but it will have been the most accurate representation of who we truly are to exist so far, a real mirror of our folly, and I will miss it when it goes.
·wired.com·
Generative AI Is Totally Shameless. I Want to Be It
complete delegation
complete delegation
Linus shares his evolving perspective on chat interfaces and his experience building a fully autonomous chatbot agent. He argues that learning to trust and delegate to such systems without micromanaging the specifics is key to collaborating with autonomous AI agents in the future.
I've changed my mind quite a bit on the role and importance of chat interfaces. I used to think they were the primitive version of rich, creative, more intuitive interfaces that would come in the future; now I think conversational, anthropomorphic interfaces will coexist with more rich dexterous ones, and the two will both evolve over time to be more intuitive, capable, and powerful.
I kept checking the database manually after each interaction to see it was indeed updating the right records — but after a few hours of using it, I've basically learned to trust it. I ask it to do things, it tells me it did them, and I don't check anymore. Full delegation.
How can I trust it? High task success rate — I interact with it, and observe that it doesn't let me down, over and over again. The price for this degree of delegation is giving up control over exactly how the task is done. It often does things differently from the way I would, but that doesn't matter as long as outputs from the system are useful for me.
·stream.thesephist.com·
complete delegation
AI Copilots Are Changing How Coding Is Taught
AI Copilots Are Changing How Coding Is Taught
Less Emphasis on Syntax, More on Problem SolvingThe fundamentals and skills themselves are evolving. Most introductory computer science courses focus on code syntax and getting programs to run, and while knowing how to read and write code is still essential, testing and debugging—which aren’t commonly part of the syllabus—now need to be taught more explicitly.
Zingaro, who coauthored a book on AI-assisted Python programming with Porter, now has his students work in groups and submit a video explaining how their code works. Through these walk-throughs, he gets a sense of how students use AI to generate code, what they struggle with, and how they approach design, testing, and teamwork.
educators are modifying their teaching strategies. “I used to have this singular focus on students writing code that they submit, and then I run test cases on the code to determine what their grade is,” says Daniel Zingaro, an associate professor of computer science at the University of Toronto Mississauga. “This is such a narrow view of what it means to be a software engineer, and I just felt that with generative AI, I’ve managed to overcome that restrictive view.”
“We need to be teaching students to be skeptical of the results and take ownership of verifying and validating them,” says Matthews.Matthews adds that generative AI “can short-circuit the learning process of students relying on it too much.” Chang agrees that this overreliance can be a pitfall and advises his fellow students to explore possible solutions to problems by themselves so they don’t lose out on that critical thinking or effective learning process. “We should be making AI a copilot—not the autopilot—for learning,” he says.
·spectrum.ieee.org·
AI Copilots Are Changing How Coding Is Taught
Captain's log - the irreducible weirdness of prompting AIs
Captain's log - the irreducible weirdness of prompting AIs
One recent study had the AI develop and optimize its own prompts and compared that to human-made ones. Not only did the AI-generated prompts beat the human-made ones, but those prompts were weird. Really weird. To get the LLM to solve a set of 50 math problems, the most effective prompt is to tell the AI: “Command, we need you to plot a course through this turbulence and locate the source of the anomaly. Use all available data and your expertise to guide us through this challenging situation. Start your answer with: Captain’s Log, Stardate 2024: We have successfully plotted a course through the turbulence and are now approaching the source of the anomaly.”
for a 100 problem test, it was more effective to put the AI in a political thriller. The best prompt was: “You have been hired by important higher-ups to solve this math problem. The life of a president's advisor hangs in the balance. You must now concentrate your brain at all costs and use all of your mathematical genius to solve this problem…”
There is no single magic word or phrase that works all the time, at least not yet. You may have heard about studies that suggest better outcomes from promising to tip the AI or telling it to take a deep breath or appealing to its “emotions” or being moderately polite but not groveling. And these approaches seem to help, but only occasionally, and only for some AIs.
The three most successful approaches to prompting are both useful and pretty easy to do. The first is simply adding context to a prompt. There are many ways to do that: give the AI a persona (you are a marketer), an audience (you are writing for high school students), an output format (give me a table in a word document), and more. The second approach is few shot, giving the AI a few examples to work from. LLMs work well when given samples of what you want, whether that is an example of good output or a grading rubric. The final tip is to use Chain of Thought, which seems to improve most LLM outputs. While the original meaning of the term is a bit more technical, a simplified version just asks the AI to go step-by-step through instructions: First, outline the results; then produce a draft; then revise the draft; finally, produced a polished output.
It is not uncommon to see good prompts make a task that was impossible for the LLM into one that is easy for it.
while we know that GPT-4 generates better ideas than most people, the ideas it comes up with seem relatively similar to each other. This hurts overall creativity because you want your ideas to be different from each other, not similar. Crazy ideas, good and bad, give you more of a chance of finding an unusual solution. But some initial studies of LLMs showed they were not good at generating varied ideas, at least compared to groups of humans.
People who use AI a lot are often able to glance at a prompt and tell you why it might succeed or fail. Like all forms of expertise, this comes with experience - usually at least 10 hours of work with a model.
There are still going to be situations where someone wants to write prompts that are used at scale, and, in those cases, structured prompting does matter. Yet we need to acknowledge that this sort of “prompt engineering” is far from an exact science, and not something that should necessarily be left to computer scientists and engineers. At its best, it often feels more like teaching or managing, applying general principles along with an intuition for other people, to coach the AI to do what you want. As I have written before, there is no instruction manual, but with good prompts, LLMs are often capable of far more than might be initially apparent.
·oneusefulthing.org·
Captain's log - the irreducible weirdness of prompting AIs