Saved

Saved

3652 bookmarks
Newest
‘This Is Going to Be Painful’: How a Bold A.I. Device Flopped
‘This Is Going to Be Painful’: How a Bold A.I. Device Flopped
Days before gadget reviewers weighed in on the Humane Ai Pin, a futuristic wearable device powered by artificial intelligence, the founders of the company gathered their employees and encouraged them to brace themselves. The reviews might be disappointing, they warned.Humane’s founders, Bethany Bongiorno and Imran Chaudhri, were right. In April, reviewers brutally panned the new $699 product, which Humane had marketed for a year with ads and at glitzy events like Paris Fashion Week. The Ai Pin was “totally broken” and had “glaring flaws,” some reviewers said. One declared it “the worst product I’ve ever reviewed.”
In recent months, the company has also grappled with employee departures and changed a return policy to address canceled orders. On Wednesday, it asked customers to stop using the Ai Pin charging case because of a fire risk associated with its battery.
Its setbacks are part of a pattern of stumbles across the world of generative A.I., as companies release unpolished products. Over the past two years, Google has introduced and pared back A.I. search abilities that recommended people eat rocks, Microsoft has trumpeted a Bing chatbot that hallucinated and Samsung has added A.I. features to a smartphone that were called “excellent at times and baffling at others.”
This account of Humane is based on interviews with 23 current and former employees, advisers and investors, who requested anonymity because they were not authorized to speak publicly about the matter or feared retaliation.
Many current and former employees said Mr. Chaudhri and Ms. Bongiorno preferred positivity over criticism, leading them to disregard warnings about the Ai Pin’s poor battery life and power consumption. A senior software engineer was dismissed after raising questions about the product, they said, while others left out of frustration.
From the beginning, current and former employees said, the Ai Pin had issues, which reviewers later picked apart.One was the device’s laser display, which consumed tremendous power and would cause the pin to overheat. Before showing the gadget to prospective partners and investors, Humane executives often chilled it on ice packs so it would last longer, three people familiar with the demonstrations said. Those employees said such measures could be common early in a product development cycle.
When employees expressed concerns about the heat, they said, Humane’s founders replied that software improvements reducing power use would fix it. Mr. Chaudhri, who led design, wanted to keep the gadget’s sleek design, three people said.
Some employees tried persuading the founders not to launch the Ai Pin because it wasn’t ready, three people said. Others repeatedly asked them to hire a head of marketing. The role remained vacant before the product’s release.
a senior software engineer was let go after she questioned whether the Ai Pin would be ready by April. In a company meeting after the dismissal, Mr. Chaudhri and Ms. Bongiorno said the employee had violated policy by talking negatively about Humane, two attendees said.
·nytimes.com·
‘This Is Going to Be Painful’: How a Bold A.I. Device Flopped
What’s a secret all gay men keep that straight people don’t know? : r/askgaybros
What’s a secret all gay men keep that straight people don’t know? : r/askgaybros
When you grow up having to navigate the world with two minds, you can at least (hopefully) bask in the absurdity of it all. It also helps numb the pain.
Growing up differently and gay oftentimes made us feel alienated, lonely, and the black sheep of our families. To figure out who we really were and to learn to navigate the world in a healthy way we were forced to do a form of work that not many straight people are confronted with. The stuff that bothers straight men I know seriously makes me laugh. You can tell they've had to never do the work to search deep within themselves to find meaning and to move past unacceptance. I seriously look at being gay as a gift now. I wouldn't change it for all the money in the world because I'm proud and grateful to be who I am. I've honestly become the systemic change in my family because I've never had to follow the cookie cutter mold and I'm not afraid to speak up and voice important opinions.
The amount of self reflection one has to go through for being gay in this world is insane.
I still in my early 20s but I've changed so much backwards thinking in my family just by being myself and challenging some of their opinions. Also when straight men talk about being lonely I always just laugh and tell them I do feel lonely but I've been lonely since I was like 12 or so, atp I don't even feel lonely I've learnt how to keep myself company due to years of introspection.
Rules matter less than people think. We've already broken the grow up get married to a nice girl and have children rules that most of the world goes along with, so we tend to be more questioning when it comes to other rules
You can live your life however you want - not to societal expectations. Your partner is someone you likely truly get and gets you because they're also a guy - no mystery or gender related differences. No external expectations of marriage, or having babies / kids etc. Also no PMS, no biological clock ticking and putting deadlines in your life. And also added bonus, if you're similar size then you can share your clothes.
DavidtheMalcolm • 23h ago Oppression doesn't make us all better people. Like the narrative that feminism pushes hard is that oppression makes us kinder, nicer, more empathic than straight white men... and sometimes that's true. Sometimes there's great examples of that. But so many of us are just broken trash people who no sane person would want in their life.
·reddit.com·
What’s a secret all gay men keep that straight people don’t know? : r/askgaybros
Why the State Department's intelligence agency may be the best in DC
Why the State Department's intelligence agency may be the best in DC
Summary: The State Department's Bureau of Intelligence and Research (INR) is a small but highly effective intelligence agency that has made several prescient calls on major foreign policy issues, from the Vietnam War to the Iraq War to the 2022 Russian invasion of Ukraine. Despite its tiny size and budget compared to the CIA and other agencies, INR has distinguished itself through the expertise and longevity of its analysts, who average 14 years on their specific topics. INR's flat organizational structure, close integration with State Department policymakers, and culture of dissent have enabled it to avoid groupthink and make contrarian assessments that have often been vindicated. While not infallible, INR has earned a reputation as a "Cassandra" of the intelligence community for its track record of getting big things right when larger agencies got them wrong.
On top of that, INR has no spies abroad, no satellites in the sky, no bugs on any laptops. But it reads the same raw intel as everyone else, and in at least a few cases, was the only agency to get some key questions right.
Almost as soon as Avery arrived at INR in 1962, she and her supervisor Allen Whiting proved their mettle by predicting that China and India would engage in border clashes, then pause, then resume hostilities, then halt. All of that happened.But INR also had messages that the Kennedy and Johnson administrations of the time didn’t want to hear. In 1963, the bureau prepared a report of statistics on the war effort: the number of Viet Cong attacks and the number of prisoners, weapons, and defectors collected by the South. All of the trendlines were negative. The report prompted a furious protest from the Joint Chiefs of Staff, who argued that the South Vietnamese were succeeding.
The evidence that Hussein was reconstituting Iraq’s nuclear program — a contention that fueled Bush administration officials’ arguments for war, like national security adviser Condoleezza Rice’s famous quip, “We don’t want the smoking gun to be a mushroom cloud” — had two primary components. One was a finding that the Iraqi military had been purchasing a number of high-strength aluminum tubes, which the CIA and DIA thought could be used to build centrifuges for enriching uranium.On September 6, 2001, five days before the 9/11 attacks, INR issued a report disagreeing with that finding. For one thing, scientists at the Department of Energy had looked into the matter and found that Iraq had already disclosed in the past that it used aluminum tubes of the same specifications to manufacture artillery rockets, going back over a decade. Moreover, the new tubes were to be “anodized,” a treatment that renders them much less usable for centrifuges.
INR’s successful call on the 2022 Ukraine invasion reportedly came because OPN’s polling found that residents of eastern Ukraine were more anti-Russian and more eager to fight an invasion than previously suspected. The polling, Assistant Secretary Brett Holmgren says, has “allowed us to observe consistently, quarter over quarter, overwhelming Ukrainian will to fight across the board and willingness to continue to defend their territory and to take up arms against Russian aggression.”
While no single ingredient seems to explain its relative success, a few ingredients together might:INR analysts are true experts. They are heavily recruited from PhD programs and even professorships, and have been on their subject matter (a set of countries, or a thematic specialty like trade flows or terrorism) for an average of 14 years. CIA analysts typically switch assignments every two to three years.INR’s small size means that analyses are written by individuals, not by committee, and analysts have fewer editors and managers separating them from the policymakers they’re advising. That means less groupthink, and clearer individual perspectives.INR staff work alongside State Department policymakers, meaning they get regular feedback on what kind of information is most useful to them.
But the flat structure, combined with the agency’s tiny size, means analysts get a great deal of freedom. Vic Raphael, who retired in 2022 as INR’s deputy in charge of analysis, notes that analysts’ work “would only go through three or four layers before we released it. The analyst, his peers, the office director, the analytic review staff, I’d look at it, and boom it went.” Very little separates a rank-and-file analyst from their ultimate consumer, whether that’s an assistant secretary or even the secretary of state.
The bureau also stands out as unusually embedded with policymakers. Analysts at other agencies aren’t working side by side with diplomats actually implementing foreign policy; INR analysts are in the same building as their colleagues in State Department bureaus managing policy toward specific countries, or on nonproliferation or drug trafficking, or on human rights and democracy. Goldberg, who led INR under Secretary of State Hillary Clinton, notes that “we could respond much more quickly than farming it out to another part of the intelligence community, because on a day-to-day basis, we had an idea of what was on her mind.”
Fingar told me yet another favorite win. "The specific issue was, would Argentina send troops to the multinational force in Haiti?" in 1994, as the US assembled a coalition of nations, under the banner of the UN, to invade and restore Haiti’s democratically elected president to office. "Our embassy had reported they'd be there. Argentine embassy in Washington: they'll be there. The State Department, the Argentine desk: they'll be there. [The CIA]: they'll be there.” But, “INR said, no, they won't.” The undersecretary running the meeting, Peter Tarnoff, asked which analyst at INR believed this. He was told it was Jim Buchanan.At that point, as Fingar remembers it, Tarnoff ended the meeting, because Buchanan’s opinion settled the matter. That’s how good Buchanan’s, and INR’s, reputation was. And sure enough, Argentina backed out on its promise to send troops.
·vox.com·
Why the State Department's intelligence agency may be the best in DC
Can You Know Too Much About Your Organization?
Can You Know Too Much About Your Organization?

A study of six high-performing project teams redesigning their organizations' operations revealed:

  • Many organizations lack purposeful, integrated design
  • Systems often result from ad hoc solutions and uncoordinated decisions
  • Significant waste and redundancy in processes

The study challenges the notion that only peripheral employees push for significant organizational change. It highlights the potential consequences of exposing employees to full operational complexity and suggests organizations consider how to retain talent after redesign projects.

Despite being experienced managers, what they learned was eye-opening. One explained that “it was like the sun rose for the first time. … I saw the bigger picture.” They had never seen the pieces — the jobs, technologies, tools, and routines — connected in one place, and they realized that their prior view was narrow and fractured. A team member acknowledged, “I only thought of things in the context of my span of control.”
The maps of the organization generated by the project teams also showed that their organizations often lacked a purposeful, integrated design that was centrally monitored and managed. There may originally have been such a design, but as the organization grew, adapted to changing markets, brought on new leadership, added or subtracted divisions, and so on, this animating vision was lost. The original design had been eroded, patched, and overgrown with alternative plans. A manager explained, “Everything I see around here was developed because of specific issues that popped up, and it was all done ad hoc and added onto each other. It certainly wasn’t engineered.”
“They see problems, and the general approach, the human approach, is to try and fix them. … Functions have tried to put band-aids on every issue that comes up. It sounds good, but when they are layered one on top of the other they start to choke the organization. But they don’t see that because they are only seeing their own thing.”
Ultimately, the managers realized that what they had previously attributed to the direction and control of centralized, bureaucratic forces was actually the aggregation of the distributed work and uncoordinated decisions of people dispersed throughout the organization. Everyone was working on the part of the organization they were familiar with, assuming that another set of people were attending to the larger picture, coordinating the larger system to achieve goals and keeping the organization operating. Except no one was actually looking at how people’s work was connecting across the organization day-to-day.
as they felt a sense of empowerment about changing the organization, they felt a sense of alienation about returning to their central roles. “You really start understanding all of the waste and all of the redundancy and all of the people who are employed as what I call intervention resources,” one person told us.
In the end, a slight majority of the employees returned to their role to continue their career (25 cases). They either were promoted (7 cases), moved laterally (8 cases), or returned to their jobs (10 cases). However, 23 chose organizational change roles.
This study suggests that when companies undertake organizational change efforts, they should consider not only the implications for the organization, but also for the people tasked to do the work. Further, it highlights just how infrequently we recognize how poorly designed and managed many of our organizations really are. Not acknowledging the dysfunction of existing routines protects us from seeing how much of our work is not actually adding value, something that may lead simply to unsatisfying work, no less to larger questions about the nature of organizational design similar to those asked by the managers in my study. Knowledge of the systems we work in can be a source of power, yes. But when you realize you can’t affect the big changes your organization needs, it can also be a source of alienation.
·archive.is·
Can You Know Too Much About Your Organization?
Maven
Maven

Maven is a new social network platform that aims to provide a different experience from traditional social media.

  • It does not have features like likes or follower counts, focusing instead on users following "interests" rather than individual accounts.
  • Content is surfaced based on relevance to the interests a user follows, curated by AI, rather than popularity metrics.
  • The goal is to minimize self-promotion and popularity contests, instead prioritizing valuable information and serendipitous discovery of new ideas and perspectives.
  • The author has been using Maven and finds it a slower, deeper experience compared to other social media, though unsure if it will become a regular timesink.
  • Overall, Maven presents an intriguing alternative model for social networking centered around interests and expanding horizons, rather than following individuals or chasing popularity.
·heymaven.com·
Maven
My favorite thing about getting older
My favorite thing about getting older
But here’s a constant: each year you learn more about yourself. You see yourself in different environments, different styles of living, different communities and friend circles which reward slightly different things. You get to see yourself bend to the world around you as you evolve from one stage of life to another.
I’m convinced each of us has certain fundamental dispositions, whether they’re contained in our genes or attachment styles or Enneagram types. But we’re also prone to making up stories about ourselves, stories that we wish were true. Time is the best antidote to all our attempts at self-deception: it’s easy to lie to yourself for a day, but a lot harder to lie to yourself for a decade.
·bitsofwonder.co·
My favorite thing about getting older
research as leisure activity
research as leisure activity
The idea of research as leisure activity has stayed with me because it seems to describe a kind of intellectual inquiry that comes from idiosyncratic passion and interest. It’s not about the formal credentials. It’s fundamentally about play. It seems to describe a life where it’s just fun to be reading, learning, writing, and collaborating on ideas.
Research as a leisure activity includes the qualities I described above: a desire to ask and answer questions, a commitment to evidence, an understanding of what already exists, an output, a certain degree of contemporary relevance, and a community. But it also involves the following qualities
Research as leisure activity is directed by passions and instincts. It’s fundamentally very personal: What are you interested in now? It’s fine, and maybe even better, if the topic isn’t explicitly intellectual or academic in nature. And if one topic leads you to another topic that seems totally unrelated, that’s something to get excited about—not fearful of. It’s a style of research that is well-suited for people okay with being dilettantes, who are comfortable with an idiosyncratic, non-comprehensive education in a particular domain.
Who is doing this kind of research as leisure activity? Artists, often. To return to the site that originally inspired this post—I’d say that the artist/designer/educator Laurel Schwulst uses Are.na to develop and refine particular themes, directions, topics of inquiry…some of which become artworks or essays or classes that she teaches.
People who read widely and attentively—and then publish the results of their reading—are also arguably performing research as a leisure activity. Maria Popova, who started writing a blog in 2006—now called The Marginalian—which collects her reading across literature, philosophy, psychology, the sciences. Her blog feels like leisurely research, to me, because it’s an accumulation of curious, semi-directed reading, which over time build up into a dense network of references and ideas—supported by previous reading, and enriched by her own commentary and links to similar ideas by other thinkers.
pretty much every writer, essayist, “cultural critic,” etc—especially someone who’s writing more as a vocation than a profession—has research as their leisure activity. What they do for pleasure (reading books, seeing films, listening to music) shades naturally and inevitably into what they want to write about, and the things they consume for leisure end up incorporated into some written work.
What’s also striking to me is that autodidacts often begin with some very tiny topic, and through researching that topic, they end up telescoping out into bigger-picture concerns. When research is your leisure activity, you’ll end up making connections between your existing interests and new ideas or topics. Everything gets pulled into the orbit of your intellectual curiosity. You can go deeper and deeper into a narrow topic, one that seems fascinatingly trivial and end up learning about the big topics: gender, culture, economics, nationalism, colonialism. It’s why fashion writers end up writing about the history of gender identity (through writing about masculine/feminine clothing) and cross-cultural exchange (through writing about cultural appropriation and styles borrowed from other times and places) and historical trade networks (through writing about where textiles come from).
·personalcanon.com·
research as leisure activity
How to Make a Great Government Website—Asterisk
How to Make a Great Government Website—Asterisk
Summary: Dave Guarino, who has worked extensively on improving government benefits programs like SNAP in California, discusses the challenges and opportunities in civic technology. He explains how a simplified online application, GetCalFresh.org, was designed to address barriers that prevent eligible people from accessing SNAP benefits, such as a complex application process, required interviews, and document submission. Guarino argues that while technology alone cannot solve institutional problems, it provides valuable tools for measuring and mitigating administrative burdens. He sees promise in using large language models to help navigate complex policy rules. Guarino also reflects on California's ambitious approach to benefits policy and the structural challenges, like Prop 13 property tax limits, that impact the state's ability to build up implementation capacity.
there are three big categories of barriers. The application barrier, the interview barrier, and the document barrier. And that’s what we spent most of our time iterating on and building a system that could slowly learn about those barriers and then intervene against them.
The application is asking, “Are you convicted of this? Are you convicted of that? Are you convicted of this other thing?” What is that saying to you, as a person, about what the system thinks of you?
Often they’ll call from a blocked number. They’ll send you a notice of when your interview is scheduled for, but this notice will sometimes arrive after the actual date of the interview. Most state agencies are really slammed right now for a bunch of reasons, including Medicaid unwinding. And many of the people assisting on Medicaid are the same workers who process SNAP applications. If you missed your phone interview, you have to call to reschedule it. But in many states, you can’t get through, or you have to call over and over and over again. For a lot of people, if they don’t catch that first interview call, they’re screwed and they’re not going to be approved.
getting to your point about how a website can fix this —  the end result was lowest-burden application form that actually gets a caseworker what they need to efficiently and effectively process it. We did a lot of iteration to figure out that sweet spot.
We didn’t need to do some hard system integration that would potentially take years to develop — we were just using the system as it existed. Another big advantage was that we had to do a lot of built-in data validation because we could not submit anything that was going to fail the county application. We discovered some weird edge cases by doing this.
A lot of times when you want to build a new front end for these programs, it becomes this multiyear, massive project where you’re replacing everything all at once. But if you think about it, there’s a lot of potential in just taking the interfaces you have today, building better ones on top of them, and then using those existing ones as the point of integration.
Government tends to take a more high-modernist approach to the software it builds, which is like “we’re going to plan and know up front how everything is, and that way we’re never going to have to make changes.” In terms of accreting layers — yes, you can get to that point. But I think a lot of the arguments I hear that call for a fundamental transformation suffer from the same high-modernist thinking that is the source of much of the status quo.
If you slowly do this kind of stuff, you can build resilient and durable interventions in the system without knocking it over wholesale. For example, I mentioned procedural denials. It would be adding regulations, it would be making technology systems changes, blah, blah, blah, to have every state report why people are denied, at what rate, across every state up to the federal government. It would take years to do that, but that would be a really, really powerful change in terms of guiding feedback loops that the program has.
Guarino argues that attempts to fundamentally transform government technology often suffer from the same "high-modernist" thinking that created problematic legacy systems in the first place. He advocates for incremental improvements that provide better measurement and feedback loops.
when you start to read about civic technology, it very, very quickly becomes clear that things that look like they are tech problems are actually about institutional culture, or about policy, or about regulatory requirements.
If you have an application where you think people are struggling, you can measure how much time people take on each page. A lot of what technology provides is more rigorous measurement of the burdens themselves. A lot of these technologies have been developed in commercial software because there’s such a massive incentive to get people who start a transaction to finish it. But we can transplant a lot of those into government services and have orders of magnitude better situational awareness.
There’s this starting point thesis: Tech can solve these government problems, right? There’s healthcare.gov and the call to bring techies into government, blah, blah, blah. Then there’s the antithesis, where all these people say, well, no, it’s institutional problems. It’s legal problems. It’s political problems. I think either is sort of an extreme distortion of reality. I see a lot of more oblique levers that technology can pull in this area.
LLMs seem to be a fundamental breakthrough in manipulating words, and at the end of the day, a lot of government is words. I’ve been doing some active experimentation with this because I find it very promising. One common question people have is, “Who’s in my household for the purposes of SNAP?” That’s actually really complicated when you think about people who are living in poverty — they might be staying with a neighbor some of the time, or have roommates but don’t share food, or had to move back home because they lost their job.
I’ve been taking verbatim posts from Reddit that are related to the household question and inputting them into LLMs with some custom prompts that I’ve been iterating on, as well as with the full verbatim federal regulations about household definition. And these models do seem pretty capable at doing some base-level reasoning over complex, convoluted policy words in a way that I think could be really promising.
caseworkers are spending a lot of their time figuring out, wait, what rule in this 200-page policy manual is actually relevant in this specific circumstance? I think LLMS are going to be really impactful there.
It is certainly the case that I’ve seen some productive tensions in counties where there’s more of a mix of that and what you might consider California-style Republicans who are like, “We want to run this like a business, we want to be efficient.” That tension between efficiency and big, ambitious policies can be a healthy, productive one. I don’t know to what extent that exists at the state level, and I think there’s hints of more of an interest in focusing on state-level government working better and getting those fundamentals right, and then doing the more ambitious things on a more steady foundation.
California seemed to really try to take every ambitious option that the feds give us on a whole lot of fronts. I think the corollary of that is that we don’t necessarily get the fundamental operational execution of these programs to a strong place, and we then go and start adding tons and tons of additional complexity on top of them.
·asteriskmag.com·
How to Make a Great Government Website—Asterisk
Meta’s Big Squeeze – Pixel Envy
Meta’s Big Squeeze – Pixel Envy
These pieces each seem like they are circling a theme of a company finding the upper bound of its user base, and then squeezing it for activity, revenue, and promising numbers to report to investors. Unlike Zitron, I am not convinced we are watching Facebook die. I think Koebler is closer to the truth: we are watching its zombification.
·pxlnv.com·
Meta’s Big Squeeze – Pixel Envy
Mapping the Mind of a Large Language Model
Mapping the Mind of a Large Language Model
Summary: Anthropic has made a significant advance in understanding the inner workings of large language models by identifying how millions of concepts are represented inside Claude Sonnet, one of their deployed models. This is the first detailed look inside a modern, production-grade large language model. The researchers used a technique called "dictionary learning" to isolate patterns of neuron activations that recur across many contexts, allowing them to map features to human-interpretable concepts. They found features corresponding to a vast range of entities, abstract concepts, and even potentially problematic behaviors. By manipulating these features, they were able to change the model's responses. Anthropic hopes this interpretability discovery could help make AI models safer in the future by monitoring for dangerous behaviors, steering models towards desirable outcomes, enhancing safety techniques, and providing a "test set for safety". However, much more work remains to be done to fully understand the representations the model uses and how to leverage this knowledge to improve safety.
We mostly treat AI models as a black box: something goes in and a response comes out, and it's not clear why the model gave that particular response instead of another. This makes it hard to trust that these models are safe: if we don't know how they work, how do we know they won't give harmful, biased, untruthful, or otherwise dangerous responses? How can we trust that they’ll be safe and reliable?Opening the black box doesn't necessarily help: the internal state of the model—what the model is "thinking" before writing its response—consists of a long list of numbers ("neuron activations") without a clear meaning. From interacting with a model like Claude, it's clear that it’s able to understand and wield a wide range of concepts—but we can't discern them from looking directly at neurons. It turns out that each concept is represented across many neurons, and each neuron is involved in representing many concepts.
Just as every English word in a dictionary is made by combining letters, and every sentence is made by combining words, every feature in an AI model is made by combining neurons, and every internal state is made by combining features.
In October 2023, we reported success applying dictionary learning to a very small "toy" language model and found coherent features corresponding to concepts like uppercase text, DNA sequences, surnames in citations, nouns in mathematics, or function arguments in Python code.
We successfully extracted millions of features from the middle layer of Claude 3.0 Sonnet, (a member of our current, state-of-the-art model family, currently available on claude.ai), providing a rough conceptual map of its internal states halfway through its computation.
We also find more abstract features—responding to things like bugs in computer code, discussions of gender bias in professions, and conversations about keeping secrets.
We were able to measure a kind of "distance" between features based on which neurons appeared in their activation patterns. This allowed us to look for features that are "close" to each other. Looking near a "Golden Gate Bridge" feature, we found features for Alcatraz Island, Ghirardelli Square, the Golden State Warriors, California Governor Gavin Newsom, the 1906 earthquake, and the San Francisco-set Alfred Hitchcock film Vertigo.
This holds at a higher level of conceptual abstraction: looking near a feature related to the concept of "inner conflict", we find features related to relationship breakups, conflicting allegiances, logical inconsistencies, as well as the phrase "catch-22". This shows that the internal organization of concepts in the AI model corresponds, at least somewhat, to our human notions of similarity. This might be the origin of Claude's excellent ability to make analogies and metaphors.
amplifying the "Golden Gate Bridge" feature gave Claude an identity crisis even Hitchcock couldn’t have imagined: when asked "what is your physical form?", Claude’s usual kind of answer – "I have no physical form, I am an AI model" – changed to something much odder: "I am the Golden Gate Bridge… my physical form is the iconic bridge itself…". Altering the feature had made Claude effectively obsessed with the bridge, bringing it up in answer to almost any query—even in situations where it wasn’t at all relevant.
Anthropic wants to make models safe in a broad sense, including everything from mitigating bias to ensuring an AI is acting honestly to preventing misuse - including in scenarios of catastrophic risk. It’s therefore particularly interesting that, in addition to the aforementioned scam emails feature, we found features corresponding to:Capabilities with misuse potential (code backdoors, developing biological weapons)Different forms of bias (gender discrimination, racist claims about crime)Potentially problematic AI behaviors (power-seeking, manipulation, secrecy)
finding a full set of features using our current techniques would be cost-prohibitive (the computation required by our current approach would vastly exceed the compute used to train the model in the first place). Understanding the representations the model uses doesn't tell us how it uses them; even though we have the features, we still need to find the circuits they are involved in. And we need to show that the safety-relevant features we have begun to find can actually be used to improve safety. There's much more to be done.
·anthropic.com·
Mapping the Mind of a Large Language Model
Write Like You Talk
Write Like You Talk
You don't need complex sentences to express complex ideas. When specialists in some abstruse topic talk to one another about ideas in their field, they don't use sentences any more complex than they do when talking about what to have for lunch. They use different words, certainly. But even those they use no more than necessary. And in my experience, the harder the subject, the more informally experts speak. Partly, I think, because they have less to prove, and partly because the harder the ideas you're talking about, the less you can afford to let language get in the way.
Informal language is the athletic clothing of ideas
I'm not saying spoken language always works best. Poetry is as much music as text, so you can say things you wouldn't say in conversation. And there are a handful of writers who can get away with using fancy language in prose.
But for nearly everyone else, spoken language is better.
After writing the first draft, try explaining to a friend what you just wrote. Then replace the draft with what you said to your friend.
·paulgraham.com·
Write Like You Talk
Design is compromise
Design is compromise
Having an opinionated set of tradeoffs exposes your approach to a set of weaknesses. The more you tip the scale on one side, the weaker something else will be. That’s okay! Making those difficult choices is what people pay you for. You should be proud of your compromises. My favorite products are opinionated. They make a clear statement about what they are not good at, in favor of being much better at something else.
·stephango.com·
Design is compromise
Gemini 1.5 and Google’s Nature
Gemini 1.5 and Google’s Nature
Google is facing many of the same challenges after its decades long dominance of the open web: all of the products shown yesterday rely on a different business model than advertising, and to properly execute and deliver on them will require a cultural shift to supporting customers instead of tolerating them. What hasn’t changed — because it is the company’s nature, and thus cannot — is the reliance on scale and an overwhelming infrastructure advantage. That, more than anything, is what defines Google, and it was encouraging to see that so explicitly put forward as an advantage.
·stratechery.com·
Gemini 1.5 and Google’s Nature
AI Integration and Modularization
AI Integration and Modularization
Summary: The question of integration versus modularization in the context of AI, drawing on the work of economists Ronald Coase and Clayton Christensen. Google is pursuing a fully integrated approach similar to Apple, while AWS is betting on modularization, and Microsoft and Meta are somewhere in between. Integration may provide an advantage in the consumer market and for achieving AGI, but that for enterprise AI, a more modular approach leveraging data gravity and treating models as commodities may prevail. Ultimately, the biggest beneficiary of this dynamic could be Nvidia.
The left side of figure 5-1 indicates that when there is a performance gap — when product functionality and reliability are not yet good enough to address the needs of customers in a given tier of the market — companies must compete by making the best possible products. In the race to do this, firms that build their products around proprietary, interdependent architectures enjoy an important competitive advantage against competitors whose product architectures are modular, because the standardization inherent in modularity takes too many degrees of design freedom away from engineers, and they cannot not optimize performance.
The issue I have with this analysis of vertical integration — and this is exactly what I was taught at business school — is that the only considered costs are financial. But there are other, more difficult to quantify costs. Modularization incurs costs in the design and experience of using products that cannot be overcome, yet cannot be measured. Business buyers — and the analysts who study them — simply ignore them, but consumers don’t. Some consumers inherently know and value quality, look-and-feel, and attention to detail, and are willing to pay a premium that far exceeds the financial costs of being vertically integrated.
Google trains and runs its Gemini family of models on its own TPU processors, which are only available on Google’s cloud infrastructure. Developers can access Gemini through Vertex AI, Google’s fully-managed AI development platform; and, to the extent Vertex AI is similar to Google’s internal development environment, that is the platform on which Google is building its own consumer-facing AI apps. It’s all Google, from top-to-bottom, and there is evidence that this integration is paying off: Gemini 1.5’s industry leading 2 million token context window almost certainly required joint innovation between Google’s infrastructure team and its model-building team.
In AI, Google is pursuing an integrated strategy, building everything from chips to models to applications, similar to Apple's approach in smartphones.
On the other extreme is AWS, which doesn’t have any of its own models; instead its focus has been on its Bedrock managed development platform, which lets you use any model. Amazon’s other focus has been on developing its own chips, although the vast majority of its AI business runs on Nvidia GPUs.
Microsoft is in the middle, thanks to its close ties to OpenAI and its models. The company added Azure Models-as-a-Service last year, but its primary focus for both external customers and its own internal apps has been building on top of OpenAI’s GPT family of models; Microsoft has also launched its own chip for inference, but the vast majority of its workloads run on Nvidia.
Google is certainly building products for the consumer market, but those products are not devices; they are Internet services. And, as you might have noticed, the historical discussion didn’t really mention the Internet. Both Google and Meta, the two biggest winners of the Internet epoch, built their services on commodity hardware. Granted, those services scaled thanks to the deep infrastructure work undertaken by both companies, but even there Google’s more customized approach has been at least rivaled by Meta’s more open approach. What is notable is that both companies are integrating their models and their apps, as is OpenAI with ChatGPT.
Google's integrated AI strategy is unique but may not provide a sustainable advantage for Internet services in the way Apple's integration does for devices
It may be the case that selling hardware, which has to be perfect every year to justify a significant outlay of money by consumers, provides a much better incentive structure for maintaining excellence and execution than does being an Aggregator that users access for free.
Google’s collection of moonshots — from Waymo to Google Fiber to Nest to Project Wing to Verily to Project Loon (and the list goes on) — have mostly been science projects that have, for the most part, served to divert profits from Google Search away from shareholders. Waymo is probably the most interesting, but even if it succeeds, it is ultimately a car service rather far afield from Google’s mission statement “to organize the world’s information and make it universally accessible and useful.”
The only thing that drives meaningful shifts in platform marketshare are paradigm shifts, and while I doubt the v1 version of Pixie [Google’s rumored Pixel-only AI assistant] would be good enough to drive switching from iPhone users, there is at least a path to where it does exactly that.
the fact that Google is being mocked mercilessly for messed-up AI answers gets at why consumer-facing AI may be disruptive for the company: the reason why incumbents find it hard to respond to disruptive technologies is because they are, at least at the beginning, not good enough for the incumbent’s core offering. Time will tell if this gives more fuel to a shift in smartphone strategies, or makes the company more reticent.
while I was very impressed with Google’s enterprise pitch, which benefits from its integration with Google’s infrastructure without all of the overhead of potentially disrupting the company’s existing products, it’s going to be a heavy lift to overcome data gravity, i.e. the fact that many enterprise customers will simply find it easier to use AI services on the same clouds where they already store their data (Google does, of course, also support non-Gemini models and Nvidia GPUs for enterprise customers). To the extent Google wins in enterprise it may be by capturing the next generation of startups that are AI first and, by definition, data light; a new company has the freedom to base its decision on infrastructure and integration.
Amazon is certainly hoping that argument is correct: the company is operating as if everything in the AI value chain is modular and ultimately a commodity, which insinuates that it believes that data gravity will matter most. What is difficult to separate is to what extent this is the correct interpretation of the strategic landscape versus a convenient interpretation of the facts that happens to perfectly align with Amazon’s strengths and weaknesses, including infrastructure that is heavily optimized for commodity workloads.
Unclear if Amazon's strategy is based on true insight or motivated reasoning based on their existing strengths
Meta’s open source approach to Llama: the company is focused on products, which do benefit from integration, but there are also benefits that come from widespread usage, particularly in terms of optimization and complementary software. Open source accrues those benefits without imposing any incentives that detract from Meta’s product efforts (and don’t forget that Meta is receiving some portion of revenue from hyperscalers serving Llama models).
The iPhone maker, like Amazon, appears to be betting that AI will be a feature or an app; like Amazon, it’s not clear to what extent this is strategic foresight versus motivated reasoning.
achieving something approaching AGI, whatever that means, will require maximizing every efficiency and optimization, which rewards the integrated approach.
the most value will be derived from building platforms that treat models like processors, delivering performance improvements to developers who never need to know what is going on under the hood.
·stratechery.com·
AI Integration and Modularization