Found 280 bookmarks
Newest
The Hot Mess of AI: How Does Misalignment Scale With Model...
The Hot Mess of AI: How Does Misalignment Scale With Model...
As AI becomes more capable, we entrust it with more general and consequential tasks. The risks from failure grow more severe with increasing task scope. It is therefore important to understand how extremely capable AI models will fail: Will they fail by systematically pursuing goals we do not intend? Or will they fail by being a hot mess, and taking nonsensical actions that do not further any goal? We operationalize this question using a bias-variance decomposition of the errors made by AI models: An AI's \emph{incoherence} on a task is measured over test-time randomness as the fraction of its error that stems from variance rather than bias in task outcome. Across all tasks and frontier models we measure, the longer models spend reasoning and taking actions, \emph{the more incoherent} their failures become. Incoherence changes with model scale in a way that is experiment dependent. However, in several settings, larger, more capable models are more incoherent than smaller models. Consequently, scale alone seems unlikely to eliminate incoherence. Instead, as more capable AIs pursue harder tasks, requiring more sequential action and thought, our results predict failures to be accompanied by more incoherent behavior. This suggests a future where AIs sometimes cause industrial accidents (due to unpredictable misbehavior), but are less likely to exhibit consistent pursuit of a misaligned goal. This increases the relative importance of alignment research targeting reward hacking or goal misspecification.
·arxiv.org·
The Hot Mess of AI: How Does Misalignment Scale With Model...
Context Widows
Context Widows
or, of GPUs, LPUs, and Goal Displacement
·artificialbureaucracy.substack.com·
Context Widows
Making Software: Shaders.
Making Software: Shaders.
How to draw high fidelity graphics when all you have is an x and y coordinate.
·makingsoftware.com·
Making Software: Shaders.
Seeing like a software company
Seeing like a software company
The big idea of James C. Scott’s Seeing Like A State can be expressed in three points: Modern organizations exert control by maximising “legibility”: by…
·seangoedecke.com·
Seeing like a software company
Home | Parlant
Home | Parlant
Built safe & compliant AI customer interactions using open-source foundations
·parlant.io·
Home | Parlant
Vibe Coding in Practice: Motivations, Challenges, and a Future...
Vibe Coding in Practice: Motivations, Challenges, and a Future...
AI code generation tools are transforming software development, especially for novice and non-software developers, by enabling them to write code and build applications faster and with little to no human intervention. Vibe coding is the practice where users rely on AI code generation tools through intuition and trial-and-error without necessarily understanding the underlying code. Despite widespread adoption, no research has systematically investigated why users engage in vibe coding, what they experience while doing so, and how they approach quality assurance (QA) and perceive the quality of the AI-generated code. To this end, we conduct a systematic grey literature review of 101 practitioner sources, extracting 518 firsthand behavioral accounts about vibe coding practices, challenges, and limitations. Our analysis reveals a speed-quality trade-off paradox, where vibe coders are motivated by speed and accessibility, often experiencing rapid ``instant success and flow'', yet most perceive the resulting code as fast but flawed. QA practices are frequently overlooked, with many skipping testing, relying on the models' or tools' outputs without modification, or delegating checks back to the AI code generation tools. This creates a new class of vulnerable software developers, particularly those who build a product but are unable to debug it when issues arise. We argue that vibe coding lowers barriers and accelerates prototyping, but at the cost of reliability and maintainability. These insights carry implications for tool designers and software development teams. Understanding how vibe coding is practiced today is crucial for guiding its responsible use and preventing a broader QA crisis in AI-assisted development.
·arxiv.org·
Vibe Coding in Practice: Motivations, Challenges, and a Future...
Colf
Colf
Prompt solutions to algorithmic problems with the fewest tokens.
·colf.dev·
Colf
Writing is thinking
Writing is thinking
On the value of human-generated scientific writing in the age of large-language models.
·nature.com·
Writing is thinking
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
·up.raindrop.io·
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.