Search no comments

Found 280 bookmarks

Newest

Open-source AI tool beats giant LLMs in literature reviews — and gets citations right

Researchers can deploy the cheap and transparent model on their own computer system.

·nature.com·Feb 6, 2026

Open-source AI tool beats giant LLMs in literature reviews — and gets citations right

Building a C compiler with a team of parallel Claudes

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

·anthropic.com·Feb 6, 2026

Building a C compiler with a team of parallel Claudes

The Hot Mess of AI: How Does Misalignment Scale With Model...

As AI becomes more capable, we entrust it with more general and consequential tasks. The risks from failure grow more severe with increasing task scope. It is therefore important to understand how extremely capable AI models will fail: Will they fail by systematically pursuing goals we do not intend? Or will they fail by being a hot mess, and taking nonsensical actions that do not further any goal? We operationalize this question using a bias-variance decomposition of the errors made by AI models: An AI's \emph{incoherence} on a task is measured over test-time randomness as the fraction of its error that stems from variance rather than bias in task outcome. Across all tasks and frontier models we measure, the longer models spend reasoning and taking actions, \emph{the more incoherent} their failures become. Incoherence changes with model scale in a way that is experiment dependent. However, in several settings, larger, more capable models are more incoherent than smaller models. Consequently, scale alone seems unlikely to eliminate incoherence. Instead, as more capable AIs pursue harder tasks, requiring more sequential action and thought, our results predict failures to be accompanied by more incoherent behavior. This suggests a future where AIs sometimes cause industrial accidents (due to unpredictable misbehavior), but are less likely to exhibit consistent pursuit of a misaligned goal. This increases the relative importance of alignment research targeting reward hacking or goal misspecification.

·arxiv.org·Feb 3, 2026

The Hot Mess of AI: How Does Misalignment Scale With Model...

Does AI already have human-level intelligence? The evidence is clear

The vision of human-level machine intelligence laid out by Alan Turing in the 1950s is now a reality. Eyes unclouded by dread or hype will help us to prepare for what comes next.

·nature.com·Feb 3, 2026

Does AI already have human-level intelligence? The evidence is clear

How AI assistance impacts the formation of coding skills

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

·anthropic.com·Jan 30, 2026

How AI assistance impacts the formation of coding skills

StackOverflow graph of questions asked per month.

·data.stackexchange.com·Jan 4, 2026

StackOverflow graph of questions asked per month.

Context Widows

or, of GPUs, LPUs, and Goal Displacement

·artificialbureaucracy.substack.com·Dec 15, 2025

Context Widows

Making Software: Shaders.

How to draw high fidelity graphics when all you have is an x and y coordinate.

·makingsoftware.com·Dec 11, 2025

Making Software: Shaders.

Chatbot writing style

·archive.is·Dec 4, 2025

Chatbot writing style

Seeing like a software company

The big idea of James C. Scott’s Seeing Like A State can be expressed in three points: Modern organizations exert control by maximising “legibility”: by…

·seangoedecke.com·Nov 29, 2025

Seeing like a software company

Grading is broken

·archive.is·Nov 24, 2025

Grading is broken

The False Glorification of Yann LeCun

Don’t believe everything you read

·garymarcus.substack.com·Nov 20, 2025

The False Glorification of Yann LeCun

John von Neumann Shot Lightning From His Arse

Pop-hereditarianism is built on selective credulity

·theintrinsicperspective.com·Nov 13, 2025

John von Neumann Shot Lightning From His Arse

La batalla contra Google se organiza: frenar el bloqueo de los APKs es el objetivo de una nueva campaña

Parece que se ha desatado una guerra fría entre Google y la comunidad del software libre por el futuro de la libre instalación de APKs en el sistema...

·xatakandroid.com·Nov 3, 2025

La batalla contra Google se organiza: frenar el bloqueo de los APKs es el objetivo de una nueva campaña

Attention Authors: Updated Practice for Review Articles and Position Papers in arXiv CS Category – arXiv blog

·blog.arxiv.org·Nov 1, 2025

Attention Authors: Updated Practice for Review Articles and Position Papers in arXiv CS Category – arXiv blog

Home | Parlant

Built safe & compliant AI customer interactions using open-source foundations

·parlant.io·Oct 31, 2025

Home | Parlant

johannschopplich/toon: 🎒 Token-Oriented Object Notation – JSON for LLMs at half the token cost

🎒 Token-Oriented Object Notation – JSON for LLMs at half the token cost - johannschopplich/toon

·github.com·Oct 27, 2025

johannschopplich/toon: 🎒 Token-Oriented Object Notation – JSON for LLMs at half the token cost

Los últimos días de la humanidad

Un artículo de Michel Suárez

·elcuadernodigital.com·Oct 6, 2025

Los últimos días de la humanidad

Vibe Coding in Practice: Motivations, Challenges, and a Future...

AI code generation tools are transforming software development, especially for novice and non-software developers, by enabling them to write code and build applications faster and with little to no human intervention. Vibe coding is the practice where users rely on AI code generation tools through intuition and trial-and-error without necessarily understanding the underlying code. Despite widespread adoption, no research has systematically investigated why users engage in vibe coding, what they experience while doing so, and how they approach quality assurance (QA) and perceive the quality of the AI-generated code. To this end, we conduct a systematic grey literature review of 101 practitioner sources, extracting 518 firsthand behavioral accounts about vibe coding practices, challenges, and limitations. Our analysis reveals a speed-quality trade-off paradox, where vibe coders are motivated by speed and accessibility, often experiencing rapid ``instant success and flow'', yet most perceive the resulting code as fast but flawed. QA practices are frequently overlooked, with many skipping testing, relying on the models' or tools' outputs without modification, or delegating checks back to the AI code generation tools. This creates a new class of vulnerable software developers, particularly those who build a product but are unable to debug it when issues arise. We argue that vibe coding lowers barriers and accelerates prototyping, but at the cost of reliability and maintainability. These insights carry implications for tool designers and software development teams. Understanding how vibe coding is practiced today is crucial for guiding its responsible use and preventing a broader QA crisis in AI-assisted development.

·arxiv.org·Oct 6, 2025

Vibe Coding in Practice: Motivations, Challenges, and a Future...

Tim Berners-Lee Invented the World Wide Web. Now He Wants to Save It

Julian Lucas profiles Sir Tim Berners-Lee, the inventor of the World Wide Web and a co-founder of Inrupt, on the occasion of his memoir, “This Is for Everyone.”

·newyorker.com·Sep 29, 2025

Tim Berners-Lee Invented the World Wide Web. Now He Wants to Save It

Colf

Prompt solutions to algorithmic problems with the fewest tokens.

·colf.dev·Sep 8, 2025

Colf

A guide to the many twitters in 2025

https://x.com/TommySiegel/status/1961111472773447765

·up.raindrop.io·Aug 28, 2025

A guide to the many twitters in 2025

Computer Science Grads Struggle to Find Jobs in the A.I. Age - The Ne…

archived 10 Aug 2025 14:08:41 UTC

·archive.is·Aug 10, 2025

Computer Science Grads Struggle to Find Jobs in the A.I. Age - The Ne…

The peer-review crisis: how to fix an overloaded system

Journals and funders are trying to boost the speed and effectiveness of review processes that are under strain.

·nature.com·Aug 9, 2025

The peer-review crisis: how to fix an overloaded system

Meta Is Going to Let Job Candidates Use AI During Coding Tests

"This is more representative of the developer environment that our future employees will work in."

·404media.co·Jul 30, 2025

Meta Is Going to Let Job Candidates Use AI During Coding Tests

Writing is thinking

On the value of human-generated scientific writing in the age of large-language models.

·nature.com·Jul 23, 2025

Writing is thinking

Diez razones por las que la IA no sustituirá a los informáticos en un futuro próximo

Hay muchas cosas que la IA no puede hacer, pero el bombo publicitario y la desinformación están alejando a los futuros estudiantes de la informática.

·theconversation.com·Jul 14, 2025

Diez razones por las que la IA no sustituirá a los informáticos en un futuro próximo

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

·up.raindrop.io·Jul 10, 2025

'Positive review only': Researchers hide AI prompts in papers

Instructions in preprints from 14 universities highlight controversy on AI in peer review

·asia.nikkei.com·Jul 5, 2025

'Positive review only': Researchers hide AI prompts in papers

Cognition | Blockdiff: How we built our own file format for VM disk snapshots

How we built blockdiff, an open-source tool for rapid block-level diffs and snapshots of VM disks.

·cognition.ai·Jun 23, 2025

Cognition | Blockdiff: How we built our own file format for VM disk snapshots