Search Test Information Space

Found 22 bookmarks

Newest

Forecasting rare language model behaviors \ Anthropic

#Alignment #Risk #Forecasting #Scale #Anthropic #Paper #PDF #Blog

·anthropic.com·Feb 25, 2025

Forecasting rare language model behaviors \ Anthropic

Deliberative alignment: reasoning enables safer language models | OpenAI

#Alignment #OpenAI

·openai.com·Jan 8, 2025

Deliberative alignment: reasoning enables safer language models | OpenAI

OpenAI trained o1 and o3 to 'think' about its safety policy | TechCrunch

#Alignment #OpenAI

·techcrunch.com·Dec 23, 2024

OpenAI trained o1 and o3 to 'think' about its safety policy | TechCrunch

12 Days of OpenAI | OpenAI

#Alignment #Reasoning #Large Language Models #OpenAI

·openai.com·Dec 20, 2024

12 Days of OpenAI | OpenAI

Greenblatt, R. et al. (2024). Alignment faking in large language models.

#Alignment #Paper #Training #Anthropic

·assets.anthropic.com·Dec 18, 2024

Greenblatt, R. et al. (2024). Alignment faking in large language models.

Beyond Preferences in AI Alignment

#AI #Preferences #Alignment #Paper #pddf

·arxiv.org·Sep 8, 2024

Beyond Preferences in AI Alignment

Reframing superintelligence fhi tr 2019 1

Drexler, K. E. (2019). Reframing superintelligence. Future of Humanity Institute.

#CAIS #Alignment #Paper #PDF

·fhi.ox.ac.uk·Dec 15, 2023

Reframing superintelligence fhi tr 2019 1

Weak to strong generalization

#OpenAI #Alignment #Paper #PDF

·cdn.openai.com·Dec 15, 2023

Weak to strong generalization

OpenAI’s Ilya Sutskever Has a Plan for Keeping Super-Intelligent AI in Check

·wired.com·Dec 15, 2023

OpenAI’s Ilya Sutskever Has a Plan for Keeping Super-Intelligent AI in Check

OpenAI Demos a Control Method for Superintelligent AI

#IEEE #Alignment

·spectrum.ieee.org·Dec 14, 2023

OpenAI Demos a Control Method for Superintelligent AI

Now we know what OpenAI’s superalignment team has been up to

#OpenAI #Alignment

·technologyreview.com·Dec 14, 2023

Now we know what OpenAI’s superalignment team has been up to

Weak-to-strong generalization

#OpenAI #Alignment #Proxy

·openai.com·Dec 14, 2023

Weak-to-strong generalization

Superalignment Fast Grants

#OpenAI #Alignment #Funding

·openai.com·Dec 14, 2023

Superalignment Fast Grants

What Sam Altman’s Firing Means for the Future of OpenAI

#OpenAI #Alignment

·wired.com·Nov 19, 2023

What Sam Altman’s Firing Means for the Future of OpenAI

LIMA: Less Is More for Alignment

#Machine Learning #Alignment #Paper #PDF #Meta

·arxiv.org·May 23, 2023

LIMA: Less Is More for Alignment

Does anyone believe AI alignment is a long-term solution?

#Quora #Lex Page #Inflection AI #Alignment

·lex.page·May 5, 2023

Does anyone believe AI alignment is a long-term solution?

Using the Veil of Ignorance to align AI systems with principles of justice | Proceedings of the National Academy of Sciences

#DeepMind #Alignment #Paper

·pnas.org·Apr 25, 2023

Using the Veil of Ignorance to align AI systems with principles of justice | Proceedings of the National Academy of Sciences

AI Alignment Forum

#AI #Alignment #Value Alignment

·alignmentforum.org·Apr 21, 2023

AI Alignment Forum

Researching Alignment Research: Unsupervised Analysis

#Value Alignment #Alignment #AI #Paper #PDF

·arxiv.org·Apr 21, 2023

Researching Alignment Research: Unsupervised Analysis

Stanford Seminar - Forecasting and Aligning AI

#AI #Forecasting #Alignment #Natural Language Processing #Questions and Answers #Machine Learning #Hypermind

·youtube.com·Jun 8, 2022

Stanford Seminar - Forecasting and Aligning AI

Building Aligned AI

#AI #Alignment #Ethics #Futurism

·youtube.com·May 24, 2022

Building Aligned AI

Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents

#Reinforcement Learning #Social Science #AGI #Normativity #Alignment

·youtube.com·Mar 9, 2022

Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents