Forecasting rare language model behaviors \ Anthropic#Alignment#Risk#Forecasting#Scale#Anthropic#Paper#PDF#Blog·anthropic.com·Feb 25, 2025Forecasting rare language model behaviors \ Anthropic
Deliberative alignment: reasoning enables safer language models | OpenAI#Alignment#OpenAI·openai.com·Jan 8, 2025Deliberative alignment: reasoning enables safer language models | OpenAI
OpenAI trained o1 and o3 to 'think' about its safety policy | TechCrunch#Alignment#OpenAI·techcrunch.com·Dec 23, 2024OpenAI trained o1 and o3 to 'think' about its safety policy | TechCrunch
12 Days of OpenAI | OpenAI#Alignment#Reasoning#Large Language Models#OpenAI·openai.com·Dec 20, 202412 Days of OpenAI | OpenAI
Greenblatt, R. et al. (2024). Alignment faking in large language models.#Alignment#Paper#Training#Anthropic·assets.anthropic.com·Dec 18, 2024Greenblatt, R. et al. (2024). Alignment faking in large language models.
Beyond Preferences in AI Alignment#AI#Preferences#Alignment#Paper#pddf·arxiv.org·Sep 8, 2024Beyond Preferences in AI Alignment
Reframing superintelligence fhi tr 2019 1Drexler, K. E. (2019). Reframing superintelligence. Future of Humanity Institute.#CAIS#Alignment#Paper#PDF·fhi.ox.ac.uk·Dec 15, 2023Reframing superintelligence fhi tr 2019 1
Weak to strong generalization#OpenAI#Alignment#Paper#PDF·cdn.openai.com·Dec 15, 2023Weak to strong generalization
OpenAI’s Ilya Sutskever Has a Plan for Keeping Super-Intelligent AI in Check#Alignment·wired.com·Dec 15, 2023OpenAI’s Ilya Sutskever Has a Plan for Keeping Super-Intelligent AI in Check
OpenAI Demos a Control Method for Superintelligent AI#IEEE#Alignment·spectrum.ieee.org·Dec 14, 2023OpenAI Demos a Control Method for Superintelligent AI
Now we know what OpenAI’s superalignment team has been up to#OpenAI#Alignment·technologyreview.com·Dec 14, 2023Now we know what OpenAI’s superalignment team has been up to
Weak-to-strong generalization#OpenAI#Alignment#Proxy·openai.com·Dec 14, 2023Weak-to-strong generalization
Superalignment Fast Grants#OpenAI#Alignment#Funding·openai.com·Dec 14, 2023Superalignment Fast Grants
What Sam Altman’s Firing Means for the Future of OpenAI#OpenAI#Alignment·wired.com·Nov 19, 2023What Sam Altman’s Firing Means for the Future of OpenAI
LIMA: Less Is More for Alignment#Machine Learning#Alignment#Paper#PDF#Meta·arxiv.org·May 23, 2023LIMA: Less Is More for Alignment
Does anyone believe AI alignment is a long-term solution?#Quora#Lex Page#Inflection AI#Alignment·lex.page·May 5, 2023Does anyone believe AI alignment is a long-term solution?
Using the Veil of Ignorance to align AI systems with principles of justice | Proceedings of the National Academy of Sciences#DeepMind#Alignment#Paper·pnas.org·Apr 25, 2023Using the Veil of Ignorance to align AI systems with principles of justice | Proceedings of the National Academy of Sciences
Researching Alignment Research: Unsupervised Analysis#Value Alignment#Alignment#AI#Paper#PDF·arxiv.org·Apr 21, 2023Researching Alignment Research: Unsupervised Analysis
Stanford Seminar - Forecasting and Aligning AI#AI#Forecasting#Alignment#Natural Language Processing#Questions and Answers#Machine Learning#Hypermind·youtube.com·Jun 8, 2022Stanford Seminar - Forecasting and Aligning AI
Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents#Reinforcement Learning#Social Science#AGI#Normativity#Alignment·youtube.com·Mar 9, 2022Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents