Deliberative alignment: reasoning enables safer language models | OpenAI
It’s Time to Move Past AI Nationalism
OpenAI trained o1 and o3 to 'think' about its safety policy | TechCrunch
12 Days of OpenAI | OpenAI
Greenblatt, R. et al. (2024). Alignment faking in large language models.
Beyond Preferences in AI Alignment
Reframing superintelligence fhi tr 2019 1
Drexler, K. E. (2019). Reframing superintelligence. Future of Humanity Institute.
Weak to strong generalization
OpenAI’s Ilya Sutskever Has a Plan for Keeping Super-Intelligent AI in Check
OpenAI Demos a Control Method for Superintelligent AI
Now we know what OpenAI’s superalignment team has been up to
Weak-to-strong generalization
Superalignment Fast Grants
What Sam Altman’s Firing Means for the Future of OpenAI
LIMA: Less Is More for Alignment
Does anyone believe AI alignment is a long-term solution?
Using the Veil of Ignorance to align AI systems with principles of justice | Proceedings of the National Academy of Sciences
AI Alignment Forum
Researching Alignment Research: Unsupervised Analysis
Stanford Seminar - Forecasting and Aligning AI
Building Aligned AI
Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents