Search Test Information Space

Found 7 bookmarks

Custom sorting

Greenblatt, R. et al. (2024). Alignment faking in large language models.

·assets.anthropic.com·Dec 18, 2024

Beyond Preferences in AI Alignment

·arxiv.org·Sep 8, 2024

Reframing superintelligence fhi tr 2019 1

Drexler, K. E. (2019). Reframing superintelligence. Future of Humanity Institute.

·fhi.ox.ac.uk·Dec 15, 2023

Weak to strong generalization

·cdn.openai.com·Dec 15, 2023

LIMA: Less Is More for Alignment

·arxiv.org·May 23, 2023

Using the Veil of Ignorance to align AI systems with principles of justice | Proceedings of the National Academy of Sciences

·pnas.org·Apr 25, 2023

Researching Alignment Research: Unsupervised Analysis

·arxiv.org·Apr 21, 2023