Forecasting rare language model behaviors \ Anthropic
Reframing superintelligence fhi tr 2019 1
Drexler, K. E. (2019). Reframing superintelligence. Future of Humanity Institute.
Weak to strong generalization
LIMA: Less Is More for Alignment
Researching Alignment Research: Unsupervised Analysis