Search Test Information Space

Found 7 bookmarks

Custom sorting

The State of Multilingual LLM Safety Research: From Measuring the...

#Multilingual #Large Language Models #Safety #Research #Report #Paper #PDF

·arxiv.org·Jun 3, 2025

The State of Multilingual LLM Safety Research: From Measuring the...

Scaling Laws For Scalable Oversight

(A security aspect contrasting Compton might be that tactical versions are initiated to have controlled chain reactions and then vanish, also not unlike Houdini, or a locked Roomba mystery, so there may be a forensic science. Also relate to prior paper on MAIM's version of MAD and articles on quantum hacks.))

#Safety #Paper #PDF #Benchmark

·arxiv.org·May 10, 2025

Scaling Laws For Scalable Oversight

An approach to technical agi safety apr 2025

#DeepMind #AGI #Safety #Paper #PDF

·storage.googleapis.com·Apr 2, 2025

An approach to technical agi safety apr 2025

Constitutional Classifiers: Defending against Universal Jailbreaks...

#Anthropic #Classification #Safety #Large Language Models #Paper #PDF

·arxiv.org·Feb 3, 2025

Constitutional Classifiers: Defending against Universal Jailbreaks...

International AI Safety Report 2025

International AI Safety Report 2025

#Report #International #AI #Safety #Paper #PDF

·gov.uk·Jan 29, 2025

International AI Safety Report 2025

Can Go AIs be adversarially robust?

#Robustness #Strategy #Games #Safety #AI #Paper #PDF

·arxiv.org·Jul 15, 2024

Can Go AIs be adversarially robust?

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

GS levels.

#AI #Robust AI #Reliability #Safety #Paper #PDF

·arxiv.org·May 14, 2024

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems