Search Test Information Space

Found 4 bookmarks

Newest

Constitutional Classifiers: Defending against Universal Jailbreaks...

#Anthropic #Classification #Safety #Large Language Models #Paper #PDF

·arxiv.org·Feb 3, 2025

Constitutional Classifiers: Defending against Universal Jailbreaks...

Simple probes can catch sleeper agents \ Anthropic

#Training #Large Language Models #Anthropic #Paper #Classification #Cybersecurity

·anthropic.com·Apr 24, 2024

Simple probes can catch sleeper agents \ Anthropic

Who's Harry Potter? Approximate Unlearning in LLMs

Download PDF

#Machine Learning #Large Language Models #Paper #PDF #Fine-Tuning #Microsoft #Classification

·arxiv.org·Dec 27, 2023

Who's Harry Potter? Approximate Unlearning in LLMs

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

#Large Language Models #Classification #Paper #PDF

·arxiv.org·Jun 12, 2023

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text