Constitutional Classifiers: Defending against Universal Jailbreaks...
This is one of the most important papers on AI in years. Truly it covers the sweep of reactions to the Great Big AI Crisis with exceptional cogency, using an exceptional metaphoric device.
Simple probes can catch sleeper agents \ Anthropic
Who's Harry Potter? Approximate Unlearning in LLMs
Download PDF
Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools
DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text
Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools
Paraphrase Detection: Human vs. Machine Content
ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text