Constitutional Classifiers: Defending against Universal Jailbreaks...#Anthropic#Classification#Safety#Large Language Models#Paper#PDF·arxiv.org·Feb 3, 2025Constitutional Classifiers: Defending against Universal Jailbreaks...
This is one of the most important papers on AI in years. Truly it covers the sweep of reactions to the Great Big AI Crisis with exceptional cogency, using an exceptional metaphoric device.#AI#Social Theory#Paper#Classification#Metaphor·x.com·Jul 7, 2024This is one of the most important papers on AI in years. Truly it covers the sweep of reactions to the Great Big AI Crisis with exceptional cogency, using an exceptional metaphoric device.
Simple probes can catch sleeper agents \ Anthropic#Training#Large Language Models#Anthropic#Paper#Classification#Cybersecurity·anthropic.com·Apr 24, 2024Simple probes can catch sleeper agents \ Anthropic
Who's Harry Potter? Approximate Unlearning in LLMsDownload PDF#Machine Learning#Large Language Models#Paper#PDF#Fine-Tuning#Microsoft#Classification·arxiv.org·Dec 27, 2023Who's Harry Potter? Approximate Unlearning in LLMs
Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools#ChatGPT#Classification#Paper#PDF·sciencedirect.com·Jun 15, 2023Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools
DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text#Large Language Models#Classification#Paper#PDF·arxiv.org·Jun 12, 2023DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text
Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools#Generative Models#Classification#Writing#Paper#PDF#ChatGPT·cell.com·Jun 8, 2023Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools
Paraphrase Detection: Human vs. Machine Content#Paraphrase#Quora#Paper#PDF#Classification·arxiv.org·May 7, 2023Paraphrase Detection: Human vs. Machine Content
ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text#ChatGPT#Classification#Political Science#Paper#PDF·arxiv.org·May 1, 2023ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text