Search Test Information Space

Found 9 bookmarks

Custom sorting

Constitutional Classifiers: Defending against Universal Jailbreaks...

#Anthropic #Classification #Safety #Large Language Models #Paper #PDF

·arxiv.org·Feb 3, 2025

Constitutional Classifiers: Defending against Universal Jailbreaks...

This is one of the most important papers on AI in years. Truly it covers the sweep of reactions to the Great Big AI Crisis with exceptional cogency, using an exceptional metaphoric device.

#AI #Social Theory #Paper #Classification #Metaphor

·x.com·Jul 7, 2024

This is one of the most important papers on AI in years. Truly it covers the sweep of reactions to the Great Big AI Crisis with exceptional cogency, using an exceptional metaphoric device.

Simple probes can catch sleeper agents \ Anthropic

#Training #Large Language Models #Anthropic #Paper #Classification #Cybersecurity

·anthropic.com·Apr 24, 2024

Simple probes can catch sleeper agents \ Anthropic

Who's Harry Potter? Approximate Unlearning in LLMs

Download PDF

#Machine Learning #Large Language Models #Paper #PDF #Fine-Tuning #Microsoft #Classification

·arxiv.org·Dec 27, 2023

Who's Harry Potter? Approximate Unlearning in LLMs

Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools

#ChatGPT #Classification #Paper #PDF

·sciencedirect.com·Jun 15, 2023

Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

#Large Language Models #Classification #Paper #PDF

·arxiv.org·Jun 12, 2023

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools

#Generative Models #Classification #Writing #Paper #PDF #ChatGPT

·cell.com·Jun 8, 2023

Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools

Paraphrase Detection: Human vs. Machine Content

#Paraphrase #Quora #Paper #PDF #Classification

·arxiv.org·May 7, 2023

Paraphrase Detection: Human vs. Machine Content

ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text

#ChatGPT #Classification #Political Science #Paper #PDF

·arxiv.org·May 1, 2023

ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text