If you can get an AI vendor to include a few tailored toxic entries—you don’t seem to need that many, even for a large model—the attacker can affect outcomes generated by the system as a whole.
The attacks apply to seemingly every modern type of AI model. They don’t seem to require any special knowledge about the internals of the system—black box attacks have been demonstrated to work on a number of occasions—which means that OpenAI’s secrecy is of no help.
They seem to be able to target specific keywords for manipulation. That manipulation can be a change in sentiment (always positive or always negative), meaning (forced mistranslations), or quality (degraded output for that keyword). The keyword doesn’t have to be mentioned in the toxic entries. Systems built on federated learning seem to be as vulnerable as the rest.
Turns out that language models can also be poisoned during fine-tuning
The researchers managed to do both keyword manipulation and degrade output with as few as a hundred toxic entries, and they discover that large models are less stable and more vulnerable to poisoning. They also discovered that preventing these attacks is extremely difficult, if not realistically impossible.
This means that OpenAI and ChatGPT as a product is overpriced. We don’t know if their products have serious defects or not. It means that OpenAI, as an organisation, is probably overvalued by investors.
The only rational option the rest of us have is to price them as if their products are defective and manipulated.