Search Saved

The AI trust crisis

The AI trust crisis 14th December 2023 Dropbox added some new AI features. In the past couple of days these have attracted a firestorm of criticism. Benj Edwards rounds it up in Dropbox spooks users with new AI features that send data to OpenAI when used. The key issue here is that people are worried that their private files on Dropbox are being passed to OpenAI to use as training data for their models—a claim that is strenuously denied by Dropbox. As far as I can tell, Dropbox built some sensible features—summarize on demand, “chat with your data” via Retrieval Augmented Generation—and did a moderately OK job of communicating how they work... but when it comes to data privacy and AI, a “moderately OK job” is a failing grade. Especially if you hold as much of people’s private data as Dropbox does! Two details in particular seem really important. Dropbox have an AI principles document which includes this: Customer trust and the privacy of their data are our foundation. We will not use customer data to train AI models without consent. They also have a checkbox in their settings that looks like this: Update: Some time between me publishing this article and four hours later, that link stopped working. I took that screenshot on my own account. It’s toggled “on”—but I never turned it on myself. Does that mean I’m marked as “consenting” to having my data used to train AI models? I don’t think so: I think this is a combination of confusing wording and the eternal vagueness of what the term “consent” means in a world where everyone agrees to the terms and conditions of everything without reading them. But a LOT of people have come to the conclusion that this means their private data—which they pay Dropbox to protect—is now being funneled into the OpenAI training abyss. People don’t believe OpenAI # Here’s copy from that Dropbox preference box, talking about their “third-party partners”—in this case OpenAI: Your data is never used to train their internal models, and is deleted from third-party servers within 30 days. It’s increasing clear to me like people simply don’t believe OpenAI when they’re told that data won’t be used for training. What’s really going on here is something deeper then: AI is facing a crisis of trust. I quipped on Twitter: “OpenAI are training on every piece of data they see, even when they say they aren’t” is the new “Facebook are showing you ads based on overhearing everything you say through your phone’s microphone” Here’s what I meant by that. Facebook don’t spy on you through your microphone # Have you heard the one about Facebook spying on you through your phone’s microphone and showing you ads based on what you’re talking about? This theory has been floating around for years. From a technical perspective it should be easy to disprove: Mobile phone operating systems don’t allow apps to invisibly access the microphone. Privacy researchers can audit communications between devices and Facebook to confirm if this is happening. Running high quality voice recognition like this at scale is extremely expensive—I had a conversation with a friend who works on server-based machine learning at Apple a few years ago who found the entire idea laughable. The non-technical reasons are even stronger: Facebook say they aren’t doing this. The risk to their reputation if they are caught in a lie is astronomical. As with many conspiracy theories, too many people would have to be “in the loop” and not blow the whistle. Facebook don’t need to do this: there are much, much cheaper and more effective ways to target ads at you than spying through your microphone. These methods have been working incredibly well for years. Facebook gets to show us thousands of ads a year. 99% of those don’t correlate in the slightest to anything we have said out loud. If you keep rolling the dice long enough, eventually a coincidence will strike. Here’s the thing though: none of these arguments matter. If you’ve ever experienced Facebook showing you an ad for something that you were talking about out-loud about moments earlier, you’ve already dismissed everything I just said. You have personally experienced anecdotal evidence which overrides all of my arguments here.

One consistent theme I’ve seen in conversations about this issue is that people are much more comfortable trusting their data to local models that run on their own devices than models hosted in the cloud. The good news is that local models are consistently both increasing in quality and shrinking in size.

#misinfo #conspiracies #privacy #tech #companies/OpenAI #data #ethics #discourse/counterclaims #advertising

·simonwillison.net·Oct 10, 2024

The AI trust crisis

Surprise! The Latest ‘Comprehensive’ US Privacy Bill Is Doomed

Deleting sections of a bill holding companies accountable for making data-driven decisions that could lead to discrimination in housing, employment, health care, and the like spurred a strong response from civil society organizations including the NAACP, the Japanese American Citizens League, the Autistic Self Advocacy Network, and Asian Americans Advancing Justice, among dozens of others.

In a letter this week to E&C Democrats, obtained by WIRED, the groups wrote: “Privacy rights and civil rights are no longer separate concepts—they are inextricably bound together and must be protected. Abuse of our data is no longer limited to targeted advertising or data breaches. Instead, our data are used in decisions about who gets a mortgage, who gets into which schools, and who gets hired—and who does not.”

these provisions contained generous “pro-business” caveats. For instance, users would be able to opt out of algorithmic decisionmaking only if doing so wasn’t “prohibitively costly” or “demonstrably impracticable due to technological limitations.” Similarly, companies could have limited the public’s knowledge about the results of any audits by simply hiring an independent assessor to complete the task rather than doing so internally.

#news #privacy #ethics #legal/laws

·wired.com·Jun 30, 2024

Surprise! The Latest ‘Comprehensive’ US Privacy Bill Is Doomed

AI and problems of scale — Benedict Evans

Scaling technological abilities can itself represent a qualitative change, where a difference in degree becomes a difference in kind, requiring new ways of thinking about ethical and regulatory implications. These are usually a matter of social, cultural, and political considerations rather than purely technical ones

what if every police patrol car had a bank of cameras that scan not just every number plate but every face within a hundred yards against a national database of outstanding warrants? What if the cameras in the subway do that? All the connected cameras in the city? China is already trying to do this, and we seem to be pretty sure we don’t like that, but why? One could argue that there’s no difference in principle, only in scale, but a change in scale can itself be a change in principle.

As technology advances, things that were previously possible only on a small scale can become practically feasible at a massive scale, which can change the nature and implications of those capabilities

Generative AI is now creating a lot of new examples of scale itself as a difference in principle. You could look the emergent abuse of AI image generators, shrug, and talk about Photoshop: there have been fake nudes on the web for as long as there’s been a web. But when high-school boys can load photos of 50 or 500 classmates into an ML model and generate thousands of such images (let’s not even think about video) on a home PC (or their phone), that does seem like an important change. Faking people’s voices has been possible for a long time, but it’s new and different that any idiot can do it themselves. People have always cheated at homework and exams, but the internet made it easy and now ChatGPT makes it (almost) free. Again, something that has always been theoretically possible on a small scale becomes practically possible on a massive scale, and that changes what it means.

This might be a genuinely new and bad thing that we don’t like at all; or, it may be new and we decide we don’t care; we may decide that it’s just a new (worse?) expression of an old thing we don’t worry about; and, it may be that this was indeed being done before, even at scale, but somehow doing it like this makes it different, or just makes us more aware that it’s being done at all. Cambridge Analytica was a hoax, but it catalysed awareness of issues that were real

As new technologies emerge, there is often a period of ambivalence and uncertainty about how to view and regulate them, as they may represent new expressions of old problems or genuinely novel issues.

#generative #automation #gpt #ai #trends #legal/laws #privacy #ethics

·ben-evans.com·May 3, 2024

AI and problems of scale — Benedict Evans