If you’d looked at Skype in 2004 and argued that it would own ‘voice’ on ‘computers’, that would not have been the right mental model. I think this is where we’ll go with video - there will continue to be hard engineering, but video itself will be a commodity and the question will be how you wrap it. There will be video in everything, just as there is voice in everything, and there will be a great deal of proliferation into industry verticals on one hand and into unbundling pieces of the tech stack on the other. On one hand video in healthcare, education or insurance is about the workflow, the data model and the route to market, and lots more interesting companies will be created, and on the other hand Slack is deploying video on top of Amazon’s building blocks, and lots of interesting companies will be created here as well. There’s lots of bundling and unbundling coming, as always. Everything will be ‘video’ and then it will disappear inside.
the calendar is often the aggregation layer - you don’t need to know what service the next call uses, just when it is. Skype needed both an account and an app, so had a network effect (and lost even so). WhatsApp uses the telephone numbering system as an address and so piggybacked on your phone’s contact list - effectively, it used the PSTN as the social graph rather than having to build its own. But a group video call is a URL and a calendar invitation - it has no graph of its own.
one of the ways that this all feels very 1.0 is the rather artificial distinction between calls that are based on a ‘room’, where the addressing system is a URL and anyone can join without an account, and calls that are based on ‘people’, where everyone joining needs their own address, whether it’s a phone number, an account or something else. Hence Google has both Meet (URLs) and Duo (people) - Apple’s FaceTime is only people (no URLs).
When Snap launched, there were already infinite ways to share images, but Snap asked a bunch of weird questions that no-one had really asked before. Why do you have to press the camera button - why doesn’t the app open in the camera? Why are you saving your messages - isn’t that like saving all your phone calls? Fundamentally, Snap asked ‘why, exactly, are you sending a picture? What is the underlying social purpose?’ You’re not really sending someone a sheet of pixels - you’re communicating.
That’s the question Zoom and all its competitors haven’t really asked. Zoom has done a good job of asking why it was hard to get into a call, but it hasn’t asked why you’re in the call in the first place. Why, exactly, are you sending someone a video stream and watching another one? Why am I looking at a grid of little thumbnails of faces? Is that the purpose of this moment? What is the ‘mute’ button for - background noise, or so I can talk to someone else, or is it so I can turn it off to raise my hand? What social purpose is ‘mute’ actually serving? What is screen-sharing for? What other questions could one ask? And so if Zoom is the Dropbox or Skype of video, we are waiting for the Snap, Clubhouse and Yo.