Found 4 bookmarks
Newest
Vision Pro is an over-engineered “devkit” // Hardware bleeds genius & audacity but software story is disheartening // What we got wrong at Oculus that Apple got right // Why Meta could finally have its Android moment
Vision Pro is an over-engineered “devkit” // Hardware bleeds genius & audacity but software story is disheartening // What we got wrong at Oculus that Apple got right // Why Meta could finally have its Android moment
Some of the topics I touch on: Why I believe Vision Pro may be an over-engineered “devkit” The genius & audacity behind some of Apple’s hardware decisions Gaze & pinch is an incredible UI superpower and major industry ah-ha moment Why the Vision Pro software/content story is so dull and unimaginative Why most people won’t use Vision Pro for watching TV/movies Apple’s bet in immersive video is a total game-changer for live sports Why I returned my Vision Pro… and my Top 10 wishlist to reconsider Apple’s VR debut is the best thing that ever happened to Oculus/Meta My unsolicited product advice to Meta for Quest Pro 2 and beyond
Apple really played it safe in the design of this first VR product by over-engineering it. For starters, Vision Pro ships with more sensors than what’s likely necessary to deliver Apple’s intended experience. This is typical in a first-generation product that’s been under development for so many years. It makes Vision Pro start to feel like a devkit.
A sensor party: 6 tracking cameras, 2 passthrough cameras, 2 depth sensors(plus 4 eye-tracking cameras not shown)
it’s easy to understand two particularly important decisions Apple made for the Vision Pro launch: Designing an incredible in-store Vision Pro demo experience, with the primary goal of getting as many people as possible to experience the magic of VR through Apple’s lenses — most of whom have no intention to even consider a $4,000 purchase. The demo is only secondarily focused on actually selling Vision Pro headsets. Launching an iconic woven strap that photographs beautifully even though this strap simply isn’t comfortable enough for the vast majority of head shapes. It’s easy to conclude that this decision paid off because nearly every bit of media coverage (including and especially third-party reviews on YouTube) uses the woven strap despite the fact that it’s less comfortable than the dual loop strap that’s “hidden in the box”.
Apple’s relentless and uncompromising hardware insanity is largely what made it possible for such a high-res display to exist in a VR headset, and it’s clear that this product couldn’t possibly have launched much sooner than 2024 for one simple limiting factor — the maturity of micro-OLED displays plus the existence of power-efficient chipsets that can deliver the heavy compute required to drive this kind of display (i.e. the M2).
·hugo.blog·
Vision Pro is an over-engineered “devkit” // Hardware bleeds genius & audacity but software story is disheartening // What we got wrong at Oculus that Apple got right // Why Meta could finally have its Android moment
Natural Language Is an Unnatural Interface
Natural Language Is an Unnatural Interface
On the user experience of interacting with LLMs
Prompt engineers not only need to get the model to respond to a given question but also structure the output in a parsable way (such as JSON), in case it needs to be rendered in some UI components or be chained into the input of a future LLM query. They scaffold the raw input that is fed into an LLM so the end user doesn’t need to spend time thinking about prompting at all.
From the user’s side, it’s hard to decide what to ask while providing the right amount of context.From the developer’s side, two problems arise. It’s hard to monitor natural language queries and understand how users are interacting with your product. It’s also hard to guarantee that an LLM can successfully complete an arbitrary query. This is especially true for agentic workflows, which are incredibly brittle in practice.
When we speak to other people, there is a shared context that we communicate under. We’re not just exchanging words, but a larger information stream that also includes intonation while speaking, hand gestures, memories of each other, and more. LLMs unfortunately cannot understand most of this context and therefore, can only do as much as is described by the prompt
most people use LLMs for ~4 basic natural language tasks, rarely taking advantage of the conversational back-and-forth built into chat systems:Summarization: Summarizing a large amount of information or text into a concise yet comprehensive summary. This is useful for quickly digesting information from long articles, documents or conversations. An AI system needs to understand the key ideas, concepts and themes to produce a good summary.ELI5 (Explain Like I'm 5): Explaining a complex concept in a simple, easy-to-understand manner without any jargon. The goal is to make an explanation clear and simple enough for a broad, non-expert audience.Perspectives: Providing multiple perspectives or opinions on a topic. This could include personal perspectives from various stakeholders, experts with different viewpoints, or just a range of ways a topic can be interpreted based on different experiences and backgrounds. In other words, “what would ___ do?”Contextual Responses: Responding to a user or situation in an appropriate, contextualized manner (via email, message, etc.). Contextual responses should feel organic and on-topic, as if provided by another person participating in the same conversation.
Prompting nearly always gets in the way because it requires the user to think. End users ultimately do not wish to confront an empty text box in accomplishing their goals. Buttons and other interactive design elements make life easier.The interface makes all the difference in crafting an AI system that augments and amplifies human capabilities rather than adding additional cognitive load.Similar to standup comedy, delightful LLM-powered experiences require a subversion of expectation.
Users will expect the usual drudge of drafting an email or searching for a nearby restaurant, but instead will be surprised by the amount of work that has already been done for them from the moment that their intent is made clear. For example, it would a great experience to discover pre-written email drafts or carefully crafted restaurant and meal recommendations that match your personal taste.If you still need to use a text input box, at a minimum, also provide some buttons to auto-fill the prompt box. The buttons can pass LLM-generated questions to the prompt box.
·varunshenoy.substack.com·
Natural Language Is an Unnatural Interface
Vision Pro — Benedict Evans
Vision Pro — Benedict Evans
Meta, today, has roughly the right price and is working forward to the right device: Apple has started with the right device and will work back to the right price. Meta is trying to catalyse an ecosystem while we wait for the right hardware - Apple is trying to catalyse an ecosystem while we wait for the right price.
one of the things I wondered before the event was how Apple would show a 3D experience in 2D. Meta shows either screenshots from within the system (with the low visual quality inherent in the spec you can make and sell for $500) or shots of someone wearing the headset and grinning - neither are satisfactory. Apple shows the person in the room, with the virtual stuff as though it was really there, because it looks as though it is.
For Meta, the device places you in ‘the metaverse’ and there could be many experiences within that. For Apple, this device itself doesn’t take you anywhere - it’s a screen and there could be five different ‘metaverse’ apps. This iPhone was a piece of glass that could be anything - this is trying to be a piece of glass that can show anything.
A lot of what Apple shows is possibility and experiment - it could be this, this or that, just as when Apple launched the watch it suggested it as fitness, social or fashion, and it turn out to work best for fitness (and is now a huge business).
Mark Zuckerberg, speaking to a Meta all-hands after Apple’s event, made the perfectly reasonable point that Apple hasn’t shown much that no-one had thought of before - there’s no ‘magic’ invention. Everyone already knows we need better screens, eye-tracking and hand-tracking, in a thin and light device.
It’s worth remembering that Meta isn’t in this to make a games device, nor really to sell devices per se - rather, the thesis is that if VR is the next platform, Meta has to make sure it isn’t controlled by a platform owner who can screw them, as Apple did with IDFA in 2021.
On the other hand, the Vision Pro is an argument that current devices just aren’t good enough to break out of the enthusiast and gaming market, incremental improvement isn’t good enough either, and you need a step change in capability.
Apple’s privacy positioning, of course, has new strategic value now that it’s selling a device you wear that’s covered in cameras
the genesis of the current wave of VR was the realisation a decade ago that the VR concepts of the 1990s would work now, and with nothing more than off-the-shelf smartphone components and gaming PCs, plus a bit more work. But ‘a bit more work’ turned out to be thirty or forty billion dollars from Meta and God only knows how much more from Apple - something over $100bn combined, almost certainly.
So it might be that a wearable screen of any kind, no matter how good, is just a staging post - the summit of a foothill on the way to the top of Everest. Maybe the real Reality device is glasses, or contact lenses projecting onto your retina, or some kind of neural connection, all of which might be a decade or decades away again, and the piece of glass in our pocket remains the right device all the way through.
I think the price and the challenge of category creation are tightly connected. Apple has decided that the capabilities of the Vision Pro are the minimum viable product - that it just isn’t worth making or selling a device without a screen so good you can’t see the pixels, pass-through where you can’t see any lag, perfect eye-tracking and perfect hand-tracking. Of course the rest of the industry would like to do that, and will in due course, but Apple has decided you must do that.
For VR, better screens are merely better, but for AR Apple thinks this this level of display system is a base below which you don’t have a product at all.
For Meta, the device places you in ‘the metaverse’ and there could be many experiences within that. For Apple, this device itself doesn’t take you anywhere - it’s a screen and there could be five different ‘metaverse’ apps. The iPhone was a piece of glass that could be anything - this is trying to be a piece of glass that can show anything.
This reminds me a little of when Meta tried to make a phone, and then a Home Screen for a phone, and Mark Zuckerberg said “your phone should be about people.” I thought “no, this is a computer, and there are many apps, some of which are about people and some of which are not.” Indeed there’s also an echo of telco thinking: on a feature phone, ‘internet stuff’ was one or two icons on your portable telephone, but on the iPhone the entire telephone was just one icon on your computer. On a Vision Pro, the ‘Meta Metaverse’ is one app amongst many. You have many apps and panels, which could be 2D or 3D, or could be spaces.
·ben-evans.com·
Vision Pro — Benedict Evans
Technical debt - Wikipedia
Technical debt - Wikipedia
In software development, technical debt (also known as design debt[1] or code debt) is the implied cost of additional rework caused by choosing an easy (limited) solution now instead of using a better approach that would take longer.[2] Analogous with monetary debt,[3] if technical debt is not repaid, it can accumulate "interest", making it harder to implement changes. Unaddressed technical debt increases software entropy and cost of further rework.
Common causes of technical debt include: Ongoing development, long series of project enhancements over time renders old solutions sub-optimal.
When I think about Adobe's reliance on entrenched menu panels and new menus with new/inconsistent interfaces I think of this. They've lasted so long that new features are all stapled on as menus instead of integrated throughout the whole system. Some ideas require a rethink of the whole interface, something Adobe can't afford because they're moving too much and don't have the resources to dedicate to soemthing of that scale?
Parallel development on multiple branches accrues technical debt because of the work required to merge the changes into a single source base. The more changes done in isolation, the more debt.
Similarly, this reminds me of the Gmail redesign's "blue-gate" where designers on Twitter pointed out how many different tones of Blue were in different aspects of the redesign. It seemed apparent that each component of the interface had it's own dedicated team, and the inconsistencies in appearance/interface design came from non-thorough communication between the teams.
·en.wikipedia.org·
Technical debt - Wikipedia