skip to content
Stephen Van Tran
Table of Contents

OpenAI wants to build a computer you never look at. The company that redefined how millions interact with artificial intelligence through ChatGPT is now making its largest acquisition ever—a $6.5 billion all-stock deal for io, the hardware startup founded by former Apple design chief Jony Ive—and the first product will deliberately lack the one feature we’ve come to expect from every modern device: a screen.

The thesis is audacious and counterintuitive: in an era where Apple, Google, and Meta pour billions into displays—retina screens, spatial computing headsets, augmented reality glasses—OpenAI is betting that the future of computing is invisible. The device, internally codenamed “Gumdrop,” is roughly the size of an old iPod Shuffle, designed to slip into a pocket or hang around your neck, controlled entirely by voice.

This isn’t merely a hardware play. OpenAI is simultaneously overhauling its audio AI capabilities, unifying multiple engineering and research teams to ship a new audio model architecture by the end of Q1 2026. The new model reportedly handles interruptions like an actual conversation partner, speaks while you’re talking—something current models cannot manage—and produces responses that sound dramatically more natural and emotive. The company is building both the brain and the body, and timing them to arrive together.

Sam Altman described the vision in strikingly anti-smartphone terms. Where the iPhone feels like “walking through Times Square,” the OpenAI device aims for something like “sitting in the most beautiful cabin by a lake in the mountains and just enjoying the peace and calm”. It’s a bet that users are exhausted by the attention economy, that the next great interface isn’t more pixels but fewer, and that voice—the oldest human interface—might also be the most intimate.

The graveyard of AI hardware startups offers sobering context. Humane’s AI Pin, valued at $850 million, was returned more than it was sold. Rabbit’s R1 launched with fanfare and landed with a thud. Both devices promised ambient intelligence and delivered clunky, cloud-dependent toys that couldn’t justify their existence alongside a smartphone. Media outlets listed them as the top hardware failures of 2024, exposing the enormous gap between concept and execution in AI hardware.

OpenAI appears to have studied those failures. By acquiring Ive’s 55-person team—including former Apple designers Scott Cannon, Evans Hankey, and Tang Tan, who helped build the iPod, iPhone, and iPad—OpenAI is attempting to combine frontier AI research with the world’s most accomplished consumer hardware design culture. The question isn’t whether they can build something beautiful. The question is whether beautiful is enough when the smartphone in every pocket already does voice, and does it well.

The $6.5 billion bet on calm computing

When OpenAI announced the io acquisition in May 2025, the deal structure told a story beyond the headline number. OpenAI already held a 23% stake in io from an earlier agreement, meaning it paid an additional $5 billion to fully absorb the startup—still OpenAI’s largest acquisition ever. The company wasn’t just buying a product roadmap; it was buying a design philosophy.

Ive has been explicit about his motivations. He sees audio-first design as a chance to “right the wrongs” of past consumer gadgets, addressing the addiction loops that his own iPhone designs inadvertently created. The screenless approach isn’t a technical limitation but a moral stance: by removing the visual interface entirely, you force interactions to be high-intent rather than idle scrolling.

The io team—55 engineers, scientists, physicists, and product specialists—represents more than a decade of Apple’s institutional knowledge transplanted into an AI-first context. Ive retains control of his design firm LoveFrom, which will continue to operate independently, but io now operates as a hardware division within OpenAI, with Ive taking on “deep creative and design responsibilities across OpenAI” more broadly.

The collaboration began two years before the acquisition, when Ive and Altman started quietly working on what they call the “third core device”—a gadget designed to live alongside your phone and laptop rather than replace them. That framing is crucial: unlike Humane and Rabbit, which positioned their devices as smartphone alternatives, OpenAI is explicitly positioning its device as a complement. The pitch isn’t “throw away your phone” but “stop staring at it.”

OpenAI COO Brad Lightcap articulated the strategic vision in a Wall Street Journal interview, describing an opportunity for AI access through an “ambient computer layer” rather than web browsers and mobile apps. The company wants to build AI that is “truly personal”—knowing your context, your habits, your preferences—without requiring you to pull out a screen and type.

The prototype that Altman and Ive revealed in November 2025 is notably screenless and approximately the size of a smartphone, though the final form factor remains fluid. Internal discussions have explored smart speakers, smart glasses, and a pen-like device—all operated by voice without a display. The company isn’t planning just one gadget but a family of devices, with the first consumer release targeted for late 2026 or 2027.

Manufacturing has shifted to Foxconn, Apple’s longtime manufacturing partner, after OpenAI moved production away from China-based Luxshare due to concerns about mainland Chinese manufacturing for a device that will handle sensitive user data. The device will likely be assembled in Vietnam or the United States, aligning with OpenAI’s preference for a non-China supply chain.

Altman has set extraordinary manufacturing ambitions. In a since-leaked conversation with staff, he claimed the goal is to produce 100 million devices “faster than any company has ever shipped 100 million of something new before”. For context, the original iPhone took about five years to reach 100 million units. The ambition suggests OpenAI isn’t thinking of this as a niche product for early adopters but as a mass-market play from day one.

The design philosophy Ive articulated—products that are “incredibly intelligent, sophisticated products that you want to touch, and you feel no intimidation, and you want to use almost carelessly—that you use them almost without thought”—reads as a direct critique of the complexity that has crept into modern interfaces. The goal is a device so intuitive it disappears, leaving only the conversation between human and AI.

Whether that vision translates into a product people actually want is the $6.5 billion question. The io acquisition gives OpenAI world-class design talent, but design excellence has never guaranteed commercial success in hardware. Ask Google about its Pixel phones, or Amazon about its Fire Phone. What Ive brings is the credibility to make the public take OpenAI’s hardware ambitions seriously—and that alone may be worth the premium.

Voice AI gets a new architecture

The hardware is only half the story. OpenAI is simultaneously rebuilding its audio AI from the ground up, with a new model architecture expected by the end of March 2026 that addresses fundamental limitations in current voice assistants.

Today’s GPT-realtime model uses a transformer architecture that struggles with overlapping speech. When a human interrupts, the model pauses awkwardly. When it needs to speak while processing, it can’t. The new architecture reportedly solves both problems, enabling continuous audio exchange where the model speaks while you’re talking—something no current commercial voice AI can manage convincingly.

The technical improvements matter because they change what voice interfaces can actually do. Current voice assistants work best for command-and-control interactions: “Set a timer for ten minutes,” “Play this song,” “What’s the weather?” They struggle with genuine conversation—the back-and-forth, the interruptions, the overlapping speech, the natural rhythm of human dialogue. OpenAI’s new model is designed for conversations that feel natural rather than transactional.

The Information reports that OpenAI has unified several engineering, product, and research teams to accelerate this work, placing the initiative under Kundan Kumar, a former researcher at venture-backed AI provider Character.AI. The organizational restructuring suggests the audio push is now a company-wide priority, not a side project.

Beyond conversational fluidity, the new model aims for emotional intelligence. OpenAI’s documentation describes responses that sound more natural and emotive, providing more accurate, in-depth answers while handling the paralinguistic cues—tone, pacing, emphasis—that make human speech comprehensible. For a screenless device, this isn’t a nice-to-have; it’s existential. Without visual feedback, the voice must carry all the meaning.

OpenAI has already demonstrated capabilities in this direction. Its existing gpt-realtime model processes and generates audio directly through a single model and API, bypassing traditional pipelines that chain together separate speech-to-text and text-to-speech systems. The single-model approach reduces latency, preserves nuance in speech, and produces more natural responses. On OpenAI’s MultiChallenge audio benchmark, gpt-realtime scores 30.5% for instruction-following accuracy, up from 20.6% for the December 2024 model—a meaningful improvement, though the absolute numbers suggest how far voice AI still has to travel.

The device itself will run OpenAI’s tailored AI models locally, with cloud computational support for more intensive tasks. This hybrid approach addresses the latency problem that plagued earlier AI hardware: simple interactions can happen instantly on-device, while complex reasoning can leverage OpenAI’s data centers without the user waiting. The pen-like device will reportedly include microphones and cameras for contextual awareness, converting handwritten notes to text and understanding visual context.

The competitive timing is notable. Google has delayed the sunset of Google Assistant in favor of Gemini Live until March 2026, while Apple has pushed its “Siri V2” overhaul to a Spring 2026 target following a leadership shakeup. Both incumbents are racing to improve their voice assistants, but both are constrained by their existing hardware ecosystems. OpenAI, starting from scratch, can design hardware and software together without legacy compatibility concerns.

The broader context is a Silicon Valley-wide pivot toward audio. TechCrunch frames the moment as big tech “declaring war on screens”, with multiple companies betting that the next interface paradigm will be ears rather than eyes. Whether this reflects genuine user demand or collective wishful thinking about screen addiction remains to be seen. What’s clear is that OpenAI is positioning itself at the center of that bet, building both the models and the hardware to make it real.

Learning from the AI hardware graveyard

OpenAI’s device will not be the first AI-native hardware to hit the market. It will be the latest in a line that includes spectacular failures—and understanding those failures is essential to understanding whether OpenAI can succeed.

Humane’s AI Pin launched in early 2024 with $230 million in funding and a striking design: a small, screenless device worn on the chest, featuring a laser projector that displayed information on the user’s palm. The reviews were devastating. The Verge called it “the solution to none of your problems.” The device was slow, the battery life was poor, the voice recognition was unreliable, and perhaps most damning, everything it did could be done better by the smartphone already in users’ pockets. By late 2024, Humane was returning more units than it sold and actively seeking a buyer.

Rabbit’s R1 fared little better. The $199 device promised a new interaction paradigm based on “Large Action Models”—AI that could perform tasks across apps rather than just answering questions. The reality was a device with missing features, performance problems, and a growing sense that it was inessential. Security researchers discovered vulnerabilities that exposed user data. The founder admitted they had ignored security issues in early development. Both devices made year-end lists of the worst hardware failures of 2024.

The common thread isn’t that the devices were bad ideas but that they were incomplete executions. Both depended entirely on cloud processing, meaning every interaction required network latency. Both launched with promises of future capabilities rather than present utility. Both tried to replace the smartphone while offering a fraction of its functionality. And both failed to answer the basic question that every new device category must answer: what does this do that my phone doesn’t?

OpenAI’s approach differs in several important ways. First, the device runs models locally, reducing dependence on cloud latency. Second, OpenAI is explicitly framing the device as a complement to phones and laptops, not a replacement. Third, the company is building both the AI models and the hardware simultaneously, allowing tight integration that neither Humane nor Rabbit could achieve with off-the-shelf models.

Fourth—and perhaps most importantly—OpenAI has Jony Ive. This isn’t about celebrity; it’s about institutional knowledge. Ive’s team designed the iPod, which created an entirely new device category. They designed the iPhone, which killed the category they had just created. They understand, at a visceral level, what makes a device worth carrying—the weight, the materials, the interactions, the moments of delight that transform a gadget into a habit.

The lesson from failed AI hardware is brutally simple: technical capability is necessary but not sufficient. The Humane AI Pin was technically impressive. The Rabbit R1 was genuinely novel. Neither was useful enough, often enough, to justify its existence. OpenAI’s device will face the same bar: not “is it cool?” but “is it better than just talking to ChatGPT on my phone?”

One overlooked insight from the AI hardware failures is the positioning problem. Rabbit never clarified whether the R1 was a productivity tool, a digital assistant for professionals, or a toy for tech enthusiasts. By trying to be everything to everyone, it ended up being nothing to anyone. OpenAI’s “calm computing” framing is at least a clear positioning: this is for people who feel overwhelmed by their screens and want a quieter way to access AI.

Whether that’s a large enough market remains an open question. The people most overwhelmed by screens are often the least likely to spend $500-plus on an experimental new device category. The early adopters who buy first-generation hardware tend to be tech enthusiasts who want more capabilities, not fewer. OpenAI’s challenge is to make “doing less” feel like getting more.

The paths to victory and the ways this breaks

The bull case for OpenAI’s device writes itself. AI is becoming the default interface for an expanding range of tasks—research, writing, scheduling, coding, creative work. The current access modes—typing into ChatGPT on a laptop, talking to a phone—are clunky and context-poor. A dedicated device that knows your habits, hears your environment, and sits ready for natural conversation could become the primary way people interact with AI. If voice AI reaches parity with human conversation, and if OpenAI maintains its model lead, the company that owns the interface could capture enormous value.

Altman has articulated this as a vision for “ambient computing”—AI that surrounds you rather than requiring you to summon it. The phrase echoes Calm Technology principles from the 1990s: computing that recedes into the background, informing without demanding attention. The OpenAI device would be the first serious attempt to build a mass-market calm computing device, which is either visionary or a very expensive experiment in UX philosophy.

The bear case is equally clear. Smartphones are not going away. Every major tech company has tried and failed to create a successful post-smartphone device category. Apple’s Watch succeeded only by becoming an iPhone accessory rather than a replacement. Meta’s VR headsets remain niche despite billions in investment. Google Glass became a punchline. The smartphone is the most successful consumer electronics product in history, and it already does voice—Siri, Google Assistant, ChatGPT itself are all available through the device in your pocket.

The technical challenges are formidable. For a screenless device to work, users must have absolute confidence in the AI’s verbal accuracy, since there’s no screen to verify output. The “hallucination” problem that plagues all large language models becomes existential: if the device confidently says something wrong, and you act on it, there’s no visual trace of the error. Current AI models still hallucinate regularly, and there’s no evidence that OpenAI has solved this problem.

Privacy concerns loom over any device that sees and hears everything. A screenless gadget with microphones and cameras, designed to understand your context and behavior, raises questions about data security that OpenAI has not yet fully addressed. The company’s shift to Foxconn manufacturing explicitly cited concerns about Chinese supply chains, suggesting awareness of the sensitivity—but geopolitical supply chain decisions don’t answer questions about what data the device collects, where it’s stored, and who has access.

The “personality” problem is reported to be a significant internal debate. What should this device feel like? How should it handle conflict, uncertainty, sensitive topics? For a text-based chatbot, these questions are manageable—the user can read carefully, ask follow-up questions, correct misunderstandings. For a voice-only device, the personality becomes the interface. Get it wrong and users feel creeped out, condescended to, or simply annoyed.

Battery life, durability, and everyday utility remain unknowns. The iPod Shuffle form factor suggests something that clips on and forgets—but a device that’s always listening requires power, and batteries that last all day while running AI models are a genuine engineering challenge. Will users remember to charge another device? Will they carry it when they already have a phone? Will the utility justify the friction?

For operators and investors, the realistic assessment is this: OpenAI is making the largest bet in AI hardware history, with the best design team available, at a moment when voice AI is meaningfully improving. The company has the resources, the talent, and the model capabilities to succeed. But success requires threading multiple needles simultaneously: technical performance, user experience, privacy, manufacturing, pricing, positioning, and timing. The failures of Humane and Rabbit prove that threading even most of those needles isn’t enough.

The following operator checklist distills what matters for anyone watching this space, building adjacent products, or considering whether OpenAI’s hardware ambitions change the competitive landscape:

  • Watch the audio model release in Q1 2026. If the new architecture delivers on promises—natural interruption handling, simultaneous speech, emotional expression—it validates the core technical thesis. If it underdelivers, the device becomes a beautiful shell waiting for better software.

  • Track Foxconn production timelines. Hardware delays are the norm in consumer electronics, and a slip from late 2026 to 2027 or beyond would give competitors time to catch up. Apple and Google are not standing still.

  • Monitor enterprise versus consumer positioning. OpenAI might pivot to enterprise customers first—ambient AI for knowledge workers, sales teams, or field service—where the value proposition is clearer and the price tolerance is higher.

  • Assess the pricing signal. A $500 device signals premium positioning; a $199 device signals mass-market ambition. The price will reveal who OpenAI thinks the customer is.

  • Evaluate the “complement” story. If OpenAI devices require a paired smartphone to work properly, the value proposition weakens. If they’re genuinely standalone, the addressable market expands.

  • Watch for developer platform announcements. If OpenAI opens the device to third-party apps and skills, it’s building a platform. If it keeps the experience closed, it’s building a product.

The device will arrive in a world where voice assistants have disappointed for a decade, where AI hardware startups have failed spectacularly, and where the smartphone remains the center of digital life. OpenAI is betting that Jony Ive’s design sensibility, frontier AI capabilities, and cultural exhaustion with screen addiction create an opening for something new. It’s a bet worth watching, even if it’s not yet a bet worth making.