Speech to Reality
On-Demand Production using Natural Language, 3D Generative AI, and Discrete Robotic Assembly






Alexander Htet Kyaw, Se Hwan Jeon, Miana Smith, Neil Gershenfeld
Massachusetts Institute of Technology
Department of Architecture
Department of Electrical Engineering and Computer Science
Department of Mechanical Engineering
Media Lab | Center for Bits and Atoms

Sponsored by Morningside Academy of Design and the Steve Jobs Archive




Recent advancements in generative AI, such as text-to-3D tools, have made it possible to create 3D digital models from text inputs. However, translating these digital mesh models into tangible physical objects remains a challenge due to the constraints of the real world, such as fabrication time, geometric complexity, and material waste. Our innovation addresses these challenges by integrating generative AI, natural language processing, robotic assembly, and modular design. We present Speech to Reality, an AI-driven robotic assembly system that allows anyone to speak objects into existence in a fast, accessible, and sustainable manner. 
 


Imagine a scenario where you say, “I want a chair,” and within 5 minutes, a physical chair materializes before you in real life. The system enables on-demand production of physical objects, allowing users to go from a spoken prompt to a completed physical object in less than five minutes. We are able to demonstrate the assembly of various user prompts, from functional items like chairs, tables, and shelves to more eccentric requests, such as a tall dog or the letter 'T. ' 



The system leverages generative AI and robotics for modular assembly, allowing users to use voice commands to assemble, disassemble, and reconfigure objects on demand. For example, a user can prompt the robot to build a table and then later reassemble it into a shelf as their needs change, enabling an accessible, on-demand, and sustainable approach to AI-driven physical production. 




Speech into reality redefines how we can create and interact with the physical world. The ability to make physical objects through speech input could enable people to create objects on demand by simply articulating their needs. The convenience, accessibility, and sustainability of this approach could democratize access to custom-made objects, allowing users to bring their ideas to life using natural language.

Generative AI can generate a wide variety of digital models in seconds, but turning AI-generated geometries into physical objects takes time and material resources. Previously, 3D printing has been used to fabricate generative AI outputs. However, 3D printing can take hours or even days to produce a single large-scale object. Speech to Reality is an innovative system integrating 3D generative AI, robotic assembly, and modular design to enable on-demand and sustainable production from natural language input. The solution challenges conventional manufacturing practices that rely on complex supply chains or resource-intensive production methods. Instead of waiting for mass-produced goods to be shipped, users can create exactly what they need, when they need it, and disassemble it when they no longer need it —all from a voice command.



Additionally, AI-generated geometries do not inherently account for fabrication constraints and cannot be directly assembled by a robot. To address this, we discretize the AI-generated geometry into modular components that a robot can assemble. The modules we used for robotic assembly are reusable, lightweight, and have a structural cuboctahedron geometry, allowing assembly from any direction. Each face is embedded with magnets, ensuring secure attachment between adjacent modules while allowing for reversible connections. To ensure the assembly feasibility of these modules, we further modify the geometry based on algorithmic checks and fabrication constraints, including overhang detection, connectivity search, and robotic arm reachability analysis. 



Speech to Reality is a fully automated system that integrates natural language processing, 3D generative AI, modular design, and robotic assembly into a novel design solution. By transforming spoken inputs into tangible physical objects, the system dramatically lowers the barrier to entry, allowing physical making to be accessible to anyone, regardless of technical expertise. Speech to Reality represents a future where on-demand, sustainable, AI-driven manufacturing is within everyone’s reach.




If you find this work relevant, feel free to cite it as follows:

bibtex
@misc{kyaw2024speechrealityondemandproduction,
      title={Speech to Reality: On-Demand Production using Natural Language, 3D Generative AI, and Discrete Robotic Assembly}, 
      author={Alexander Htet Kyaw and Se Hwan Jeon and Miana Smith and Neil Gershenfeld},
      year={2024},
      eprint={2409.18390},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2409.18390},
}

A. H. Kyaw, S. H. Jeon, M. Smith, and N. Gershenfeld, "Speech to Reality: On-Demand Production using Natural Language, 3D Generative AI, and Discrete Robotic Assembly," arXiv preprint arXiv:2409.18390, 2024. Available: https://arxiv.org/abs/2409.18390.