
Olivier Elemento: A New Paradigm for Data Interaction – Combining Voice and Augmented Reality
Olivier Elemento, Director of Englander Institute for Precision Medicine at Weill Cornell Medicine, shared a post on LinkedIn:
”A New Paradigm for Data Interaction: Combining Voice and Augmented Reality (AR).
For decades, our view into vast, complex datasets has been confined to the edges of a physical monitor. We pan, zoom, and scroll, but we’re always peering through a small window. Augmented reality promises to break down these walls, offering a virtually infinite canvas for our data.
But a bigger canvas isn’t enough if the way we interact with it is still clunky. This led me to explore a core idea: what if the most intuitive interface for these expansive AR environments isn’t our hands, but our voice?
To put this concept to the test, I built an experimental prototype on the Apple Vision Pro. The exploration revealed several clear insights:
- The Huge Screen is a Game-Changer: The experience of viewing a pathology slide on a virtual screen the size of a wall is fundamentally different from panning on a monitor. I think the true power of AR for professional work isn’t just about 3D models, but about its ability to unbind our data from the confines of physical screens.
- Voice Unlocks True Fluidity: Using a two-step AI model (an OpenAI real-time API for transcription and GPT-4 for function/tool calling), natural language becomes the control scheme. I think voice is the key to making these powerful AR data environments practical, removing the clunky manual interactions that create friction and break concentration.
- Deep Focus Follows Intuitive Design: When the interface effectively disappears and the data is all-encompassing, the result is a state of deep focus. The user can remain fully immersed in the analytical task at hand, whether it’s identifying cells or spotting anomalies.
- Under the Hood: For those interested in the implementation, making the experience feel simple required a complex full-stack effort. A custom FastAPI server streams the gigapixel slides as DeepZoom-style tiles for efficient navigation. The system also handles intricate coordinate transformations for accurate drawing across multiple zoom levels and runs optimized spatial queries on the backend for real-time analysis.
While this prototype used digital pathology as its example (and latency is not perfect at times due to using my iPhone as a hotspot), the core principles apply to any field dealing with large-scale visual data – from geographical analysis and architectural design to financial modeling. It’s an exciting new paradigm for how we see and speak to our data.”
Proceed to the video attached to the post.
More posts featuring Olivier Elemento.
-
Challenging the Status Quo in Colorectal Cancer 2024
December 6-8, 2024
-
ESMO 2024 Congress
September 13-17, 2024
-
ASCO Annual Meeting
May 30 - June 4, 2024
-
Yvonne Award 2024
May 31, 2024
-
OncoThon 2024, Online
Feb. 15, 2024
-
Global Summit on War & Cancer 2023, Online
Dec. 14-16, 2023