Our vision for voice AI
EXECUTIVE SUMMARY
A new address systems for the Internet of Voice
This project proposes the installation of a publicly available address system for voice AI. Similar to how the Domain Name System (DNS) and the Hypertext Transfer Protocol (HTTP) were needed for the original “Text Internet” to succeed, we suggest the development of a shared namespace system for “Voice Internet” Natural Language Processing (NLP) AI technologies (e.g. smart speaker, voice assistants, chatbots).

AI business models (e.g. Amazon Alexa, Google Search, Apple Siri) depend on access to consumer data (e.g. preferences, search queries, voice commands). It is in particular difficult for OEMs (e.g. car companies, home appliances, FMCG electronics) to adapt. As experts in the combination of technology suppliers, European OEMs do not have enough own AI know-how and data access to reach parity with US and Chinese companies.

The installation of neutral AI interoperability standards and systems helps to address this situation. It allows OEMs to share use cases and NLP AI development resources. Progress in NLP R&D is driven both by industrial and academic research. This makes the domain particularly suited to pioneer standards and protocols for AI interoperability. Later on neighboring AI domains (e.g. Computer Vision, Brain Machine Interfaces) can be similarly integrated.

VISION
A new address systems for voice AI interoperability
Natural language processing (NLP) and voice assistant technologies (smart speaker, chatbots, voice assistants) are expected to account for 2/3 of all Internet search requests by the year 2025. Every first customer contact will be a bot. Voice-based e-commerce will become a $55bn p.a. market during the same period (Gartner).

The potential of voice interfaces stems from usability and availability of the medium: Everybody speaks. Language is a tool that users of all ages already know. But integrating voice technologies is a huge challenge for European OEMs such as car manufacturers, FMCG electronics, and telecommunication companies.

OEMs have 2 options: Either they use Alexa and become the microphone of the Amazon business model. Or, they choose a white-label NLP (e.g. Nuance, Cerence, Watson) and build custom voice assistants.

NLP is a young domain and currently more than 1000 voice AI companies, research projects, and industry solutions compete world-wide across different languages and use cases.

Examples for voice AI use cases & applications:
- Smart speak
- Voice assistants
- Biometric user identification
- Customer hotlines
- Chatbots
- Voice-based COVID 19 diagnostics
- Industry-specific (e.g. banking, insurances) voice assistants
- Digital receptionists
- Toys
- Text classification (e.g. legal contracts, medical files)
- Text generation (e.g. marketing, advertising)
- Search engines 

It is a situation of AI bias. Voice AIs are usually only good in the domain they are designed and trained for. NLP R&D breakthroughs happen every week. It’s difficult to predict which voice AI will be best suited for a product to be released in 2-4 years.

Furthermore is the localization of voice interfaces a problem. To sell a German car with voice assistant features, or a home appliance product with an integrated smart speaker, to customers in e.g. China, USA, and France, OEMs must integrate multiple NLP technologies, because each market has a different technology leader.

Voice AI development is taking place at different speeds across different languages. The market for Swedish NLPs is for instance much smaller than for technologies with Mandarin capabilities. The Government of Israel recently announced that it will sponsor the development of voice AI capabilities for Hebrew language at Amazon and Google.

Entering whoelse.ai - the first universal language for all AIs. To make the combination of different NLP technologies easier, we provide voice AIs a simplified language to store and exchange voice-based user requests (intents) in a standardized format.

This way voice interfaces can contain multiple voice AI technologies. User requests can be answered by the voice assistant most suited to respond:

Example intent catalog implementation:

Smart Speaker for Co-Working Spaces
├── NLP 1: IBM Watson (WeWork AI)
│   ├── Air Condition
│   ├── Room Booking
│   ├── Catering
│   └── Register Guest

└── whoelse.ai
   └── NLP 2: Cisco Mindmeld (PWC AI)
   │   ├── Tax Fillings
   │   ├── HR Management
   │   └── Digital Lawyer
   └── NLP 3: Nuance Mix (Lufthansa AI)
   │   ├── Ticket Booking
   │   ├── Hotel Reservation
   │   └── Rental Cars
   └── NLP 4: Deepgram (no white label)
   │   ├── Transcription Service
   │   ├── Task Automation
   │   └── Meeting Translator
   └── NLP 5: Alexa (Amazon)
       ├── ..
       ├── ..
       └── ..

User journey:
Voice AI 1    Welcome at WeWork - how can I help you?
User Input   I want to file my taxes!
Voice AI 1    I can not help you personally. But I find the best AI available!
Searches in whoelse.ai intent catalog
Voice AI 2    Welcome at PCW. Please tell me first your tax code (..)​

We detailed this concept during the organisation of a DIN industry standard initiative for NLP API interoperability. As consortium initiators we worked together with 35+ voice AI departments of OEMs (e.g. PWC, BMW, Mercedes, Bosch, Telekom) and validated the demand of such an AI exchange.

The Domain Name System was once needed for the Text Internet to succeed. This project proposes to now develop the technologies needed for the first address system for a new kind of Voice Internet.

But standardization itself is not a business model. In the current environment usually de-facto monopolistis like “GAFA” control by their market dominance the adoption of technology specifications and SDKs in the industry.

Same is now happening in the field of voice AI interoperability. In September 2019 Amazon announced a new NLP standard initiative. The Alexa consortium agreed that invocation words (Alexa, Einstein, .. ) will control which voice assistant responds to a user request.

This selection logic will, in our opinion, not work. Because it is unlikely that different OEMs will be able to find a shared agreement about the ownership of arbitrary language. Example: “Voice AI, find me a ride-share” - who should decide if this command is directed to e.g. BMW or VW? Will it be the user, the AI, or the interface provider?

The Amazon’s consortium naming logic favours the most known brands and thus is designed to position Alexa in the best way possible. Research shows as well that consumers do not want to remember multiple brands for voice assistants and prefer to use natural language over synthetic input dialogs. Naming voice AIs will be an ongoing topic of concern in the industry.

Long-term this project solves problems stemming from the redundancy of voice AI-based information. Once voice AI technologies are mainstream adopted, and every user is surrounded by multiple voice interfaces, it will be an issue that several voice assistants pick up a speech command and want to respond in parallel to the same user. A shared addressing system between voice AIs to agree what was (likely) said and which AI was addressed by the user will then be inevitably needed.


Ready to see what we’re building?

We care about protecting your data. Here’s our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.