Li Zhifei, CEO of Mobvoi
Editor's Note:
What will the future world of AI look like?
During 2016 TEDGE Conference, Li Zhifei, CEO of Mobvoi, gave two possible answers: on the one hand, media and the public often describe or imagine the AI world as a world of superb technologies; on the other hand, frontier AI scientists and developers tend to have a more down-to-earth version view of the AI world.
For sure, Mobvoi believes in the latter one. When Li was still an engineer for Google Translate four years ago, he already dreamed of starting his own company. His goal was to define the next generation of human-machine interaction and replace unnatural ways of human-machine interaction, such as keyboard and touchscreen with more natural ways, such as speech.
Li and his team started with speech recognition algorithm and gradually developed a series of software applications, including Mobvoi’s speech search and smart watch operating system Ticwear. However, after realizing that none of these applications can be accessible to real users, Li shifted his focus on user scenarios when software and hardware are intertwined.
As for the development trend of AI technology, Li predicted that: on the one hand, over 99 per cent of AI startups worldwide will focus on 2B due to limited user base; on the other hand, this new wave will entirely reshape consumer electronic products such as TV, smart watch and automobile, etc.
AI applications will be found everywhere. However, this doesn’t mean AI will be able to create consumers’ needs out of thin air. After all, an independent product or business model can’t be developed based on AI technologies. Instead, AI technologies have to be integrated to existing products to create better user experience. In other words, AI technologies can contribute is not offering fuel in snowy weather, but more like adding icing on the cake.
The following is the full transcript of Li Zhifei’ speech during 2016 T-EDGE Conference:
I’m so glad to stand here and give you a speech. First of all, I think there are two worlds of AI: on the one hand, media and the public often describe or imagine the AI world as a world of superb technologies, so for them, the world of AI is more like a world a sci-fic.
In fact, I even think most of you prefer to watch drones flying around or robots jumping up and down, or just wear a VR or AR headset and watch some great cartoons than to listen to my speech. After all, these are things media or the public talk about and want to see most.
However, there is still another world of AI, that is, the world of frontier AI scientists and developers.
For us, AI is not some fancy stuff, but rather means tough work. Although I am an AI engineer, I just can’t make most thing other guests have mentioned above happen. Even if I could, I don’t really know who I should sell my products to. Drones are involved with very futuristic technologies. For a fresh startup, it’s no easy job to develop drones, let alone to sell them.
In this speech, I’d like to share with you my experiences, my preliminary thoughts into the entire industry as well as briefly introduce to you our products.
Defining the next generation of human-machine interaction with speech recognition
When I was still an engineer for Google Translate four years ago, I already decided to start my own company. At that time, when I told investors that my ambition was to define the next generation of human-machine, not many of them understood what I was talking about. In fact, I don't understand it thoroughly as well, but I just held such simple belief.
Back then, that is, around 2010, mobile internet was still a new thing even for the Silicon Valley, and smartphones were still not widespread. In fact, most people bought smartphones just to play Angry Birds. So I was already wondering if the next generation of human-machine interaction would still be based on keyboards?
At that time, smartphone screens were small, so it’s hard to imagine that traditional methods can be applied. But I was a machine translation engineer, so my thinking was more focused around speech-based human-machine interaction.
It was so exciting even to think about the opportunity to define the next generation of human-machine interaction by adapting my technologies to the entire technological trend. At that time, we already thought the next generation of human-machine interaction should be based on natural languages such as speech, instead of unnatural ones, such as keyboard and touchscreen, which are most common nowadays.
When I came back to China, my way of thinking was very simple. I want to do speech-based human-machine interaction, so I thought I first need to develop all the relevant technologies. It took us one year and a half to develop our own speech recognition technology. I want to point out that many people would take it for granted and ask me whose speech recognition technologies we adopted, but it’s not that difficult to develop one’s own speech recognition technology. We developed our own speech recognition, as well as search recommendation systems, so that ordinary users could make best use of them.
Combining hardware and software is the most reliable product path of AI technology
So we’ve already developed our own technologies and attempted to integrate them into our products, such as Chumenwenwen Mobile Voice.
Unfortunately, we find out that although our speech recognition technology is very advanced, it’s too inconvenient for users to make best use of our technologies. For example, users might open our APP, push the speech recognition button and say something. However, when they want to book a restaurant, they still have to open another APP. As a result, speech recognition won’t bring much convenience to users.
We soon realized that we won’t be able to survive financially if we continued this way. Although we don't have much expense, we still have to pay salaries for our employees. More fundamentally, we find that users stop to grow, and few users would continue to use our APP. That was when we started thinking what kind of speech-based human-machine interaction will bring real value to users.
We developed operating system for smart watch, but later found out that there were few high-quality hardware. So we decided to make a smart watch by ourselves. To conclude, it took us one year and a half to develop relevant technologies and APP, and another two years and half to try to integrate our advanced AI technologies into hardware. Only then can our technologies be more competitive. This is basically what we’ve done for the past two years.
In fact, I think combining hardware and software might be the most practical way to give full play to AI technologies. How come? To be honest, AI technology is still not mature enough. Different from RAM or hard disk, which could be directly bought or assembled, integration is every important for AI technologies. A good AI application should be the combination of advanced speech recognition technologies, speech comprehension, hardware and operating system. Otherwise, it can’t be referred to as a qualified AI product.
Today, Google has already been firm about developing AI hardware. Two years ago, however, there was a clear division of labor: Google is responsible for operating system, and all the hardware manufacturers are responsible for hardware. Together, they would be able to create a dynamic ecosystem.
However, when Google found out that little progress was made, it started to develop hardware by itself and was very serious about it. I think the new trend is that AI should be integrated to people’s life, be presented to users by combing software and hardware, and as a result, be able to provide better user experience and bring much convenience to people.
AI is not like fuel in snowy weather, but more like icing on a cake
What’s the future trend of AI? As an engineer, I don’t really like predicting future, since we don’t know. Technologies are evolving so rapidly, and there are lots of boundaries and limits. In the past, I would try my best to avoid talking about future.
It’s quite difficult to predict what will happen five or ten years later. But it’s easier to predict what will happen two or three years later and how the entire industry will evolve. So I’d like to give some tentative predictions about the development of the entire AI industry here, though they could be totally wrong even about next year.
First of all, there’s a tend, or rather fact, that AI applications will be found everywhere. AI is not about offering fuel in a snowy winter, and no independent product or business model can be developed solely based on AI technologies. Instead, AI technologies have to be integrated to existing products, if possible, to improve efficiency and make these products more competitive. This is a very natural and slow process. I believe we shall see more and more such applications in the next one to two years.
For large companies, since they’ve already accumulated a gigantic user base, they could just integrate AI technologies into their existing products and improve their user experience. For small companies, however, they have to adopt either 2B or 2C model to catch up. My prediction is that over 99 per cent of AI startups will adopt 2B model in the future.
Secondly, I think more consumer-level products will gradually emerge. As we can all see, smartphones, TVs and smartphones, even loudspeakers, have already been reshaped by integrating AI technologies and today’s smartphones look very different as those made three years ago.
There’s another consumer-level product that’s going to be reshaped with AI technologies, that is, automobile. However, automobiles are very different from TVs or smartphones, because it’s very complicated to integrate AI technologies well in them. Even a company that didn’t make any hardware could develop smart TV or loudspeaker within one or two years, but it’s much more difficult to develop smart automobiles.
From smart watch to the smart vehicles
How can automobile be smart? It’s still hard to have a clear answer. Lots of people have already made some attempts, but there’s not only electrical problems involved but also mechanical ones. So it’s an entirely different challenge.
We adopted a more practical approach to the quest. Our wish is that we can have adequate users in the short them, but accumulate enough value in the long term. Our focus is on rear-view glass. To put it simply, it’s like turning a mirror into a 4G smartphone. Since the smart rear-view glass is preinstalled in automobiles, it can help complete a lot of tasks, such as playing music, navigation, all through speech or gesture.
Different from other smart rear-view glasses, human-machine interaction is much more highlighted in our products, so that car drivers no longer need to use their hands, touch some buttons or stare at the screen in order to control the car. Instead, they can just glance at it swiftly. This is the fundamental design principle of our product.
For example, in the past, you need to first say “Hi, Ticauto” to awake the speech recognition system first and then say “Help me turn on Wifi” to ask the system to turn on Wifi function; with our product, however, you can just say “Hi, Ticauto, help me turn on Wifi”, and the system will understand and help you turn on Wifi.
Moreover, you will be able to directly say “I want to go home” or “I want to go to work” and the system will understand and provide the necessary navigation for you to go home or go to work. In addition, since we integrate noise reduction algorithm in our system, you can direct say “I don’t want to listen music. Just show me the navigation to home” when your car is still playing some music, and the system will understand, turn off the music and show you the navigation guide.
What’s more, our speech recognition can fulfil much more complicated tasks and understand what you are talking about in complicated context. For example, you start by asking the system to show you the navigation guide to International Trade Center, and the system start navigating you. However, you might want to ask the system how far it is half way, what restaurants are around, what parking lots you can park your cars, etc. Our system will also be able to understand these questions and give you necessary guide or recommendation.
I shall not dive into more details due to time limit, but you only need to know that we want to develop the next generation of human-machine interaction in automobiles that is not based on hands or touchscreen, so that you can turn to it for help without endangering your safety.
By the way, we also integrate other ways of AI-based interaction such as gesture. For example, you can tell the car to switch music or take a picture with a simple gesture, or your car lamps will flash when you are too close to the car ahead. All these functions are aimed at providing a better experience.
It is our ultimate goal to completely integrate automobile and AI technologies together. For the user end, users will have their own accounts and ask the system to complete their specific tasks. For the backend, we shall also develop a platform based on the same algorithm. As a result, we shall be able to help create an indeed smart life for ordinary users.
……………………………
(Like our Facebook page and follow us now on Twitter @tmtpostenglish, on Medium @TMTpost, on Instagram @tmtpost_english and on Apple News@TMTpost)
[The article is published and edited with authorization from the author @TiGeek, please note source and hyperlink when reproduce.]
Translated by Levin Feng (Senior Translator at PAGE TO PAGE), working for TMTpost.
根据《网络安全法》实名制要求,请绑定手机号后发表评论