TECHNOLOGY

Information-driven applications are at an inflection point. For the past decade, most applications have relied on the same basic mechanism to find the information we need: a user would submit a request, such as a text query, and the application would return a list of results.

Today, computing devices are with us always, and when we need information, we are less likely to be sitting at our desk typing queries. Fortunately, our computing devices are now equipped with numerous sensors like cameras, microphones and GPS. As a result, instead relying on ‘hard signals’ like text queries entered on a keyboard, applications can sometimes understand what information we may need by interpreting ‘soft signals’ like audio, video and location information.

Interpreting meaning and intent from multiple different streams of sensor data is no small task. Not only does it rely on complex technologies like audio processing, image analysis, and language understanding, but it typically must be performed over an extended time period in order to be able to recognize meaningful patterns.

While it is not always possible to uncover useful information from streams of unstructured data, there are many situations where analyzing these signals can often be helpful. Real-time conversations are one area where analyzing ‘soft signals’ of sensor information can sometimes be fruitful for understanding intent. In fact, analyzing and understanding a conversation over time can sometimes make it possible to anticipate information that may be relevant in the future.

Based on nearly two years of research and development, Expect Labs has developed a technology platform capable of analyzing and understanding continuous conversations in real-time.

We call this platform our ‘Anticipatory Computing Engine’, and it has three unique capabilities designed to facilitate conversational interactions:

  1. Real-Time, Multi-Party Conversation Analysis: Our platform is designed to analyze and understand multiple concurrent streams of conversational dialogue in real-time. It continuously analyses audio signals and attempts to understand their underlying meaning. Based on this understanding, it not only attempts to identify key concepts and topics related to your conversation, but it also uses language structure and analysis to infer what types of information you may find most useful.
  2. Continuous, Predictive Modeling: Our platform observes conversations over time and generates a model to represent the meaning of each conversation. This model changes from second-to-second as the conversation evolves. This model is then extrapolated to predict the topics, concepts and related information that may be relevant in the future. In essence, this platform analyzes and understands the past ten minutes of a conversation in order to predict what may be relevant in the next ten seconds.
  3. Proactive Information Discovery: Our platform does not wait for a user to explicitly ask for information. Instead, it uses its underlying predictive model to identify information that is most likely to be relevant at every point in time. It then proactively finds and retrieves this information - from across the web or from a user’s social graph – and delivers this information to the user, in some cases before they even request it.

As a key component of any application that attempts to understand spoken dialogue, voice recognition is one of the technologies our platform uses to extract meaning from conversations. Most of us are aware that statistical techniques like this can be far from perfect. These shortcomings can be frustrating when we are relying on this technology to accomplish specific tasks.

Our platform takes a unique approach in the way it leverages voice recognition technology. It is our hope that this approach will make applications more robust to typical voice recognition inaccuracies. Specifically, rather than use this technology to respond to active user commands, our platform uses voice recognition to listen passively over time and identify key concepts and the general subject matter of conversations. As a result, our platform can usually tolerate occasional inaccuracies without significantly impairing our understanding of the meaning of a given conversation.

Furthermore, our platform employs simple user feedback mechanisms that enable each user to correct errors and fine-tune results as a conversation unfolds. These simple mechanisms allow our platform benefit from the powerful capabilities that voice recognition technology affords without being hamstrung by its occasional shortcomings.

The current implementation of our Anticipatory Computing Engine represents a first step toward the goal of building a general-purpose conversation assistant. We are just scratching the surface on the potential for technologies like this, and we are inspired by the technology advances we will see in this area over the next few years. As a result of these advances, we think that in just a few years, we may all look back and recall how old-fashioned it was that we had to type queries at our keyboard to find the information we needed.