PEOPLE using voice to communicate over some distance.
I’m not limiting this to PSTN-style telephony. I also mean two-way radios, VoIP, or any other kind of thing where you talk into one end and your voice comes out the other end and hopefully goes into someone’s ear.
Our company got started by trying to fix some issues with military or tactical voice communication.
For tactical users, effective voice communication can be a matter of life and death, and when it is, there’s no time to stop and type out a text message.
While we working on how we could make tactical voice users more effective, we realized that many of the same problems that tactical users have with their communication tools are the same problems that mobile phone users have, except that mobile phone users aren’t get shot at as often, so they tend not to notice.
Making voice useful is relevant for both tactical and consumer applications.
It requires that we confront two key assumptions about voice communication systems and how they work.
The first assumption is that distance somehow matters.
Voice communication networks are an amazing way to collapse the distance between people, allowing them to interact as if they were in the same place at the same time.
But as the price of access to these wonderful networks continues to fall, the distance that voice travels matters less and less.
We find that the HUMAN TIME we spend PRODUCING, CONSUMING, and PROCESSING information in and out of these networks--OUR TIME is now the most scarce resource in communications.
Unfortunately, our tools aren’t always set up that way.
Modern telephony is not particularly concerned with saving people time. In many cases it is quite the opposite.
Before you can say something to someone using telephones, you have to do quite a bit of waiting.
2nd Assumption: An even bigger and more fundamental assumption that is baked in to voice communication is Live Only
Whether PTT, live phone call or even voicemail, the systems are built with the assumption that voice is live, or it doesn’t work.
If a live phone call fails to connect with another person, it is diverted on a live connection to a voicemail system.
Trying to set up these live voice circuits or sessions causes waiting, interruption, inconvenience, and the bizarre phenomenon of failed rendezvous.
I call you, get voicemail, you call me right back, but I’m dialing your voicemail, so you get my voicemail, and now both of us are on the phone, live, with the other person’s voicemail system.
Live Only costs us more than time spent waiting, there are also social costs. You know when you call someone that you are probably interrupting them and putting them in an awkward situation
Text communication is broken up into a series of messages, each of which can be dealt with at the users’ convenience.
Text is less intrusive and doesn’t require that the attention of both parties be synchronized in time. For a phone call to work, both parties need to devote their full attention to the call, at the same moment in time. Text is not so demanding. Because attention synchronization isn’t required, text even allows users to participate in multiple ongoing conversations at the same time.
For some situations and users then, text is a reasonable alternative to voice.
There ARE still situations where voice is a better mode, such as driving, walking, if you have a lot to say, or having a more personal conversation with someone.Rather, voice COULD be a better mode, but current voice systems don’t respect the time constraints and user expectations of the modern world.
So at RebelVox we wondered, how can we bring the flexibility of text-based communication to the efficient and expressive mode of the human voice?
What we came to is that we all need to move beyond Live Only.
Voice can be a lot more useful to people if it was as flexible in time as text is.
To give users the most efficient voice communication experience, we have to unlock the constraints of time, thankfully it turns out that computers are already pretty good at this.
In the same way that DVRs like TiVO have changed the user experience of television through time shifting, we can provide a new and more useful user experience for voice communication by doing time shifting in both directions.
Right now, there is some crude time shifting available for voice: voicemail. It only works in one direction, and it works by first intrusively ringing a phone.
What would it mean to FULLY EMBRACE time shifting for voice communications?
Here’s a 60 second video to show you how it could work.
Pause on last frame
If coordinating dinner plans between two people using voice would have taken 7 minutes, imagine how long it would have taken for 5 people.
What you just saw was one system, the same system, used for messaging, live voice, and a seamless transition between the two.
Because it is one system, there is no rendezvous problem, because there’s only one place to go.
Your non-live voice isn’t trapped in a separate messaging application, and your live voice sticks around in case you want to listen to it later.
By applying time shifting techniques to both sending and receiving, TWO WAY TIME SHIFTING, users are more efficient.
In this system, everything you say is stored in a time shifting buffer, then that buffer is synchronized with your recipients, who apply their own time shifting to what you’ve said.
With two way time shifting, you can use voice without waiting, without interrupting, and seamlessly transition into live if everybody is paying attention.
Let’s take a look at the user interface so you can see how this works.
This device looks suspiciously like an iPhone. We are not an iPhone app company, although our software does run on the iPhone. I think the iPhone UI works well and looks great, so I’m going to show you what two way time shifting could look like on that kind of device.
Sam wants to say something to Jill.
Doesn’t CALL her, doesn’t LEAVE MESSAGE.
Sam doesn’t have to wait for anything: neither Jill, nor marginal network, nor no network at all.
Now Jill and Sam have merged into the experience of a live, full-duplex phone call.
They hold their phones up to their faces and they talk.
The UI changes, but the underlying system is still doing the same thing.
This time shifting stuff is great for saving time and communicating without full attention synchronization.
But we can also help save people time that they would have spent waiting for the network, either to establish a connection, or when either party goes out of range of the network.
Modern mobile phones are increasingly powerful computers, but none of this power is being applied to make voice work better.
By utilizing storage and processing capabilities at the edges of the network, we can fix the rendezvous problem, and we can even fix dropped calls.
Of course we can’t really make the network deliver packets when there are none, but we can give the users at the edges of that network an optimal user experience when that happens.
For example, users can have some new options such as Keep Talking, knowing that their voice will make it to their recipient just as soon as the network will allow.
A key insight into the development of early packet networks, specifically internetworks, was that the edges are smart, and the middle is dumb.
If something gets lost in the dumb middle, the smart edges are capable of doing error correction and retransmission themselves.
Modern phones are capable of doing this for voice over packet networks, and it would be very useful for everybody if they would.
This is our new user experience for voice. It is a single system that supports live, non-live, and lets users move seamlessly between those two extremes.
Since text is already unlocked in time, it can be blended into the same interface.
By applying time shifting techniques to voice at the edges of the network, we can save people time and let them communicate more efficiently.