Post-Bellagio Catharsis #2 – Voices Across the Digital Divide a.k.a. “There’s plenty of room at the bottom” (v 2.0)

Voice enabled technology is one of the most promising bridging tools available to society. Given that large portions of the worlds population remain isolated from the Internet community and as a result are often marginalized and disenfranchised can now actually use their voice as a means to communicate with the world.

Several teams are now working to build solutions in this area, with Interactive Voice Response (IVR) systems that are accessible by phone and internet and provide a means to build bridges across the digital divide.

This presentation was delivered at the conference “Turn Up the Volume: Bringing Voice to Mobile Citizen Journalism” organized by the International Center for Journalists at the Rockefeller Foundation’s facility in Bellagio, Italy between Oct 8-12 this year to bring some of the key thinkers of this space together.

In it, we provide a broad overview of how most IVR systems work, with Swara IVR as a case study and also talk about how mobile phones, the internet and other communication media can be linked together to form independent, community owned communication networks.


The OSI Reference Model is familiar to anyone who ever had to attend a Networking class as part of a technical course.

The OSI model is a purely theoretical model, i.e. no real-world system exactly corresponds to this series of layers.  In that regard it is similar to the “perfectly frictionless surface” described in physics courses.

However, it does provide an excellent basis to analyse different networks and determine how they are structured, and therein lies its utility.

Here I’ve roughly split the model into two parts, Host Layers and Media Layers.

For the purposes of this discussion, Host Layers are comprised of components that go on a device within the control of the user/community

Media Layers are comprised of those components which constitute the “backbone”, “grid” or “infrastructure” and are typically outside the control of the user/community.





Longer Computer Science-y description of what each layer really does. I sourced this from Wikipedia and you can find more here:










When we look at the OSI model in the context of voice enabled systems, we can start to group the components of the system into rough approximations of the layers in the OSI model.

For example, as mentioned above, the Media layers are typically merged  and manged by  one monolithic entity that controls the physical transmission media as well as the data representations and addressing system. This is typically enforced by some form of regulation that prevents private citizens from operating networks at the media layer.

Therefore, almost all voice (and other data transmission) systems depend at some level on a provider.

For infrastructural needs, such as power, the provider may be the utility company. For data transmission itself, it would likely be a cellphone operator or internet service provider (ISP).

The cellphone operator and/or ISP determine where to place infrastructure to best serve customer needs as well as how much to charge for data transmission.

In order to interface with the provider layer, a physical interface is required. This is typically a mobile phone, a GSM Gateway, a USB Dongle, a LAN cable etc.

Next, a logical interface is needed to manage the incoming data. This is usually a piece of software that reads the signals from the physical device and figures out what to do with them. It could be a software PBX system like Asterisk or an SMS gateway like FrontlineSMS or even an Email system. Typically the Interactive part of the IVR system is done here.

The next is the layer that does what I think is the most important job. It packages the data in such a way that it can be used by human beings. This could be a blogging software like Loudblog or WordPress that associates the raw data such as audio files or summary text with meta-information that is useful in indexing and accessing it later. It could also be as simple as a script that runs periodically and send all the accumulated content to an email inbox, as an attachment to an email containing text based meta-information about the binary data attached.

Finally there is the application layer that takes the data presented by the translation layer and serves it up in different formats for different…well…applications :), such as reporting, monitoring and evaluation, content sharing etc.



The Swara IVR system can be used as an example to illustrate this model. In our  case we rely on telecom providers such as the ones listed to provide the data transfer services for voice and SMS. These are interfaced to the Swara servers via USB dongles and GSM Gateways, such as the Matrix SETU ATA 211G

The physical interface talks to a logical interface such as an Asterisk PBX server or a Frontline SMS instance. These logical interfaces are checked by automated scripts (mostly written in Python) which export the content in MP3 and text format to email.

The figure on the left shows email in the application layer. However in the case of Swara, email works both at the translation as well as the application layer. The email format has been successfully used for several years by thousands of organizations to share information and works extremely well in both one-to-one and one-to-many modes.







How Swara works is that a user calls a cellphone number (or goes to a URL), using their phone (or browser). Calls are received by an Interactive Voice Response system, while web requests go to a web server, running a simple blog interface.

Users on the IVR can either record fresh content or listen to content posted by others. The format is very similar to a bulletin board.

Consistent with the bulletin board concept, all messages are moderated, typically by professional journalists who volunteer for this task. In order to simplify this task and eliminate the need to learn a new CMS, we introduced the idea of email as a translation layer as well as an application layer tool. An automated script running on Swara IVR servers exports all incoming audio content as email attachments (mp3 files). The system is also set up to accept content over email,  with the email body and subject line providing the text metadata that the system needs in order to know what to do with the content.

Once the content is on the email system, it can be pushed to blog platforms that support post-by-email, as well as social media (read Facebook, Twitter and Google +). We also send the emails directly to individuals and mailing lists.

Of course this entire system still depends on the cellphone and internet service providers, which somewhat limits its scope as a truly democratic tool






The limits imposed at the provider level  affect both the affordability as well as the scope of content on community platforms.

Since providers are typically commercial entities with a revenue requirement, communities that do not represent a market of significant volume are typically not attractive as customers to them. As a result service quality for such communities and the areas that they live in are of a lower tier than services provided by the same providers in urban areas with a higher market volume.

Therefore, in order to provide some alternatives to support the general independence and sustainability of community platform, it is important to build on tools that can constitute part of the provider layer.

The possibilities are many and it is the intention of Mojolab to work on as many of these possibilities as we can.

We have already begun work on two areas, Citizen Band Radio and Wireless mesh networks. The long term idea is to create tools that can be used by communities to run their own local networks and link them as they choose to the rest of the world.

As such we typically consider the pricing of the solution from the standpoint of a hypothetical community of 300 people all earning INR 32 (less than one dollar) a day.




All of this is directed towards the long term goal of building a  distributed, organic, self healing, replicable community content delivery network, that is owned by the community, is sustainable, blackout resistant and accessible by everyone. We believe that such a network, such a platform, can actually help make democracy more participatory and inclusive.  While that sounds Utopian, we believe it can be done if  all the stakeholders play to their strengths and stay willing to collaborate.  We are always on the lookout for people and organizations willing to find ways to combine strengths and resources to work towards this vision.

1 comment for “Post-Bellagio Catharsis #2 – Voices Across the Digital Divide a.k.a. “There’s plenty of room at the bottom” (v 2.0)

  1. August 10, 2013 at 7:25 pm

    I want setup a Swara so can u tell me
    1- sofware is free or cast?
    2- how to download?
    3- how to setup?

Leave a Reply

%d bloggers like this: