CGNet Swara Stats Mar 22 2011 – Using Google Fusion Tables

I figured I would do a little playing around with Google Fusion Tables and here is the result. Much as I am perturbed by the new privacy policy, Google does have some classy tools.
The data is all real so I suppose this could also serve for a status report 🙂

Locations Talked About On Swara

This is an experiment using Google fusion tables populated with data acquired by passing the transcribed message text on Swara through the Yahoo! Placemaker Service.

Obviously there is margin for error since phrases in some languages might refer to places in English. However, with some error checking thrown in (like a whitelist of countries) the error can be seriously minimized. I was able to produce the map below, which is a fast loading (and tad simpler to build) version of the Swara/Ushahidi Mashup I did a few weeks ago.

The catch of course is that you have to be online to visualize data like this and I didn’t like having to give my data to Google, but I suppose everything has a price….

 

Daily Calls Received By Swara Since Feb 2010:

This is a timeline of calls received by Swara every day since Feb 2010:

 

Call Length Distribution (2012 only) (x Axis=Number of calls, y axis=seconds)

This is my amateurish attempt at producing a histogram of call lengths (in increments of 60 seconds). The Fusion tables don’t come with a histogram feature. It would be nice to be able to aggregate the data into bins and produce a histogram automagically 🙂

 

That’s all for now, will push out more as I come up with them

2 thoughts on “CGNet Swara Stats Mar 22 2011 – Using Google Fusion Tables”

  1. This is a great example of visualisation – and from what you’ve written, sounds easy. I am assume Google has access to the data you’ve submitted – but I also assume you retain the ownership of the data, right?

    Also, it’d be great if a voice recognition system could automatically tag in the locations by scanning the content of each message. But I guess this is some years off.

  2. An email conversation with Ashim Jain who responded to the statistics

    Arjun:
    Ashim, thanks for the feedback, responses inline. Would you mind if I share this conversation on the website comment stream? These are great questions to ask and in the absence of bandwidth, this is the best way to create documentation 😀

    Ashim:
    Hi Arjun,

    Took some time to read up on some parts of the Mojolab site and Yahoo Placemaker. Hmm… the latter sounds quite interesting and would be curious to know more about it at some point. So all these points shown on the Indian map are coming from YPS (i.e. YPS detected names of Indian towns and villages in the transcribed text??) ? That’s quite impressive considering that YPS would need to know so many names and that these names were accurately spelled in the transcription!! It leaves Ushahidi in the dust doesn’t it, since that map had only 2-3 dots on the Indian map on mojolabs site?>

    Arjun:
    The Ushahidi map was plotting only a couple of reports from a field visit, so not really an apples to apples comparison. But yes, this approach is much simpler than Ushahidi, except that Ushahidi can also run on your laptop, while for this you need Google and cloud power, i.e. a net connection.

    Some questions related to the data:
    1. What are the reasons for so many calls to be zero seconds? Are they wrong numbers or is this people giving missed calls to the system?

    Arjun:

    The call’s aren’t 0 seconds, just less than a minute. The histogram feature doesnt exist on Google, so will manually (or scriptomatically) sort the data into ranges and that should fix the 0 second issue. Plus I think a lot of people do give just missed calls to the system.

    Ashim:

    (Btw, the graph would be a tad bit easier to read if you put the Y axis zero near the X axis 0, and exchange the X and Y axes. Reason is that intuitively, the duration of a call is the independent variable so it should be on X. Also, it may be good if we don’t round the numbers to the nearest minute since that is too coarse. Maybe round it to the nearest 10 or 15 sec.)

    Arjun:

    Absolutely! I just couldnt figure out how to do that :D. I shared the sheets with you, if you find a way before I do, please do let me know how. Also, once I do the script sort, I will also set it up so that it presents the data correctly.

    Ashim:

    2. What is the approximate minimum time for a call for us to assume that some meaningful interaction happened on such a call? As an example, a 60-sec. call might just get consumed in listening to the menu choices and pressing the first 1-2 keys.

    Arjun:

    This is a great question. To be honest, I have no idea what unit of time represents a meaningful interaction. I would say at least a minute would be a bare minimum, but this is just an assumption with no mathematical backing whatsoever.

    Ashim:

    3. Some explanation of the reasons for the extreme peaks and valleys in the first chart should be there, if known to us. E.g., the valley around beginning of 2011 might be due to some system problem.

    Arjun:

    Well, I think anytime we have stopped engaging directly with the community for any length of time because of lack of bandwidth, we have seen a drop. Also, we changed the number initially, which also resulted in people getting lost, since we did not keep track of numbers back then. Also follow up plays a big role. Whenever we have an impact, the listener base expands, since people share success.

    Ashim:
    4. Another few graphs:
    a) X – Time of day, Y – number of calls
    b) X – Day of week, Y – no. of calls
    c) Similarly see if there is any variation of call volume during a month or season or year
    Similarly see the above data for length of calls. Note that remove the 0 sec. calls. Plot calls less than a meaningful minimum time in a different colour or on a different line.

    d) More than the number of calls, the total meaningful “listening” and “recording” times might give us better insight. So, replace the number of calls in the above with the sum of meaningful listening time (and do the same for recording time but on a different line).

    e) Some other graph or data to show: i) What keys did the users press, i.e. which choices are most popular, least popular, etc.? ii) Is there any indication that users are finding it difficult to understand the menu instructions or are they having to press ‘Fwd’ too many times, etc.?

    Wonder if we put a choice in the menu that says ‘Give us feedback about anything you don’t like or would like to see changed/added’, then we could hear people’s feedback.

    Arjun:
    All super ideas, will add to the feature list!

Leave a Reply

Your email address will not be published.