Our technology roadmap is focussed on developing a set of fungible tools that can be used to crowdsource the collection, curation, publication and visualization of content, primarily in the form of incident reports
Our tasks in crowdsourcing are classifiable into broadly 5 phases
Collection of crowdsourced data can be done via any public medium, such as SMS, Email or Phone
For almost all our projects we currently export all incoming reports to email, to provide a single, familiar content management interface
This phase involves verifying the content, editing it for quality and summarizing it in text
The curation phase is best performed by a large group of people and is therefore crowdsourcable
We again use email for this.
All curators are members of a mailing list. The mailing list acts as the data repository, and each members inbox presents a view of that repository. As long as replies are sent to the mailing list address, the views across curator mailboxes should say consistent
Members of the curator mailing list send back their updates to each message on the same conversation, so each conversation represents the body of work done on that piece of content
This phase involves taking the curated messages and publishing them on different public interfaces, such as blogs and websites
Long before socially inclined technology projects start creating tangible impact for their shareholders, begins the torrent of data generated by technology interventions. This data is typically valuable, since it contains new information about communities who have often never been connected to any digital media. Moreover, the data usually also contains performance related information related to the intervention itself. However, the data is collected in so many different forms and formats that it is usually very difficult to analyze data from all sources comprehensively.
Ever since Mojolab began, we’ve experienced a consistent challenge in being able to manage data that has been coming at us like an avalanche from our many sources in the field.In our case, data is collected both passively by automated systems as well as actively by human particpants. The automatically collected data is stored in several databases, each on a different remotely placed notebook server, connected only by a VPN.
The manual data is collected as a set of excel sheets, often made by participants from the field. These sheets often vary in format, even for similar data. For example, two volunteers in two different locations may collect participant data for workshops in completely dissimilar looking Excel sheets, when compared based on column names, but which represent the same type of data, in that both are lists of people.
To solve this problem we came up with the idea of a tool that would be able to combine tables in different formats to make more comprehensive data sets.
In the simplest case, the tool would be able to take two tables (as CSV files), for example (see figure below)
Continue reading “LivingData – Building collaborative datasets”