Kasadaka (“talking box”) is an ICT for Development (ICT4D) platform to develop voice-based technologies for those who are not connected to the Internet, cannot not read and write, and speak underresourced languages.
[This post is based on Andre Baart’s B.Sc. thesis. The text is mostly written by him]
In developing (rural) communities, the adoption of mobile phones is widespread. This allows information to be offered to these communities through voice-based services. This research explores the possibilities of creating a flexible framework (Kasadaka) for hosting voice services in rural communities. The context of the developing world poses special requirements, which have been taken into account in this research. The framework creates a voice service that incorporates dynamic data from a data store. The framework allows for a low-effort adaptation to new and changing use cases. The service is hosted on cheap, low-powered hardware and is connected to the local GSM network through a dongle. We validated the working and flexibility of the framework by adapting it to a new use case. Setting up this new voice server was possible in less than one hour, proving that it is suitable for rapid prototyping. This framework enables further research into the effects and possibilities of hosting voice based information services in the developing world. The image below shows the different components and the dataflow between these components when a call is made. Read more in Andre Baart‘s thesis (pdf).
All information on how to get started with Kasadaka can be found on the project’s GitHub page: https://github.com/abaart/KasaDaka
Text in italics only takes place when setting up the call.
Asterisk receives the call from the GSM dongle, answers the call, and connects it to VXI. Asterisk receives the user’s input and forwards it to VXI.
VXI requests the configured VoiceXML document from Apache. VXI requests the configured VoiceXML document from Apache. Together with the request, it sends the user input.
Apache runs the Python program (based on Flask), in which data from the triple store has to be read or written. Python sends the SPARQL query to ClioPatria.
ClioPatria runs the query on the data present, and sends the result of the query back to the Python program.
Python renders the VoiceXML template. The dynamic data is now inserted in the VoiceXML document, and it is sent back to VXI.
VXI starts interpreting the VoiceXML document. In the document there are references to audio files. It sends requests to Apache for the referenced files.
Apache sends a request for the file to the file system.
The file is read from the file system.
Apache responds with the requested audio files.
VXI puts all the audio files in the correct order and plays them back sequentially, sending the audio to the GSM dongle.