Niki

Cat shown as an older man speaking into a mobile phone in order to make a bill payment.

About

“Even before you do anything Niki starts to talk to you.”1

Several start-ups in India work at the intersection of voice and multilingual support. Niki is one of the few that directly provides this support to the individual. It is described by its developers as an “artificial intelligence powered conversational commerce startup”.2 This means that a person can use the app to make an array of digital payments with either text or voice commands.3 Niki was founded in 2015 as a voice-based assistant app in English; now, it provides information in multiple Indian languages via voice and text. The app allows people to use voice commands to complete a range of tasks, including paying utility bills, recharging prepaid mobile accounts, booking tickets for travel and accommodation, and availing local deals. The team at Niki focuses on the segment they call ‘middle India’, which includes customers from Tier 2, Tier 3, and Tier 4 cities; their aim is to bring the benefits of the online economy, without the barriers of language, to these new internet users. In 2020, Niki witnessed an increase of 1,000% in revenue, with a 22% increase in their user base to 550,000 users.4 The company has received funding from Ratan Tata, Unilazer Ventures, and SAP.iO, among others.5 As per a 2020 report, Niki plans to raise USD 50 million (more than 300 crore) by early 2021. The team intends to use these funds to expand their market share and capture 20% of the 150 million ‘Bharat household’ (Indian household) market by the financial year 2022.6

Methodology and process

This section enumerates the design and user research process that the team at Niki follows to understand the needs of their users and find new ways to help them.

Design process and user research

The process of creating voice datasets in Indian languages involved several steps, beginning with the selection of languages that are the focus of the project and then building speech technologies using the voice datasets. The selection of the 13 languages was based on the following criteria: optimal text selection, speaker selection, pronunciation variation, recording specification, text correction for handling out-of-the-vocabulary words, and data verification.7 To ensure the quality of data, characteristics that affect speech synthesis quality such as encoding (converting one form of data to another), sampling rate (number of samples of audio recorded every second) etc. were considered. The sentences for the speech recordings were taken through web crawlers from newspaper reports, Wikipedia pages, websites, and blogs in the respective Indian language. To achieve good coverage of topics and words, sentences were also taken from different types of literature, including children’s stories, science writing, tourism content, etc. Care was also taken to ensure that the texts were commonly used, free of errors, easy to read, and covered a wide range of words and syllables. Code-mixed sentences were avoided.

Design process and user research

According to the team at Niki, between 2016-2019, they have spent about 30,000 hours speaking to users. Research is a critical component of their design process. They have an in-house customer insights and research team that works towards understanding users better through various methods including on-ground research. For instance, the team of researchers, designers, and product managers travelled to Tier 3 (and below) cities such as Chomu, Pushkar, Ajmer, and Udaipur in Rajasthan to meet and undertake usability studies with residents. Usability studies are conducted to observe and understand the needs of a group of representative users.7They involve observing users as they attempt to complete tasks using the product.

The following are insights based on the key learnings from the user studies and the feedback collected in Tier 3 and Tier 4 cities:

Single-page and consistently guided user interface: The entire app should have a single-page user interface (UI) across services to maintain consistency. The user journey should be guided – the app should point users to the next step. Messages need to be crisp and to the point, and actionable items and messages should not be mixed. The content flows should be designed to be linear; multiple branches should be avoided.

Acknowledgement messages: The research showed that it is important to acknowledge every action of the individual, as it gives them the confidence to use the app. All the ‘call to action’ and actionable elements must be consolidated in a specifically identified area.

Apprehension about change: The research revealed that users were hesitant about trying new features for fear of failing. The team observed that they were diffident and nervous about trying anything they hadn’t previously encountered.

Multilingual and voice-based technology

The founders of Niki aim to solve two structural problems: voice and vernacular. While creating the natural language processing (NLP)8 engine for Indian languages, the team realised that sentences in most Indian languages had some words in English, creating challenges when building an interface that could understand a sentence where more than one language was used. To deal with this, they built an in-house NLP engine, with the idea that the engine could be scaled to different contexts (such as bill payment and phone recharge) and more Indian languages. To make it easier to add more languages that the system can understand, the team now only requires entries in the Natural Language Generation (NLG)9 files. One of the other ways in which Niki was able to increase scale in adding new languages was by delaying heavy technological investments until the proof of concept was taken to users, and by making changes and upscaling based only on the feedback and success from the user research. The NLP system designed in-house provided Niki with the capacity to launch its services in different Indian languages with ease. There are three main components in the pipeline in terms of recording a response and giving an answer – a) transcribing the audio b) extracting meaning from the text, and c) responding to the individual. In the case of Niki, the first layer is provided by Google, the second layer by the NLP engine, and the third layer by dialogue management.

Designing for hyper-localisation of conversations

Niki claims to have a scalable design that can be adapted to multiple regional languages. When expanding to a new language, the team begins by understanding colloquial usage in specific regions and for particular uses. Based on their findings, they gauge users’ intentions, and design the app’s responses accordingly. The key challenge was to design for the hyper-localisation of conversations. In India, the immense diversity of languages and dialects is difficult to capture. The team sees this as a concern to be tackled in the future.

Privacy and data collection

One of the main challenges faced by companies and researchers working on Indian language voice interfaces (VIs) is the lack of data in several Indian languages. Niki aims to tackle this by collecting data and strengthening their NLP engine, including as many languages as possible. One of the ways in which Niki has been striving to ensure accuracy in different languages is through the annotation of data, as they always had voice as an input for data. Based on how accurate the model is in understanding a particular language, it is made available to the user. Their NLP and machine learning operations are trained to understand what the user is saying across multiple languages. They store inputs from each interaction made by the individual using the app to improve their models for various accents and dialects. This system allows them to have data with different languages, accents, volumes, pitches, word speeds, and background noises.10

Niki is one of the few apps in India that provides a privacy policy in English and Hindi. The team also ensures that the financial data they collect is encrypted from the start to when the data is transferred. They also do not share personally identifiable information (PII) with third parties without the consent of the user. They practice purpose limitation internally and allow access to only the data that is required for a team to work on specific tasks.

Building for Bharat

In the initial years, Niki focused on creating voice-and speech-based interfaces that catered to English-speaking Indian people. During this time, they also provided voice-related services to various banks to use as their personalised voice bots. On listening to the interactions between people and these bots, the team realised that people were trying to speak to the English-only bot in their first languages. Only when they developed and launched the same chatbot in Hindi did they realise the immense potential that an Indian language–speaking chatbot had, especially in Tier 2 and Tier 3 cities in India.

During their extensive usability studies, they examined the reasons people buy smartphones and their device preferences. People preferred certain phones because they had big screens, their peers had them, or they wanted to watch videos on them. This is reinforced by other research that stated that people speaking Indian languages rely heavily on voice searches on non-transactional platforms like YouTube and Google.11 However, there seemed to be hesitation to use a voice-only input platform when money was involved, this could be due to the fear that money could be transferred unintentionally. In these cases, people preferred to ask for assistance from family members or friends to complete the transaction. During the user research, the team found that the main reason for this was the fear of losing money online. Hence, the app and the voice interface were designed to ensure user trust and create an experience similar to interacting with a family member.

Accessibility and assistance

Niki equates the issue of accessibility with removal of unfamiliarity, since they believe that unfamiliarity is a barrier to using new technologies or apps. The interface is made accessible with the provision of support in multiple Indian languages in a way that makes the individual comfortable; it offers a type of hand-holding while people engage with new functionalities. The interface also includes crisp replies and acknowledgement messages. Niki believes that this could lead to an improved form of accessibility – VIs in multiple Indian languages will open the “internet economy to new internet users''.12 This includes enabling an individual to read or interact in their own language and be comfortable using the internet for more services, including some that involve monetary transactions. Niki aims to act as a guide through the individual’s journey through the app by making them comfortable, initiating conversation in their languages, and confirming each utterance to ensure an accurate record of queries.

Challenges

“The way Hindi is spoken in Bihar is different from the way Hindi is spoken in Rajasthan.”13

A significant challenge for Niki and most voice interface developers in India is the huge diversity of languages that are spoken in the country. This heterogeneity of languages even among dialects creates the pressure to develop interfaces that understand languages as well as dialects. Most Indian languages are still low-resource ones without enough data, which acts as a barrier to creating VIs in these languages. Consequently the lack of less widely spoken languages available in VIs prevents them from being adopted by a number of communities.

Future of Niki

“Niki’s vision is that nobody’s left behind.”14

Niki aims to use experience from creating local language-and-voice-based interfaces, especially for ‘middle India’ households in smaller cities, to expand their reach in the future. They aim to make the entire internet economy accessible to a large number of Indians who are coming online for the first time. Their current focus is to provide access to essential services such as rations, electricity, and phone recharging. Their focus has always been to improve the socio-economic lives of ‘middle India’ households and to further this they hope to reach more people through a voice interface that they can use in their own language.


Disclaimer: This is an independent case study conducted as a part of the Making Voices Heard Project, supported by the Mozilla Corporation. The researchers have not received any external remuneration as a part of this case study, and claim no conflict of interest.

Research and Writing by
Shweta Mohandas and Saumyaa Naidu
Review and Editing by
Puthiya Purayil Sneha and Torsha Sarkar
Research Inputs by
Sumandro Chattapadhyay

Download Niki Case study




CONTENTS

About

Methodology and Process

Multilingual and voice-based technology

Privacy and data collection

Accessibility and assistance

Challenges

Future of Niki


Notes

1 Interview, Niki, online, Bangalore, 24 July 2020  
2 Sangani, P., “Hindi Users Help Niki.ai Grow 300x”, The Economic Times, 5 June 2019, Accessed 05 September 2021, https://economictimes.indiatimes.com/small-biz/startups/newsbuzz/hindi-users-help-niki-ai-grow-300x/articleshow/69940200.cms/   
3 Sangani, “Hindi Users”, The Economic Times.   
4 Kashyaap, S., “[Product Roadmap] How Ratan Tata-backed Niki.ai is Helping Bharat Users Perform Transactions”, YourStory, 8 April 2021, https://yourstory.com/2020/08/product-roadmap-ratan-tata-nikiai-bharat-conversational-ai-startup/amp   
5 Sangani, “Hindi Users”, The Economic Times.  
6 HT Brand Studio, “Niki Unlocks Bharat’s Internet Economy Averaging 52 Transactions per Household”, Mint, 12 January 2021, https://www.livemint.com/brand-post/niki-unlocks-bharat-s-internet-economy-averaging-52-transactions-per-household-11610451780112.html.   
7 “Usability Testing”, Interaction Design, accessed 3 November 2021, https://www.interaction-design.org/literature/topics/usability-testing   
8 NLP is the automatic manipulation of natural language, like speech and text, by software.  
9 The software process that produces a natural language output. The NGL process converts machine code into human language output.   
10 hippoBrain “E41: Full-stack Ramu Kaka - A Product to Service the Bharat Market by Sachin Jaiswal & Shishir Modi”, YouTube 2 April 2021, https://www.youtube.com/watch?v=bn4h9jsmfcg&ab_channel=hippoBrainhippoBrain.   
11 Interview, Niki, online, Bangalore, 24 July 2020   
12 Interview, Niki, online, Bangalore, 24 July 2020   
13 Interview, Niki, online, Bangalore, 24 July 2020   
14 Interview, Niki, online, Bangalore, 24 July 2020