Claraphone
Ganna is the founder and CEO of Claraphone which is developing real-time accent editing technology for phone calls and teleconferencing with the goal of aiding people globally understand each other better. Ganna grew up in Poland and then worked throughout Europe before moving to the United States. Through personal experience and observing others in these varied settings, she noted that inability to understand non-native speakers was inhibiting ideal work performance. Additionally, she noted inherent bias against non-native English speakers. As an immigrant working in multiple different settings engaging colleagues both nationally and internationally, she wanted a solution to these problems to improve the efficiency of communication and to ensure that non-native English speakers have the same opportunities of professional advancement as their colleagues. It was though these experiences that Claraphone was started. Ganna also serves as San Francisco lead for Women in AI, an organization advocating gender equity in technology.
In a progressively integrated global business marketplace, pronunciation variations and accents are increasingly common but cause productivity losses, inefficiencies and communication lapses due to errors in comprehension. Additionally, bias against non-native speakers is widespread as a component of institutional racism and limits access to professional opportunities and economic advancement. Claraphone is committed to solving these comprehension issues and minimizing bias against non-native speakers such that they have the same opportunities as their colleagues based on merit rather than accent. The project I am proposing is to further our developmental work utilizing Claraphone such that we can realize real-time accent editing technology for phone calls and teleconferencing. This technology would improve understanding and collaboration and reduce bias. Improved comprehension would enrich international teams making their work more collaborative, efficient and productive. From an institutional bias standpoint, this technology would allow non-native English speakers improved opportunities in business and academia.
The goal of Claraphone is to eliminate or minimize pronunciation errors in real-time during phone calls and teleconferencing. Claraphone is trying to address workplace inefficiencies in an increasingly global marketplace due to comprehension issues between native and non-native English speakers. Additionally, non-native English speakers face bias due to their accent which can be reduced with this technology. English is used by about 1.7 billion people worldwide with over 75%—about 1.3 billion people—being not native English speakers. Almost all adults who learn English as a second language speak it with an accent. We can likely all think of a situation where our comprehension has been inhibited due to an accent. Scientific studies report that non-standard pronunciation is hard to understand for both native and non-native speakers and increases demand on neural systems processing speech perception. And, though we would like to assume that this fact does not impact our judgment of the accented person’s work, multiple studies have shown that accent is one aspect of implicit bias. As an example, one study showed that in a job interview, interviewees with an ethnic sounding name were automatically at a disadvantage, but those with an accent were even further disadvantaged.
My team and I are building Claraphone, a real-time accent-editing technology for phone calls and teleconferencing. Claraphone will preserve the speaker's voice and intonation while producing the desired pronunciation. Speakers can choose from varied desired pronunciations in order to better connect with their international partners. For example, speakers from India can choose an American accent when speaking to Americans or an Irish accent when speaking to Irish. Americans on the other hand could use a Hindi accent when speaking to colleagues from India so all parties can understand each other better. We are planning to enable this technology as an App, API, SDR, and add-on mobile service.
Our core technology is greenfield and AI-based. It is user-specific. We use a deep neural network to recognize user's speech and mispronunciation patterns and convert the speech into text. Next, we apply vocoder to synthesize speech with the desired pronunciation. At the end of the process, we use a splicer/mixer to replace mispronounced parts of the speech with newly synthesized clear speech. For videoconferencing we will add an extra deep neural network to read lips. Our team is also exploring voice-to-voice conversion as one of the viable pathways to implement the core technology.
Claraphone is being developed to improve opportunities for non-native English speakers working in offshore service industries, those collaborating with international colleagues, and those working in native English-speaking settings. As a non-native English speaker who has worked in these settings, I have firsthand experience regarding the difficulties due to my non-native accent and how this technology would aid in decreasing these challenges. For those working in offshore service industries geared towards native English-speaking customers, Claraphone will improve customer satisfaction ratings by augmenting comprehension and decreasing bias, ultimately increasing demand for these services. For those collaborating with international colleagues, Claraphone will decrease miscommunication and increase efficiencies by decreasing the need for repetitions and follow-up questions. This will again allow for higher acceptance and increased opportunities. Finally, for those non-native speakers working directly in a native English-speaking country, Claraphone would offer similar opportunities to decrease institutional bias and improve employment and promotion opportunities, especially as virtual communication continues to increase. Initially I conducted over 20 interviews with non-native speakers working in each of these settings to focus our work. As our work progresses, we will again survey representatives working in these settings to reassess whether our technology is meeting their needs.
- Elevating opportunities for all people, especially those who are traditionally left behind
Although much attention has recently focused on institutional bias, little work has been done towards decreasing this bias, much of which is implicit and not easily seen by those making decisions on employment and promotion in industry. The goal of Claraphone is to elevate the opportunity of non-native English speakers, many of whom are immigrants with less opportunity than their native English-speaking colleagues, be it in a native English-speaking country or in the international setting. What is often described as "poor" English is actually accented English--Claraphone aims to eliminate this implicit bias, thus increasing opportunity by leveling the playing field.
I am very interested in process improvement, harnessing technology and working to solve problems. This particular need and subsequent project idea came to me when I was working on an instructional video for a Fortune 100 company in Silicon Valley. When I showed the deliverable to the manager who ordered the work, I received very positive feedback; however, the manager did not feel that my accented speech was the ideal choice for the voice-over on the video. I subsequently searched for software that could automatically reduce my accent while preserving my voice but could not find anything. Therefore, I had to hire a professional actor to record the voice-over to complete the project. This was the “aha” moment that made me reflect on all of my prior work (and life) experiences which were non-ideal due to my accent or the accent of others and how that negatively impacted interactions and opportunity for the non-native English speakers. This was the catalyst to first consider accent reduction, and subsequently the development of a tool enabling people to edit their accents in real-time during phone calls and teleconferencing.
I am passionate about this project for multiple reasons and at multiple levels. First and foremost, as a serial immigrant I have experienced firsthand the negative impact of bias on my personal achievements. I often felt like (and feel like) my credibility is diminished just because of my pronunciation. I would love to use the technology myself once it is completed! Second, social justice is very important to me, as espoused by my work as San Francisco lead of Women in AI. Working in the technology industry and specifically in Silicon Valley, I have seen and experienced multiple inequities. This tool can at least solve one particular inequity. Finally, I have a passion for creation and creativity, both in my work and my personal life. I ultimately found that working in industry, even at the cutting edge, did not sufficiently saturate my creative desires, thus my pursuit of creating this technology.
I have multiple prior experiences which make me well-positioned to deliver this project. Additionally, the work done to date since project inception helps to support the delivery of this project. I am an Accenture alumnus with fifteen years of experience in management and team building. I also consulted for Cisco Systems for over four years and have a good understanding of telecommunications and the SaaS markets. My educational background is a mixture of IT and psychology which enables me to understand well the technology and successfully build and manage the team needed to deliver the product.
I have a team of three amazing people working with me on the project: Todd Troxell, a successful serial startup CTO who built early parts of Rackspace, Viddler, vidIQ, Pocket, and Lupex; Prashantkumar Brode, a Ph.D. Lead Architect specializing in machine learning and speech processing who has designed and developed robust speaker dependent and independent audio-visual speech recognition systems in Marathi, Hindi and English; and Bryce Irvin, our voice-to-voice researcher who is a Georgia Institute of Technology graduate skilled in signal processing and sound editing. Work to date has included market validation, recruitment and team development, prototyping, technological research and development, partial core technology implementation and fundraising efforts.
I have had to overcome adversity much of my life, which has led to me being an autonomous self-starter as well as being interested in social justice. I was raised in poverty by a single mom who was an undocumented immigrant living in Poland. I had to start working professionally at the age of fourteen to help support my family; I worked approximately twenty hours a week between the ages of fourteen and eighteen as an interpreter, housekeeper, babysitter, customer assistant, cook and graphic designer. Despite all of that, I graduated with good grades. Later, I self-founded my entire education in IT and psychology and completed it while working over forty hours a week throughout the programs. I was one of 3% of female graduates graduating from my IT college. I immigrated to the United Kingdom and later to the United States on my own and overcame lots of adversity to find the right life and professional opportunities, secure housing and start my own consulting company in less than 18 months after arriving to the US.
Prior to immigrating to the United States, I led two successful programs for the European Union focused on individuals looking for professional advancement. The first project served over 150 participants and the second one over 120 participants. Each of the projects lasted 10 months, with trainings and workshops taking place once a month for 16 hours during one weekend. I was a curriculum co-author and team leader for the trainer’s team of eight. I designed the entire flow of the workshop covering 18 different topics, created training materials and designing workshop activities for the participants. I was also in charge of designing the evaluation surveys and responsible for administering the surveys and analyzing the feedback. I led train-the-trainer training, coached the trainers team and led participant feedback analysis sessions. The project received very positive feedback from participants and over 30% obtained career advancement within one year of project completion. I stay in touch with some of the participants to this day.
- For-profit, including B-Corp or similar models
Our project seeks to answer a complex question which has not been tackled adequately by larger technology firms. Larger firms are seeking to make voice interpretation better, but this has been a very slow and flawed process due to the multiple accents that must be understood. Therefore, we have innovated by considering how to reverse-engineer the solution, something that is not being done by the larger firms to our knowledge. In addition, our core technology contains proprietary elements. The application of all the core components together including deep learning, vocoder, and splicer/mixer is unique. The lip-reading element during teleconferencing adds an extra layer of innovation.
A
- Activity A: Enable call-center operators in India to improve clarity of their pronunciation with technology.
- Outputs A: Operators gain ability to communicate with customers more efficiently.
- Short Term Outcomes A: Improved productivity - fewer errors and repetitions. Bias against non-native speaking operators is eliminated. Customer satisfaction is increased. Works stress for the operators is decreased due to simplified customer interactions.
- Long Term Outcomes A: Call centers can charge higher rates for their services and gain bigger portion of the market. Operators are being payed higher salaries; wealth and the well-being are increased.
B
- Activity B: Enable offshore and remote teams and team members to improve clarity of their pronunciation with technology.
- Outputs B: Nonnative English-speaking team members can communicate with their teams more effectively.
- Short Term Outcomes B: Improved productivity for individuals and teams. Hither earning potential due to bias elimination and more just compensation. Higher visibility in organization due to improved self-confidence of non-native speakers.
- Long Term Outcomes B:Improved intellectual potential and innovation the workplaces due to increased workforce diversity. Improved international collaboration. More equal wealth distribution.
C
- Activity c: Enable immigrants residing in English-speaking countries to improve clarity of their pronunciation with technology.
- Outputs c: Immigrants can communicate with employers, governments, businesses, educational institutions and the society members more effectively.
- Short Term Outcomes C: Improved productivity for all parties involved in the communication process. Hither earning and upward mobility potential due to bias elimination and more just compensation. Improved self-confidence of immigrants.
- Long Term Outcomes C:Improved intellectual potential and innovation across the society due to increased diversity and immigrants participation in all aspects of social and business life.
- Low-Income
- Middle-Income
- Refugees & Internally Displaced Persons
- Minorities & Previously Excluded Populations
- 8. Decent Work and Economic Growth
- 9. Industry, Innovation, and Infrastructure
- India
- United States
- India
- Philippines
- United States
Currently we are developing core technology. We are hoping to serve greater than 10 million people by 2025.
Develop Minimum Viable product by Q1 2021 - an API for call centers in India. Launch the call center technology in India and the Philippines in 2021.
Enable API for Teleconferencing tools and SDR in 2022.
Enable app for individual users in 2023.
During the next year, my team and I are planning to focus on completing the core technology needed for our minimum viable product: accent-editing API for call centers in India. Once the technology is completed, we plan to launch pilots with local call centers. These goals present the following challenges/barriers:
- Needing to raise sufficient funding to hire the core development team
- Completing our greenfield core technology for semi-scripted call center interactions with the biggest potential barrier of building novel real-time speech conversion
- Finding call centers in India to partner with to implement the technology
In the next five years, my team and I will keep working on the core technology to enable API for teleconferencing as well as App and add-on mobile service for users worldwide. This set of goals presents the following barriers:
- Finding and retaining the best talent available
- Excelling with real-time speech conversion for unscripted free speech where the biggest obstacles are real-time speech conversion and speaker characteristics preservation
- Determining how to integrate our technology with mainstream teleconferencing software given that teleconferencing companies might be reluctant to integrate with our API
- Entering B2B and B2C markets-create marketing strategies and hiring a team able to implement these strategies
I am planning to focus on building a strong technical team consisting of members who identify with the project’s goals and share my passion for social justice—to date I have been successful in recruiting the aforementioned highly skilled technical team to fulfill these roles. I am continually identifying and applying for grant funding including NIH SBIR as well as seeking institutional funding—I am becoming more skilled in this process and have grown my network significantly since project inception making the reality of significant funding more likely as our core technology and minimal viable product grow. I have also begun to find experienced mentors who can guide and support me in project oversight, as well as networking and connecting me with the right business and technological leaders. I am hoping to win prizes such as the Elevate prize which will increase the project’s visibility in addition to, in this specific case, gaining the support of the MIT network.
Woman in AI: a do-tank advocating for gender equality in tech. The organization will enable us to reach female talent through ability to post vacancies and connect with organization's members.
Our business model is SaaS: software as a subscription. We want to enable our solution as API compatible with teleconferencing tools and call-center software, SDI for voice-activated technologies, mobile App for individual and business users and add-on mobile service available through mobile providers.
Our goal is to price solution in a way that will enable us to reach large groups of users fast giving them access to technology that can improve their lives and livelihoods.
At the initial stages of operation we are hoping to fund project via mix of grants and venture capital. Our plan is to start bringing revenue in the second part of 2021 and achieve profitability in 2023.
We are starting our fundraising journey - have not raised funds yet.
We are looking to raise 1M USD to complete and launch our minimum viable product: API for call-centers in India.
About 150 000 USD
The Elevate Prize would afford the opportunity to reach a skilled mentorship group focused on the utilization of technology to answer questions regarding social justice with a focus on improving opportunities for those in less-resourced settings.
Gaining the Elevate Prize would also provide significant opportunities for the advancement of our work by providing funding to help us recruit the best talent to further development of the core technology required for further project implementation as well as allow for increased visibility of our project and goals.
- Funding and revenue model
- Mentorship and/or coaching
- Board members or advisors
- Marketing, media, and exposure
CEO