Dynamic India – Spoken Word 3

arnoldkwong7
Jun 22, 2023
5 min read

Technical Implementations for Multiple Languages

This is the third in EkaLore’s multi-part series, the Spoken Word. It covers the opportunities for India to leverage its multi-lingual, multi-cultural society.

First, a little introduction to set the terms we’ll use to discuss the problem and then the opportunity.

At the most common level, computer applications see an input (request), a function to be performed (process), and a result (output). Decades of computer programming have seen an increase in the complexity of these series of events. From highly restricted inputs (button pushes) to voice commands. From local logic (deciding to turn a relay on or off) to questions of a cloud of processing. From turning on or off an indicator light for output to a complex visualization graphic with verbal explanation. As hard as defining what’s to be done, the task of performing these sequences of events has become ever more complicated when handling in and outs in multiple languages.

Software developers try to subdivide each event to be executed in sequence into ever smaller defined tasks to be accumulated and combined into a larger function (‘stepwise refinement,’ ‘object decomposition’).

A simple human task – buying a transit ticket – identifying and displaying the start and destination desired on the computer starts with understanding. Cryptic codes (like the airport location identifiers) might have nothing to do with the common language name for a city (hundreds of these in India alone and thousands globally). The steps to refine the date, times of day, class of service, and other requirements (like wheelchair seating) are sometimes simpler (commuter services are often ‘today’, ‘right now,’ and ‘cheapest available’), where longer journeys with connections can be incredibly complicated (“Can I get to there from here?”).

Identifying date, day, times of day, and intervals can be difficult to express for the software developer (‘when is a night flight?’, ‘prefer early morning flight’)

Creating the payment required, requesting payment, tendering payment, and providing confirmation information has all of its own complexity (Google Wallet? AliPay? Paytm?), even disregarding currencies and identity confirmation.

Enter the use of ‘artificial intelligence’ for language translation and processing.

Voice interfaces for computer access now provide ‘natural language inputs,’ ‘spoken voice outputs,’ and ‘natural language understanding.’ Simple in concept – the user talks, the computer works, and an answer is provided by spoken words. This becomes much more complicated with the software required to provide equal functionality and capability in multiple scheduled languages and typescripts.

A factor in all of this is the software developer. Many developers only work in a single spoken and written language. Even in computer programming languages, a developer may only have a limited fluency and capability. A very experienced programmer may be fluent in dozens of programming languages (and dialects). Psychological studies have shown even experienced programmers are limited conceptually and for performance by the extent of their knowledge across many languages and functions. Simply, programmers tend to express their work in the language their brains are most comfortable with, and in the computer programming language they knew first. This makes thinking about how to interface with a consumer in many languages a more difficult work effort.

Cultural differences also play a role in the ability of software developers to get things done. Assumptions of how people will accept the data presented and go on to the next functions are embedded in the programming design and applications functions. Once presented with costs, many people in ‘shopping’ mode want to go back to change the purchase, refine their choices, or save this result for later. Many computer applications make assumptions limiting shopping interactions (and have been shown to lose many buyers because of it). A simple cultural difference is how ‘polite’ a computer application has to be. Many ‘terse’ computer user interfaces seem offensive and lacking in social conventions to some users. (“There wasn’t even a thank you when done.”)

The difficulty of deciding what to program when working in multiple languages can be greater than the actual coding work. The testing to ensure what was required is done is also difficult and expensive. The more languages to be used, the more expensive the testing. Dialects and regional preferences are also factors in the use of language and brevity.

The software must be defined to the smallest details of the functions needed. For voice (natural language inputs), the functions desired must be translated into the computer internals performing the smallest operations. “Take me to the closest airport today” might mean “Fly to the closest airport to my destination by flying” or “Fly to the closest airport and then transport further to my destination in the shortest time.” The ambiguity accepted as part of normal conversation is often processed by assumptions made by the software developer. Responding to a flight request with another question, “Do you want to fly to Heathrow or Gatwick?” requires a decision to have been made and coding made, in the context of air travel, when buying a ticket, and understanding where you are starting out from (only one airport might be possible). For these reasons, questions like, “I’d like to fly into the city in time to check in to the hotel, freshen up, and then off to dinner with my contact.” are still beyond the state of handling by voice command – yet obvious to a human executive assistant.

For voice output experience, the software developer must consider all of the likely types of outputs and then be able to condition the program to be ready for the next context for voice inputs again. Simply, the computer application needs to keep its place (context) where the user was active. When a computer user hops to do something else and then comes back to where they were working, the computer software has to know where it was and related data, just as a person would be expected to keep up with what was in process when an interruption occurred.

These activities require the processing actions to be carefully laid out to support the widest variety of functions that could be requested. Voice and highly flexible user interfaces place a premium on careful functional analysis (functional decomposition and structuring). Instead of a highly specific order and precedence, the software must be ready to take on the actions requested from the voice-driven entry in the order the computer user is driving (for example, telling the software how payment will be tendered even though the amount and transaction haven’t yet been completed). The extent and flexibility of these events are preset by the software developer. Not all functions can be completed at any time (for example, reserving a ride share for a destination airport not yet selected), and the computer software needs to remember the preferences (‘retain the context’) for computer usage. The number of possible paths to complete a complicated transaction and the possible interruptions and other tasks can grow rapidly, making software development more time-consuming and costly. This is one of the reasons why fully flexible voice active user interfaces can be expensive and require a longer schedule.

The Spoken Word series’ next post will discuss flexibility.

When it’s posted, you can find it at www.ekalore.com/india-business

Dynamic India – Spoken Word 3

Recent Posts

Comentários