Operationalizing AI at scale: transcript (TaskUs Forward webinar)

A lightly edited transcript of the September 2, 2024 TaskUs Forward webinar, Operationalizing AI at Scale: Driving Results in CX with GenAI. Hosted by Alp Uguray (Masters of Automation), with Manish Pandya (SVP, Digital at TaskUs) and Todd Schiller (co-founder & CEO, PixieBrix). Short on time? See the highlights post.

The transcript was produced from the YouTube captions and lightly edited to fix product and company names, verbal ticks, and mistranscriptions.

Alp Uguray: Hi everyone, thanks for joining us today to our webinar Operationalizing AI at Scale: Driving Results in CX with Generative AI. I'm Alp, your host and moderator. I'm the founder of Masters of Automation, which operates at the intersection of AI leadership and enterprise transformation with AI and automation — also a five-time UiPath MVP and a Global Shaper at the World Economic Forum's Boston Hub.

Today I'm joined by Manish and Todd. Welcome. Todd holds a PhD in computer science from the University of Washington and is the co-founder and CEO of PixieBrix, the first platform to extend any web application with AI, automation, and collaboration. Manish heads Digital at TaskUs, where he and his team work with clients and internal operations to bring innovations that help improve teammate experience through the use of generative AI and automation products and solutions. Manish, welcome.

Manish Pandya: Thank you. Thank you for having me.

Alp: Generative AI is revolutionizing customer experience, moving beyond basic chatbots to power smart assistance tools. These tools automate part of the process and free human agents to handle more complex interactions and deliver more personalized responses. However, success depends on proper implementation. In this webinar, with a panel of experts like Todd and Manish, we'll discuss the trends, best practices, and real-world use cases for AI in CX — including the areas where generative AI can deliver the most value, the impact of large-scale AI implementation and metrics to track progress, and key considerations for choosing an AI solution and when.

Before we jump into all that, let's do a quick round of introductions. Manish, can you share more about TaskUs?

Manish: Absolutely. TaskUs is a tech-enabled business process outsourcer which uniquely combines the exceptional human talent we have access to with generative AI technologies, and we are powering it through our TaskGPT platform so we can deliver innovative solutions and customer experience to the world's biggest and most innovative brands out there. Thank you.

Alp: Thank you, Manish. Todd?

Todd Schiller: Thanks for having me. PixieBrix is the first low-code platform to extend web applications with AI, automation, and collaboration. Unlike traditional low-code application platforms that some folks might be familiar with, PixieBrix deploys as a browser extension, so you can actually extend the applications you already use versus creating yet another screen. Since the launch of ChatGPT we've seen a real surge of interest from global BPOs like TaskUs and from brands to solve the last-mile delivery problem for AI in both customer support and contact center.

Alp: Thank you. To those watching us, please note that this webinar is being recorded. At the end we'll have a Q&A, so at any point feel free to drop your questions and we'll do our best to answer them.

Let's get started. Todd, it seems like there is a new AI headline every day. What's caused this rapid advancement to occur?

Todd: That's a great way to start the webinar. By now probably everyone has heard of LLMs, or large language models, and in particular GPTs, which stand for generative pre-trained transformers. We saw this catalyst with the launch of ChatGPT back in November 2022 — it kicked off a Cambrian explosion of public interest, models, and investment.

GPTs really have two main differences with the previous wave of AI that people may be familiar with. First, it's generative — so it can actually read and generate text fluently. The second, and this is the real big one, is that it's pre-trained, so it eliminates the upfront cost to apply AI in different situations.

We've really seen three factors that have accelerated the pace of advancement today. The first is rapid improvements to the foundation models themselves — things like context window size, which is how much information these models can consume, has been increasing rapidly, as well as response time. We've also seen the infrastructure we've developed over the past decade — things like cloud, automation, and integrations. And then we've paired that with rapid innovation in applied engineering — developing patterns like retrieval-augmented generation, graph databases, vector databases — to do all these different kinds of things. You may have heard terms like RAG and other terms.

I think in a lot of ways AI looks like the early days of the internet. It's subsidized by an ample wave of investment from the venture capitalists. You have providers like Google, Amazon, and Facebook fighting to create this excess amount of infrastructure and capacity due to competitive pressure. We also see a very low barrier to entry to build prototypes. So ultimately I think AI is the future, but definitely not all the ideas and the companies and the headlines that you see are going to stick. It's a very exciting time to be in right now.

Alp: Manish, as an enabling technology, AI is set to transform a lot of functions. How quickly are CX leaders adopting AI?

Manish: As Todd mentioned, there is pretty much a new headline every day, and there is advancement. What we've seen is that CX leaders are increasingly recognizing this transformative potential of generative AI. It's easy to experiment, so we also see rapid experimentation across various functions. Leaders do see that by integrating AI they'll get better customer experience, customer satisfaction, loyalty, and of course, ultimately at the end of the day, business growth.

We also see there is a distinct need for human-assisted AI, or human-in-the-loop, so that we can get optimal results. As Todd mentioned, we are in the early days, so we have some challenges with the large language models and so on. As the adoption rate is increasing, there is also a need to combine the power of AI with human oversight so that we can get optimal outcomes. At TaskUs we believe in the power of human-driven AI, or human-in-the-loop, where the technology is amplifying the capabilities of the human as they deliver — not necessarily replacing them. This has potential to deliver empathetic, personalized, and nuanced responses that only a human can deliver, and not necessarily something that an AI can replicate, although AI can assist.

We also see a varying level of maturity in terms of which industries are adopting. Especially tech-savvy sectors are rapidly piloting and experimenting with AI technologies — particularly in conversational AI, content generation (which is a mainstay of generative technologies), content writing and rewriting, sentiment analysis, and next-best actions.

Having said that, there are also challenges, particularly related to AI adoption — the large language model hallucination problem. There are ways to mitigate it, which Todd mentioned (RAG and graph and so on), but there's also a challenge of the data that feeds into the RAG model itself, where you have data entropy challenges. We also have data privacy issues and the need for skilled people to manage this.

In summary, CX leaders are experimenting with generative AI technologies at a rapid pace, but with a clear understanding that to make the outcome sticky they need to have humans in the loop and they need to have humans to manage this as well.

Alp: There's a lot at stake when dealing with customers. Todd, what are some things that might go wrong? What are the stakes?

Todd: As Manish mentioned, the current wave of AI has a lot of opportunities, but it also has a lot of challenges. A key root cause is that while large language models are fluent, they don't really understand what they're generating. A metaphor people like to use is that of parrots — parrots can make sounds and mimic the sentence structure of humans, but they have very limited understanding of what they're actually hearing or saying. That causes problems like hallucination, which is the term folks are using for LLMs making up information. The LLM generates what sounds plausible, but what sounds plausible might not actually be true or beneficial. These limitations cause unpredictable bugs that can't be eliminated via the traditional testing and QA cycles that a lot of teams are familiar with.

A fun example that made headlines last month was that McDonald's was testing voice AI for its drive-throughs. For one customer it added hundreds of chicken nuggets to an order that couldn't be removed. In my day I've ordered a lot of chicken nuggets, but never quite that many in one setting.

Unfortunately, the unpredictable nature of these LLMs also makes them difficult and expensive to secure. They can be influenced by instructions provided by both users and bad actors. Some folks are familiar with the story early on of a jokester negotiating with a chatbot to buy a car for a dollar.

These make funny headlines, but they have real-world consequences for brands that adopt AI. The first is that countries like Canada have held companies liable for damages if their customer-facing AI gives misinformation — you can't just hide behind the "hey, it was AI" excuse. And then at the end of the day, you're just putting your brand at risk if your brand ends up spotlighted on social media for a bad customer interaction. That's especially true if you're being construed as trying to use AI to prevent customers from reaching customer service, versus using AI to better serve them.

Alp: Manish, as a BPO you see a broad variety of customers. How are you seeing the needs and situations vary across them?

Manish: At TaskUs we have the unique privilege of working with a broad spectrum of clients, and they cover multiple service lines — from digital customer experience, which also includes voice deflection, to trust and safety, and others as well. For successful application of generative AI, some themes are emerging from the use cases.

First is the ability for our teammates or agents to securely and consistently get answers from the knowledge base — access to client-specific information knowledge bases that the teammate needs to be able to search in natural language and get a generated response.

Second is a need for teammates and agents to write contextually. What that means is not just write proper English and grammar, but also to use the client's brand and guidelines to write and rewrite content, intelligently suggest responses — especially in real-time channels like in-app messages and chat — and some of the next-best actions that can follow after that.

Lastly, the theme that is appearing is that large language models are great at analyzing tons of data — so sentiment analysis of conversation, summarization of actions, using them for quality assurance, and also real-time transcription of voice and other media.

These are the four themes that we have seen: secure search, right content, intelligently suggested responses, and analyzing a whole bunch of data to summarize and do sentiment analysis.

Alp: Given the variety of needs, how have you seen CX teams using AI? Are there common threads?

Manish: Having worked with this broad spectrum of clients, what we've seen is that clients are expecting privacy and security of customer data at the core, and when you're working with generative AI and large language models, this becomes even more enhanced. There has to be rigorous adherence to privacy and regulation, making sure that generative AI — which is essentially non-deterministic — is able to provide a deterministic answer.

Second is reliability and accuracy of the responses and the suggested actions. This means we need to have consistent responses which are meaningful and demonstrate intelligence — not necessarily parrot something else.

We also see that clients are expecting us to provide responses that are free from biases. A large language model is inherently trained with tons of data and content, so if there are biases within the large language model, they will surface when you're generating responses. So through proper governance, deploying guardrails — and the guardrails could be built into the large language model, or we can enhance with guardrails that we put in using NeMo Guardrails and such — plus data governance: what goes in for augmentation of the responses, that data has to be governed; versioning of the data that goes into the knowledge base, and so on.

Lastly, being able to observe how the response is generated: is it at the stage where the prompt is generated, or is it when the response is generated? And of course the time it takes.

What we're seeing is that clients are expecting the generative AI solutions to complement our teams so they can be freed up to perform high-value tasks. Essentially what they're looking for is a true co-pilot for our teams, not just chatbots and search. These are the common themes that we see emerging when CX teams are looking for generative AI solutions.

Alp: Very powerful. Todd, how is technology being leveraged more effectively?

Todd: I think we've seen three phases, and it ends up echoing a lot of what Manish just mentioned. Initially we saw the early innovators figuring out how to roll out general chat interfaces at scale in a way that actually met their company's and industry's compliance requirements. So this came with a combination of things like single sign-on, running models in their own cloud, data retention policies, redaction policies — you name it.

In the next phase, we saw companies starting to actually connect those GPTs to their corporate systems for things like information retrieval — really around that requirement to improve reliability and quality via grounding it to the enterprise's actual data, actual brand guidelines, and company-specific information.

At that point, what I think companies found out is that chatbots are the fastest way to deliver AI to everyone at your company — and if you remember, ChatGPT was actually the fastest-growing consumer product of all time — but there's actually little consistency with outcomes when you just roll out a chatbot. We're starting to see that with some of the headlines coming out. For example, I think it was two weeks ago, one large pharma company just canceled their Microsoft Copilot subscription deal due to lack of return on investment.

There are a couple of root causes for why chat by itself has not succeeded in operationalizing AI. The first is inconsistent quality based on how it's being prompted — different people are getting different results on the team. The second, which is more inherent to the chat interface, is that it's a very general way to interact. Everyone knows chat because people are used to chatting on their phones with friends, but it's very manual and it's relatively low throughput compared to other interfaces in terms of how quickly you can provide it context and get information back. Both of these end up being deal breakers when you're talking about real-time chat or real-time voice in a fast-paced customer support environment. That's especially true if you're trying to onboard new team members quickly and empower them to have that speed to competency with their teammates.

So in its current phase, what we're really seeing is companies figuring out how to embed AI into the natural flow of work, and as Manish said, it's really about making AI a true co-pilot for teammates. From PixieBrix's perspective — and I think we're showing the slide here — that really means giving AI eyes and ears to see what work the teammate is doing, giving it the ability to interact with the teammate in the different systems, and affordances to assist the teammate with information or actions. Then, to truly operationalize this, you really just have to have the flexibility to use the most effective models and most effective interfaces, versus just giving everyone a chat interface.

Alp: Manish, as a BPO you work with some very large deployments around the world. Are there any special considerations to operationalize at that scale?

Manish: Of course you need scalability — to operate across multiple geos and so on — but also flexibility of working with different large language models. There are a variety or plethora of large language models that are available today: closed-source, open models — open-source models like the Meta Llama models and such. So being able to work with them at scale, but also provide the flexibility of choosing the best model for the job. Then there are different hosting models out there as well.

Second is data security and compliance. Everybody has data security and compliance, but that gets nuanced and enhanced when you work with generative AI technologies. Of course you need to have strong information security teams and data protection teams out there who provide oversight — GDPR, HIPAA, and others — but you also need to adapt and have a deeper understanding of how these large language models are actually handling the data. That means you need to be able to separate and segregate the large language model provider from the host, which means you need to have third-party hosting, air gaps, and to be able to self-host in certain situations.

We also see, when we work at scale, that there is a variety of content that needs to be consumed, and this is not just text — PDFs, websites you need to scrape to pull in, tables, images, and so on. That means you need to have a hybrid content creation strategy, which would mean leveraging your learning experience teams toward building training materials, which we do very effectively, along with a robust data pipeline.

Fourth is operational efficiency. At the scale at which we operate, even smaller inefficiencies in generation of responses can add to your average handle time. There is a throughput issue we see with large language models — we need to make sure that whatever user experience we provide to our teammates helps in taking out all this operational inefficiency, otherwise you would have detrimental effects.

Last but not least, we work with a wide variety of clients, which means they have a wide variety of client systems. Being able to distribute the AI solutions that we build — which Todd mentioned, the last-mile problem — either through Chrome extensions, web UI which we all are used to, web widgets that can be embedded on some native applications, or even marketplace applications for the most common CRMs. So we need to invest in flexible, interoperable solutions that can adapt to different client environments to be able to service it.

Alp: From the perspective of a customer, what are some considerations to choose a tech-enabled BPO?

Manish: TaskUs is a tech-enabled BPO, so what I would provide as some parameters: one, partner with a BPO that has the ability to invest in generative AI solutions, not just deploy what products are available, but partner to create customized solutions. Second, have flexibility in deployment options. Third, be able to leverage their information security teams and data privacy teams to be able to comply with GDPR — not just from a data perspective but also from a generative AI perspective.

Provide insights on usage and adoption, which is a given, but also use the gaps that you see in the data itself to have a meaningful conversation through quality teams, through training teams, through the learning experience teams with the clients, so that we can improve our overall experience.

Last but not the least, there has to be a culture of continuous innovation, and the ability to take those insights and keep up with the integrations that are there — or provide this information so that you can use all of it at this very nascent stage of generative AI to provide meaningful information on a continuous basis.

Alp: What kind of impact can CX teams expect to see when bringing on the right AI-enabled BPO like TaskUs?

Manish: I would look at the similar KPIs but also augment that — similar KPIs being reduction of average handle time, accuracy, customer satisfaction, employee engagement. But the more important thing is to augment humans with technology so that we can free up those teammates to do higher-value work, which is what would be expected as the generative AI technologies get deployed.

Alp: Todd, PixieBrix takes a very unique approach in the marketplace. What kind of outcomes are you guys seeing?

Todd: First off, I'd echo Manish — a benefit of working with customer experience and contact center is that, unlike a lot of functions out there, CX teams have good measures of effectiveness such as average handle time, quality, customer satisfaction, and employee experience. We certainly see these wins in our metrics when we use PixieBrix to deliver AI.

Some of my favorite outcomes, though, are where PixieBrix is used to enable business outcomes that weren't possible before. A couple of examples. First, one of our early customers is an AI-enabled company that equips school buses with cameras and sensors to catch unsafe and illegal driving. Due to legal enforcement regulations there are really strict quality requirements and rules around review, so you have to have humans in the loop, as well as police in the loop. What PixieBrix does is help make the operational economics work so they're able to scale their operations nationally.

In more of a contact center context, one I really love is that we have customers using PixieBrix to deliver real-time chat language translation. Given the scarcity of native language speakers in places like Europe, what this translation does is enable more customers in those geographies to actually speak to humans — and therefore receive better service. The other thing that surprises some folks is that experienced CX teammates using translation are more effective and deliver better outcomes than inexperienced people who might just be native speakers.

Alp: Manish, what does the future hold for TaskUs and PixieBrix? Is there anything our guests could be on the lookout for?

Manish: TaskUs has partnered with PixieBrix since 2021 — both as an early adopter of the technology, but also as a design partner. We have exchanged a lot of ideas about what can be built and how we can roll it out. We have rolled out the PixieBrix business platform to thousands of teammates for assisted automation, and now we are excited to continue our collaboration with PixieBrix, with plans to tighten the integration between the TaskGPT platform and PixieBrix's generative AI solutions — to deliver more innovative solutions, to have the next chapter of AI and automation, so that we can enhance our teammates' work to make it more meaningful. Thank you.

Alp: Thank you. We talked about very interesting topics today — how generative AI is moving beyond basic chatbots to create tools that can handle very complicated inquiries and personalize those interactions between AI and humans. We talked about the areas of CX where generative AI can deliver the most significant value today and provided a broad map of key questions to ask potential vendors. We also talked about metrics — what success looks like for a large-scale AI implementation, which metrics to track, what are the best practices for ongoing optimization, and what is really important when choosing a partner.

With that said, I'd like to thank all of you for your time and open up questions from the audience. We received a few of them. The first one is for you, Todd. How can we implement generative AI across different markets and/or languages?

Todd: That's a good question, and it's great to hear that we have a global audience here. First off, as Manish mentioned earlier, you have to be aware of the regulations that apply to your region — so in the EU you have both the EU AI Act as well as things like GDPR.

As far as language itself, I definitely think it's exciting that we're getting close to that universal translator from Star Trek. What's sort of interesting is that since LLMs were trained on content in multiple languages, in a lot of cases they actually do pretty well out of the box, especially for tasks like semantic search. There are some clear exceptions to that rule — Japanese is a great exception where LLMs don't do great on translation tasks.

In a lot of situations, you have to be flexible and choose the right model for the job. Often for translation you don't necessarily want to use generative AI — you want to rely on a more specialized translation model. That's for a couple of reasons. The first is that different models have better or worse performance across different language pairs — for example, going from English to the European languages versus Chinese or Japanese. The other thing is, depending on what channel you're talking about, LLMs in this current wave of technology are most likely more expensive or slower than specialized AI for real time, when you're talking about voice or chat. So I would echo Manish — you really have to be flexible and agile in which models you're adopting.

Alp: Makes a lot of sense. Manish, the next question is for you. I think this ties to hallucinations within AI. The question is: can you talk a little bit more about how can we prevent AI from giving the wrong response to customers?

Manish: We heard earlier from Todd about all the different capabilities of large language models — context window, how they are trained, number of parameters, and so on. What that means is that they each have their limitations as well. So how do we overcome this so that we can avoid hallucination, which is where it is generating a response which sounds plausible but is not accurate?

There are techniques. First is basically retrieval-augmented generation. What you do is ground the large language model with the context you are providing, that is fetched based on the question. So you need to have a robust RAG pipeline. You need to have human oversight in the content curation process. There are capabilities out there to take any content and then curate it however you would, but as you run them at scale you will see that you need to have human oversight on the content curation as well.

When you look at the RAG pipeline, there are so many techniques out there — chunking (how big the chunks you select), how do you rerank those chunks, and so on. Those are the factors that go in.

You need to implement guardrails — either choose a large language model that has built-in guardrails, or there's a framework for guardrails and such — so that you can ground the question as well as the output that is generated.

Then you need to evaluate the outputs constantly. There are frameworks available like the RAGAS framework, or even some of the other providers that can provide a quality score of the output that is generated.

Just to summarize: it's not easy to prevent that — you can mitigate and you can eliminate to a large extent. But there are techniques that can be used, and when combined together, the output is going to be one where you have prevented some of these wrong responses that are coming out.

Alp: Thank you, Manish. The next question goes to you, Todd. How do we ensure we don't become too dependent on AI and maintain a balance with human intervention?

Todd: I love that question — thanks to whoever asked that. I think we've seen some companies probably be too focused on short-term metrics and headlines, and forgetting to really consider that adopting any technology, much less AI, has second-order effects. We've seen a lot of companies deploying, for example, customer-facing chatbots, and then running around with their deflection rate metrics saying, "hey, we did a great job," but then they see drops in their customer loyalty and lifetime customer value.

So for me the key is to think holistically about your company and value chain — try to apply what a lot of people call systems thinking — and really try to understand what is unique about AI as a technology and therefore where it can best be applied.

There was a recent article — I think it was last month's edition of Communications of the ACM, which is kind of the computer science magazine for industry — that had an article about what we can learn from co-pilots and other industries that have been rolled out before, like aviation and flight co-pilots. There are some really great considerations they bring up around things like making systems more intelligible or explicable: how does it explain the output that it's giving? And then, knowing that you can't solve all the problems, how do you design for the appropriate levels of trust as part of that?

I really think we have to think about where it fits into the larger system and think about what that relationship should look like, and what's unique about this new wave of AI.

Alp: Space is moving very fast and rapidly growing, and new technologies are coming out like we talked about in the beginning of the webinar. Customers are reacting to that speed of innovation. Manish, the next question goes for you: what advancements do you expect to see over the next few years, and how to get ahead?

Manish: Rather than talking about the next few years, let's talk about the next few months and next few quarters. All the models, as we see, are becoming faster, cheaper — some of them are becoming smaller. So all this variety of models is becoming available. What this means is that now you have more access, and hence the pace of experimentation will increase.

We are also seeing new modalities being available. In the contact center world, in the BPO world, real-time transcription — access to real-time transcription of voice — with quality is basically becoming available now. Being able to also understand not just the words but also the emotions behind them — those kinds of capabilities are also becoming available now.

You will also see that not just one-shot question and response with some follow-up queries, but being able to form a chain — which is what is called agentic AI — will emerge, where you provide a task and the generative AI solution is able to string together multiple automation capabilities together.

You also see a lot of low-hanging fruit from an automation perspective getting either automated or provided as a co-pilot to the agents and teammates, so they get freed up to do more meaningful tasks — focus on relationship, empathy, and so on.

One example I can give: conversational AI voice agents are emerging now, and what they're doing is taking a flow of, let's say, lead capture, and providing it in such a way that you're able to handle the interruptions, you're able to handle the objections, and at the end of it, it's still a linear flow in terms of capturing information and providing it to a human, so that the human would then go and perform the action — which would essentially in this case be to convert the lead into a meeting or something after that.

You'll see these kinds of trends emerge — new modalities, new capabilities, being able to have the reasoning so that you can chain up automation together, take out all the low-hanging fruit out there, which is any tedious task that humans are not meant for, and free up those humans so they can perform more complex, relationship-oriented tasks.

Alp: I'm very excited about that future. I think as we are reinventing the future of work, we're introducing more automation to remove the mundane tasks that people hate to do, while bringing explainable AI — why AI does certain things and what the impact is, the transparency in execution. As the future is going to evolve and as more technology innovations come through, and those technologies get adopted by the customers of PixieBrix and TaskUs through large language models, it's going to build a new trust between humans and AI. That trust is going to be really important and has its own requirements, and of course it has to bring business value — measuring the ROI where it needs to be, while we are driving meaningful work of the future.

With that said, thank you once again to Todd and Manish for a fantastic and insightful conversation, and to everyone who joined us in this webinar today. Thank you for spending time with us. We'll be showing a QR code for you to scan to give us your feedback. If you want to know more about generative AI capabilities, you may get in touch directly with Manish or visit TaskUs's website at taskus.com. Same goes for PixieBrix and Todd — connect at pixiebrix.com. Thank you very much.

Todd Schiller

Human ✘ Artificial Intelligence