Understanding the Costs of Enterprise Generative AI
We are still in the early phase of enterprises adopting generative AI. However, even at this early point enterprises must understand the costs associated with the use of Large Language Models (LLMs) as they embark on developing generative AI applications. These costs can become significant. The goal of this post is to identify the cost categories and show the decisions that impact the costs. Enterprises can then use the analysis to assess the best way to move forward with experimentation, implementation, and deployment of generative AI-based applications.
In a previous post, I discussed the three AI eras that preceded the era of generative AI, analyzed the reasons for the mismatch between reality and hype, and presented three actions enterprises should take as they embark on their generative AI efforts. One of these actions called for understanding the risks associated with the use of generative AI applications and finding ways to mitigate them. The cost of building, using, and maintaining these applications represents a significant risk. As we see from our firm’s corporate clients in the automotive, telco, and financial services industries, the number of enterprise employees who are experimenting with generative AI tools continues to grow. For this reason, it is necessary to understand this risk in greater detail and determine how to mitigate it.
Enterprise employees are using generative AI on a broadening set of tasks. In most cases, they start by using the free version of a chatbot that includes a foundation model such as ChatGPT, or Bard. The out-of-the-box performance of these models is sufficient for some tasks, e.g., writing emails, or summarizing documents. Attracted by the good performance during initial interactions, as a next step the employees typically buy individual subscriptions that give them access to better LLMs. Before too long these individual subscriptions become a major expense for the enterprise. This is a very similar journey enterprise employees followed when tools such as Dropbox and Slack were introduced to the market.
To address the complexity or proprietary nature of many enterprise processes, e.g., customer support, design, data analysis, and software programming, different approaches are typically necessary. Most enterprises start by applying more extensive and sophisticated prompt engineering but continue to use out-of-the-box LLMs. When this is not sufficient or possible, then they try to improve the performance of these cloud-based LLMs, using fine-tuning or Retrieval Augmented Generation. If even these approaches do not produce satisfactory results, then the enterprise may elect to develop its own proprietary LLM. For example, Intuit has developed the GenOS generative AI application that uses proprietary LLMs to help its customers solve tax, accounting, marketing, cash flow, and personal finance problems. Toyota is developing a generative AI application to help automotive designers during the ideation process. For this application, Toyota developed its generative AI techniques and collaborated with LLM providers to create its proprietary task-specific models. Stellantis has built a version of ChatGPT into its DS vehicles. We need to develop metrics that will allow an enterprise to determine when to move from complex prompt engineering to fine-tuning existing LLMs, and ultimately to developing a proprietary LLM. A cost-benefit analysis must be one of the key metrics.
Each approach has its pros and cons, and each approach has different costs. For example, the highest cost of complex prompt engineering is finding employees who understand the problems to be solved and can create the appropriate prompts, whereas the highest cost associated with RAG approaches is finding the right data with which to specialize an LLM. In this blog, we try to identify the cost categories rather than provide actual cost figures since these will depend on the type of model that will be necessary, the applications that will utilize it, and the model-building environment. We have identified four types of costs:
- Technology costs
- Regulatory compliance costs
- Response verification costs
- User training costs
Technology Costs
Technology costs include data, tools, data centers, and people. Data, and lots of it, is the key ingredient for developing a new LLM or modifying an existing one. Depending on the business processes the LLM will be used for, it may be necessary for the enterprise to license data in addition to its proprietary data. The data that will be used for the modification or development of the proprietary LLM must then be a) labeled, b) curated to remove problematic records (for example, records that may introduce biases) and verify that it has the right characteristics, and c) properly managed. Based on how the enterprise wants to treat its proprietary data it may be able to use an LLM that is offered as a service, or it may have to host the LLM in its data centers. Each option has different costs.
Employing proprietary LLMs (or even Small Language Models) requires that the enterprise understands the training, inference, and maintenance/retraining of such models. Enterprises need to license the generative AI software infrastructure tools for tasks such as tokenization and vector management, large-scale neural network architecture development, model verification and integrity, and MLOps so that they can develop proprietary LLMs, as well as tools for generative AI application development. If the enterprise builds an LLM using an existing open-source model, such as Falcon or Mistral, it may incur model support costs. If instead, it modifies a proprietary LLM, such as GPT-4, once they have it on-premise, then their costs will be higher because they will also have to pay for the model’s license. In both of these cases, because today many open-source and proprietary models exist, the corporation will also need to consider the cost of selecting the initial LLM to train using its data.
Because LLMs are significantly more complex than the typical predictive models used in discriminative AI tasks, model integrity including defending against cybersecurity threats the LLM may be exposed to, are significantly harder and more expensive tasks in generative AI than they are in discriminative AI. If these are key issues for the company, then it will have to develop and maintain its own LLM and host it in its data center with all the costs such a decision entails.
The enterprise must determine whether its existing data center resources are appropriate and sufficient for developing, and hosting LLMs. Generative AI startups using public cloud resources learned that the architectures of existing data centers are not optimally configured for training these LLMs and subsequently using them for inferences. The data centers used for generative AI require new architectures with different CPU/GPU ratios and component interconnects and have higher power needs (and here) than today’s typical data center. Upgrading data centers is expensive.
Regulatory Compliance Costs
Governments are starting to realize that the power of AI, including that of generative AI, can be transformative. Many of the resulting transformations in biotech, healthcare, education, and other fields, will result in enormous good. Others, such as the generation of fake content, will have negative consequences. To address the problems that can arise from the use of certain data in the development of LLMs, e.g., biometric data, but also from the use of the resulting models in applications, such as scoring applications, countries started enacting various regulations. The US issued a presidential executive order, to cover generative AI operations while the EU recently passed its AI Act. These regulations follow China’s which has been a pioneer in this area. Japan has taken a softer regulatory approach.
Complying with these regulations, particularly for corporations that operate in multiple geographies, will become expensive. Corporations will need to ensure that the data they use, the models they build, and the applications embedding these models are compliant with each geography’s regulations. They may have to develop different models for each geography using data and model-building processes that comply with the geography’s regulations. Furthermore, existing applications that require certification, e.g., healthcare applications, will need to be re-certified once they incorporate a new AI component.
Response Verification Costs
Those of us who use LLMs have become familiar with their hallucinations and inconsistent responses. Corporations exploring the use of generative AI for customer support applications, or software development are particularly concerned about customer satisfaction and even liability issues if a customer receives the wrong response, or a piece of code generated by the LLM creates problems in existing enterprise systems. For this reason, enterprises will need to consider the additional cost of validating the correctness and consistency of LLM responses and the cost of developing and incorporating safeguards to ground the model’s performance.
User Training Costs
A final cost category involves the training of employees so that they can take maximum advantage of generative AI’s power. This could mean training them on prompt engineering, as well as on how to best use the various types of copilots that are starting to be incorporated into productivity tools and other types of enterprise applications. These costs can become significant and impact the use cases an enterprise is considering.
Every enterprise needs to develop the generative AI component of its AI strategy. As part of the analysis that will go into the strategy, the enterprise will need to determine whether it needs its proprietary model if such a model should be capable of addressing multiple domains, and whether the users will be better served by a single enterprise LLM or several smaller, task- or domain-specific models.
Image copyright: VentureBeat
Leave a Reply