The Cost of AI-Powered Software: What Business Leaders Need to Prepare For

The Cost of AI-Powered Software: What Business Leaders Need to Prepare For

By

Mike McCormick

on •

Nov 4, 2024

When it comes to building AI-powered software, understanding cost is essential for long-term success. In this post, Mike McCormick, VP of Technology at Artium, shares his experience with AI implementations, the common mistakes that companies make, and strategies for managing costs while maintaining high performance.

What are the biggest cost drivers in building AI-powered software?

A lot of the cost drivers in AI-powered software are similar to what you’d expect from building cloud-based applications. You're dealing with the cost of the team doing the build, the cost of infrastructure, and the cost of any COTS products you’ll need along the way. Sticking with the adage “software is to done as grass is to cut,” I prefer to consider these costs as continuous. In addition, when it comes to AI-powered software there are a few unique variables to consider.

For example, the cost of using hosted foundation models is something that can significantly impact expenses. Right now we are seeing that these models are commoditizing, and vendors are trying to push prices down to stay competitive, but that doesn’t necessarily mean they are affordable within your architecture. If you're making several API calls to an LLM for every user action, those costs can escalate fast. Even if prices drop, high-volume applications can see their expenses soar if they're making multiple calls per interaction. In general, my back-of-the-napkin-math compares API pricing, call volume, and critically of the feature to product improvements against what it costs to have our team create and maintain a more focused model/architecture for ourselves. It often doesn’t take too far beyond a prototype for the focused model/architecture to cost less.

When we talk about the software tools, Artium has long favored working software in production as a means of driving alignment and product development. In the generative AI era, we find that quick prototypes make a big difference in aligning stakeholders’ understanding and building momentum. It’s easy to prototype something using a hosted model, but when you scale that prototype to users, the cost can scale out of control. In that sense it is important to plan for when you cross over from the convenience of a hosted model to the lower runtime cost of building and hosting your own solutions.

Hiring AI talent is a significant cost driver. There’s a limited talent pool that can develop, fine-tune, and maintain AI architectures, so whether you're hiring, upskilling internally, or working with consultancies like Artium, the talent that helps you compete and differentiate remains in high demand. These costs appear upfront as salary and transition to retention costs as your talent continuously evaluates growth opportunity, alignment, and the roadmap of cool things you choose to build together. In a highly competitive market engineering leaders have the most leverage on cost by creating environments that retain high-demand individuals.

What strategies do you recommend for achieving a balance between high performance and cost efficiency as usage grows?

The key to balancing performance and cost efficiency is to identify how your AI component delivers value to the user. Too often, companies jump to using LLMs or other AI technologies simply because they’re available, without fully understanding how the technology creates differentiating outcomes or how you intend to track success. When your product team can make statements like “helping users do this specific task x amount faster converts to y amount of cost savings/revenue/brand reputation/etc.” then it becomes easier for your development team to work out some quick math on how they might build the solution, how that might perform, and what that might cost. Having both the outcome and a napkin scrolled map to achieve the outcome (we have a focused prototyping process to answer these questions) allows you to make decisions on when it is appropriate to apply AI and when you should skip it.

What are common mistakes companies make that drive up AI operational costs?

One of the most common mistakes is underestimating the volume of interactions LLM architectures favor compared to cloud architectures and how that volume relates to long-term cost of hosted foundation models. A good rule of thumb is to consider the LLM a user. When you start asking yourself “how would a human get to this answer” using natural language and an interrogative process, it becomes clearer that what you think of as a service-to-service call in a cloud architecture might turn into multiple calls to an LLM, which could significantly increase costs if you’re not careful. These hosted foundation models are incredible but they definitely do not come with billing guard rails.

Another mistake is not preparing for regulatory shifts. As AI continues to receive regulatory attention, companies that don’t plan for compliance can find themselves with unexpected operational costs. I think we’re just beginning to understand how regulations will shape the cost landscape, and I expect that companies in regulated industries will enjoy a brief advantage over traditionally unregulated industries as the latter adjusts to costs, process, and people necessary to remain aligned to regulation.

What role does data quality play in the cost of AI maintenance?

Data quality is a huge factor. For more traditional AI and machine learning models, many data science teams are already familiar with operational processes like creating/maintaining data pipelines and monitoring model drift. But with LLMs, things are still evolving. You can’t just dump data into a model and expect optimal results. You’ll still need to invest in engineering to structure and manage that data properly, which will evolve as your application scales.

At Artium we have a process of Continuous Alignment Testing that we use to make sure new software, new model versions, and new data still play nicely together. We use these tools to boost our confidence and consistency when working with generative models. It costs more to introduce these tools in our continuous integration pipeline but as we know from 30 years of web development, it is better to learn early than to be surprised in production. AI models are not a "set it and forget it" technology. You need continuous investment in quality to maintain long-term value.

How does the choice of AI framework or platform impact long-term costs?

Your choice of AI framework or platform can have a huge impact on your long-term costs, especially in terms of how locked in you get with a particular vendor. Take OpenAI, for example. Their API makes it really easy to get started, and they’ve done a great job of building in features like function calling or structured data formatting. But all those extra features come with a level of platform dependence.

If you build heavily on one platform—say, you optimize your app for OpenAI’s GPT models—it’s not a simple lift-and-shift operation to move to another provider like Google’s Vertex or a self-hosted LLM. These models behave differently, and features you rely on in one might not be available in another. That’s why companies need to think about flexibility from day one.

If you're planning for the long term, you should consider what your migration path might look like. Are you using a third-party model to prove value and get to market quickly, but planning to eventually bring that capability in-house? If so, make sure you’re not building too much lock-in with any one feature set. The more flexible you can make your architecture—whether that’s through building modular systems or leaving room for different models—the more you can control your long-term costs.

The worst-case scenario is getting locked into a single platform and then discovering that the cost of using that model at scale is unsustainable. Flexibility is key, especially in an iterative field like AI, where the technology and the platforms are evolving rapidly.

When it comes to building AI-powered software, understanding cost is essential for long-term success. In this post, Mike McCormick, VP of Technology at Artium, shares his experience with AI implementations, the common mistakes that companies make, and strategies for managing costs while maintaining high performance.

What are the biggest cost drivers in building AI-powered software?

A lot of the cost drivers in AI-powered software are similar to what you’d expect from building cloud-based applications. You're dealing with the cost of the team doing the build, the cost of infrastructure, and the cost of any COTS products you’ll need along the way. Sticking with the adage “software is to done as grass is to cut,” I prefer to consider these costs as continuous. In addition, when it comes to AI-powered software there are a few unique variables to consider.

For example, the cost of using hosted foundation models is something that can significantly impact expenses. Right now we are seeing that these models are commoditizing, and vendors are trying to push prices down to stay competitive, but that doesn’t necessarily mean they are affordable within your architecture. If you're making several API calls to an LLM for every user action, those costs can escalate fast. Even if prices drop, high-volume applications can see their expenses soar if they're making multiple calls per interaction. In general, my back-of-the-napkin-math compares API pricing, call volume, and critically of the feature to product improvements against what it costs to have our team create and maintain a more focused model/architecture for ourselves. It often doesn’t take too far beyond a prototype for the focused model/architecture to cost less.

When we talk about the software tools, Artium has long favored working software in production as a means of driving alignment and product development. In the generative AI era, we find that quick prototypes make a big difference in aligning stakeholders’ understanding and building momentum. It’s easy to prototype something using a hosted model, but when you scale that prototype to users, the cost can scale out of control. In that sense it is important to plan for when you cross over from the convenience of a hosted model to the lower runtime cost of building and hosting your own solutions.

Hiring AI talent is a significant cost driver. There’s a limited talent pool that can develop, fine-tune, and maintain AI architectures, so whether you're hiring, upskilling internally, or working with consultancies like Artium, the talent that helps you compete and differentiate remains in high demand. These costs appear upfront as salary and transition to retention costs as your talent continuously evaluates growth opportunity, alignment, and the roadmap of cool things you choose to build together. In a highly competitive market engineering leaders have the most leverage on cost by creating environments that retain high-demand individuals.

What strategies do you recommend for achieving a balance between high performance and cost efficiency as usage grows?

The key to balancing performance and cost efficiency is to identify how your AI component delivers value to the user. Too often, companies jump to using LLMs or other AI technologies simply because they’re available, without fully understanding how the technology creates differentiating outcomes or how you intend to track success. When your product team can make statements like “helping users do this specific task x amount faster converts to y amount of cost savings/revenue/brand reputation/etc.” then it becomes easier for your development team to work out some quick math on how they might build the solution, how that might perform, and what that might cost. Having both the outcome and a napkin scrolled map to achieve the outcome (we have a focused prototyping process to answer these questions) allows you to make decisions on when it is appropriate to apply AI and when you should skip it.

What are common mistakes companies make that drive up AI operational costs?

One of the most common mistakes is underestimating the volume of interactions LLM architectures favor compared to cloud architectures and how that volume relates to long-term cost of hosted foundation models. A good rule of thumb is to consider the LLM a user. When you start asking yourself “how would a human get to this answer” using natural language and an interrogative process, it becomes clearer that what you think of as a service-to-service call in a cloud architecture might turn into multiple calls to an LLM, which could significantly increase costs if you’re not careful. These hosted foundation models are incredible but they definitely do not come with billing guard rails.

Another mistake is not preparing for regulatory shifts. As AI continues to receive regulatory attention, companies that don’t plan for compliance can find themselves with unexpected operational costs. I think we’re just beginning to understand how regulations will shape the cost landscape, and I expect that companies in regulated industries will enjoy a brief advantage over traditionally unregulated industries as the latter adjusts to costs, process, and people necessary to remain aligned to regulation.

What role does data quality play in the cost of AI maintenance?

Data quality is a huge factor. For more traditional AI and machine learning models, many data science teams are already familiar with operational processes like creating/maintaining data pipelines and monitoring model drift. But with LLMs, things are still evolving. You can’t just dump data into a model and expect optimal results. You’ll still need to invest in engineering to structure and manage that data properly, which will evolve as your application scales.

At Artium we have a process of Continuous Alignment Testing that we use to make sure new software, new model versions, and new data still play nicely together. We use these tools to boost our confidence and consistency when working with generative models. It costs more to introduce these tools in our continuous integration pipeline but as we know from 30 years of web development, it is better to learn early than to be surprised in production. AI models are not a "set it and forget it" technology. You need continuous investment in quality to maintain long-term value.

How does the choice of AI framework or platform impact long-term costs?

Your choice of AI framework or platform can have a huge impact on your long-term costs, especially in terms of how locked in you get with a particular vendor. Take OpenAI, for example. Their API makes it really easy to get started, and they’ve done a great job of building in features like function calling or structured data formatting. But all those extra features come with a level of platform dependence.

If you build heavily on one platform—say, you optimize your app for OpenAI’s GPT models—it’s not a simple lift-and-shift operation to move to another provider like Google’s Vertex or a self-hosted LLM. These models behave differently, and features you rely on in one might not be available in another. That’s why companies need to think about flexibility from day one.

If you're planning for the long term, you should consider what your migration path might look like. Are you using a third-party model to prove value and get to market quickly, but planning to eventually bring that capability in-house? If so, make sure you’re not building too much lock-in with any one feature set. The more flexible you can make your architecture—whether that’s through building modular systems or leaving room for different models—the more you can control your long-term costs.

The worst-case scenario is getting locked into a single platform and then discovering that the cost of using that model at scale is unsustainable. Flexibility is key, especially in an iterative field like AI, where the technology and the platforms are evolving rapidly.

When it comes to building AI-powered software, understanding cost is essential for long-term success. In this post, Mike McCormick, VP of Technology at Artium, shares his experience with AI implementations, the common mistakes that companies make, and strategies for managing costs while maintaining high performance.

What are the biggest cost drivers in building AI-powered software?

A lot of the cost drivers in AI-powered software are similar to what you’d expect from building cloud-based applications. You're dealing with the cost of the team doing the build, the cost of infrastructure, and the cost of any COTS products you’ll need along the way. Sticking with the adage “software is to done as grass is to cut,” I prefer to consider these costs as continuous. In addition, when it comes to AI-powered software there are a few unique variables to consider.

For example, the cost of using hosted foundation models is something that can significantly impact expenses. Right now we are seeing that these models are commoditizing, and vendors are trying to push prices down to stay competitive, but that doesn’t necessarily mean they are affordable within your architecture. If you're making several API calls to an LLM for every user action, those costs can escalate fast. Even if prices drop, high-volume applications can see their expenses soar if they're making multiple calls per interaction. In general, my back-of-the-napkin-math compares API pricing, call volume, and critically of the feature to product improvements against what it costs to have our team create and maintain a more focused model/architecture for ourselves. It often doesn’t take too far beyond a prototype for the focused model/architecture to cost less.

When we talk about the software tools, Artium has long favored working software in production as a means of driving alignment and product development. In the generative AI era, we find that quick prototypes make a big difference in aligning stakeholders’ understanding and building momentum. It’s easy to prototype something using a hosted model, but when you scale that prototype to users, the cost can scale out of control. In that sense it is important to plan for when you cross over from the convenience of a hosted model to the lower runtime cost of building and hosting your own solutions.

Hiring AI talent is a significant cost driver. There’s a limited talent pool that can develop, fine-tune, and maintain AI architectures, so whether you're hiring, upskilling internally, or working with consultancies like Artium, the talent that helps you compete and differentiate remains in high demand. These costs appear upfront as salary and transition to retention costs as your talent continuously evaluates growth opportunity, alignment, and the roadmap of cool things you choose to build together. In a highly competitive market engineering leaders have the most leverage on cost by creating environments that retain high-demand individuals.

What strategies do you recommend for achieving a balance between high performance and cost efficiency as usage grows?

The key to balancing performance and cost efficiency is to identify how your AI component delivers value to the user. Too often, companies jump to using LLMs or other AI technologies simply because they’re available, without fully understanding how the technology creates differentiating outcomes or how you intend to track success. When your product team can make statements like “helping users do this specific task x amount faster converts to y amount of cost savings/revenue/brand reputation/etc.” then it becomes easier for your development team to work out some quick math on how they might build the solution, how that might perform, and what that might cost. Having both the outcome and a napkin scrolled map to achieve the outcome (we have a focused prototyping process to answer these questions) allows you to make decisions on when it is appropriate to apply AI and when you should skip it.

What are common mistakes companies make that drive up AI operational costs?

One of the most common mistakes is underestimating the volume of interactions LLM architectures favor compared to cloud architectures and how that volume relates to long-term cost of hosted foundation models. A good rule of thumb is to consider the LLM a user. When you start asking yourself “how would a human get to this answer” using natural language and an interrogative process, it becomes clearer that what you think of as a service-to-service call in a cloud architecture might turn into multiple calls to an LLM, which could significantly increase costs if you’re not careful. These hosted foundation models are incredible but they definitely do not come with billing guard rails.

Another mistake is not preparing for regulatory shifts. As AI continues to receive regulatory attention, companies that don’t plan for compliance can find themselves with unexpected operational costs. I think we’re just beginning to understand how regulations will shape the cost landscape, and I expect that companies in regulated industries will enjoy a brief advantage over traditionally unregulated industries as the latter adjusts to costs, process, and people necessary to remain aligned to regulation.

What role does data quality play in the cost of AI maintenance?

Data quality is a huge factor. For more traditional AI and machine learning models, many data science teams are already familiar with operational processes like creating/maintaining data pipelines and monitoring model drift. But with LLMs, things are still evolving. You can’t just dump data into a model and expect optimal results. You’ll still need to invest in engineering to structure and manage that data properly, which will evolve as your application scales.

At Artium we have a process of Continuous Alignment Testing that we use to make sure new software, new model versions, and new data still play nicely together. We use these tools to boost our confidence and consistency when working with generative models. It costs more to introduce these tools in our continuous integration pipeline but as we know from 30 years of web development, it is better to learn early than to be surprised in production. AI models are not a "set it and forget it" technology. You need continuous investment in quality to maintain long-term value.

How does the choice of AI framework or platform impact long-term costs?

Your choice of AI framework or platform can have a huge impact on your long-term costs, especially in terms of how locked in you get with a particular vendor. Take OpenAI, for example. Their API makes it really easy to get started, and they’ve done a great job of building in features like function calling or structured data formatting. But all those extra features come with a level of platform dependence.

If you build heavily on one platform—say, you optimize your app for OpenAI’s GPT models—it’s not a simple lift-and-shift operation to move to another provider like Google’s Vertex or a self-hosted LLM. These models behave differently, and features you rely on in one might not be available in another. That’s why companies need to think about flexibility from day one.

If you're planning for the long term, you should consider what your migration path might look like. Are you using a third-party model to prove value and get to market quickly, but planning to eventually bring that capability in-house? If so, make sure you’re not building too much lock-in with any one feature set. The more flexible you can make your architecture—whether that’s through building modular systems or leaving room for different models—the more you can control your long-term costs.

The worst-case scenario is getting locked into a single platform and then discovering that the cost of using that model at scale is unsustainable. Flexibility is key, especially in an iterative field like AI, where the technology and the platforms are evolving rapidly.