AI/LLM
What It Really Takes to Build AI Products that Suceed
By
David Wiszowaty
on •
Sep 17, 1014
Many enterprise companies are jumping to create new products and features centered around the tech world’s hottest topic: Artificial Intelligence.
However, 80% of these initiatives reportedly fail. Many assume that a few lines of code or a quick integration of an AI model will get them to where they need to be, but the truth is far more challenging.
In this post, Artium Senior Software Engineer David Wiszowaty will dive into what sets truly successful AI products apart from those that fail - and share insights on how to do it.
Q: What are the most common misconceptions about AI in products that you've encountered?
One of the biggest misconceptions I've encountered while building AI products, especially on the engineering side, is the oversimplification of what it takes to actually build AI features. Many people think that developing an AI application is as straightforward as integrating an LLM, writing a few prompts, and voilà—you have a fully functional AI product.
While this approach can get you partially the way there, AI development is much more complex. Creating a truly valuable and reliable AI product means diving deep into fine-tuning the model, making incremental changes, and sometimes even incorporating multiple AI agents that work together to deliver the desired outcomes. There's also the need for constant evaluation and improvement. AI doesn't always deliver reliable or accurate results right out of the box, and it requires a lot of trial and error to get it to a state where it can consistently provide value to end users.
For instance, when you're developing an AI model, there are many different aspects you need to consider: data preprocessing, model training, hyperparameter tuning, integration with existing systems, and continuous monitoring.
This misconception of AI as an easy solution can lead to unrealistic expectations and disappointment when the AI doesn’t immediately perform as hoped. It’s crucial to recognize that AI development is an ongoing process that involves learning, adjustment, and refinement.
Q: How does Artium’s practices play a role in building a successful AI product?
There are several pieces to this, but one of the core practices we focus on is testing—specifically Test-Driven Development (TDD). Since AI is inherently non-deterministic, meaning that the responses you get can be random and unpredictable. This can make testing seem tricky at first. In my current project we are relying heavily on testing how our AI functionality behaves. The way that we are approaching this is through a practice called "Continuous Alignment Testing" (CAT). This testing framework allows us to validate the AI behavior by providing it with numerous scenarios and assessing the AI responses for tone, accuracy, and efficacy. In the case of my current project, the AI model searches for people with specific skill sets and summarizes the results. We've written tests to ensure that when given a specific prompt, the AI returns reliable results and doesn’t hallucinate or make up information.
Another practice that we do at Artium that has been super valuable for developing successful AI features is Pair Programming. Since the responses we get from AI models are non-deterministic, there are numerous different situations we can be in. Being able to pair allows us to bounce ideas off each other in real time. We can then try out these scenarios and figure out together how to update the AI prompt to better handle every scenario. Pairing and even mobbing can increase and expand the number of different approaches to try and improve the AI feature.
Q: What key factors differentiate successful AI products from those that fail?
This is a great question and one that took me some time to think about. I believe one of the key differentiators is how you approach building AI components or features. A common pitfall is treating AI simply as a chatbot or an assistant that only responds to commands. While there is value in chatbots, it isn’t the most innovative AI implementation.
On the project I am currently working on, we’re approaching AI as more of a "teammate" than just an assistant. A teammate works alongside you to achieve a shared goal, not just reacting to commands, but proactively participating in the workflow. A great example of this is GitHub Copilot, which positions itself as a coding partner and “pairing buddy” rather than a simple assistant. Using Github Copilot, a developer will go about their regular, day to day workflow, and occasionally the Copilot will contribute and make suggestions where it thinks it makes sense. It’s not always perfect, but there are plenty of occasions where the Copilot provides valuable results that the developer wouldn’t have thought of entering into a chatbot interface. This kind of approach can lead to much more valuable and impactful AI implementations.
Q: What advice would you give to other engineers looking to improve the success rate of their AI projects?
The biggest piece of advice I’d offer is to think about testing very early on in the development process. It is very time-consuming and difficult to gauge how successful a prompt is in assuring it is responding appropriately. This becomes more apparent when making any sort of modification to the AI prompt. Any change to the prompt can cause the AI to behave very differently in some scenarios. Without automated tests and monitoring, it becomes very difficult to know if the prompt change actually broke any existing behavior.
Another piece to consider is to think about improving visibility into what is happening behind-the-scenes with these AI components. When developing complex AI features, it can all feel very magical and tough to understand. You tend to ask yourself, “Is it actually working as intended?” Logging and understanding what prompts are being sent and how the AI is processing them can help you identify where things might be going wrong and where adjustments are needed.
Finally, choose the right tools and libraries that offer good visibility into what’s happening under the hood. Avoid libraries that abstract away too much of the inner workings, as they make it harder to understand what’s happening and how to debug issues when they arise. Opt for lighter wrappers or open-source solutions where you can see and understand the interactions between components. In my current project, it was very helpful to experiment with different libraries since each of these libraries offer different pieces of AI functionality. We were able to notice that some AI libraries do better in some specific workflows, while not so great in other workflows. There were numerous libraries and frameworks that we experimented with, such as LangChain, CrewAI, LlamaIndex, and just using the Vertex AI SDK. Ultimately, we chose to use LlamaIndex because it made it easy to implement RAG (Retrieval-Augmented Generation) and build a multi-agent AI architecture. Additionally, it provided the right balance of abstraction and transparency, making it easier to understand what happens behind the scenes.
Many enterprise companies are jumping to create new products and features centered around the tech world’s hottest topic: Artificial Intelligence.
However, 80% of these initiatives reportedly fail. Many assume that a few lines of code or a quick integration of an AI model will get them to where they need to be, but the truth is far more challenging.
In this post, Artium Senior Software Engineer David Wiszowaty will dive into what sets truly successful AI products apart from those that fail - and share insights on how to do it.
Q: What are the most common misconceptions about AI in products that you've encountered?
One of the biggest misconceptions I've encountered while building AI products, especially on the engineering side, is the oversimplification of what it takes to actually build AI features. Many people think that developing an AI application is as straightforward as integrating an LLM, writing a few prompts, and voilà—you have a fully functional AI product.
While this approach can get you partially the way there, AI development is much more complex. Creating a truly valuable and reliable AI product means diving deep into fine-tuning the model, making incremental changes, and sometimes even incorporating multiple AI agents that work together to deliver the desired outcomes. There's also the need for constant evaluation and improvement. AI doesn't always deliver reliable or accurate results right out of the box, and it requires a lot of trial and error to get it to a state where it can consistently provide value to end users.
For instance, when you're developing an AI model, there are many different aspects you need to consider: data preprocessing, model training, hyperparameter tuning, integration with existing systems, and continuous monitoring.
This misconception of AI as an easy solution can lead to unrealistic expectations and disappointment when the AI doesn’t immediately perform as hoped. It’s crucial to recognize that AI development is an ongoing process that involves learning, adjustment, and refinement.
Q: How does Artium’s practices play a role in building a successful AI product?
There are several pieces to this, but one of the core practices we focus on is testing—specifically Test-Driven Development (TDD). Since AI is inherently non-deterministic, meaning that the responses you get can be random and unpredictable. This can make testing seem tricky at first. In my current project we are relying heavily on testing how our AI functionality behaves. The way that we are approaching this is through a practice called "Continuous Alignment Testing" (CAT). This testing framework allows us to validate the AI behavior by providing it with numerous scenarios and assessing the AI responses for tone, accuracy, and efficacy. In the case of my current project, the AI model searches for people with specific skill sets and summarizes the results. We've written tests to ensure that when given a specific prompt, the AI returns reliable results and doesn’t hallucinate or make up information.
Another practice that we do at Artium that has been super valuable for developing successful AI features is Pair Programming. Since the responses we get from AI models are non-deterministic, there are numerous different situations we can be in. Being able to pair allows us to bounce ideas off each other in real time. We can then try out these scenarios and figure out together how to update the AI prompt to better handle every scenario. Pairing and even mobbing can increase and expand the number of different approaches to try and improve the AI feature.
Q: What key factors differentiate successful AI products from those that fail?
This is a great question and one that took me some time to think about. I believe one of the key differentiators is how you approach building AI components or features. A common pitfall is treating AI simply as a chatbot or an assistant that only responds to commands. While there is value in chatbots, it isn’t the most innovative AI implementation.
On the project I am currently working on, we’re approaching AI as more of a "teammate" than just an assistant. A teammate works alongside you to achieve a shared goal, not just reacting to commands, but proactively participating in the workflow. A great example of this is GitHub Copilot, which positions itself as a coding partner and “pairing buddy” rather than a simple assistant. Using Github Copilot, a developer will go about their regular, day to day workflow, and occasionally the Copilot will contribute and make suggestions where it thinks it makes sense. It’s not always perfect, but there are plenty of occasions where the Copilot provides valuable results that the developer wouldn’t have thought of entering into a chatbot interface. This kind of approach can lead to much more valuable and impactful AI implementations.
Q: What advice would you give to other engineers looking to improve the success rate of their AI projects?
The biggest piece of advice I’d offer is to think about testing very early on in the development process. It is very time-consuming and difficult to gauge how successful a prompt is in assuring it is responding appropriately. This becomes more apparent when making any sort of modification to the AI prompt. Any change to the prompt can cause the AI to behave very differently in some scenarios. Without automated tests and monitoring, it becomes very difficult to know if the prompt change actually broke any existing behavior.
Another piece to consider is to think about improving visibility into what is happening behind-the-scenes with these AI components. When developing complex AI features, it can all feel very magical and tough to understand. You tend to ask yourself, “Is it actually working as intended?” Logging and understanding what prompts are being sent and how the AI is processing them can help you identify where things might be going wrong and where adjustments are needed.
Finally, choose the right tools and libraries that offer good visibility into what’s happening under the hood. Avoid libraries that abstract away too much of the inner workings, as they make it harder to understand what’s happening and how to debug issues when they arise. Opt for lighter wrappers or open-source solutions where you can see and understand the interactions between components. In my current project, it was very helpful to experiment with different libraries since each of these libraries offer different pieces of AI functionality. We were able to notice that some AI libraries do better in some specific workflows, while not so great in other workflows. There were numerous libraries and frameworks that we experimented with, such as LangChain, CrewAI, LlamaIndex, and just using the Vertex AI SDK. Ultimately, we chose to use LlamaIndex because it made it easy to implement RAG (Retrieval-Augmented Generation) and build a multi-agent AI architecture. Additionally, it provided the right balance of abstraction and transparency, making it easier to understand what happens behind the scenes.
Many enterprise companies are jumping to create new products and features centered around the tech world’s hottest topic: Artificial Intelligence.
However, 80% of these initiatives reportedly fail. Many assume that a few lines of code or a quick integration of an AI model will get them to where they need to be, but the truth is far more challenging.
In this post, Artium Senior Software Engineer David Wiszowaty will dive into what sets truly successful AI products apart from those that fail - and share insights on how to do it.
Q: What are the most common misconceptions about AI in products that you've encountered?
One of the biggest misconceptions I've encountered while building AI products, especially on the engineering side, is the oversimplification of what it takes to actually build AI features. Many people think that developing an AI application is as straightforward as integrating an LLM, writing a few prompts, and voilà—you have a fully functional AI product.
While this approach can get you partially the way there, AI development is much more complex. Creating a truly valuable and reliable AI product means diving deep into fine-tuning the model, making incremental changes, and sometimes even incorporating multiple AI agents that work together to deliver the desired outcomes. There's also the need for constant evaluation and improvement. AI doesn't always deliver reliable or accurate results right out of the box, and it requires a lot of trial and error to get it to a state where it can consistently provide value to end users.
For instance, when you're developing an AI model, there are many different aspects you need to consider: data preprocessing, model training, hyperparameter tuning, integration with existing systems, and continuous monitoring.
This misconception of AI as an easy solution can lead to unrealistic expectations and disappointment when the AI doesn’t immediately perform as hoped. It’s crucial to recognize that AI development is an ongoing process that involves learning, adjustment, and refinement.
Q: How does Artium’s practices play a role in building a successful AI product?
There are several pieces to this, but one of the core practices we focus on is testing—specifically Test-Driven Development (TDD). Since AI is inherently non-deterministic, meaning that the responses you get can be random and unpredictable. This can make testing seem tricky at first. In my current project we are relying heavily on testing how our AI functionality behaves. The way that we are approaching this is through a practice called "Continuous Alignment Testing" (CAT). This testing framework allows us to validate the AI behavior by providing it with numerous scenarios and assessing the AI responses for tone, accuracy, and efficacy. In the case of my current project, the AI model searches for people with specific skill sets and summarizes the results. We've written tests to ensure that when given a specific prompt, the AI returns reliable results and doesn’t hallucinate or make up information.
Another practice that we do at Artium that has been super valuable for developing successful AI features is Pair Programming. Since the responses we get from AI models are non-deterministic, there are numerous different situations we can be in. Being able to pair allows us to bounce ideas off each other in real time. We can then try out these scenarios and figure out together how to update the AI prompt to better handle every scenario. Pairing and even mobbing can increase and expand the number of different approaches to try and improve the AI feature.
Q: What key factors differentiate successful AI products from those that fail?
This is a great question and one that took me some time to think about. I believe one of the key differentiators is how you approach building AI components or features. A common pitfall is treating AI simply as a chatbot or an assistant that only responds to commands. While there is value in chatbots, it isn’t the most innovative AI implementation.
On the project I am currently working on, we’re approaching AI as more of a "teammate" than just an assistant. A teammate works alongside you to achieve a shared goal, not just reacting to commands, but proactively participating in the workflow. A great example of this is GitHub Copilot, which positions itself as a coding partner and “pairing buddy” rather than a simple assistant. Using Github Copilot, a developer will go about their regular, day to day workflow, and occasionally the Copilot will contribute and make suggestions where it thinks it makes sense. It’s not always perfect, but there are plenty of occasions where the Copilot provides valuable results that the developer wouldn’t have thought of entering into a chatbot interface. This kind of approach can lead to much more valuable and impactful AI implementations.
Q: What advice would you give to other engineers looking to improve the success rate of their AI projects?
The biggest piece of advice I’d offer is to think about testing very early on in the development process. It is very time-consuming and difficult to gauge how successful a prompt is in assuring it is responding appropriately. This becomes more apparent when making any sort of modification to the AI prompt. Any change to the prompt can cause the AI to behave very differently in some scenarios. Without automated tests and monitoring, it becomes very difficult to know if the prompt change actually broke any existing behavior.
Another piece to consider is to think about improving visibility into what is happening behind-the-scenes with these AI components. When developing complex AI features, it can all feel very magical and tough to understand. You tend to ask yourself, “Is it actually working as intended?” Logging and understanding what prompts are being sent and how the AI is processing them can help you identify where things might be going wrong and where adjustments are needed.
Finally, choose the right tools and libraries that offer good visibility into what’s happening under the hood. Avoid libraries that abstract away too much of the inner workings, as they make it harder to understand what’s happening and how to debug issues when they arise. Opt for lighter wrappers or open-source solutions where you can see and understand the interactions between components. In my current project, it was very helpful to experiment with different libraries since each of these libraries offer different pieces of AI functionality. We were able to notice that some AI libraries do better in some specific workflows, while not so great in other workflows. There were numerous libraries and frameworks that we experimented with, such as LangChain, CrewAI, LlamaIndex, and just using the Vertex AI SDK. Ultimately, we chose to use LlamaIndex because it made it easy to implement RAG (Retrieval-Augmented Generation) and build a multi-agent AI architecture. Additionally, it provided the right balance of abstraction and transparency, making it easier to understand what happens behind the scenes.