Crafting AI-Powered Consumer Apps

Crafting AI-Powered Consumer Apps

By

cauri jaye

on •

May 13, 2023

Crafting AI-Powered Consumer Apps

Many apps based on generative AI have hit the market since November 2022. The lion's share of these apps do nothing more than place a thin layer of an interface on top of the raw generative AI platform. We need to create deeper apps to survive the current gold rush. 

The thin layer

Take, for example, Lensa. This app allows you to upload a few selfies and uses them to generate up to 125 variations of yourself. These include images of you as an astronaut, a wizard, and a sketch. They charge $7.99 for the service. You can do exactly the same thing for free using Mid-Journey directly, however, requiring a few more clicks and some prompt engineering. 

These types of apps have flooded the market and will continue to do so, however, they will exist like a flash in the pan, a repeat of the dot-com boom's rapid successes that quickly faded leaving behind the winners of that revolution. 

The deeper win

The apps that endure will be the ones that do more than simply give users direct access to generative AI. 

They will add value to the interaction through personalization and context. They will simplify the interaction to make it intuitive and human. They will add enriched data sets or access that the LLM-based AIs cannot natively do. 

These three layers: personal context, enhanced interaction, and enriched data form the backbone of the new applications.

Personal context

LLMs learn from a mass of public data. When interacting with them through prompts we effectively instruct them where to look in the massive data space. A good prompt helps refine contextual understanding and define areas to ignore. It’s a rapid reduction from all knowledge to specific knowledge. 

We can think of it as an exercise in the reductive scope of domain space, cultural context, personal psychology, and individual situation - each of these as the overlapping circles of a Venn diagram. The more a prompt zeros in on this, the better the prompt completion (or response) from the AI. 

Apps that define this scope through interaction with a user, either through direct interaction or through data, return more value. For example, an app that mixes demographic and psychographic data from google analytics, with profile and historical data from a user can more readily provide real value to that user through better prompt engineering. The user does not even need to be aware of any of this happening, increasing the magic of the experience. 

Enhanced interaction

Chat interfaces are boring. We are used to them and they provide the most basic way of interacting with an LLM, but we are soon going to get bored of this. Mid-Journey did something interesting by using commands in chat in Discord as an interface, but many found this clunky and unusable, and a large population found it too complex. 

The future of user experience and interaction design lies in making multi-modality as seamless as communication between two humans. When interacting we switch seamlessly between telling, showing, and writing.

These take many forms such as sharing or showing online content, drawing a picture on a napkin, pointing to an example, writing an SMS message, or standing up and demonstrating. Interacting with AI will head in this direction. 

With prompt engineering comes a new challenge as the user experience comes from the AI response, yet the prompts are often generated as part of the engineering process. Business considerations also play a part.

Prompt engineering requires input and collaboration between design, product, and engineering. We have had great success pairing an engineer with a CXO or designer to work on the prompts, achieving a balance between business, UX, and technology demands. 

Enriched data

The data that trains LLMs are mostly out of date, and with every retraining, by the time the process is complete, they are out of date again. However, the training is not about the data as it is about the ability to reach logical completions to prompts.

ChatGPT, although trained in data pre-September 2021 can still handle most completions accurately. However, when asked for more up-to-date data it will get it wrong or say it does not know. 

Adding more recent data that can be incorporated into the responses separates a generic generative AI app from a truly useful one that adds value. 

Another failing of LLMs lies in their ability to do math. This limits their use in science, engineering, and other specialized domains. However the integration of datasets and knowledge bases such as Wolfram Alpha allows them to incorporate up-to-date specialized knowledge and mathematical abilities. 

Multiple plug-ins and integrations allow the creation of consumer apps that take advantage of current knowledge to make more accurate predictions. The output of one service, forms the input of the next, with the enhancement of the LLM inserted into the transfer. This unleashes the power of well-designed consumer apps. 

How to build it

We sketched the interaction between the user, the system, and the models in Miro. The ease of movement and categorization of stickies made it easy for us to take into account the technical, business and UX needs when sketching the interactions. These sketches were then transferred to Mermaid so a more dynamic model could exist along with the codebase. We keep it up to date, along with an explanation of each element in the flow. 

This type of sketching helps in many ways, including:

  • understanding the flow of data through the system 

  • identifying which models (or types of models) to use

  • identifying points of enhancement to the data without user interaction to increase the magic of the UX

  • identifying where prompts are needed to talk to the model

  • identifying and estimating potential latency issues

  • provides a tangible asset  communication between software engineers, data scientists, prompt engineers, product owners, and designers 

This also has the benefit of transparency and documentation for the wider team. 

Prompts

A singular prompt can generate an ongoing interaction that leads to a pretty amazing response. For example, I created a single prompt that generates an entire backlog for a project, powered by answering five to ten questions. However, I created another that generates an estimated roadmap, one that creates a user journey map, and yet another to list the epics needed for a system.

The real power of an AI-powered application goes beyond the single prompt, using memory and different ways of “thinking” to provide real value.

Multiple prompts

Chat interactions require a back-and-forth between the user and the agent (AI). However, to truly turn it into the app, there need to be additional inputs into the thread of the conversation. These include but are not limited to:

  • Extraction of specific information from user inputs

  • Interpretations or moderation of user inputs before sending to the model

  • Interpretations or moderation of the agent outputs before sending them to the user

  • Enhancement of data in either direction based on additional context

  • Adjustment of language and tone before sending to the user

  • Summarisation of data before output to the user

Any of these interactions may require additional prompts to the model to generate new content that more directly meets the needs of the user and the system to create value. 

We have found that the current growing systems that help in this process, such as the very popular Langchain, need to provide more value to an experienced engineer. Using platforms like Google Colab and Livebook creates a flow that merges code, data, and prompts to manage the development of a conversation between the LLM, the system, and the user. 

On prompting and coding 

Using Livebook allows us to develop prompts and code side by side. This fundamental change to the approach of how to interact with a machine highlights three tiers: prompts to get language responses (using AI API), messages to get user responses (using chat interface), and code to interact between the two.

This method allows us to massage prompts quickly, find edge cases, adjust language and tone, assess the quality of feedback, and see the impact of new integrations quickly. As always, rapid feedback and iteration is the key to successful user experience and consistency. 

The next step will be to extract the Livebook content into a library that can be used by both Livebook and the consumer application, allowing for rapid development of the integrated code-data-prompt paradigm with easy release to master. John Wilger can elaborate on the details of this for interested engineers. 

The process of prompt refinement is not one that ends and will continue in parallel with the code and interface development and beyond. Creating interfaces for users other than coders to update prompts will be a fundamental part of developing AI-based systems.

Memory

Memory plays a crucial part in both personalizations for the user as well as learning across the system. I’ve written extensively on AI memory so I will not go into detail here. Reach out if you want to know more details. 

Chat memory

Chat memory (session memory) is the first level. Each input from the user needs to be remembered for the next response. When using chatGPT this functionality is built in, however, when engaging the GPT models through API outside of the playground, this functionality is managed. There are third-party systems that have entered the market that help with this, however, again, most do not provide enough value to a seasoned engineer and are better built in-house per project. This may quickly change as new solutions enter the market. 

User memory 

When dealing with consumer apps, having memory across sessions for users is paramount. The system needs to remember the user's needs, preferences, choices, and interactions. 

This can be done in different ways. One common emerging practice is to send the history of a chat to the LLM and ask it to summarise. This completion can then be stored and used as input in the next session. When these summaries of each session become too many, then they can all be sent to the model for a summary of summaries. This degrades the memory over time but is usually sufficient to fulfill the needs of the platform. At any point, the details of any area can be repulsed from the database for a fresh summary.

Another method is to pull out specific data for the interactions with a user and store just these items in a database. These can then be sent back at the start of a new session. 

New methods will emerge over the coming months. Stay tuned. 

System memory

For deeper memory that remembers data across an entire system and can expose new trends or be used as new training data to search data, the emerging trend is to use vector databases. 

Vector databases are specialized storage systems designed to efficiently store, manage, and query large collections of high-dimensional vectors. These databases are particularly useful for AI systems, as they enable them to have a form of memory by allowing quick retrieval of similar or relevant information.

In AI systems, data is often represented as vectors in a high-dimensional space, where each dimension corresponds to a specific feature. These systems, particularly those based on deep learning, use vector representations to capture complex patterns and relationships between data points.

By providing a scalable and efficient way to store and retrieve vectors, vector databases help AI systems have memory in the sense that they can quickly access and utilize previously learned information. This capability is critical for tasks like similarity search, pattern recognition, and recommendation systems, where finding the most similar items in a large dataset is crucial.

Multiple models

LLMs are the tip of the iceberg. They have facilitated the creation of other deep learning models, as well as created a human-acceptable interaction paradigm. These other models can generate images, videos, and audio, synthesize and enhance data, and more. We must know which model to use for each task so it can be assigned properly to get the desired result. This can be done either manually or automatically. Manually assigning a model requires predetermining the best model for the task. This decision is not just a technological one as cost, efficiency, and accuracy all play a role. 

It is very likely that open-source models will overtake walled gardens rapidly, much as open-source software effectively killed proprietary languages and frameworks - and for the same reasons - increased breadth, reduced cost, and rapid development. LLMs like Open Assistant and GPT4All open up the possibility of not having to pay for Open AI prices and getting better data over time. 

Automated selection of models is on the rise. Microsoft’s Jarvis (formerly huggingGPT) was the first popular example of using an LLM to determine the system’s need and then select the model to address that need automatically, without human selection or intervention. This hub-and-spoke “brain” model will likely become the norm quickly, removing engineers and business people from the model choice decision.

New data 

As stated at the beginning, all LLM data is out of date as soon as they hit the market due to the training time. This will change over time as advances are being made daily as to how to add data regularly without training new models. Also advances such as quantum computing chips will allow for rapid retraining of models, eventually in real-time. 

In the meantime, interim solutions have begun to emerge and the space exists to create new better ones for the more innovative among us. These include ideas like fine-tuning, plugins, and grounding. 

Finetuning is the process of delivering highly structured sample data to an LLM that it takes into account when it answers a request. Fine-tuning has a lot of disadvantages. Principally it does not really add your data to the responses. Rather it allows you to include thousands of examples of how you want the completion to be formatted and what kind of language you would like. This does not really augment the data.

Grounding is a process of intercepting communication between a user and an LLM to insert additional, often proprietary, information so it is taken into account in the completion. The most popular model right now is to use plugins for LLMs. These can consult server data stores or the web to analyze the content and incorporate it into the answers from the LLM. It’s important to note that this process does not add this data to the LLM, but rather allows it to process it essentially as part of the input from the prompt itself. This comes with severe limitations on the size of content and accuracy of integration, however, it is better than nothing. 

Sidebar: the art of engineering 

While the tools for creating software are precise, math, data storage, and code, the way we wield them is more like artistry. A tattoo artist is not just a machinist. A developer is not just an engineer. In this era we no longer program machines, but train them. We do not just talk to them in code but in natural language.

Prompt engineering requires knowledge of language, data science, nuance and metaphor, grammar, and of linear algebra, and ontology. It is the perfect mashup of art and science. 

The names Artium and Artisan are now more relevant than ever. 

What comes next?

More learning. More experimenting. Hardened apps. Most importantly we need to try new ideas, not be afraid of the pace of change, and balance our desire for perfection with our drive for iteration. We need to create systems that truly embrace modularity and allow AI to take over some of the burdens that would normally be on the code, on the user, or on a stakeholder/admin. 

Fault-tolerant systems with redundancies become more important. LLMs are trained on human data and include all of the biases and inaccuracies that this entails. They can hallucinate and lie, and they can be creative. This means that we need systems that can work to prevent these outcomes but accept that we cannot create systems that eliminate them. This means becoming innovative about how we design both the back and front end of our systems to account for these human-derived AI failings. 

Go forth and create.

- Cauri Jaye, Fractional CXO at Artium

*It is important to note the date of this document, as mentioned above, as the space is moving so very quickly. Some items here may rapidly become out of date, some will need revision and others will be solidified as fact. Please keep this in mind as we explore, expand and capitalize on this emerging space

Many apps based on generative AI have hit the market since November 2022. The lion's share of these apps do nothing more than place a thin layer of an interface on top of the raw generative AI platform. We need to create deeper apps to survive the current gold rush. 

The thin layer

Take, for example, Lensa. This app allows you to upload a few selfies and uses them to generate up to 125 variations of yourself. These include images of you as an astronaut, a wizard, and a sketch. They charge $7.99 for the service. You can do exactly the same thing for free using Mid-Journey directly, however, requiring a few more clicks and some prompt engineering. 

These types of apps have flooded the market and will continue to do so, however, they will exist like a flash in the pan, a repeat of the dot-com boom's rapid successes that quickly faded leaving behind the winners of that revolution. 

The deeper win

The apps that endure will be the ones that do more than simply give users direct access to generative AI. 

They will add value to the interaction through personalization and context. They will simplify the interaction to make it intuitive and human. They will add enriched data sets or access that the LLM-based AIs cannot natively do. 

These three layers: personal context, enhanced interaction, and enriched data form the backbone of the new applications.

Personal context

LLMs learn from a mass of public data. When interacting with them through prompts we effectively instruct them where to look in the massive data space. A good prompt helps refine contextual understanding and define areas to ignore. It’s a rapid reduction from all knowledge to specific knowledge. 

We can think of it as an exercise in the reductive scope of domain space, cultural context, personal psychology, and individual situation - each of these as the overlapping circles of a Venn diagram. The more a prompt zeros in on this, the better the prompt completion (or response) from the AI. 

Apps that define this scope through interaction with a user, either through direct interaction or through data, return more value. For example, an app that mixes demographic and psychographic data from google analytics, with profile and historical data from a user can more readily provide real value to that user through better prompt engineering. The user does not even need to be aware of any of this happening, increasing the magic of the experience. 

Enhanced interaction

Chat interfaces are boring. We are used to them and they provide the most basic way of interacting with an LLM, but we are soon going to get bored of this. Mid-Journey did something interesting by using commands in chat in Discord as an interface, but many found this clunky and unusable, and a large population found it too complex. 

The future of user experience and interaction design lies in making multi-modality as seamless as communication between two humans. When interacting we switch seamlessly between telling, showing, and writing.

These take many forms such as sharing or showing online content, drawing a picture on a napkin, pointing to an example, writing an SMS message, or standing up and demonstrating. Interacting with AI will head in this direction. 

With prompt engineering comes a new challenge as the user experience comes from the AI response, yet the prompts are often generated as part of the engineering process. Business considerations also play a part.

Prompt engineering requires input and collaboration between design, product, and engineering. We have had great success pairing an engineer with a CXO or designer to work on the prompts, achieving a balance between business, UX, and technology demands. 

Enriched data

The data that trains LLMs are mostly out of date, and with every retraining, by the time the process is complete, they are out of date again. However, the training is not about the data as it is about the ability to reach logical completions to prompts.

ChatGPT, although trained in data pre-September 2021 can still handle most completions accurately. However, when asked for more up-to-date data it will get it wrong or say it does not know. 

Adding more recent data that can be incorporated into the responses separates a generic generative AI app from a truly useful one that adds value. 

Another failing of LLMs lies in their ability to do math. This limits their use in science, engineering, and other specialized domains. However the integration of datasets and knowledge bases such as Wolfram Alpha allows them to incorporate up-to-date specialized knowledge and mathematical abilities. 

Multiple plug-ins and integrations allow the creation of consumer apps that take advantage of current knowledge to make more accurate predictions. The output of one service, forms the input of the next, with the enhancement of the LLM inserted into the transfer. This unleashes the power of well-designed consumer apps. 

How to build it

We sketched the interaction between the user, the system, and the models in Miro. The ease of movement and categorization of stickies made it easy for us to take into account the technical, business and UX needs when sketching the interactions. These sketches were then transferred to Mermaid so a more dynamic model could exist along with the codebase. We keep it up to date, along with an explanation of each element in the flow. 

This type of sketching helps in many ways, including:

  • understanding the flow of data through the system 

  • identifying which models (or types of models) to use

  • identifying points of enhancement to the data without user interaction to increase the magic of the UX

  • identifying where prompts are needed to talk to the model

  • identifying and estimating potential latency issues

  • provides a tangible asset  communication between software engineers, data scientists, prompt engineers, product owners, and designers 

This also has the benefit of transparency and documentation for the wider team. 

Prompts

A singular prompt can generate an ongoing interaction that leads to a pretty amazing response. For example, I created a single prompt that generates an entire backlog for a project, powered by answering five to ten questions. However, I created another that generates an estimated roadmap, one that creates a user journey map, and yet another to list the epics needed for a system.

The real power of an AI-powered application goes beyond the single prompt, using memory and different ways of “thinking” to provide real value.

Multiple prompts

Chat interactions require a back-and-forth between the user and the agent (AI). However, to truly turn it into the app, there need to be additional inputs into the thread of the conversation. These include but are not limited to:

  • Extraction of specific information from user inputs

  • Interpretations or moderation of user inputs before sending to the model

  • Interpretations or moderation of the agent outputs before sending them to the user

  • Enhancement of data in either direction based on additional context

  • Adjustment of language and tone before sending to the user

  • Summarisation of data before output to the user

Any of these interactions may require additional prompts to the model to generate new content that more directly meets the needs of the user and the system to create value. 

We have found that the current growing systems that help in this process, such as the very popular Langchain, need to provide more value to an experienced engineer. Using platforms like Google Colab and Livebook creates a flow that merges code, data, and prompts to manage the development of a conversation between the LLM, the system, and the user. 

On prompting and coding 

Using Livebook allows us to develop prompts and code side by side. This fundamental change to the approach of how to interact with a machine highlights three tiers: prompts to get language responses (using AI API), messages to get user responses (using chat interface), and code to interact between the two.

This method allows us to massage prompts quickly, find edge cases, adjust language and tone, assess the quality of feedback, and see the impact of new integrations quickly. As always, rapid feedback and iteration is the key to successful user experience and consistency. 

The next step will be to extract the Livebook content into a library that can be used by both Livebook and the consumer application, allowing for rapid development of the integrated code-data-prompt paradigm with easy release to master. John Wilger can elaborate on the details of this for interested engineers. 

The process of prompt refinement is not one that ends and will continue in parallel with the code and interface development and beyond. Creating interfaces for users other than coders to update prompts will be a fundamental part of developing AI-based systems.

Memory

Memory plays a crucial part in both personalizations for the user as well as learning across the system. I’ve written extensively on AI memory so I will not go into detail here. Reach out if you want to know more details. 

Chat memory

Chat memory (session memory) is the first level. Each input from the user needs to be remembered for the next response. When using chatGPT this functionality is built in, however, when engaging the GPT models through API outside of the playground, this functionality is managed. There are third-party systems that have entered the market that help with this, however, again, most do not provide enough value to a seasoned engineer and are better built in-house per project. This may quickly change as new solutions enter the market. 

User memory 

When dealing with consumer apps, having memory across sessions for users is paramount. The system needs to remember the user's needs, preferences, choices, and interactions. 

This can be done in different ways. One common emerging practice is to send the history of a chat to the LLM and ask it to summarise. This completion can then be stored and used as input in the next session. When these summaries of each session become too many, then they can all be sent to the model for a summary of summaries. This degrades the memory over time but is usually sufficient to fulfill the needs of the platform. At any point, the details of any area can be repulsed from the database for a fresh summary.

Another method is to pull out specific data for the interactions with a user and store just these items in a database. These can then be sent back at the start of a new session. 

New methods will emerge over the coming months. Stay tuned. 

System memory

For deeper memory that remembers data across an entire system and can expose new trends or be used as new training data to search data, the emerging trend is to use vector databases. 

Vector databases are specialized storage systems designed to efficiently store, manage, and query large collections of high-dimensional vectors. These databases are particularly useful for AI systems, as they enable them to have a form of memory by allowing quick retrieval of similar or relevant information.

In AI systems, data is often represented as vectors in a high-dimensional space, where each dimension corresponds to a specific feature. These systems, particularly those based on deep learning, use vector representations to capture complex patterns and relationships between data points.

By providing a scalable and efficient way to store and retrieve vectors, vector databases help AI systems have memory in the sense that they can quickly access and utilize previously learned information. This capability is critical for tasks like similarity search, pattern recognition, and recommendation systems, where finding the most similar items in a large dataset is crucial.

Multiple models

LLMs are the tip of the iceberg. They have facilitated the creation of other deep learning models, as well as created a human-acceptable interaction paradigm. These other models can generate images, videos, and audio, synthesize and enhance data, and more. We must know which model to use for each task so it can be assigned properly to get the desired result. This can be done either manually or automatically. Manually assigning a model requires predetermining the best model for the task. This decision is not just a technological one as cost, efficiency, and accuracy all play a role. 

It is very likely that open-source models will overtake walled gardens rapidly, much as open-source software effectively killed proprietary languages and frameworks - and for the same reasons - increased breadth, reduced cost, and rapid development. LLMs like Open Assistant and GPT4All open up the possibility of not having to pay for Open AI prices and getting better data over time. 

Automated selection of models is on the rise. Microsoft’s Jarvis (formerly huggingGPT) was the first popular example of using an LLM to determine the system’s need and then select the model to address that need automatically, without human selection or intervention. This hub-and-spoke “brain” model will likely become the norm quickly, removing engineers and business people from the model choice decision.

New data 

As stated at the beginning, all LLM data is out of date as soon as they hit the market due to the training time. This will change over time as advances are being made daily as to how to add data regularly without training new models. Also advances such as quantum computing chips will allow for rapid retraining of models, eventually in real-time. 

In the meantime, interim solutions have begun to emerge and the space exists to create new better ones for the more innovative among us. These include ideas like fine-tuning, plugins, and grounding. 

Finetuning is the process of delivering highly structured sample data to an LLM that it takes into account when it answers a request. Fine-tuning has a lot of disadvantages. Principally it does not really add your data to the responses. Rather it allows you to include thousands of examples of how you want the completion to be formatted and what kind of language you would like. This does not really augment the data.

Grounding is a process of intercepting communication between a user and an LLM to insert additional, often proprietary, information so it is taken into account in the completion. The most popular model right now is to use plugins for LLMs. These can consult server data stores or the web to analyze the content and incorporate it into the answers from the LLM. It’s important to note that this process does not add this data to the LLM, but rather allows it to process it essentially as part of the input from the prompt itself. This comes with severe limitations on the size of content and accuracy of integration, however, it is better than nothing. 

Sidebar: the art of engineering 

While the tools for creating software are precise, math, data storage, and code, the way we wield them is more like artistry. A tattoo artist is not just a machinist. A developer is not just an engineer. In this era we no longer program machines, but train them. We do not just talk to them in code but in natural language.

Prompt engineering requires knowledge of language, data science, nuance and metaphor, grammar, and of linear algebra, and ontology. It is the perfect mashup of art and science. 

The names Artium and Artisan are now more relevant than ever. 

What comes next?

More learning. More experimenting. Hardened apps. Most importantly we need to try new ideas, not be afraid of the pace of change, and balance our desire for perfection with our drive for iteration. We need to create systems that truly embrace modularity and allow AI to take over some of the burdens that would normally be on the code, on the user, or on a stakeholder/admin. 

Fault-tolerant systems with redundancies become more important. LLMs are trained on human data and include all of the biases and inaccuracies that this entails. They can hallucinate and lie, and they can be creative. This means that we need systems that can work to prevent these outcomes but accept that we cannot create systems that eliminate them. This means becoming innovative about how we design both the back and front end of our systems to account for these human-derived AI failings. 

Go forth and create.

- Cauri Jaye, Fractional CXO at Artium

*It is important to note the date of this document, as mentioned above, as the space is moving so very quickly. Some items here may rapidly become out of date, some will need revision and others will be solidified as fact. Please keep this in mind as we explore, expand and capitalize on this emerging space

Many apps based on generative AI have hit the market since November 2022. The lion's share of these apps do nothing more than place a thin layer of an interface on top of the raw generative AI platform. We need to create deeper apps to survive the current gold rush. 

The thin layer

Take, for example, Lensa. This app allows you to upload a few selfies and uses them to generate up to 125 variations of yourself. These include images of you as an astronaut, a wizard, and a sketch. They charge $7.99 for the service. You can do exactly the same thing for free using Mid-Journey directly, however, requiring a few more clicks and some prompt engineering. 

These types of apps have flooded the market and will continue to do so, however, they will exist like a flash in the pan, a repeat of the dot-com boom's rapid successes that quickly faded leaving behind the winners of that revolution. 

The deeper win

The apps that endure will be the ones that do more than simply give users direct access to generative AI. 

They will add value to the interaction through personalization and context. They will simplify the interaction to make it intuitive and human. They will add enriched data sets or access that the LLM-based AIs cannot natively do. 

These three layers: personal context, enhanced interaction, and enriched data form the backbone of the new applications.

Personal context

LLMs learn from a mass of public data. When interacting with them through prompts we effectively instruct them where to look in the massive data space. A good prompt helps refine contextual understanding and define areas to ignore. It’s a rapid reduction from all knowledge to specific knowledge. 

We can think of it as an exercise in the reductive scope of domain space, cultural context, personal psychology, and individual situation - each of these as the overlapping circles of a Venn diagram. The more a prompt zeros in on this, the better the prompt completion (or response) from the AI. 

Apps that define this scope through interaction with a user, either through direct interaction or through data, return more value. For example, an app that mixes demographic and psychographic data from google analytics, with profile and historical data from a user can more readily provide real value to that user through better prompt engineering. The user does not even need to be aware of any of this happening, increasing the magic of the experience. 

Enhanced interaction

Chat interfaces are boring. We are used to them and they provide the most basic way of interacting with an LLM, but we are soon going to get bored of this. Mid-Journey did something interesting by using commands in chat in Discord as an interface, but many found this clunky and unusable, and a large population found it too complex. 

The future of user experience and interaction design lies in making multi-modality as seamless as communication between two humans. When interacting we switch seamlessly between telling, showing, and writing.

These take many forms such as sharing or showing online content, drawing a picture on a napkin, pointing to an example, writing an SMS message, or standing up and demonstrating. Interacting with AI will head in this direction. 

With prompt engineering comes a new challenge as the user experience comes from the AI response, yet the prompts are often generated as part of the engineering process. Business considerations also play a part.

Prompt engineering requires input and collaboration between design, product, and engineering. We have had great success pairing an engineer with a CXO or designer to work on the prompts, achieving a balance between business, UX, and technology demands. 

Enriched data

The data that trains LLMs are mostly out of date, and with every retraining, by the time the process is complete, they are out of date again. However, the training is not about the data as it is about the ability to reach logical completions to prompts.

ChatGPT, although trained in data pre-September 2021 can still handle most completions accurately. However, when asked for more up-to-date data it will get it wrong or say it does not know. 

Adding more recent data that can be incorporated into the responses separates a generic generative AI app from a truly useful one that adds value. 

Another failing of LLMs lies in their ability to do math. This limits their use in science, engineering, and other specialized domains. However the integration of datasets and knowledge bases such as Wolfram Alpha allows them to incorporate up-to-date specialized knowledge and mathematical abilities. 

Multiple plug-ins and integrations allow the creation of consumer apps that take advantage of current knowledge to make more accurate predictions. The output of one service, forms the input of the next, with the enhancement of the LLM inserted into the transfer. This unleashes the power of well-designed consumer apps. 

How to build it

We sketched the interaction between the user, the system, and the models in Miro. The ease of movement and categorization of stickies made it easy for us to take into account the technical, business and UX needs when sketching the interactions. These sketches were then transferred to Mermaid so a more dynamic model could exist along with the codebase. We keep it up to date, along with an explanation of each element in the flow. 

This type of sketching helps in many ways, including:

  • understanding the flow of data through the system 

  • identifying which models (or types of models) to use

  • identifying points of enhancement to the data without user interaction to increase the magic of the UX

  • identifying where prompts are needed to talk to the model

  • identifying and estimating potential latency issues

  • provides a tangible asset  communication between software engineers, data scientists, prompt engineers, product owners, and designers 

This also has the benefit of transparency and documentation for the wider team. 

Prompts

A singular prompt can generate an ongoing interaction that leads to a pretty amazing response. For example, I created a single prompt that generates an entire backlog for a project, powered by answering five to ten questions. However, I created another that generates an estimated roadmap, one that creates a user journey map, and yet another to list the epics needed for a system.

The real power of an AI-powered application goes beyond the single prompt, using memory and different ways of “thinking” to provide real value.

Multiple prompts

Chat interactions require a back-and-forth between the user and the agent (AI). However, to truly turn it into the app, there need to be additional inputs into the thread of the conversation. These include but are not limited to:

  • Extraction of specific information from user inputs

  • Interpretations or moderation of user inputs before sending to the model

  • Interpretations or moderation of the agent outputs before sending them to the user

  • Enhancement of data in either direction based on additional context

  • Adjustment of language and tone before sending to the user

  • Summarisation of data before output to the user

Any of these interactions may require additional prompts to the model to generate new content that more directly meets the needs of the user and the system to create value. 

We have found that the current growing systems that help in this process, such as the very popular Langchain, need to provide more value to an experienced engineer. Using platforms like Google Colab and Livebook creates a flow that merges code, data, and prompts to manage the development of a conversation between the LLM, the system, and the user. 

On prompting and coding 

Using Livebook allows us to develop prompts and code side by side. This fundamental change to the approach of how to interact with a machine highlights three tiers: prompts to get language responses (using AI API), messages to get user responses (using chat interface), and code to interact between the two.

This method allows us to massage prompts quickly, find edge cases, adjust language and tone, assess the quality of feedback, and see the impact of new integrations quickly. As always, rapid feedback and iteration is the key to successful user experience and consistency. 

The next step will be to extract the Livebook content into a library that can be used by both Livebook and the consumer application, allowing for rapid development of the integrated code-data-prompt paradigm with easy release to master. John Wilger can elaborate on the details of this for interested engineers. 

The process of prompt refinement is not one that ends and will continue in parallel with the code and interface development and beyond. Creating interfaces for users other than coders to update prompts will be a fundamental part of developing AI-based systems.

Memory

Memory plays a crucial part in both personalizations for the user as well as learning across the system. I’ve written extensively on AI memory so I will not go into detail here. Reach out if you want to know more details. 

Chat memory

Chat memory (session memory) is the first level. Each input from the user needs to be remembered for the next response. When using chatGPT this functionality is built in, however, when engaging the GPT models through API outside of the playground, this functionality is managed. There are third-party systems that have entered the market that help with this, however, again, most do not provide enough value to a seasoned engineer and are better built in-house per project. This may quickly change as new solutions enter the market. 

User memory 

When dealing with consumer apps, having memory across sessions for users is paramount. The system needs to remember the user's needs, preferences, choices, and interactions. 

This can be done in different ways. One common emerging practice is to send the history of a chat to the LLM and ask it to summarise. This completion can then be stored and used as input in the next session. When these summaries of each session become too many, then they can all be sent to the model for a summary of summaries. This degrades the memory over time but is usually sufficient to fulfill the needs of the platform. At any point, the details of any area can be repulsed from the database for a fresh summary.

Another method is to pull out specific data for the interactions with a user and store just these items in a database. These can then be sent back at the start of a new session. 

New methods will emerge over the coming months. Stay tuned. 

System memory

For deeper memory that remembers data across an entire system and can expose new trends or be used as new training data to search data, the emerging trend is to use vector databases. 

Vector databases are specialized storage systems designed to efficiently store, manage, and query large collections of high-dimensional vectors. These databases are particularly useful for AI systems, as they enable them to have a form of memory by allowing quick retrieval of similar or relevant information.

In AI systems, data is often represented as vectors in a high-dimensional space, where each dimension corresponds to a specific feature. These systems, particularly those based on deep learning, use vector representations to capture complex patterns and relationships between data points.

By providing a scalable and efficient way to store and retrieve vectors, vector databases help AI systems have memory in the sense that they can quickly access and utilize previously learned information. This capability is critical for tasks like similarity search, pattern recognition, and recommendation systems, where finding the most similar items in a large dataset is crucial.

Multiple models

LLMs are the tip of the iceberg. They have facilitated the creation of other deep learning models, as well as created a human-acceptable interaction paradigm. These other models can generate images, videos, and audio, synthesize and enhance data, and more. We must know which model to use for each task so it can be assigned properly to get the desired result. This can be done either manually or automatically. Manually assigning a model requires predetermining the best model for the task. This decision is not just a technological one as cost, efficiency, and accuracy all play a role. 

It is very likely that open-source models will overtake walled gardens rapidly, much as open-source software effectively killed proprietary languages and frameworks - and for the same reasons - increased breadth, reduced cost, and rapid development. LLMs like Open Assistant and GPT4All open up the possibility of not having to pay for Open AI prices and getting better data over time. 

Automated selection of models is on the rise. Microsoft’s Jarvis (formerly huggingGPT) was the first popular example of using an LLM to determine the system’s need and then select the model to address that need automatically, without human selection or intervention. This hub-and-spoke “brain” model will likely become the norm quickly, removing engineers and business people from the model choice decision.

New data 

As stated at the beginning, all LLM data is out of date as soon as they hit the market due to the training time. This will change over time as advances are being made daily as to how to add data regularly without training new models. Also advances such as quantum computing chips will allow for rapid retraining of models, eventually in real-time. 

In the meantime, interim solutions have begun to emerge and the space exists to create new better ones for the more innovative among us. These include ideas like fine-tuning, plugins, and grounding. 

Finetuning is the process of delivering highly structured sample data to an LLM that it takes into account when it answers a request. Fine-tuning has a lot of disadvantages. Principally it does not really add your data to the responses. Rather it allows you to include thousands of examples of how you want the completion to be formatted and what kind of language you would like. This does not really augment the data.

Grounding is a process of intercepting communication between a user and an LLM to insert additional, often proprietary, information so it is taken into account in the completion. The most popular model right now is to use plugins for LLMs. These can consult server data stores or the web to analyze the content and incorporate it into the answers from the LLM. It’s important to note that this process does not add this data to the LLM, but rather allows it to process it essentially as part of the input from the prompt itself. This comes with severe limitations on the size of content and accuracy of integration, however, it is better than nothing. 

Sidebar: the art of engineering 

While the tools for creating software are precise, math, data storage, and code, the way we wield them is more like artistry. A tattoo artist is not just a machinist. A developer is not just an engineer. In this era we no longer program machines, but train them. We do not just talk to them in code but in natural language.

Prompt engineering requires knowledge of language, data science, nuance and metaphor, grammar, and of linear algebra, and ontology. It is the perfect mashup of art and science. 

The names Artium and Artisan are now more relevant than ever. 

What comes next?

More learning. More experimenting. Hardened apps. Most importantly we need to try new ideas, not be afraid of the pace of change, and balance our desire for perfection with our drive for iteration. We need to create systems that truly embrace modularity and allow AI to take over some of the burdens that would normally be on the code, on the user, or on a stakeholder/admin. 

Fault-tolerant systems with redundancies become more important. LLMs are trained on human data and include all of the biases and inaccuracies that this entails. They can hallucinate and lie, and they can be creative. This means that we need systems that can work to prevent these outcomes but accept that we cannot create systems that eliminate them. This means becoming innovative about how we design both the back and front end of our systems to account for these human-derived AI failings. 

Go forth and create.

- Cauri Jaye, Fractional CXO at Artium

*It is important to note the date of this document, as mentioned above, as the space is moving so very quickly. Some items here may rapidly become out of date, some will need revision and others will be solidified as fact. Please keep this in mind as we explore, expand and capitalize on this emerging space