AI/LLM

AI's Evolving Role in the Enterprise: Navigating the New Data Frontier 

By

cauri jaye

on •

Feb 20, 2024

This blog post is the first in a series explaining the new frontier of AI in enterprise. “AI changes everything” is a common phrase. These posts look at what these changes entail based on current trends and what we, at Artium, are seeing in the trenches as we build the platforms and products that underscore this new domain. 

As enterprises stand at the cusp of a technological revolution, the role of Artificial Intelligence (AI) in reshaping the landscape of business and innovation becomes increasingly pivotal. The journey of AI, mainly through its integration with Large Language Models (LLMs), signals a profound shift in how we harness the vast reservoirs of digital data to create intelligent systems capable of logic, reasoning, and learning. As we venture beyond 2023, the data crucial for training these digital brains undergoes a significant transformation.

The shift in data paradigms

Historically, the internet's publicly and commercially accessible digital data served as the cornerstone for training LLMs. However, the proliferation of AI-generated content has now saturated this data pool, signalling a critical shift: the need for a transition from reliance on pre-processed, human-curated data to other data sources. Synthetic data, generated to fill gaps in real-world data or enhance datasets for AI training, plays a pivotal role, especially in areas where privacy concerns or the rarity of data pose challenges.

Embracing synthetic data: bridging gaps in AI training

The emergence of synthetic data represents an important advancement in AI. Defined as artificially generated data that mimics real-world data, synthetic data serves as a crucial tool in situations where actual data is rare, expensive to collect, or privacy-sensitive. This innovative approach allows researchers and developers to simulate a wide range of scenarios for AI training without the limitations of scale or ethical concerns associated with real-world data collection.

Examples of Synthetic Data Use:
  • Computer Vision: Generating synthetic images to train visual recognition systems, helping them recognise objects in varied lighting and from different angles without the need to capture thousands of real photos.

  • Natural Language Processing (NLP): Creating artificial text data to improve language models, enabling them to understand and generate human-like text across diverse languages and dialects.

  • Healthcare: Producing synthetic patient records for medical research, ensuring privacy compliance while providing valuable data for disease prediction models.

Creating high-quality synthetic data is a sophisticated process that involves advanced algorithms, such as Generative Adversarial Networks (GANs), ensuring the generated data accurately reflects the complexities of real-world phenomena. 

The release of Sora, video generation from OpenAI, creates a whole new era for synthetic data. Much more than a simple creative tool for entertainment videos, Sora is a physics engine that can model the real world based on physics and potentially psychology. 

Just like LLMs displayed unexpected logical abilities, image diffusion models displayed some unexpected, emergent abilities: it developed an idea of 3D space from 2D images. It seems Sora has an emergent ability to model physics. The effects of this can be seen in some of the example videos from Sora where scaled physics are on display, like the movement of coffee in a cup moving differently from an ocean. 

Organisations, governments, and companies can leverage this modelling of the real world to generate experiences in a million ways without having to actually do them, leading to advanced learning with relatively minimal use of resources. This will change manufacturing, customer service, AI model training, product development, service design, organisational structures, drug development, film scripts, conferences, architecture… almost anything we can imagine.  

While synthetic data expands the horizons of AI training, its creation requires significant computational resources and expertise to ensure accuracy and relevance.

The limitations of synthetic data

Despite its benefits, we need more than synthetic data to fulfil the insatiable data needs of AI development. While it offers a valuable supplement, especially in areas where actual data is scarce or sensitive, the quest for comprehensive AI training necessitates more, demanding a more direct engagement with the real world. This is where sensor-driven data collection comes into play, offering a direct stream of real-world data that synthetic methods can't fully replicate. The intricacy of real-life scenarios, with their unpredictable variables and nuanced details, demands the richness of sensor-collected data to truly evolve AI's understanding of the world.

The dawn of sensor-driven AI

The Consumer Electronics Show (CES) 2024 showcased a future where everyday appliances, from the Samsung Anyplace Induction Cooktop to the BESPOKE 4-Door Flex™ Refrigerator with AI Family Hub™+, are equipped with sensors that collect real-time data, ushering in a new era of AI training​​​​​​. These developments highlight an emerging paradigm where AI can learn from an unfiltered reality, directly perceiving the world through sensors rather than through the lens of human consciousness.

As we embrace this shift, the regulatory and ethical considerations of collecting data via cameras and microphones emerge.

Humanising AI: beyond design to connection

The key to integrating AI into our daily lives does not lie solely in anthropomorphic design but in creating devices that resemble pets or humans, which our psychology naturally tends to humanise. This inclination makes it more acceptable for these entities to possess sensors, as seen in the emergence of robotic assistants and pet-like robots seamlessly blending into our environments. By building devices that evoke a sense of companionship, we facilitate a deeper acceptance and integration of sensor-equipped technologies into our lives, thus enabling AI to gather a richer array of data from our daily routines.

Learning vs coding: navigating the complexities of AI training

We must acknowledge the nuanced perspective that this form of data collection, while seemingly invasive, is essential for the training and development of AI. People train AI rather than explicitly programming it, mirroring the human learning process. Humans absorb information from both direct experience and secondary sources like books. The book learning gives us a lot of information about the world, but all pre-filtered through the interpretations of others. This is akin to an AI learning from internet data. However, most of our foundational learning comes from our senses as we interact with the real world.

Giving AI access to digital sensors represents its own version of the real-world experience, enabling it to learn and grow its understanding. Much like absorbing film and literature teaches it about humanity and human values, so sensors teach it about real life. While we, as humans, don't retain every minute detail of our experiences, the collected data shapes our understanding, abilities, and even our personalities. AI operates on a similar principle. The information it gathers from our daily lives isn't about surveilling every aspect but about understanding patterns and contexts to make a more intelligent AI.

This process, though it may initially seem 'creepy,' is akin to how we learn and grow. By framing AI's data collection in this light, we can better appreciate the necessity of such interactions for AI to truly enhance our lives.

What to do with all that data

In the landscape reshaped by AI, proprietary data emerges as a crucial asset, more valuable than ever in the quest for AI excellence. This shift opens up novel avenues for enterprises to capitalise on their unique data repositories. Enterprises can transform their proprietary data into a strategic asset, contributing to the collective knowledge base for training LLMs. This approach enhances the AI's learning curve and positions the enterprise as a leader in the AI domain.

Sharing of mostly anonymised data must be immediate, necessitating the creation of new, secure data pipelines accessible through APIs.  Navigating this new data frontier requires a balanced approach, where enterprise leaders couple innovation with ethical stewardship. Teams with expertise must guide themselves by respecting privacy and ensuring transparency when collecting, using, and sharing data.

Harnessing AI for innovation and opportunity

For enterprises, this evolution of AI offers a frontier brimming with opportunities. Beyond merely augmenting existing products, the potential exists to create new revenue streams by selling this newly-desired sensor data and developing innovative AI-driven services. This paradigm shift demands a reevaluation of business models, as AI not only optimises but also transforms how value is created and delivered.

As enterprises continue to explore the potential of AI, the role of data—proprietary, sensor-derived, and synthetic—becomes increasingly central. This comprehensive approach to data utilisation fuels the advancement of AI technologies and ensures that teams feel equipped to create systems to handle the complexities of the real world. The journey ahead promises a landscape where AI-driven innovation feels limitless, guided by ethical use and strategic integration of diverse data sources.