Computer-made data for AI training is now the norm in many industries.
We have all heard of platforms like ChatGPT that are trained on massive ‘real world’ data – but did you know that there is a growing trend for training AI using data that has itself been computer-made?
I don’t know about you, but that feels like being stuck in an infinite loop that gets more and more ‘not real’ as it goes along…
[If this concept is completely new to you, you might like to read this post first: https://bellastjohn.com/harnessing-synthetic-data-advancing-ai-training-while-navigating-challenges/]
What “Miracle on the Hudson” taught us about relying on computer-made data for AI training
In the world of aviation, Captain Chesley “Sully” Sullenberger became a household name for his miraculous emergency landing of US Airways Flight 1549 on the Hudson River in 2009, saving the lives of all 155 people on board. His exceptional skills and quick decision-making showcased the vital role of human judgment in critical situations. However, an often-overlooked aspect of this remarkable incident lies in the lessons it offers about the potential pitfalls of relying too heavily on computer-made data to train AI models.
In the aftermath of the “Miracle on the Hudson,” investigators conducted simulations to understand the events leading up to the water landing. While these simulations provided valuable insights into the sequence of events and the performance of the aircraft systems, they failed to accurately capture the human element—the split-second decisions, instincts, and experience that Sullenberger and his co-pilot, Jeffrey Skiles, brought to the table.
This critical omission underscores a fundamental challenge in the development and training of AI models: the danger of overlooking human reactions and response times when relying on computer-made data. Here are some important lessons we can draw from this incident:
1. The Complexity of Human Decision-Making: The quick thinking and sound judgment exhibited by Captain Sullenberger and his crew in an emergency situation like Flight 1549 cannot be replicated solely through computer-generated scenarios. Human decision-making involves intuition, experience, and a deep understanding of context—a complexity that is challenging to simulate accurately.
2. Real-World Factors and Uncertainty: Simulations are typically based on known variables and data, but real-life emergencies often involve unexpected factors and uncertainties. Human operators have the ability to adapt and make critical decisions in response to evolving situations, which is difficult to capture in a controlled synthetic environment.
3. The Role of Stress and Pressure: In high-stress situations, human performance can be significantly affected. Stress and pressure can lead to heightened awareness and focus, but they can also impair decision-making. Understanding how humans react under pressure is essential for designing AI systems that can complement human actions effectively.
4. The Need for Hybrid Models: The limitations of computer-made data in replicating human decision-making suggest the importance of hybrid models that integrate both synthetic and real-world data. Such models can better prepare AI systems for handling complex, high-stakes scenarios by accounting for the unpredictable nature of human responses.
5. Ethical Considerations: The reliance on computer-made data alone can have ethical implications, especially in fields like aviation, healthcare, and autonomous vehicles, where human lives are at stake. Ensuring that AI systems are trained to make ethical decisions in critical situations requires a nuanced understanding of human values and moral judgment.
Why use computer-made data?
I get that training AI models takes a huge amount of data, and it is obvious how using ‘fake data’ to test the security systems of financial institutions, for instance, might make sense.
…but it is not stopping there.
Examples of computer-made data for AI training
I asked ChatGPT to provide a list of 20, along with some concerns:
- “Healthcare Technology Companies:
- Concern: Privacy and security concerns when generating synthetic medical data to protect patient information.
- Financial Services and Banking Institutions:
- Concern: Ensuring that synthetic data for fraud detection and risk assessment doesn’t introduce biases or ethical issues.
- Autonomous Vehicle Manufacturers:
- Concern: The accuracy of synthetic data in simulating real-world driving scenarios and safety implications.
- Insurance Companies:
- Concern: The fairness and accuracy of synthetic data used for pricing, underwriting, and claims assessment.
- Retail and E-commerce Platforms:
- Concern: The potential for synthetic data to affect personalized recommendations and customer profiling.
- Manufacturing and Supply Chain Management Companies:
- Concern: The reliability of synthetic data for optimizing production processes and logistics.
- Agriculture Technology Firms:
- Concern: The effectiveness of synthetic data in modeling crop yield predictions and pest management.
- Energy and Utilities Providers:
- Concern: The accuracy of synthetic data for grid optimization and energy demand forecasting.
- Aerospace and Defense Contractors:
- Concern: The realism of synthetic data for simulating aircraft performance and military scenarios.
- Environmental Monitoring Organizations:
- Concern: The validity of synthetic data in climate modeling and predicting environmental changes.
- Education Technology Startups:
- Concern: Ensuring the relevance and fairness of synthetic data for personalized learning and educational assessments.
- Entertainment and Gaming Companies:
- Concern: The authenticity of synthetic data used in video game environments and virtual reality experiences.
- Human Resources and Talent Management Firms:
- Concern: Bias and fairness issues in synthetic data used for recruitment and workforce analytics.
- Real Estate and Property Management Companies:
- Concern: The reliability of synthetic data for property valuation and market analysis.
- Telecommunications Providers:
- Concern: Privacy considerations when generating synthetic data for network optimization and customer behavior analysis.
- Government Agencies (e.g., Census Bureau):
- Concern: Ensuring the accuracy and fairness of synthetic data for policy-making and resource allocation.
- Logistics and Delivery Services:
- Concern: The realism of synthetic data used to optimize delivery routes and predict package delivery times.
- Social Media Platforms:
- Concern: The potential impact of synthetic data on content recommendation algorithms and user experiences.
- Pharmaceutical and Biotech Companies:
- Concern: Ethical and regulatory issues when using synthetic data in drug discovery and clinical trials.
- Legal Services and LegalTech Startups:
- Concern: Ensuring the validity and fairness of synthetic data used for legal research and predictive analytics.”
It is also interesting that a few days ago when I asked a couple of friends who are in real estate whether they use AI, they both replied ‘no’ due to lack of perceived reliability of data (also referenced above).
Incorporating human factors into AI training is not a straightforward task, but it is essential for creating AI systems that can operate effectively and safely in the real world. While simulations and synthetic data have their place in training models, they should complement, not replace, the rich insights gained from real-world experience.
The “Miracle on the Hudson” serves as a powerful reminder that AI, no matter how advanced, cannot entirely replace the judgment, instincts, and adaptability of humans in high-stakes situations. As we continue to develop AI systems for various applications, it is crucial that we remain cognizant of the limitations of computer-made data and strive for a more holistic approach that honours the irreplaceable role of human expertise and experience in complex decision-making processes.
~ Bella