Data’s First Lesson – It’s Not What It Appears. - Western Growers Association Western Growers Association

May 14, 2025

Data’s First Lesson – It’s Not What It Appears.

The promise of artificial intelligence (AI) and big data is expanding, with stated implications ranging from making our lives a bit easier with automatic orders by our refrigerators for items running low, to the slightly more complex problem of solving climate change and world peace. As with most things, the truth of what AI will offer us (especially in the foreseeable future) largely lies somewhere in between a fully stocked fridge and fixing the universe.

Does AI and big data offer solutions to the Produce industry? Most definitely.

Are we there yet? Most certainly not.

Western Growers Association (WGA), in collaboration with Crème Global, launched GreenLink® in 2022 as an effort to accelerate data-driven solutions for food safety. GreenLink® is a cloud-based platform allowing fresh produce growers to share food safety data confidentially, while structuring their data in formats that make visualization and meaning clearer for the individual participants and for the industry as a whole. For any given participant, GreenLink® unlocks the value of their cumulative dataset, giving insight into years of data collection and investment. Collectively, participants’ data is aggregated and made anonymous to usher in additional opportunities for accelerated and augmented learnings. As an example, if one grower collects 500 tissue samples of leafy greens annually for pathogen testing, they instantly have access to an industry’s worth of sample volume in GreenLink® that now houses years of data equaling over a quarter of a million tissue samples and over six and a half million data points.

With increasing amounts of data to make decisions , new questions emerge and evolve regarding the most efficient means to manage food safety risk. Do we know enough from the data to recommend change? When and how do you know that you know enough? What if the observations change? Are the findings related to certain practices, areas, or inputs? Is there something else we can measure to give us heads up to risk? Can you quantify the risk reduction from our effort? Are the test methods really the same? Which method should I use? How was my sample actually collected, and are we sure it is representative? Was the system designed to “find” issues? Or, was it collected to provide evidence of compliance?

While we have a way to go in our effort to completely embrace data and to unlock the promise of finding unrealized solutions, we have already learned some invaluable data lessons.

The Lesson of Data Humility: Our first lesson is that of data humility – while we may have been collecting data for years, in most cases we are starting at ground zero in our data strategy by working on the structural foundation on which we will build grander data-driven solutions. Despite decades worth of data collection, oftentimes datasets are limited, siloed, biased, missing elements, or generally of low data quality. In these instances, no quantity of this quality of data can overcome the fractured foundation that it builds, and you must revisit to rebuild it (if available) or start fresh with an intentionally designed structure. High quality data is essential for training effective data models because it shapes the model’s accuracy, reliability, and fairness. Without clean and representative inputs, even the most advanced algorithms can produce inconsistent results. As an easier-to-understand metaphor, think of data sharing and data programs like a mirror with amazing lighting. When glancing in the mirror from afar, the data looks wonderful, but as you approach the mirror and linger gazing into the reflection, you begin to see all the data’s flaws, wrinkles, and limitations. As such, the first official lesson comes with the realization that what we thought we knew about our process is far more limited than we had hoped.

The Lesson of Being Honest: Our first lesson from data is, to be honest, somewhat demotivating – we never stop wanting the 100% on the test, and it can be deflating to recognize our data investment didn’t quite hit the mark. Despite this, the gift of the lesson is that with the requirement to slow down long enough to see the flaws in our data and collection system, we are fully equipped to restructure the questions we would like answered from our data. This restructuring happens not just individually, but with data-sharing programs, we can accelerate this restructuring across entire industries. With a good hypothesis of what we would like to know, we can set the foundation for data collection to ensure it can give us the best chance of teaching us what we want to know in the future. With a thoughtful construction of the data system (sampling, testing methodologies, data structure) we can change course and design a path to our intended learning destination. These foundational efforts will ultimately usher in brighter, bigger solutions augmented with AI tools that will offer even greater learnings and models.

The Lesson of Patience: Working with data in the field of food safety is a long game. Obtaining transformative insights from AI and these large datasets will not happen overnight. The programs and insights are the product of consistent, iterative work. In addition, there needs to be a willingness to refine and revisit what we have done in the past. It is easy to become inpatient or ask us when we will know the outcome of these programs, but progress, especially when building programs, builds slowly and compounds; it is not a straight line. First is having a clean dataset, then adding a few more variables, then designing specific trials. These refinements compound over time, ultimately enabling learning and proving hypotheses.

Data sharing isn’t just about the data you share – it’s about unearthing the limitations and gaps in existing data points. While it is always nice in retrospect to understand what happened in an adverse food safety event, our goal should be designing systems that prevent those adverse events from ever happening. If we think of most of the data we collect in our processes, manufacturing plants, and in our overall food supply chain, it is safe to say these datapoints most likely were not designed to identify the problem before it happened. Intention matters. We need a different system. Data and data-driven solutions will always be biased, intentionally or unintentionally, in how we collect the data and the initial questions we ask of it. Through data sharing and transparency amongst and across industries, we can design and standardize what we look for in food safety data. We can recognize that data collection for the sake of data collection is costing us far more than what we spend on the tests, and we can identify the activities of low value that should be deprioritized to free up resources that could be better spent elsewhere.

Interested in building intentional data collection systems?

Make your data work as hard as you do. There’s safety in numbers – join us.

Data’s First Lesson – It’s Not What It Appears.

Related Data Posts

California Unveils Roadmap Aimed to Streamline Food Safety and Water Regulations

FDA Seeks Input on Traceability Flexibilities

House Committee Advances Bill to Expand FDA Authority Over Unsafe Food Imports