Heightened interest in farm data has arisen since commercialization of precision agriculture technology. Venture capitalists, agricultural service providers, farmers, and policy makers are all concerned with farm data. Farm data are prone to inherent problems and analysis shortcomings that ushers in the notion of “Big Data”. Farm data are known to be messy, especially from combine yield monitors, and analysts are concerned with the validity of data especially given that other people may have impacted data quality at various steps along the data path.
Analysts tasked with managing yield monitor data desire information on how that data have been handled before reaching their computer. Analysts question whether crop sensors were properly calibrated, what observations have already been deleted, and if any data points were relocated to adjust for flow delays.
Tools such as USDA ARS Yield Editor were developed to clean yield monitor data by flagging potentially erroneously measured observations and relocate data to more appropriate locations. Utilization of these tools increase the quality of yield monitor data, but only if proper processes are correctly performed. Documentation of changes have not been adequately implemented that are inseparable from the data. Uncertainty exists if correct processes were actually performed on data coming from farmers’ fields. Although discerning analysts request original yield monitor data files, these data often arrive at the analyst’s computer as delimited text or shape files. A data analyst may receive yield monitor data from a farmer or crop advisor but have no information on how that data were previously manipulated or by how many people. Data may have been subjected to rigorous data cleaning protocols or potentially modified with nefarious intentions. The prudent analyst has very little confidence that analyses would be reliable given prior data management uncertainties.
The Analyst’s Problem is analogous to the “whisper game” (or “telephone game”) that many people played in elementary school; where children line up and whisper a sentence that they were just given. The last child announces what they heard and compares to the original statement; rarely would these two statements match when playing the whisper game. Moral to the story, if the analyst does not know how the data were manipulated by potentially every person along the line (e.g., farm employee, farm operator, crop consultant, technical service provider, sales agronomist, others), then results based on analysis of that data are not likely trusted. Now that the Analyst’s Problem has been identified, solutions can be explored.
MORE BY TERRY GRIFFIN
Regardless of whether data originate from logged or CAN processes or manipulated on desktop or cloud-based computing systems, uncertainties regarding how data were managed remains. One possible solution to the Analyst’s Problem of uncertain data quality from prior data manipulation could be distributed ledger technology (DLT). Distributed ledger technology ensures data have not been inappropriately manipulated or at the very least documents what changes have been made by specific individuals.
Distributed ledger technology, aka “blockchain”, impacts on agriculture have been abuzz in the media and academia. Distributed ledgers are a way of producing consensus about the facts necessary for commerce to function. Ledgers are the basic transactional recording technology at the heart of all modern business. The coordination of distributed ledgers impacts the value proposition of Big Data especially with respect to traceability, trust, and data quality. In order for farm data communities to be operational in the long-run, data quality assurance is necessary. Most agriculturalists conceptually understand how DLT benefits supply chains on both sides of the farmgate, however many agriculturalists struggle with how DLT actually works.
Most blockchain conversations focus on merchandising and banking transactions, and tracking of agricultural inputs and production outputs especially for traceability regarding food safety. Applying DLT not to merchandising or agricultural products, i.e. bags of seed arriving on a farm or bushels of corn leaving the farmgate, but to its controversial co-product, yield monitor data may have near immediate effects on Big Data.
Distributed ledger technology allows tracking of who manipulated yield monitor data and how that person manipulated that data. More than one person may be tracked along the process that may begin as early as calibration of the yield monitor. Although determining if yield monitor was calibrated after the fact may be impossible, DLT could be applied to data collected in previous growing seasons assuming proprietary file formats from the yield monitor are available. At the very least the tracking would occur from the time the sensor measures grain flow (assuming internet connectivity) to the current data analysis. Regardless of whether the analyst uses USDA ARS Yield Editor or a commercial farm software tool, the need for distributed ledger type of tracking of who performed what manipulation to the data are needed in the agricultural industry.
The distributed ledger must be linked to the data being recorded at the yield monitor sensor such that these are inseparable. Preferably, the distributed ledger would exist before data recorded by the sensor so that information on calibration would be included. Sensor calibration, or lack thereof, has been a farm data issue since the commercialization of yield monitors. Improper, or lack of sufficiently frequent, calibration has caused analysts and farmers to question the validity of data and resulting decision recommendations. Knowledge of how and when yield monitors were calibrated enhances confidence of data analysis and decision making.
Distributed ledger technology can be applied to farm data within the farmgate and beyond. Specifically, a need exists to track how yield monitor data have been managed, manipulated, and cleaned including calibration of the yield monitor sensor. Concepts of applying DLT to yield monitor data are likely obvious to agriculturalists who have dealt with the Analyst’s Problem. Current limitations include lack of wireless connectivity in many locations where grain is harvested.
This blog post is a summary of our more in-depth article found on AgManager.info: “Distributed ledger technology applied to farm data: Tracking yield monitor data changes with Blockchain”.