Data-Driven Strategies and Machine Learning Shaping the Future of Agriculture
Last week I attended the 2018 INFORMS Conference on Business Analytics & Operations Research in Baltimore, MD. Among the activities of the conference, several teams of researchers from universities and industries were competing in challenges to show how their work is influencing the world. I had the honor to be among the finalist teams for the 2018 Syngenta Crop Challenge in Analytics.
The Syngenta Crop Challenge in Analytics was established in 2015 with funding provided by prize winnings awarded to Syngenta in connection with its receipt of the 2015 Franz Edelman Award for Achievement in Operations Research and the Management Sciences. This year’s Challenge asked participants to develop a quantitative framework for predicting corn hybrids performance in new, untested locations. The basis to develop such a model consisted of a real-world dataset with soil and weather information from the previous 15 years and hybrid performance in 150,000 tests from the previous 10 years. There was also information on 20,000 genetic markers for each hybrid tested. The main challenge of such task arrives with the complex interactions between environment and genetics.
Five finalist teams, two from Brazil and the others from Colombia, Germany, and the U.S. presented their models and results. All models were based on modern machine learning methods developed to deal with complex datasets, rather than the traditional statistical methods developed specific for plant breeding analysis. Our team and the team from the U.S. used Deep Learning as the main model to deal with the interactions. The teams from Colombia and Germany used Random Forests and the other work from Brazil used a Bayesian Network. The winners were the team from CIAT (International Center for Tropical Agriculture) in Colombia, with the work: “Speeding up maize hybrids breeding schemes using machine learning”. This is an important demonstration of how scientific and technological developments in other areas can affect agriculture.
Another important fact to mention about the finalist teams is that most of them were composed of agronomists. Although data analysis may not be one of the priorities in most undergraduate courses in agricultural related fields, this subject has gained interest among students. The ability to understand the implications of the results of any method may be more important than understanding the model itself. For the machine running the model and making the predictions, numbers are just numbers, no matter what is their real life meaning. However, for fine-tuning the model, even the state-of-the-art methods still need some model definitions and hyperparameters to be inputted by the people conducting the analysis. At this point, understanding the whole problem can make a significant difference in prediction performance.
One of the points where all this intersects with precision agriculture is when we add another important factor in our model of crop response: management. It is not only the genetic interaction with the environment that affects yields, the interaction with management practices also plays major importance on yields. Every decision a farmer or crop consultant makes even before planting and until harvesting the crop will affect the final output. Data-driven strategies such as the ones developed for this challenge have a great potential to help the industry breed better seeds, consuming less time and resources. That is only the tip of the iceberg. Other data intensive farm management technologies, considering the whole production system and based mostly on on-farm-trials, can be used to fill the larger gap, which is to account for the management interaction. Once enough data is gathered and robust models are developed, we can expect larger increments in yields every year. This will play an essential role to answer the question of the challenge: How will we be able to grow enough food to meet world demand?