Random Forest and Support Vector Machine models assessed environmental and economic factors across 15 Indian states to optimise crop recommendations. While initial 10-fold cross-validation yielded high accuracy, incorporating temporal data via time-series split and lag variables improved model adaptability and identified Random Forest as the preferred algorithm.
Optimising agricultural practices represents a continuing challenge, particularly in regions susceptible to environmental fluctuations and economic pressures. Addressing low productivity requires nuanced approaches beyond conventional methods, and increasingly, data-driven solutions are being explored. A new study investigates the potential of machine learning to refine crop selection, integrating both environmental and economic data to improve yields and profitability. Steven Sam and Silima Marshal D’Abreo, both from Brunel University London, detail their research in the article ‘Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection’, where they evaluate the performance of Random Forest and Support Vector Machine (SVM) models across fifteen Indian states, demonstrating the importance of incorporating temporal dependencies for robust predictive capability.
Machine Learning Optimises Crop Recommendations for Indian Agriculture
Recent research demonstrates the potential of machine learning to address persistent challenges in Indian agriculture, specifically low productivity and yield variability. Studies consistently indicate that data-driven approaches can optimise crop selection, leading to improved outcomes for farmers. A central focus is the development of systems that recommend optimal crops based on a combination of environmental and economic factors.
Several investigations utilise algorithms such as Random Forest and Support Vector Machines (SVM) to address crop recommendation. Random Forest is an ensemble learning method constructing a multitude of decision trees, while SVMs aim to find the optimal hyperplane to separate data into different categories. These models integrate data relating to environmental conditions – including rainfall, temperature, and soil characteristics – alongside economic considerations such as market prices and input costs, across multiple states and crops. Initial evaluations, employing 10-fold cross-validation – a technique dividing data into ten subsets, iteratively used for training and testing – report high accuracy rates, exceeding 99% for Random Forest and approaching 95% for SVM.
However, to mitigate the risk of overfitting – where a model performs well on training data but poorly on unseen data – and better reflect real-world agricultural dynamics, studies incorporate temporal dependencies into model development. Time-series split validation, which accounts for the sequential nature of agricultural data by training on past data and testing on future data, reduces performance compared to simple cross-validation. This highlights the importance of accurately modelling the temporal aspects of crop growth and yield.
Further refinement employs lag variables – incorporating past data points as predictors – to improve predictive accuracy while maintaining temporal order. For example, rainfall in the previous month might be a significant predictor of yield for a particular crop. This approach enhances performance relative to both cross-validation and standard time-series splitting, achieving Random Forest accuracy of 83.62% and SVM accuracy of 74.38%. This solidifies the value of incorporating historical data into predictive modelling.
Consequently, the Random Forest model, developed utilising the lag variable approach, emerges as the most effective algorithm for optimal crop recommendation in the Indian context. It offers a practical and adaptable solution for enhancing agricultural productivity by effectively balancing predictive accuracy with the ability to account for temporal dynamics, providing farmers with actionable insights to maximise yields and profitability.
Future research should focus on expanding the dataset to include a wider range of crops, regions, and socio-economic factors. Furthermore, exploring the integration of real-time data sources such as weather forecasts and market prices will further enhance the model’s predictive capabilities and provide farmers with even more valuable insights.
👉 More information
🗞 Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection
🧠 DOI: https://doi.org/10.48550/arXiv.2505.21201
