Automatic Product Classification in International Trade: Machine Learning and Large Language Models
IDB Working Papers (Revise and Resubmit at Review of International Economics), 2024
Accurate product classification is crucial in international trade. In this study, we apply and assess several algorithms to automatically classify agricultural and food products based on text descriptions sourced from different public agencies, including customs authorities and the United States Department of Agriculture (USDA). We find that while traditional machine learning (ML) models tend to perform well within the dataset in which they are trained on, their precision drops dramatically when applied to external datasets. In contrast, large language models (LLMs) show a consistently strong performance across all datasets. The top performing LLMs —Claude 3.5 Sonnet and GPT 4— achieve accuracy rates of approximately 80% at classifying products into 6-digit Harmonized System (HS) categories and above 90% for HS 2-digit Chapters. Our analysis highlights the valuable role that artificial intelligence can play in facilitating product classification at scale and, more generally, in enhancing the categorization of unstructured data.
Recommended citation: Marra de Artiñano, I., Riottini Depetris, F., & Volpe Martincus, C. (2024). "Automatic Product Classification in International Trade: Machine Learning and Large Language Models." (No. 12962). Inter-American Development Bank.
Download here