Balancing Efficiency and Performance in NLP: A Cross-Comparison of Shallow Machine Learning and Large Language Models via AutoML
Resumen
This study critically examines the resource efficiency and performance of Shallow Machine Learning (SML) methods versus Large Language Models (LLMs) in text classification tasks by exploring the balance between accuracy and environmental sustainability. We introduce a novel optimization strategy that prioritizes computational efficiency and ecological impact alongside traditional performance metrics leveraging Automated Machine Learning (AutoML). Our analysis reveals that while the pipelines we developed did not surpass state-of-the-art (SOTA) models regarding raw performance, they offer a significantly reduced carbon footprint. We discovered SML optimal pipelines with competitive performance and up to 70 times less carbon emissions than hybrid or fully LLM pipelines, such as standard BERT and DistilBERT variants. Similarly, we obtain hybrid pipelines (using SML and LLMs) with between 20% and 50% reduced carbon emissions compared to fine-tuned alternatives and only a marginal decrease in performance. This research challenges the prevailing reliance on computationally intensive LLMs for NLP tasks and underscores the untapped potential of AutoML in sculpting the next wave of environmentally conscious AI models.