Methods Towards Improving Safeness in Responses of a Spanish Suicide Information Chatbot

Pablo Ascorbe, María S. Campos, César Domínguez, Jónathan Heras, Magdalena Pérez-Trenado

Resumen


Chatbots hold great potential for providing valuable information in sensitive fields such as mental health. However, ensuring the reliability and safety of these systems is essential and represents a crucial first step before the deployment of those chatbots. In this paper, we report our work aimed at enhancing the safeness of a Spanish suicide information chatbot based on Retrieval Augmented Generation (RAG). Namely, after a multi-stage validation process, we identified and classified unsafe answers of the chatbot by applying red-teaming classification models and manual validation by experts. This process allowed us to uncover several sources of unsafe responses, and to implement targeted mitigation strategies. As a result, fewer than 1h user-generated questions and fewer than 5h of red-teaming questions were classified by experts as unsafe. Our proposed actions focused on improving the chatbot's key components — including the document database, prompt engineering, and the underlying large language model — and can be extrapolated to enhance the safety of similar RAG-based chatbots.

Texto completo:

PDF