Methods Towards Improving Safeness in Responses of a Spanish Suicide Information Chatbot
Resumen
Chatbots hold great potential for providing valuable information in sensitive fields such as mental health. However, ensuring the reliability and safety of these systems is essential and represents a crucial first step before the deployment of those chatbots. In this paper, we report our work aimed at enhancing the safeness of a Spanish suicide information chatbot based on Retrieval Augmented Generation (RAG). Namely, after a multi-stage validation process, we identified and classified unsafe answers of the chatbot by applying red-teaming classification models and manual validation by experts. This process allowed us to uncover several sources of unsafe responses, and to implement targeted mitigation strategies. As a result, fewer than 1h user-generated questions and fewer than 5h of red-teaming questions were classified by experts as unsafe. Our proposed actions focused on improving the chatbot's key components — including the document database, prompt engineering, and the underlying large language model — and can be extrapolated to enhance the safety of similar RAG-based chatbots.