Modern autonomous systems are often controlled by reinforcement learning (RL) components that have been trained with massive amounts of data to achieve complex objectives. One major drawback of reinforcement learning is that it is hard to guarantee that the controller satisfies a safety specification, even if the specification is known upfront.
Safe reinforcement learning is an active area of research that attempts to create reinforcement learning components that are guaranteed to be safe, according to a safety criteria or specification. One promising neurosymbolic technique, called shielding, consists of using a formal technique to craft a smaller controller from the safety specification. This second controller will inspect the interaction between the environment and the RL controller and intervene when the interaction is not guaranteed to be safe, performing a corrective action. The part of the shield that is in charge of computing the correction is called “the producer”.
In this internship we will explore how to generate producers that produce optimal corrective actions, for example with criteria like producing the least confusing correction for the underlying RL controller. We will explore techniques that combine machine learning and formal methods for producing correct and useful producers, and evaluate them in realistic scenarios.
Applications are invited to apply for an intern position at the IMDEA Software Institute, Madrid, Spain.
Selected candidates will work with César Sánchez and an international team of graduate students and researchers focusing on formal methods.
Candidates should have an excellent MSc or BSc degree (or be close to complete one) in computer science, mathematics, or a related discipline, with an interest in the above area, and a strong commitment to research. Proven top programming skills as well as ability to understand and develop algorithms are required. Good teamwork and communication skills, including excellent spoken and written English are also required.
The position is based in Madrid, Spain, where the IMDEA Software Institute is situated. The institute provides for travel expenses and an internationally competitive stipend. The working language at the IMDEA Software Institute is English.
The duration of the position will be 6 months.
Applicants interested in the position should submit their application at https://careers.software.imdea.org/ using reference code 2026-01-intern-neuroshielding. Review of applications will begin immediately and close on February 10th, 2026.
The recruitment process will comply with the IMDEA Software Institute’s OTM-R Policy (Open, Transparent and Merit-based Recruitment).
For inquiries about the position, please contact: