Irfan Ul Haq, PhD Student, IMDEA Software Institute
Interaction between variables, in a program, is a common phenomenon. Sometimes, programmers make a mistake and use variables erroneously resulting in an undesired interaction, e.g., storing Euros in a variable that should hold dollars. In this work, we propose a technique which uses Natural Language Processing (NLP) and Abstract Type Inference (ATI) to detect such undesired interactions. First, we use ATI to group variables which interact with each other (ATI clusters), and then use semantic similarity between names of the variables to validate these interactions.
We evaluate our approach using two open source projects, Exim Mail Server and grep. Although these programs have been extensively tested, and have been in deployment for years, we find a programming mistake in Exim from the top ATI cluster reported by our tool.