CCE Theses and Dissertations

Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


College of Computing and Engineering


Francisco Mitropoulos

Committee Member

Michael Laszlo

Committee Member

Sumitra Mukherjee


Code Completion, Machine Learning, Natural Language Processing, Neural Networks, Python Modules, Source Code Analysis


Contemporary software development with modern programming languages leverages Integrated Development Environments, smart text editors, and similar tooling with code completion capabilities to increase the efficiency of software developers. Recent code completion research has shown that the combination of natural language processing with recurrent neural networks configured with long short-term memory can improve the accuracy of code completion predictions over prior models. It is well known that the accuracy of predictive systems based on training data is correlated to the quality and the quantity of the training data. This dissertation demonstrates that by expanding the training data set to include more references to specific Python third-party modules, the quality of the predictions increase for those specific Python third-party modules without degrading the quality of predictions of the originally represented modules.