Date of Award
Doctor of Philosophy in Computer Science (CISD)
Graduate School of Computer and Information Sciences
Software is often large, complicated and expensive to build and maintain. Redundant
code can make these applications even more costly and difficult to maintain. Duplicated
code is often introduced into these systems for a variety of reasons. Some of which
include developer churn, deficient developer application comprehension and lack of
adherence to proper development practices.
Code redundancy has several adverse effects on a software application including an
increased size of the codebase and inconsistent developer changes due to elevated
program comprehension needs. A code clone is defined as multiple code fragments that
produce similar results when given the same input. There are generally four types of
clones that are recognized. They range from simple type-1 and 2 clones, to the more
complicated type-3 and 4 clones. Numerous clone detection mechanisms are able to
identify the simpler types of code clone candidates, but far fewer claim the ability to find
the more difficult type-3 clones. Before CCCD, MeCC and FCD were the only clone
detection techniques capable of finding type-4 clones. A drawback of MeCC is the
excessive time required to detect clones and the likely exploration of an unreasonably
large number of possible paths. FCD requires extensive amounts of random data and a
significant period of time in order to discover clones.
This dissertation presents a new process for discovering code clones known as Concolic
Code Clone Discovery (CCCD). This technique discovers code clone candidates based on
the functionality of the application, not its syntactical nature. This means that things like
naming conventions and comments in the source code have no effect on the proposed
clone detection process. CCCD finds clones by first performing concolic analysis on the
targeted source code. Concolic analysis combines concrete and symbolic execution in
order to traverse all possible paths of the targeted program. These paths are represented
by the generated concolic output. A diff tool is then used to determine if the concolic
output for a method is identical to the output produced for another method. Duplicated
output is indicative of a code clone.
CCCD was validated against several open source applications along with clones of all
four types as defined by previous research. The results demonstrate that CCCD was able
to detect all types of clone candidates with a high level of accuracy.
In the future, CCCD will be used to examine how software developers work with type-3
and type-4 clones. CCCD will also be applied to various areas of security research,
including intrusion detection mechanisms.
Daniel Edward Krutz. 2013. Code Clone Discovery Based on Concolic Analysis. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, Graduate School of Computer and Information Sciences. (203)