Announcement

Tuesday, April 08, 2014

Simple Code analysis with TC Tool - Analyzing code duplication

There are many code duplication tools available (opensource and commercial) like CPD - Copy Paste Detector or Simian. However CDD (Code Duplication Detector) in TCToolkit has some unique advantages.In a previous blog post I have explained why I wrote CDD
  • It uses excellent Pygments library for parsing the source code. Hence all the languages supported by Pygments are supported by CDD for duplication check.
  • It is reasonably Fast.  Last few weeks I spent some time optimizing it for speed.
    For example, on my Dell laptop it detected 164 duplicates in 1445 files of Tomcat source code in 45 seconds.
  • It can output duplications in multiple formats. 
    • In simple text format.
    • In HTML format with 'syntax highlighted' duplicate text fragments
    • It can also add Cpp/Java style '// code comments' in the original source code.
  • It create a matrix visualization of duplication to identify any duplication patterns. See the example below for Tomcat source (org/apache/coyote/http11) directory.


Here is the command line that I used for tomcat code analysis
cdd.py -l java -o javadups.htm
To see all the options available
cdd.py --help
There are few other simple code visualization tools in TCToolkit like TTC (Token Tag Cloud) or CCOM (Class Concurrence Matrix). I will explain their usage in later posts.

Give it a try and tell me your opinion.

No comments: