There are many code duplication tools available (opensource and commercial) like CPD - Copy Paste Detector or Simian. However CDD (Code Duplication Detector) in TCToolkit has some unique advantages.In a previous blog post I have explained why I wrote CDD
- It uses excellent Pygments library for parsing the source code. Hence all the languages supported by Pygments are supported by CDD for duplication check.
- It is reasonably Fast. Last few weeks I spent some time optimizing it for speed.
For example, on my Dell laptop it detected 164 duplicates in 1445 files of Tomcat source code in 45 seconds. - It can output duplications in multiple formats.
- In simple text format.
- In HTML format with 'syntax highlighted' duplicate text fragments
- It can also add Cpp/Java style '// code comments' in the original source code.
- It create a matrix visualization of duplication to identify any duplication patterns. See the example below for Tomcat source (org/apache/coyote/http11) directory.
Here is the command line that I used for tomcat code analysis
cdd.py -l java -o javadups.htmTo see all the options available
cdd.py --help
There are few other simple code visualization tools in TCToolkit like TTC (Token Tag Cloud) or CCOM (Class Concurrence Matrix). I will explain their usage in later posts.
Give it a try and tell me your opinion.
Give it a try and tell me your opinion.
No comments:
Post a Comment