Finding duplicate code with CCFinder

Posted on August 5th, 2009 by David Luhman and tagged , .

If you're attempting to refactor a large code base, give CCFinder a try.

I checked out the following tools suggested by the Wikipedia entry on duplicate code :
- Simian
- CCFinder
- PMD Target

Simian was pretty easy to download, install and run. It seemed to do a pretty good job of listing duplicate code sections in the code. But it only gave a text list. I guess I would have saved the output and sliced and diced it a bit in Excel or something, but that would still leave much to be desired.

After checking out the screen shots of CCFinder, I decided to give it a try. It's free, and do I appreciate that, but the registration process was really pretty cumbersome. Its CAPTCHA function is by far the worst I've seen -- I failed it several times. There's also a password for the unzip file, and a license you have to install.

It's also unclear if you need SilverLight 2.x+, Python, .Net etc. I think the installer's been improving, but the whole experience needs some clarity.

I had some problems with my Java path (admittedly my problem), but I finally got the thing installed. I think it was worth the effort.

I was analyzing a PHP code base. Unfortunately, PHP wasn't explicitly supported. So I had to rename the files to .cpp (close enough), and remove the "preprocess" option from the analyzer. Then I directed it to my parent directory, and let it go.

You get a nice visual showing where blocks of duplicate code are. This page gives a good overview of how to navigate and discover things :
http://www.ccfinder.net/doc/10.2/en/tutorial-gemx.html

I found it most useful to look at the sorted Clone-set Table (biggest clones highest). The source code on the right was helpful, but I often went to a diff tool (ex. WinMerge) for further analysis.

I guess it's obvious, but CCFinder also found intra-file clones -- something you can't find with something like WinMerge.

Anyway, one thumb down for CCFinder's install experience, but two thumbs up for an otherwise nice, free tool.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options