Sunday, August 24, 2014

Classics of Computer Science

Some times you hear about quotes from Edsger Dijkstra like 'goto considered harmful'. However, you rarely find the details of actual arguments. So I have decided to collect such gems which I consider classics, which someway helped me understand new concept and references/links to original papers/books.

I am planning to keep on updating these list.

Papers/Articles of Edsger W. Dijkstra:

  1. Humble programmer 
  2. Goto Considered Harmful :  Transcriptscanned PDF of original paper 
    This paper introduced concept of 'structured programming'

Papers/Article of David Parnas:

  1. On the criteria to be used in decomposing systems into modules,
    Written in Year 1971, this paper introduced basic concepts of object oriented design (especially concept of encapsulation). I find that programmer still confused the concept of 'encapsulation' as 'hiding the data' rather than 'hiding the change'

Papers/Articles from Google

  1. MapReduce: Simplified Data Processing on Large Clusters
    Paper that triggered big data processing architecture revolution and triggered the opensource Apache Hadoop project.
  2. Detecting influenza epidemics using search engine query data
    Demonstrated how big data analytics can be used to solve some existing problems in entirely different way.
    "Here we present a method of analyzing large numbers of Google search queries to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day."
Please add your comments, suggestions for inclusion in this list.

Sunday, August 17, 2014

How increase the pain your customer - Banking way

Last two days I was trying to pay for stamp duty and registration for a new flat that i am trying to purchase.

  1. First builder went to Bank of Maharastra and got a challan for stamp duty and registration fees.
  2. He gave the challan to me and asked me to pay it.
  3. Tried to pay it online, but realized that to pay online , I need an account in SBI, Bank of Maharashtra, Bank of India or IDBI Bank. Unfortunately I don't have account in those banks. 
  4. So now I go to Bank of Maharashtra branch.
  5. Bank of Maharashtra manager suggests me to pay the amount by CASH. All government policies and banks are trying to reduce the cash transactions. Here for government tax payment we are going 'reverse'. 
  6. When I asked why, Manager explained me that cheque clearing takes 4 to 5 days even for local cheque. It seems RBI has a new process for clearing. Now there is NO local clearing in Pune. The cheque is scanned and sent to RBI for processing. 
    1. So I pay by cheque on Friday. 
    2. Cheque will get scanned on Friday evening.
    3. Saturday will go to RBI clearing.
    4. Monday clearing may happen.
    5. Tuesday the Bank of Maharashtra will get the details it is cleared or not. It may happen that other bank will ask for some time. In this case, it will get delayed
Manual clearing took 2 days while computerized clearing takes 5 day. Amazing !!!!. Instead of making life easier RBI is making more difficult to average banking customer.

I think much simpler process is possible

Sunday, August 10, 2014

Understanding Requirement Specs - Canteen Chapati Way

Few days back I went to SEZ tower 3 canteen for Lunch. I usually take full lunch. Now the full lunch menu is as specified in attached image (lunch_menu.jpg). (3 chapati, 1 bowl of rice etc). The chapati's I received are in the next image. (The size of chapati is around 3-3.5 inches diameter). The canteen guy explained to me that he has to give only 3 chapati's as per Menu however he is giving me ONE extra chapati. Do you see the problem ?

Its the size of 'chapati'. The chapati is small. The "requirement spec" (Lunch Menu) says 3 chapatis but it does not specify the size of Chapati. So provider can take advantage of it and serve really small chapati's and still fulfill the requirement as 'specified'. One option customer has is to provide the requirement spec is excruciating details (e.g. specify minimum size of chapati, minimum size of bowl, size of spoon for pickle etc etc). Pretty soon it will very painful to specify and it will be an "over constrained" problem. With so many restrictions, the canteen contractor has no leeway to improve or experiment. End result will be that I (i.e. customer) will not be happy with the service I get.

This is a common dilemma of requirement specs. Somewhere you have to trust the "common sense" of the customer and service provider. If the service provide don't have that "common sense" then as customer you will be forced to change the provider.

Remember we (software developers) are the 'service provider' for our customers. And our customers are also going to expect "common sense" from us.

Tuesday, April 29, 2014

My code review checklist

I am not a fan of checklists (especially for code reviews). Code review checklists start small and then slowly become really large and unwieldy. After sometime checklist becomes a bottleneck and instead of improving effectiveness of your process, these lengthy checklists start reducing the effectiveness. 

However, there are situations where I used checklists and they worked very well. For example, a Customer Release checklist. There are many small small things that you need to do before sending the new release to customer. Its easy to miss few critical steps. A release checklist has always worked very well.

I was not sure why in typical organization sometimes checklists did not work well (for example, in cases like code review) while sometimes it really worked. What exactly is the difference ?

Sometime back I read Atul Gawande's book 'Checklist Manifesto'. It triggered my interest in Checklists again. As first step I extracted a Code Review checklist from my code review training content. I have used this 'mental' checklist for a many-many years. It has worked well for me even with different programming languages (C/C++, Java, Python, C#, Javascript) and technologies. 

Here is my code review checklist.

PS :  Based on my experiences, information from Atul Gawande's book and from information internet, I have now prepared a 4 hour hands-on session on creating and improving the checklists. Contact me if you are interested.

Tuesday, April 08, 2014

Simple Code analysis with TC Tool - Analyzing code duplication

There are many code duplication tools available (opensource and commercial) like CPD - Copy Paste Detector or Simian. However CDD (Code Duplication Detector) in TCToolkit has some unique advantages.In a previous blog post I have explained why I wrote CDD
  • It uses excellent Pygments library for parsing the source code. Hence all the languages supported by Pygments are supported by CDD for duplication check.
  • It is reasonably Fast.  Last few weeks I spent some time optimizing it for speed.
    For example, on my Dell laptop it detected 164 duplicates in 1445 files of Tomcat source code in 45 seconds.
  • It can output duplications in multiple formats. 
    • In simple text format.
    • In HTML format with 'syntax highlighted' duplicate text fragments
    • It can also add Cpp/Java style '// code comments' in the original source code.
  • It create a matrix visualization of duplication to identify any duplication patterns. See the example below for Tomcat source (org/apache/coyote/http11) directory.

Here is the command line that I used for tomcat code analysis -l java -o javadups.htm
To see all the options available --help
There are few other simple code visualization tools in TCToolkit like TTC (Token Tag Cloud) or CCOM (Class Concurrence Matrix). I will explain their usage in later posts.

Give it a try and tell me your opinion.

Saturday, March 22, 2014

TCToolkit Update (Version 0.6.x)

When I consulted to companies on improving their source code (for refactoring it, improving the performance, detecting the design bottlenecks, detecting problematic files etc), I needed a way to quickly analyze a code base. However, there were not many tools available which gave me a quick insight on code. Commercial tools like Coverity, KlocWorks, Lattix etc are expensive. Because i could use it, I had to convince my client to 'license' it and that was difficult. Hence about 2 years back I wrote few python scripts to quickly help me analyze a codebase. Later I open sourced these python scripts a 'TCToolkit'.  

Recently I have done significant refactoring and updates to these scripts and also added some new scripts. Also I have moved the TCToolkit code to Bitbucket. ( ). 

Important updates are listed below

  1. Improved the performance of CDD (Code Duplication Detector). On my Dell laptop, subversion C code base (around 450 files) can now be analyzed for duplication in about 90 seconds.
  2. Now I use d3js library for generating the visualizations. Token tag cloud (TCC) now uses d3js for generating the tag cloud. CDD uses d3js for displaying the 'duplication matrix'.
  3. A new script 'CCOM' (Class Co-occurrence matrix) is added. This script analyzes the code base and finds out which classes are used together. It displays this information in matrix form.

    For example, class A has class B as member variable, or member function of class A uses class B as parameter then class A and B are treated as occuring. If a function takes two parameters objects class B and class C, then class B and C are treated as 'co-occurring'.
    If classes are co-occurring, then chances are there is some dependency between their functionality and hence changes in one MAY impact other.
  4. : This script generates a treemap visualization from the excellent freeware code metrics tool SourceMonitor. It also uses d3js for displaying the treemap.
 Give it a try on on your code base and see what kind of insights you get about your project.

Tuesday, February 25, 2014

Brian W. Kernighan on debugging

Just discovered this quote of Brian W. Kernighan on Debugging.
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it
If you are not sure what is means, here is a stackoverflow discussion about it.

And an article explaining the quote by Alfred Thomson

It takes some time to understand this quote. Its a kind of zen Kōan