Thoughts of a Thinking Craftsman: 2013

Sunday, December 29, 2013

Iterator : powerful and under utilized pattern

Iterator is one of my favorite design pattern. I have used in almost all projects and in various ways. I have been thinking of writing a blog on 'Iterator design pattern'. Recently (on 4th Dec 2013) I did webinar on the Iterator pattern. The links are given below

Iterator – a powerful but under-appreciated pattern - Webinar on Techgig.
You can watch the video on youtube (video)
I have published the slides on Slideshare. You get the presentation here.

Tuesday, December 17, 2013

Machine gun programmers and sniper programmers

Based on my experience in working with various teams using different technologies and working on either projects or products, in maintenance or new development I have started classifying software developers into two categories.

Machine Gun Programmer
Sniper Programmers

Machine Gun Programmer

Typically when you give such programmer a problem to solve (e.g. fix a bug or add new feature). Typically he/she jump into debugger and randomly start changing some variables. Or he will start googling for answer and cut/paste the code from 'codeproject.com' or stackoverflow.com and see if that fixes the problem (mostly without understanding the code or the answer). Essentially he/she is firing lot of bullets on 'target' hoping that one will hit the target.

Many times he/she will spend hours working on some trivial problem because he has not understood the program flow and hence cannot analyze the root cause. So he tries lot of small small code changes. Somewhere one change hits and changes the programs behavior and in turn apparently fixes the bug. However mostly likely it will introduce another bug in the system. My friend Anirudha Raste calls it 'programming by coincidence'.

Such programmers usually end up fixing one bug and introducing two more.

Sniper Programmers

On other hand Sniper programmers behave differently when given a problem. He/She will analyze the program flow, understand root cause, may go to white board and draw the flow. You may see him staring somewhere while he is mentally analyzing the problem. If he googles the problem, he spend time understanding each answer and validating if it is really a possible answer for his problem. Then he may change few lines and that will fix the bug permanently.

Problem is Majority developers that I have interacted with fall into 'machine gun programmer' category.

Sniper programmers are few and far between. If you have some sniper programmers, make sure that these gems stay in your team. Try to propagate and encourage the 'sniper programmer' attitude.

Remember you don't need much training to use Machine Gun so practically anyone can become a 'machine gun programmer'. However, every army have separate 'sniper schools' and sniper training programs.

Becoming 'Sniper programmer' is difficult task. It requires constant learning and update. It is also most rewarding.

Is your company really identifying the 'sniper programmers' and educating/training them ? or just depending on lot of 'machine gun programmer's ? Answer to this question may determine the future of your company.

Wednesday, October 30, 2013

Simple framework to assess potential risks in a Software Project

If you are in a software project, how do you assess the potential risks for a given software project ? So far I have not seen any coherent way of assessing the possible risks. Usually problems are discovered at really late the project life cycle (i.e. just before release dates) and by that time it too late to take any corrective actions. So a common problem is how to detect possible risks as early in project life cycle as possible ?

However, how do you define the 'success' of a project ?

Project is delivered to customer.
Your company got the expected profit margin from the project
Customer accepted the delivery
Customer's end users are happy with delivery.
Number of bugs reported are and hence your warranty costs are low.

Ideally a 'successful' project should include all the above. However, many times you achieve few items out of this list. For example, Customer accepted the delivery and end users are happy with features but there are lot bugs reported and rework is high. Customer has request new features and to implement new features require lot of changes in code etc Hence your cost are high/profit margin is now low. How do you assess these kind of risks ?

Last few years, I have been working on various code analysis techniques (Check my open source projects SVNPlot and TCToolkit). Based on my experience I am convinced that analysis of code, design, version control history etc gives you pretty good idea about the success or failure of a project.

Recently I have created simple framework to assess the possible risks.

First we analyze the project in three ways

Code Vs Testing quadrant
Requirement Vs Testing quadrant
Design Vs Codequadrant

Map where your project falls in each case. Based on which quadrants the project is mapped, will tell you possible risks for your project.

I find that based on various project metrics, if I mentally map the project to these quadrants, I get a 'rough judgement' of kind problems project will have in future.

What do you think ?

Saturday, September 07, 2013

Recovering from corrupted Subversion repository

As a habit I store all my personal data (training ppts, various experimental, etc) in version control. Since I am doing this for a many years, I am using Subversion to store all this data. Recently I upgraded to Subversion 1.8. I tried to do a fresh checkout and realized that repository is corrupted. 'svnadmin verify' returned error and could not fix it. I could not do a fresh 'svn checkout' and I could not use TortoiseHg as client to Subversion. I was stuck. The error I was getting was 'E200002 : serialized hash missing terminator' error.

My first step was to Google for error message and see if I get any solution. The search results suggested running 'svnadmin verify'. It confirmed the error at revision 1266. Many others suggested taking backed-up repository dump and restored from the dump. I took backup of the repository and did not take a repository dump at the time of backup. The backup of the repository was also corrupt at the same revision. So this solution was not very useful for me.

My second step was to send a mail to Subversion users email list 'users@subversion.apache.org'. You can see the mail here. Stefan Sperling suggested 'fsfsverify' script. I tried it but it did not work for me.I also tried 'fsfsfixer' script. That also did not work.

The best clue I got from 'Philip Martin'. I was not sure what the error 'serialized hash missing terminator' means. I tried to read the documentation for FSFS format. Based on the checked the data from 'revision' files. It seemed to be corrected.

Philip explained the error as

"It means one of the repository files is corrupt. It could be a revision files in db/revs or it could be a revprop file in db/revprops. A serialized hash is a series of K/V pairs followed by END:"

The key was "it could be a revrop file db/revprops". So far I missed on checking the 'revprop' files. I checked the revprops file for revision 1266 and found that it is made of just Zeros. There was no content.

Now the problem was clear. The revprop files are corrupt, I did not have the backup of 'uncorrupted' revprop files. Hence only was to somehow recover the revprops. Typical revision prop file looks like this

K 10
svn:author
V 2
pm
K 8
svn:date
V 27
2013-09-05T18:00:22.881511Z
K 7
svn:log
V 1
m END

Typically it contains information about 'author', 'date' and svn commit message. It may also contains merge information. In my case, mergeinfo was not there. Since it was a personal repository only person committing code was me. Hence svn:author will always be me. Now its question of creating message and date. So decided to do take a simpler approach for date use the 1 minute after previous valid revisions date. And message as 'this revision is recovered from corruption' kind of message.

I wrote a python script, which takes repository path and revision number and does the following

Runs the 'svnadmin verify -r ' and checks 'serialized hash missing terminator error'.
If the error is reported, the script reads the revision properties of revision just before that (i.e. revno-1) and add One minute to the time stamp of this revision.The log message is changed 'recovered from corruption' message.
Now original corrupted revision property file is copied to a backup location and corrected revision property is written in its place.
The process repeats till get it 'valid revision'. At this point it stops.

After this I ran the 'svnadmin verify' on entire repository and confirmed that all revisions can be read. Then dumped the repository contents to dump file and reloaded the content into a new empty repository.

I checked the log and diff of the revision 1266 and few subsequent revisions. Things seem to be on track.

PS> The python script I wrote is very specific to my needs. However, if you need it, send me a mail.

Tuesday, June 18, 2013

Book Review : Beautiful Code – Leading Programmers Explain How They Think

As programmer I am always looking for improving myself. One of the ways to improve is to study ‘the masters’. This is a norm for artists, architects etc. A new painter studies how other past master painters done their work. Initially they mimic their style and later develop their own. The mathematical equations, science concepts, programming have their own beauty. For a software developer, a really well written piece of software has its own ‘elegance’. It is a ‘work of art’. It is ‘beautiful’. But it is hard to describe that beauty to someone who is not a programmer. So this is book for software developers to understand the beauty in software.

This book gives you examples from master programmers and what they think about their work or about other master programmers work. It’s a great way to gain insights on how master programmers think about a particular problem. This book has articles written by masters like Brian Kernighan (Inventor of C), Karl Fogel (lead developer of Subversion), Tim Bray (inventor of Web), Charles Petzold (Famous Windows programmer and book writer), Sanjay Ghemat (of Google), Yuihiro Matsumoto (inventor of Ruby) etc. It also has articles from diverse domains regular expressions, version control, language development (Ruby, Python), numerical programming, bioinformatics, Web and search, etc.

One of most interesting article is by Arun Mehta on how he developed the hardware and software so that Prof. Stephen Hawking can interact with the world. The spec given was ‘Prof. Hawking can only press one button’. This article explains in detail how they developed the actual specs from the one liner. He explains basic design models, input interface, simple typing, word prediction, scrolling/editing/searching/macros etc. This software was developed in VB. Imagine you have given this ‘one line’ spec, what you will do? How you will proceed? It’s fascinating to understand thoughts behind all these ideas and design decisions.

First article is from Kernighan about ‘regular expression’ matcher that Rob Pike wrote for book ‘Practice of Programming’. It is truly ‘Beautiful code’. Small, powerful, elegant, does its job well. I am really tempted to show you the code.

    /* match: search for regexp anywhere in text */

    int match(char *regexp, char *text)

    {

        if (regexp[0] == '^')

            return matchhere(regexp+1, text);

        do {    /* must look even if string is empty */

            if (matchhere(regexp, text))

                return 1;

        } while (*text++ != '\0');

        return 0;

    }

    /* matchhere: search for regexp at beginning of text */

    int matchhere(char *regexp, char *text)

    {

        if (regexp[0] == '\0')

            return 1;

        if (regexp[1] == '*')

            return matchstar(regexp[0], regexp+2, text);

        if (regexp[0] == '$' && regexp[1] == '\0')

            return *text == '\0';

        if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))

            return matchhere(regexp+1, text+1);

        return 0;

    }

    /* matchstar: search for c*regexp at beginning of text */

    int matchstar(int c, char *regexp, char *text)

    {

        do {    /* a * matches zero or more instances */

            if (matchhere(regexp, text))

                return 1;

        } while (*text != '\0' && (*text++ == c || c == '.'));

        return 0;

    }

Just three small functions handle following regular expression constructs

c              matches any literal character c
.               matches any single character
^             matches the beginning of the input string
$              matches the end of the input string
*             matches zero or more occurrences of the previous character

I am always fascinated by small, powerful code. In 30 lines, this is one of most powerful code that I have seen. Here is the online version of this article.

Personally I also like following articles,

Subversion’s delta editor By Karl Fogel.
It helped me in understanding how subversion works behind scene. It also helped in developing versioning/delta storage scheme for a project.
Framework for integrated Test : Beauty through Fragility by Michael Feathers
Here Feathers talks about design of FIT (Framework for Integrated Test) framework by Ward Cunningham. (NOTE : Ward Cunningham is inventor of Wiki). FIT framework is just 3 classes.
Distributed Programming with MapReduce by Jeffery Dean and Sanjay Ghemavat
This article explains the concepts and infrastructure ideas that drive the Google search. Hadoop project implements these concepts and brings it to open source world.
Linux Kernel Driver Model : The benefits of working together by Greg Kroah-Hartman
Linux operating systems runs on everything from your mobile phone (Android OS is a derivative of Linux), to desktop, to servers to supercomputers. The driver model has to support diverse hardware requirements and various memory scales.

This is a book where you go back every few months, read different articles again and gain new insights. Enjoy.

Here are some links about the book.

Wednesday, March 20, 2013

Software Performance Optimization - A Different Skill

For almost every project I worked on in last 18 years, required performance optimization. Now I have become somewhat of an expert in Performance Optimization in various domains. I have worked on optimizing performance in CAD/CAM algorithms, database queries, caching. I have considered alternative algorithms, alternative data structure usage, impact of page faults, impact of caching etc etc.

In every domain few things are different but some basics remain constant. First rule of optimization is "Don't depend on your gut feel about the location of the performance bottleneck". 99% of times Your gut feel is wrong. So you need to use tools to locate the performance bottleneck. Essentially the process boils down to

Identify appropriate tool to generate the performance data. Usually this will be a 'profiler'. But sometimes other tools are required (e.g. for analyzing database queries which are taking long time).
Generate the performance data using the tool.
Interpret the data and locate the performance bottleneck. This requires some practice (and guidance if available)
Study the bottleneck code and find out a way to eliminate bottleneck with least amount of code changes. It is important to ensure that code changes are minimum. Large amount of code change can result in new bugs.
Make code changes and test.
Generate the new performance data and ensure the bottleneck is fixed. If not, revert the changes.
If performance is improved keep the changes and commit it.
Analyze the performance data again for the next bottleneck.
Repeat the steps 3-8.

For most projects 5 to 10 times speed ups are possible. However, usually project teams find it hard to believe. Recently I worked with SigmaTEK Systems India team for improving the performance of their Tube Nesting product. Together we were able improve the performance of their Tube Nesting product by more than 5 times.

Sushil Deshpande says

It was a real pleasure to work with Nitin on several projects, especially related to performance development.

Nitin came on board at a typical situation, where the customer was unhappy about the speed of the algorithm, and there was lot of pressure to improve it significantly more than the current speed.

Nitin showed us how to systematically analyze code using simplest tools possible (emphasis was always on understanding, never too much on tools). His inputs and ideas on how to improve the performance, without having to compromise with the quality of the results, very extremely valuable. In addition to just code optimization using performance metrics, Nitin was very keen on evaluating the algorithm techniques as well, and provided us several alternatives right down to the core level, on alternative approaches to evaluate for performance improvement.

This experience has been a real eye opener for us, and although it sounded cliché, when Nitin mentioned the very first time, that he has been involved in several projects with optimization improvements of 5X are more, it was extremely satisfying to see that he guided us using his systematic methods and principles, to performance gains of 5X + in our project as well.

Thoughts of a Thinking Craftsman

Announcement