Pages

Monday, April 22, 2013

TDD is not all about writing tests first

TDD's mantra:


In words:

  • Add a failing test
  • Write code to pass the test
  • Refactor

If you want to get a firm grip on this awesome methodology, read this book by Kent Beck.

However, TDD is not just about writing your tests first or to increase your code's test coverage. These are just the add-ons that you get for free when you use TDD. The mantra behind using TDD is to improve your design, to think from a consumer point of view who is going to use your components or APIs. These consumers can be anyone, it could be you, your team mates, other teams or general public.

Then, you may ask how can writing tests first allows you to better design your interfaces? Good question. Lets think about it in this way. Why we write software? One possible answer is to create some applications that does few useful things that wouldn't have been possible or too tedious to do manually. Its creates something with some intent that someone can use it. So two things that pop out: intent and use. I think that TDD allows you to get these two things right, the intent and use, while creating software. Again you may ask, still how writing a test helps you to get these two things right? Good question again.

When you are writing a test even without the code itself, the most important thing that comes to your mind is how am I going to use this component in my system or application. This makes you think about the interfaces, the parameters, the exceptions you want this method to throw. And there you design your API.

And then the other most important thing you think about what is the intent or the purpose of the code that you are about to write. What you want this piece of code to do? That is what makes you to write your assert statements.

Yes, it could be very difficult to digest this test-first approach initially if you haven't been practicing such methodology but once you do you will see the results and it value. You will be amazed to see how easy it becomes to design your components if they are easy to test.

I have been using TDD for more than a year now and I can definitely see the difference in the way I think and develop software. Its weird but TDD makes you to not to hate writing unit tests since they are the not burdensome-after-thoughts anymore.

Then, there are so many other good things that come for free when you write your tests first. It makes you confident about refactoring existing code. I am currently working on a system at Amazon that serves million of customers and allow them to play with their apps with almost zero downtime. How can you imagine refactoring such a service where a simple mistake can either lead to doing something unintended that breaks millions' of users Appstore experience or can breaks a functionality (or Appstore client) in its entirety? We want our tests to fail first in such cases. 

Then there is code coverage. You won't believe the results of running a code coverage tool on a codebase that is written using TDD approach. It does not give you a single opportunity to make the coverage better by looking at the results and then 'hack' the tests to increase the coverage since you already get an amazing high code coverage for free!

Give it a try, its worth it.

Tuesday, April 2, 2013

Few useful Git commands

I have been using Git at work for last couple of years now and I must say it has helped me a lot to be more productive and pushing out more and more features instead of either waiting for code reviews or waiting for the expensive branches (as in Perforce) to get created.

I won't go into details on how Git has helped me and why I love this tool but rather will post some very useful commands that helps to get around some very tricky situations. I found these commands from various posts while looking for solutions to these problems I encountered while at work. So yes, these are not 'my' solutions but they do work.


How to get a list of un-pushed commits in git?
Assuming your remote repository is named origin and you’re dealing with the master branch:
  
  $ git log master ^origin/master

This shows what commits are in master but not in origin/master (which is the remote branch).

This same syntax can be used to see the difference between two local branches, but will show cherry-picked commits as differences which can be confusing:
   
 $ git log master ^production

Delete last commit
To soft delete the commit before head. Alternatively you can refer to the SHA-1 of the hash you want to reset to. --soft option will delete the commit but it will leave all your changed files "Changes to be committed", as git status would put it.

$ git reset --soft HEAD~1

If you want to get rid of any changes to tracked files in the working tree since the commit before head use --hard instead.

Now if you already pushed and someone pulled which is usually my case, you can't use git reset. You can however do a git revert,
  
$ git revert HEAD

This will create a new commit that reverses everything introduced by the accidental commit.

Make an existing git branch track a remote branch?
Given a branch foo and a remote upstream, as of Git 1.7.0:

$ git branch --set-upstream foo upstream/foo

That will cause Git to make local branch foo track remote branch foo from upstream.

How to undo "git commit --amend"?
Move the current head so that it's pointing at the old commit. Leave the index intact for redoing the commit

$ git reset --soft HEAD@{1}

commit the current tree using the commit details of the previous HEAD commit. 

(Note that HEAD@{1} is pointing somewhere different from the previous command. It's now pointing at the erroneously amended commit.)

$ git commit -C HEAD@{1}

Monday, April 1, 2013

Log5j vs Log4j

Log5j is a modern facade over the most heavily used logging framework Log4j that not only provides a better interface for logging but also performs better. Even though Log5j has a lot of advantages, most of the projects (even at Amazon) uses Log4j. You will find logs like this all over the code base
    
   log.info("this is the string with value " + value + " and it does stink with factor" + stinkfactor);

All those concatenations, yuck!

I found the interface of Log5j pretty awesome, specially when it allows to you log like
    
   log.info("this is the string with value %s and it does not stink",value);

But out of curiosity, I wanted to know how good its performance is. I did a simple test and found some interesting results.

Experiment Setup
Unit tests with log4j configured with a FileAppender (similar to what every production environment uses). The log level was set to INFO (hence DEBUG was disabled). In each test, logs with different types of logging statements (with/without parameters) were emitted in a loop with a 2 million count. Randomly generated numbers were used as parameters in each loop just to avoid some optimizations that could be done by compiler if you have constant strings.

Use case Performance
(test finished in msec)
Notes
Log 4j Log 5j
Debug log with parameters
no isDebugEnabled()
using String.format() to format log statement
6184 183 String.format() was used only for log4j to simulate what log5j does internally
Log5j does not need String.format()
Debug log with parameters
no isDebugEnabled()
not using String.format()
1350 265 For log4j, logging statement like "log text" + param1 + " and other param" + random() was used.
Debug log without parameters
no isDebugEnabled()
52 142
Debug log with parameters
with isDebugEnabled() check for log4j
128 183 isDebugEnabled() not required for log5j
Info log with parameters
using String.format()
50192 45667 String.format() was used for log4j to simulate what log5j does internally
Log5j does not need String.format()
Info log with parameters
not using String.format()
34202 52207 -
Info log without parameters 31684 30958

Conclusion
  • Do not use String.format() with log4j, specially when that log level can be off in production (E.g. DEBUG is off in production). Its very expensive. 
  • Log5j performs almost the same as Log4j even without any isLOGLEVELEnabled() (E.g. isDebugEnabled()) checks. Hence, it results in clean code without redundant log enabled checks. 
  • Log5j provides a cleaner interface to log statements than Log4j (with a mess of appending variables within log text) with a small bearable overhead