Clean Code

You I recently read a book by uncle Bob, which is a must read for every developer. It’s called Clean Code and it completely revolutionised the way I think about code. In particular when coding I am now much more careful about meaningful names, numbers of parameters to pass to a function, the number of lines of code per function, use of comments, error handling and tests.

Why should code be clean?

Before we get to that I think we should start by asking the question what is code? In reality code is just documentation. Yep, that’s all we do, write documentation all day long. More than anything code is the best and most unambiguous specification document of the solution.

We have all been in this situation: It’s the start of a new project and your sprint velocity is really high, you are completing a lot of features per sprint … but then all of a sudden everything slows down. This normally happens because before you write code you need to read the written code. If the code is in a bad state, it is going to take you a lot of time to read the code.

productivity

To be honest this is something you can’t avoid. You will always need to read code to write new code. The only thing you can do is make the experience of reading code better for everyone.

What is good Code?

Let us start with an easier question. What is bad code? We can all notice bad code when we see it. Basically its code which isn’t readable. Then, by definition, good code is readable code.

In good code you can read through the code, starting from a high level and going down to the next level. Each level should introduce the next layer of abstraction. When a developer is reading the code he should say: “Yes, this is obvious … yes this is also obvious”.

levelsOfAbstraction

In the above diagram we see the classical 3 tier architecture however notice that the higher you go the more high level the modules are. The user interface does not know that there is a reporting database, or that for some reason, the photos are not saved with the user data. The user interface just calls some classes from the user and reporting module and displays the data. Also within the actual layers you should start at a high level and to the lower level. Thus the public methods will very high level and the private classes will be at a lower level. The same thing should happen even in the classes. The public methods are at a high level while the private methods are at a lower level.

Let’s Code

Sam had to write some code to get an employee record:

public Employee getEmployeeData(string email, out string error)
{
  // check email
  if (!email.Contains("@"))
  {
    error = "email is not valid";
    return null;
  }

  // get user by email
  var employee = Employees.Where(x => x.Email == email);

  // check if user exists
  if(employee == null)
  {
    error = "Employee with email: " + email + " was not found";
    return null;
  }
  else if(employee.Count() > 1)
  {
    error = "More than one employee found with email";
    return null;
  }
  else
  {
    error = "";
    return employee.First();
  }
}

Sam’s code isn’t too bad. However it does have some potential flaws. First of all the method is too long. Its going to take some time to read through it. I also see an example of DRY (don’t repeat yourself). The comments are a classical example. Why did Sam think that writing “get user by email” is more clear then a lambda expression? To me the lambda expression was far more descriptive and helped me understand more what is going on. There is a wrong choice of names. The variable name “employee” is too generic

Method Signature

This is where Tom comes along and sees this code. Obeying the Boy Scout Rule he starts to refactor this code. Let us start by the method name: “GetEmployeeData”. The word “data” seems to be redundant. Everything is data, so we can eliminate that word. The next thing we notice is that method name is a lie. We are not just getting employee data, we are also validating the e-mail. A more appropriate name would be “ValidateEmailAndGetEmployee”.

The next thing is the number of parameters. We should think about who is going to be calling this method. Given that this method is getting the employee by email I would expect to pass the e-mail. However I don’t expect to pass anything else. For some reason the developer is also passing in an out parameter to record errors. In object oriented we have exceptions and these make the code much cleaner.

Helper Methods

The first 6 lines of the method are just there to check the email. So in this case we should split the code into smaller functions. This also help us to eliminate the comments which just make the code longer and more ambiguous.

Final Code


public Employee validateEmailAndGetEmployee(string email)
{
  validateEmail(email);
  return getEmployeeByEmail(email);
}

private void validateEmail(string email)
{
  if (!email.Contains("@"))
  {
    throw new EmployeeException("Email not valid");
  }
}

private Employee getEmployeeByEmail(string email)
{
  var employeesWhichMatchEmail = Employees.Where(x => x.Email == email);

  ErrorIfMoreThenOneEmployeeIsInList(employeesWhichMatchEmail);

  return employeesWhichMatchEmail.First();
}

private void ErrorIfMoreThenOneEmployeeIsInList(IEnumerable listOfEmployee)
{
  if (listOfEmployee == null)
  {
    throw new EmployeeException("More than one employee got returned");
  }

  if (listOfEmployee.Count() > 1)
  {
    throw new EmployeeException("More than one employee found with email");
  }
}

I think you agree that this code is much cleaner. First of all the method and variable names describe better what is happening. The next thing we notice is that the first method is quite high level and just contains references to helper functions. The helper functions keep on decreasing the abstraction and going down to the implementation. Like this the developer reading the code can immediately assume what every method is doing.

Okay… I think that is too much information in one post. Basically we briefly discussed the following topics:

  1. Why we should invest in clean code
  2. What the bad code looks like
  3. Boy Scout Rule
  4. The proper use of function and variable names
  5. Using comments
  6. Helper methods
  7. Error handling

Till the next post go to Amazon and buy the book.