3 Rules for Refactoring Functions

· erock's devlog

Simple steps to guarantee cleaner functions

  1. Return early
  2. Return often
  3. Reduce levels of nesting

These rules are the heartbeat to how I write code. Whenever I write a function there is nothing more important for readability and maintainability. Whenever I approach a function that needs be refactored it's because the function did not subscribe to these rules.

Let's try to build a function that lets a user sign up for our service:

 1interface Response {
 2  type: "success" | "error";
 3  payload: any;
 4}
 5
 6function register(email: string, pass: string): Response {
 7  if (isEmailValid(email) && isEmailAvailable(email)) {
 8    if (isPassValid(pass)) {
 9      const userId = createUser(email, pass);
10      return {
11        type: "success",
12        payload: { userId },
13      };
14    } else {
15      return {
16        type: "error",
17        payload: "password must be 8 characters long",
18      };
19    }
20  } else {
21    return {
22      type: "error",
23      payload: "email is invalid",
24    };
25  }
26
27  return {
28    type: "error",
29    payload: "unknown error occurred",
30  };
31}

All the named functions are stubs, the actual implementation is irrelevant to this demonstration. Before we can create the user, we need to make sure that the email and password provided to us are valid and the email address has not already been created. There are a couple issues with the current code that can be improved. First, the main purpose of the function's code is nested within multiple if-statements. This makes it harder to understand how a function's core responsibility gets activated. Because we want to return meaninful errors in this function, we have to create accompanying else-statements which are not co-located next to the logic that controls them and makes them hard to read. Max indendation is 4.

Return early #

The general idea is to figure out all the ways to exit a function as quickly as possible. This reduces cognitive overhead when reading a function. When you are hunting a bug and a function returns early for your use-case, you can move on quickly because you avoid the cognitive load required to understand the rest of the function.

 1function register(email: string, pass: string): Response {
 2  if (!isEmailValid(email) || !isEmailAvailable(email) || !isPassValid(pass)) {
 3    return {
 4      type: "error",
 5      payload: "validation failed",
 6    };
 7  }
 8
 9  const userId = createUser(email, pass);
10  return {
11    type: "success",
12    payload: { userId },
13  };
14}

This seems better. We have removed the need to read all the code when we know our email or pass is invalid. The if-statement/else-statement separation in the previous example is gone: the condition is co-located with the error state. Also the primary goal of the function's code is in the main body of the function. However, we lost something in this iteration: meaningful error messages. We no longer know what piece of validation failed because they are all inside one if-statement. Max indentation is 3.

Return often #

Don't wait until the end of a function to return. Based on experience, waiting until the end to return from a function ends up creating a lot of branching logic that a developer has to keep track of in their head when traversing a function's code.

 1function register(email: string, pass: string): Response {
 2  if (!isEmailValid(email) || !isEmailAvailable(email)) {
 3    return {
 4      type: "error",
 5      payload: "email is invalid",
 6    };
 7  }
 8
 9  if (!isPassValid(pass)) {
10    return {
11      type: "error",
12      payload: "password must be 8 characters long",
13    };
14  }
15
16  const userId = createUser(email, pass);
17  return {
18    type: "success",
19    payload: { userId },
20  };
21}

Now we have separated the email validation from the password validation and have better error messages by returning early and often. This is clear to understand and at each step of the function we filter out parameters that would prevent the core of our function from running. However, there are still some improvements we could make. We have an if-statement that requires two condititions to be met before exiting the function. This logic seems straight-forward, but when given the opportunity, I almost always split the condition into separate if-statements. Another side-effect of having one statement handle two conditions is the error messaging is a little misleading. Max indentation is 3.

 1function register(email: string, pass: string): Response {
 2  if (!isEmailValid(email)) {
 3    return {
 4      type: "error",
 5      payload: "email is invalid",
 6    };
 7  }
 8
 9  if (!isEmailAvailable(email)) {
10    return {
11      type: "error",
12      payload: "email is already taken",
13    };
14  }
15
16  if (!isPassValid(pass)) {
17    return {
18      type: "error",
19      payload: "password must be 8 characters long",
20    };
21  }
22
23  const userId = createUser(email, pass);
24  return {
25    type: "success",
26    payload: { userId },
27  };
28}

Using all of the rules we have the final result. This seems readable, traceable error messaging, and reduces the level of indentation. Now it may seem like this is lengthier than the other examples, which is true. Lines of code is not really a concern to me and is not something I consider a valuable metric to compare code for readability. Max indentation is 3.

Could we improve this code even more? Let's try to refactor some of the error handling by making the validation function return the error message.

Dynamic returns #

 1/*
 2  We switch the validation functions from returning if the email is valid to
 3  returning the error message if the email is not valid. e.g.:
 4    isValidEmail(string): boolean -> validateEmail(string): string
 5*/
 6function register(email: string, pass: string): Response {
 7  const validations = [
 8    validateEmail(email),
 9    validateEmailAvailable(email),
10    validatePass(pass),
11  ];
12
13  for (let i = 0; i < validations.length; i++) {
14    const err = validations[i];
15    if (err) {
16      return {
17        type: "error",
18        payload: err,
19      };
20    }
21  }
22
23  const userId = createUser(email, pass);
24  return {
25    type: "success",
26    payload: { userId },
27  };
28}

Is this example more readable than the previous one? There are definitely some positives to this version. It's easier to add more validations because we can add the functions to the validations array and it should just work. There are only two return statements in the function which is less than the previous example.

However, we have introduced a for-loop to iterate over the validation functions and we will return if any of those validations fail. Even though this function abstracted the concept of validation and is more scalable to adding more validations, it strikes me as very dynamic. Also, every function must subscribe to handling validations the same way, if they don't, figuring out the bug is going to be tricky to figure out because the error is going happen inside a for-loop.

To me, this function is less readable because of its dynamic nature. The end-developer now needs to mentally iterate over the validations array which is a significant amount of cognitive load. At this point it really depends on what is more important: readability or scalability.

My instincts tell me to use the previous example until it is unbearable to continue to write the validation checks one after another.

Reduce levels of nesting #

As code structure determines its function, the graphic design of code determines its maintainability. Indentation, while necessary for visualizing the flow control a program, is often assumed to be merely an aethestic appeal. However, what if indentation can help determine unnecessary code complexity? Abrupt code indentation tends to convolute control flow with minor details. 1 Linus Torvalds thinks that greater than three levels of indentation is a code smell which is part of a greater design flaw.

Now, some people will claim that having 8-character indentations makes the code move too far to the right, and makes it hard to read on a 80-character terminal screen. The answer to that is that if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program. 2

In python, indentation is a rule not a guideline, any python script with misaligned code nesting will result in an IndentationError: expected an indented block. Again in python if you import this you'll get "The Zen of Python," otherwise known as PEP20, here is snippet from it:

Flat is better than nested.

Studies by Noam Chomsky suggest that few people can understand more than three levels of nested ifs 3, and many researchers recommend avoiding nesting to more than four levels 4 5

Jeff Atwood thinks that nesting code has a high cyclomatic complexity value which is a measure of how many distinct paths there are through code. A lower cyclomatic complexity value correlates with more readable code, and also indicates how well it can be properly tested. 6

Why is this important? Rarely does readability seem to play an important role in most undergraduate discussions when it can in fact be one of the most important characteristics of widely used and maintainable code. Code that is only understood by one person is code not worth maintaining -- and as a result poorly designed.


I have no idea what I'm doing. Subscribe to my rss feed to read more posts.