Mon Mar 13 2023

Practical tips on writing clean code: Improve your coding & enhance your software

By Thijs Nieuwdorp and Andrea Kuijt

In a world that runs on code, writing high quality code is continually getting more important. Simply put, clean code is essential for building software that is reliable, efficient, and easy to work with.

In this blog, we will delve into some principles and practices of clean code and how you can apply them in your own projects. Whether you are a developer working hard to make a tool production ready or a data scientist working on a prototype (which should eventually end up in production!), understanding and practicing clean code is crucial for building software that stands the test of time.

If you want to improve your coding skills and make your software better, this post is for you!

Why is clean code important?

You’re not done when it works, you are done when it is right

On the road to writing better software, a sensible starting point is studying and applying the fundamentals found in Clean Code: A Handbook of Agile Software Craftsmanship by Robert R. Martin. This handbook contains a set of principles and rules based on years of experience in writing software.

In the book, ‘Uncle Bob’ explains why writing clean code is important. He emphasizes that code is read more often than it is written: "You may think that your job is to get code to work. That is not true; it’s only half of your job. The other half of your job is to write code that other people can maintain, can use, and can  make work."

Uncle Bob advocates code that looks like it was written by someone who cares about everyone else who is going to read the code. We should stop writing code that grows more tangled and corrupted with each passing day. And we should stop creating a bug list that is a thousand pages long.

Software craftsmanship is about being a real professional. Musicians study and practice the seemingly trivial details of their disciplines, and programmers should do the same.

By ensuring your code stays clean, you will not only redeem the reader of your code (and yourself) of unnecessary frustration, but you will primarily contribute to the main benefit of writing clean code: a reduction of technical debt.

What is technical debt?

Technical debt, or in short tech debt, is the implied cost of additional rework caused by the implementation of a short term solution (quick-fixes) when writing software. Even though it can only be coarsely quantified, tech debt is estimated to be immense.

Based on research done by Stripe , tech debt leads to an estimated global loss of ±$85 billion per year. In day-to-day life, Stripe estimates that developers spend on average 23 - 33% of their time dealing with technical debt . An example of code that creates tech debt, which Uncle Bob calls ‘rude’, is copying and pasting the same piece of code multiple times instead of wrapping it in a function.

The problem is that no solution is as permanent as a temporary solution; neglected or even completely skipped tasks will not be revised once code is in production. Even though it may seem impossible to ensure high code quality in the face of tight deadlines, it is still the responsibility of every developer to prioritize quality. Also, Gartner found that by actively managing and reducing technical debt, it is possible to achieve at least a 50% faster service delivery time to the business.

In a nutshell, tidying up code after making it work, adding tests to prove that not a single part is failing, and reviewing the code before merging are the most important solutions when it comes to reducing and combatting technical debt.

How to write clean code?

We selected some principles and rules recommended by The Clean Code Handbook that will make your life - and that of the people who read your code- easier. Thijs, a Data Scientist at Xomnia, and a strong believer in writing readable code, has studied The Clean Code Handbook in great detail.

After completing a code review on a simple script that was created in a jupyter notebook, he answered the most important questions on how to start writing clean code.

Where do I start with cleaning my code?

1) Clean as you code:

Don’t worry about determining which parts of your codebase need improvement. Thijs advocates the “clean as you code” method: Whenever new code is written or existing code is touched up, make sure it meets the new standards. This prevents you from investing time in irrelevant parts of your code and will decrease your technical debt over time.

Clean Code Principle 1: “Clean as you code”

Cleaning codes should only happen when new code is written or existing code is touched up


import polars as pl
import plotly.express as px
import os

dataset = (
	pl.scan_csv("data/List of most-streamed songs on Spotify.csv")
	.with_column(
    	pl.col("Streams (Billions)").str.replace(",", "").cast(pl.Int16)
	)
)

dataset.fetch(5)

# Plot the streams of the top 10 songs
# Plot the most popular artists in the top 100
# Something with the dates?

# Create bar plot


fig = px.bar(dataset.fetch(5).select(["Song", "Streams (Billions)"]).to_pandas(), x = "Song", y = "Streams (Billions)", title="Spotify Top 5")

# Plot


fig.show()

# Write the image


if not os.path.exists("output"):
	os.mkdir("output")
fig.write_image("output/top5songs.png")

#dataset.groupby("Artist").agg(pl.sum("Streams (Billions)")).sort(pl.col("Streams (Billions)"), reverse = True).head().collect().to_pandas()

# Create top 5 of top 100 streams per artist


fig = px.bar(dataset.groupby("Artist").agg(pl.sum("Streams (Billions)")).sort(pl.col("Streams (Billions)"), reverse = True).head().collect().to_pandas(), x = "Artist", y = "Streams (Billions)", title = "Top 100 streams per artist")
``
fig.show()

if not os.path.exists("output"):
	os.mkdir("output")
fig.write_image("output/top5artists.png")

#dataset.groupby("Artist").agg(pl.col("Artist").count().alias("Top 100 entries")).sort(pl.col("Top 100 entries"), reverse = True).head(5).collect().to_pandas()

# Artists that appeared in the top 100 the most often


fig = px.bar(dataset.groupby("Artist").agg(pl.col("Artist").count().alias("Top 100 entries")).sort(pl.col("Top 100 entries"), reverse = True).head(5).collect().to_pandas(), x = "Artist", y = "Top 100 entries", title = "Number of top 100 entries")
fig.show()

if not os.path.exists("output"):
	os.mkdir("output")
fig.write_image("output/top5mostentries.png")

Thijs reviewed the following script, which uses a dataset containing the top 100 songs on Spotify to generate a few plots. Thijs explains how to improve the script to increase its readability and efficiency:

2) Write comments

Comments are a tool that’s often misused to cover up bad coding practices. Comments in your code can be a great tool to communicate to the people who are going to read your code. However, they come with some pitfalls.

Avoid the following types of comments:

Obsolete comments: Even though code can easily be adjusted by using modern day IDEs, comments are disjoined from these code edits. Comments need to be adjusted by the developer separate from the code, and this can be easy to forget, guaranteeing that it will be forgotten at times. Keeping comments that contain info that is no longer relevant only leads to confusion, so it’s better to simply remove them.

Code review: These comments were used to structure the program before it was written out. They’re no longer relevant, so it is better remove them.


# Plot the streams of the top 10 songs
# Plot the most popular artists in the top 100
# Something with the dates?

Redundant comments: Another common pitfall when writing comments is repeating what is already stated in the code. When the code has been changed and the comments haven't, this can become quite confusing. Therefore, a clean code rule of thumb is: What can be stated in names of functions or variables should not be stated in comments.

Code review: The comments say exactly what the code does. Wrapping the code in functions or variables with descriptive names should prevent the need for comments like this.


# Create bar plot
fig = px.bar(dataset.fetch(5).select(["Song", "Streams (Billions)"]).to_pandas(), x = "Song", y = "Streams (Billions)", title="Spotify Top 5")
# Plot
fig.show()

Commented-out code: Even though we understand you don’t want to remove your ‘old’ code immediately because you might need it in the future, we strongly recommend not to comment out the code in order to preserve it. Nowadays, tools like git allow keeping a very descriptive history of the code. It is better to revert to earlier commits in this history when necessary and keep the current version clean, than keep code around in comments that will rot eventually even though some day that code might come in handy again. For example, when you rename variable names and you forget to apply that change to the code you commented out, that piece of code will eventually turn obsolete.

Code review: Remove the code that you are not actually using.


#dataset.groupby("Artist").agg(pl.sum("Streams (Billions)")).sort(pl.col("Streams (Billions)"), reverse = True).head().collect().to_pandas()

Clean Code Principle 2: “Only use comments to explain something that the code itself cannot explain”

Put the effort in the code, not in the comments. 

Don't use a comment when it can be explained in a variable name or function name.

3) Determine and adhere to a naming convention

Another issue is picking the right name for objects in your code. Names are the main way to communicate functionality to the person after you. It makes it very important to put time into precisely naming your variables and functions.

Many small, aptly named functions reduce the need for comments. In a few cases, precise and concise naming conventions can help define functions to describe what is going on in a piece of code. An added advantage is that this reduces the amount of duplication and repetition in code. But again, they come with pitfalls.

Avoid the following types of code naming conventions:

  • Ambiguous names: Use unambiguous names that describe what the functions and variables do. If a name is unclear to you, it is probably also for others.
  • Magic numbers: These are numbers that appear in the code but are never defined. For example, instead of putting 3600 in the code, define it as a constant: SECONDS_IN_AN_HOUR = 3600.
  • Variable names that don’t contain any information about its functionality:  It should be clear what the programmer intended the variable to be used for. Therefore, use ‘top_5_songs_plot’ instead of ‘fig’ as a name for a variable that creates a plot of the top 5 songs.
  • Lengthy names for big scope and short names for small scoped functions: As a rule, the name of the function is inversely proportional to the scope of the function.

Code review: So, in our code, we can change the variable “fig”, which has a very specific functionality like what we see below:


fig = px.bar(dataset.fetch(5).select(["Song", "Streams (Billions)"]).to_pandas(), x = "Song", y = "Streams (Billions)", title="Spotify Top 5")

… to a more informative name that better reflects what the code is there for:


top_5_songs_plot = px.bar(
	dataset.select(["Song", "Streams (Billions)"])
		.sort(pl.col("Streams (Billions)"), reverse=True)
		.head()
		.collect()
		.to_pandas(),
x=”Song”,
y=”Streams (Billions”),
title=”Spotify Top 5”,
)

Clean Code Principle 3: “A name in your program should tell the reader the purpose and the significance of a function.

4) Write methods properly

Even though writing a method is preferred over copying and pasting code over and over again, a few rules should be considered when writing these methods. Because they also come with pitfalls, avoid the following types of methods:

  • A method that does more than one thing: In the wise words of Uncle Bob: “Functions should hardly ever be 20 lines long”. Keep your functions small, preferably limiting their scope to only one level deep. If there are multiple things happening in a function, break it down into smaller ones.
  • A function that contains unexpected functionalities: Do not put functionalities in a function that was not expected based on its name. For example, do not write a function called “generate_plot” that does more than generating a plot (it generates a dataframe AND saves it to disk). The I/O operation is called a side effect. The same goes for overwriting global variables, creating cookies, and changing settings.
  • A function that goes up and down in its level of abstraction: Every line in a function should be at the same level of abstraction, which is one below the name.
  • A function that changes the state of the system when it returns a value
  • Other tips include using searchable and pronounceable names and replacing “magic numbers” (such as often used 3600, which is the amount of seconds in an hour) with a constant that defines what it represents.

Code review: Instead of copying and pasting code every time you want to create and save a plot …


# Create bar plot


fig = px.bar(dataset.fetch(5).select(["Song", "Streams (Billions)"]).to_pandas(), x = "Song", y = "Streams (Billions)", title="Spotify Top 5")

# Write the image


if not os.path.exists("output"):
	os.mkdir("output")
fig.write_image("output/top5songs.png")

… use a function with an informative name and call it when you want to apply it:


def write_image_to_output_directory(figure: Figure, filename: str) -> None:
	if not os.path.exists("output"):
    	    os.mkdir("output")
	figure.write_image(f"output/{filename}.png")

def create_and_save_plot(
	dataframe: pl.DataFrame, title: str, x_axis: str,y_axis:str) -> None:
	plot = px.bar(dataframe, x=x_axis, y=y_axis, title=title)
	filename = title.replace(" ", "_").lower()
	write_image_to_output_directory(plot, filename)

create_and_save_plot(
	top_5_number_of_top_100_entries,
	x_axis="Artist",
	y_axis="Top 100 entries",
	title="Total top 100 entries per artist",
)

Clean Code Principle 4: “Don’t repeat yourself. Use functions

5) Write tests

Even though our code review covers a jupyter notebook that is not a part of a bigger software system, it is best practice to write tests for every piece of code that you create. An interesting way of developing code is by creating the necessary tests first and then filling in the code. This is called Test Driven Development.

According to the laws of test driven development, you are not allowed to:

  1. Write any production code until you have first written a test that fails if the production code does not exist.
  2. Write more of a test than is sufficient to fail, provided that not compiling is failing too.
  3. Write any more production code than is sufficient to pass the current failing test.

Go back and forth between your production code and your unit tests. If you work this way, you will never have to do any debugging. Learn a new language, not debugging - Debugging is not a desirable skill. Also, if you want to know how a system works, read the tests. They give you all the detailed information you need. Although this development method may be extreme, it is recommended to write a lot of tests. This shortens the feedback loop on the quality of your work extensively, and thus increases your efficiency and productivity.

Conclusion

To put this all together, technical debt is an expensive problem that occurs when choosing a quick and limited solution over a better approach that takes longer. Clean code is a set of rules and principles you can follow to reduce tech debt, and we discussed a few of those principles.

Fixing tech debt you find while working on a codebase, a principle known as "clean as you code” can improve not only the quality of your own work, but also your productivity. “Clean as you code” rules can be applied to both the code you create and the code you touch during normal work, in order to improve the existing code base slowly but steadily.

This blog also went over rules about commenting: Remove comments that are obsolete or redundant. We also went over rules for naming: Use short, descriptive names and define “magic numbers” in constants. Last but not least, we went over  rules about how to use functions: Use the single responsibility principle, avoid side effects, and use many small descriptive functions.

Now, all you have to do is get started with writing clean code!