A Guide To Navigating Large Codebases
Studying large codebases is an ongoing process and here are some useful tips to get better at it. 🚀
I recently had the opportunity to work on a large codebase for a startup. While I had prior experience of working with large codebases, this was a clear test of my skills. I had to be productive from day 1 and the expectations were high.
Within 4 days of being onboarding, I built a custom integration and tested it end-to-end to ensure it worked perfectly. I had to go through couple of iterations in the code review, but that's normal when you first start working on a new codebase. The resulting code was clean, performant, and production-ready. At the end, I was glad that I built a production ready integration in such a short time, but boy, it wasn't as easy as I thought.
Naturally, I had to spend time reading a lot of code and making sense of the abstractions and paradigms used. In addition to that, I had to deliver quickly, so I had to prfioritize which parts of the codebase I need to look at. This doubled the challenge - I have to make sense of everything plus be productive from day 1. This experience prompted me to write this article, so that you can help yourself if you're in a similar situation.
Get your Local Setup Right.
The first and most important step is to setup your local development environment. In the case where certain resources are not accessible, spend time setting up mocks. If your codebase has tests, then it's great, focus on getting all the tests to run and pass. If your codebase doesn't have tests, then simply get the entire system running locally and manually play around with the software.
This is important because you want to have a stable base to work upon from. Whenever you need to "reset the state" or make changes, it should be easily runnable from your system. In the case where it's not possible to run everything locally, try to get access to a development server where you can run things.
The idea is to have an environment where you can easily execute code and have a quick feedback loop between testing and development. This will help you iterate fast, study the codebase in a detailed way, monitor logs, and understand the product better.
Just Read.
The first step is to just open random files and start reading the code. You can find entry points or use particular methods to study parts of the codebase, but as a low-resistance task, to get into the habit of reading the codebase, just open files and read the code.
This might seem stupid, but your brain starts forming drawing past knowledge and insight subconsciously. You might not understand anything, but the process of reading triggers feedback loops within your mind which make you extrapolate. Spend a good 2 hours just reading the code.
Understand the Project Structure and Design Patterns.
Every project is designed differently and has a unique structure. Sometimes, it may be standard, sometimes it may not. The key is to understand how the files are organised, where specific functionality resides, and the various naming conventions used. This involves looking at the top-level folders and getting a sense of how the project is organised. Identify the key entry-points, the file structure, and commonly used libraries.
For example, a utils or lib directory might contain reusable functions and classes that simplify common tasks. Familiarizing yourself with these libraries allows you to leverage existing code rather than reinventing the wheel, making your development process more efficient. Quite often, you might end up writing a function which is already provided in the codebase libs or utils folder, and you can save time by just reusing the code instead of writing it all by yourself.
Common directories might include src for source code, tests for test files, docs for documentation, and config for configuration files. Configuration files, often found in the config directory, define how the application should run in different environments (e.g., development, testing, production). These files might include database settings, API keys, and other environment-specific variables. Main entry points are also essential to identify, as they are the starting points of the application. For a web application, this could be files like index.js or app.py that initialize the server and load essential modules. Understanding these key components gives you a foundational knowledge of how the project starts and runs.
Trace Product Features To Codebase Sections.
A lot of navigating large codebases is about tracing code paths. You want to determine which functional aspects of the software correspond to the specific lines of code in the codebase. This will trigger the associative tendencies of the mind, which, in other words means "connecting the dots" or "tracing the origins".
Here is a simple workflow you can try:
Select a product feature.
Understand the business use-case of the feature.
Play around with the feature and understand how it works.
Test out different inputs, scenarios and user flows.
Monitor backend logs and check which endpoints are hit.
Open the code for the endpoints and study them.
Similarly, trace the code in the frontend and study it.
Make simple changes and test them.
After spending sufficient time reading the code and analysing the product, give yourself the opportunity to make changes and modify the codebase. Make small changes, add logs, add small UI elements, play around with the data access patterns, and most importantly identify areas in the codebase which need improvement.
Read Tests.
The tests in your codebase can tell you plenty about the various problems and use-cases the code solves. It'll help you get a better perspective about the product, and the various features it offers. Depending on the type of tests (unit / integration / e2e), your insights will vary and it'll accordingly fill gaps in your knowledge base of the codebase.
Leverage Version Control History
Reading commit messages can help you understand the evolution of the codebase. If the codebase has well-written commit messages, it's even better because you can read the commits and make sense of design decisions and trade-offs.
Read Documentation & Document Your Learnings.
Reading documentation, especially design docs, will familiarise you with the high-level workings of the application. This will help you understand the trade-offs that came into play when building the system and help you write better code. The next step is to create documentation for things that haven't been codified yet.
Conclusion
These are some basic guidelines you can follow to navigate large codebases. As a Software Engineer, you will find yourself in a position where you need to gather context and understand codebases several times in your career. Like everything else, the more you practice, the better you get at it.
Navigating codebases is an ongoing job. It's not a one-time effort that yields results, rather a continuous process that you keep doing throughout the lifetime of the project. With these tips, over time, you will understand several nuances that even experienced engineers on the team might not know! In such cases, document!
I hope this helps you become a better engineer. Keep learning and growing!
If you're looking for someone to build your startup MVP, contact me!
I actively work on Open Source Software, check out my GitHub Profile. ✨
Follow me on Instagram (@adityapatange), I talk about tech, meditation, startups and hip hop! ⚡️
I write byte-sized insights on Threads to supercharge your day. 💡