by Alan Skorkin on May 19, 2010
It occurs to me, that many programmers hate reading code – c'mon admit it. Just about everyone loves writing code – writing code is fun. Reading code, on the other hand, is hard work. Not only is it hard work, it is boring, cause let's face it, any code not written by you just sucks (oh we don't say it, but we're all thinking it). Even your own code begins to look increasingly sucky mere hours after you've finished writing it and the longer you leave it the suckier it seems. So, why should you waste hours looking at other people's crappy code when you can be spending this time writing awesome code of your own? Weren't we just over this; give it a couple of hours and then come back and see if your code still looks awesome. You're never going to become a master of your craft if you don't absorb the knowledge of the masters that came before you. One way to do that is to find a master in person and get them to teach you everything they know. Is this possible – certainly, is it probable – not so much, you'd have to be extremely lucky. You don't really need luck though, we're fortunate to be in a profession where the knowledge and skill of all the masters is right there for us to absorb, embedded in the code they have written. All you have to do is read it, sure it might take you a bit longer without someone sitting there explaining it to you, but it's eminently possible. To put it in perspective, try becoming a great carpenter just by looking at a bunch of well constructed furniture.
I love reading code, I have always intuitively felt that you get a lot from it, yes it can be annoying and boring, but the payoff is well worth the effort. Consider this, if you wanted to become a great writer, would you focus exclusively on writing? You could try it, but you're not going to get far. It is an accepted fact that most great writers are also voracious readers. Before you can hope to write anything decent you need to read other great writers, absorb different styles, see what others have tried before you and feed your creative self. Your knowledge will slowly grow and eventually, your own writing will begin to exhibit some maturity, you will develop a 'feel' for it. Coding is no different, why would you expect to write anything decent if you've never read any great code? The answer is you shouldn't. Reading great code is just as important for a programmer as reading great books is for a writer (I can't take credit for this thought, it belongs to Peter Norvig and he is awesome, so take heed).
Even if all of this is unconvincing, there is one fact which is undeniable. Being good at reading code is important to your survival as a professional developer. Any non-trivial project these days will be a team effort and so there will always be large chunks of code you had no hand in which you have to work with, modify and extend. And so, code reading will likely be the most used and most useful skill you can have; better bite the bullet and get good at it – fast.
How To Read Code Like … Some Kind Of Code-Reading Guy
I can't tell you how many times I've seen programmers scroll up and down through an unfamiliar piece of code, for minutes on end, with a sour expression on their face. They would later declare the code to be unreadable crap and why waste the time anyway; we can just work around the problem somehow. I am not sure what the expectation was here, absorb the meaning of the code by osmosis, or maybe intensely stare your way to enlightenment? You don't read code by just looking at it for ages, you want to understand it and make it your own. Here are some techniques I like to use, it is not an exhaustive list, but I have found these to be particularly helpful.
- Try to build and run it. Often this is a simple one step process, like when you're looking at work code (as opposed to random code). However this is not always the case and you can learn a lot about the high level structure of the code from getting it to build and execute. And, on the subject of work code, you are intimately familiar with how to build your current project are you not? Builds are often complex, but there is a lot of understanding to be gleaned from knowing how the build does its thing to produce the executable bits.
- Don't focus on the details just yet. The first thing you want to do is get a bit of a feel for the structure and style of the code you're reading. Start having a browse and try to figure out what the various bits of the code are trying to do. This will familiarise you with the high level structure of the whole codebase as well as give you some idea of the kind of code you're dealing with (well factored, spaghetti etc.). This is the time where you want to find the entry point (whatever it happens to be, main function, servlet, controller etc.) and see how the code branches out from there. Don't spend too long on this; it's a step you can come back to at any time as you gain more familiarity with the code.
- Make sure you understand all the constructs. Unless you happen to be the premier expert on your programming language there are probably some things you didn't know it could do. As you're having a high level fly-through of the code, note down any constructs you may be unfamiliar with. If there are a lot of these, your next step is obvious. You're not going to get very far if you have no idea what the code is doing syntactically. Even if there are only a few constructs you're unfamiliar with, it is probably a good idea to go look them up. You're now discovering things about your language you didn't know before, I am happy to trade a few hours of code reading just for that.
- Now that you've got a good idea about most of the constructs, it is time to do a couple of random deep-dives. Just like in step 2, start flying though the code, but this time, pick some random functions or classes and start looking through them line by line. This is where the hard work really begins, but also where you will start getting the major pay-offs. The idea here is to really get into the mindset (the groove) of the codebase you're looking at. Once again don't spend too much time on this, but do try and deeply absorb a few meaty chunks before moving on. This is another step to which you can come back again and again with a bit more context and get more out of it every time.
- There were undoubtedly things in the previous step you were confused about, so this is the perfect time to go and read some tests. You will potentially have a lot less trouble following these and gain an understating of the code under test at the same time. I am constantly surprised when developers ignore a well-written and thorough test suite while trying to read and understand some code. Of course, sometimes there are no tests.
- No tests you say, sounds like the perfect time to write some. There are many benefits here, you're aiding your own understanding, you're improving the codebase, you're writing code while reading it, which is the best of both worlds and gives you something to do with your hands. Even if there are tests already, you can always write some more for your own benefit. Testing code often requires thinking about it a little differently and concepts that were eluding you before can become clear.
- Extract curious bits of code into standalone programs. I find this to be a fun exercise when reading code, even if just for a change of pace. Even if you don't understand the low level details of the code, you may have some idea of what the code is trying to do at a high level. Why not extract that particular bit of functionality into a separate program. It will make it easier to debug when you can execute a small chunk by itself, which – in turn – may allow you to take that extra step towards the understanding you've been looking for.
- The code is dirty and smelly? Why not refactor it. I am not suggesting you rewrite the whole codebase, but refactoring even small portions of the code can really take your understanding to the next level. Start pulling out the functionality you do understand into self-contained functions. Before you know it, the original monster function is looking manageable and you can fit it in your head. Refactoring allows you to make the code your own without having to completely rewrite it. It helps to have good tests for this, but even if you don't have that, just test as you go and only pull out functionality you're sure of. Even if the tests seem totally inadequate – learn to trust your own skill as a developer, sometimes you just need to go for it (you can always revert if you have to).
- If nothing seems to help, get yourself a code reading buddy. You’re probably not the only person out there who can benefit from reading this code, so go grab someone else and try reading it together. Don't get an expert though, they'll just explain it all to you at a high level and you will miss all the nuances that you can pick up by going though the code yourself. However, if nothing works, and you just don't get it, sometimes the best thing you can do is ask. Ask your co-workers or if you're reading open source code, try and find someone on the interwebs. But remember, this is the last step, not the first one.
If I was pressed for time and needed to understand some code reasonably quickly and could only pick one of the above steps, I would pick refactoring (step 8). You will not be able to get across quite as much, but the stuff you do get across you will know solid. Either way, the thing you need to keep in mind is this. If you're new to a significant codebase, you will never get across it instantly, or even quickly. It will take days, weeks and months of patient effort – just accept it. Even having an expert right there with you doesn't significantly cut the time down (this is what the last post in my teaching and learning series will be about). If however, you're patient and methodical about your code reading (and writing) you can eventually become intimately familiar with all aspects of the project and become the go-to man when it comes to the codebase. Alternatively you can avoid reading code and always be the guy looking for someone to explain something to you. I know which one I would rather be.
Seek Out Code Reading Opportunities – Don't Avoid Them
We love to write new code, it's seductive cause this time we're going to get it just right. Ok, maybe not this time, but next time for sure. The truth is, you're always evolving in your craft and you're never going to get it just right. There is always value in writing new code, you get to practice and hone your skills, but there is often just as much (if not more) value in reading and playing with code written by others. You not only get some valuable tech knowledge from it, but often domain knowledge as well (after all, the code is the ultimate form of documentation) which is often more valuable still.
Even code that is written in an esoteric fashion, without following any kind of convention, can be valuable. You know the code I am talking about, it almost looks obfuscated, but it wasn't intended to be (for some reason it is often Perl code :)). Whenever I see code like that, I think of it this way. Just imagine the kinds of stuff you will learn if you can only decipher this stuff. Yeah, it's a major pain, but admit it, there is that niggling secret desire to be able to write such naturally obfuscated code yourself (no use denying it, you know it's true). Well, if you invest some time into reading code like that, you're much more likely to eventually be able to write it – doesn't mean you will write it, but you want to be ABLE to. Lastly, attitude is always paramount. If you view code reading as a chore then it will be a chore and you will avoid it, but if you choose to see it as an opportunity – good things will happen.