Tuesday, November 23, 2010

Artifacts of Good Code

An artifact is defined by one meaning as "any feature that is not naturally present but is a product of an extrinsic agent, method, or the like...". I'm interested in the method of coding. When coding, the obvious output or product is the code itself. But I want to talk about some of the other products of good coding.

If the code compiles, the binary output. The compiler messages during compilation. But these are products of compiling, not so much coding.


When you see good code you often know it. It looks inviting. It draws you in and lets you play with it.

The kind of characteristics or products of good coding I want to talk about are what you often see when you find good code, they are correlated but I am not claiming or trying to prove causality. They are indicators. Think of it like trying to judge the appearance of the inside of someone's house by looking at their yard. Well trimmed plants, neat lawn, no weeds.


So what kind of things might be in the yard of a piece of code. Some obvious things are documentation, test cases, a specification and/or requirements and/or user stories, bug tracking, use of a source code versioning system etc. Not all code has any or all of these things, but my observation is that more mature code tends to have most of these things.


Lets look at why. A number of these artifacts are to facilitate more than one person working on the code. They are a sign that the code wasn't developed by a single developer. More heads are better than one. It's immediately a good sign that the code hasn't been just looked at by one person, several people working on it means some review from different perspectives. It's like a house with a single occupant vs a share house.


Anyway we were talking about code. If the code is occupied by a few coders, some things probably will happen. Firstly some kind of source code versioning system if it isn't already being used will be introduced. It's quite hard without such systems for several people to work on the same code at the same time and expect anything to work without some kind of versioning system to mark a working version and way to combine the contributions of the coders to make and mark newer versions. Even if one person was writing the code alone, marking working versions and being able to go backwards and forwards between versions of the code is very useful, but it becomes much more necessary when several people start working on it.

Often the next thing that happens when more and more people start working on the code and using it is the need for documentation. The original author knew what the function or module was meant to do and what its limitations might be by design, but if the code doesn't do what was meant or the original limitations are not sufficient or assumptions about the limitations were wrong, then without documentation its hard for others to work with or use that code. One can't change or fix a function without knowing properly what the function is intended to correctly do. That needs documenting so one can verify the implementation of the class/module/function does what is intended and the uses of it are using it as intended.

Functions shouldn't be so complicated that the intention of what they do is hard to describe, so good code tends to have nicely named functions/classes/modules too. Good code uses sensible descriptive but concise naming of functions and variables. It shows thought went in to the slicing of the program in to functions in a logical way. Hair-brained code is easy to spot, just take a sample of some of the function/class names and if you can't really guess from the name what it does, it's probably because it's a function or class that is ill defined and conceived and exists to factorize the code a certain way that isn't well aligned with any semantic meaning. Another sign of this is many functions with very similar names doing almost the same thing but with very minor variations (I mean besides operator overloading convenience functions to pass in different types). Having 10 different functions to query the same value but in different ways is a bad sign. It is probably the result of insufficient documentation because as more coders worked on something, the lack of documentation resulted in more functions added that did similar things because the original functions weren't documented about their intentions and resultantly became hard to modify and the uses of those functions were too fragile to changes in the function that new coders created new similar functions to avoid the fragility breaking everything. You can more confidently fix a broken implementation of a given API if you know what the intent is from the documentation, and then it is the callers that are broken if they depended on the broken implementation. Without good API documentation you can not make this judgement as easily.


Someone on a project like this (perhaps one of the original coders if they are still around) will eventually notice how brain-dead it is all becoming. Perhaps something happens that brings to a front that the code was not documented so there was no way to know how it should have behaved compared to how the function actually behaved. Then begins the task of documenting. It's always interesting watching coders forced to write documentation for old crufty code they didn't originally write. Okay, next someone introduces a policy about swear words in the documentation, and then down a road of policies. But the point is, documentation is a sign of good code because it signifies that someone at sometime has had to stop and think for a moment about, what does this function do, what might be the limitations, what are the inputs, what are the outputs. And they'll look at the function name and hopefully stop to consider the design of it.

It's okay to document what a function is intending to do, it's another thing altogether to actually check it does what it intends to. That's what testing does. Having test cases is how one does that. Without testing, there is no way to really know that something does as intended. Tests don't need to be automated, but automated is better for lots of cases. Unit tests can test at a fine grained level, but any testing and test case is a start and a good sign that someone is trying to check that something does what it's meant to do. If given code with no test cases, no sample input files, no examples, it's a sign that the code might not be tested to do what is intended for it to do. Having unit tests or various test cases is going to make it heaps easier for the coders to fix bugs and get the code doing what it's documented it should do, so it's a good sign that the code is or on it's way to doing what it is documented to do.


Requirements or user stories are a step up from checking the code is doing what was intended, they check what the intentions of the code should be. That might evolve (or devolve) over time, but so too should the user stories or requirements. They are a statement about what is the need for the software, what purpose will it fulfil, it gives the context and meaning to the software. It doesn't turn the code in to good code, but if the code doesn't have a context and meaning then the coders might be producing something that will have little value to anyone but maybe themselves.

So these are some of the obvious ways to identify that potentially some good coding might be going on making some good code. Obviously in a company setting it is possible for managers to require that "software engineers" must use/produce the above mentioned artifacts as standard (but then they are no longer really by-products of good coding, they have been made in to the products of the output of the workers). This is a kind of cargo cult way of doing things (google cargo cult is you haven't heard of this before). This is not the way to "inspired code". In later blogs I want to write about other signs that code is good, or even inspired, not just some cargo cult coding dressing it up to look like good code.

No comments:

Post a Comment