Saturday, November 26, 2011

Hungarian Notation - You Should Reconsider It

I used to think that Hungarian Notation was silly. I had a hard time seeing the point of it, and it just seemed like visual clutter when reading code. Eventually, I ended up working on a team that used it, and it really grew on me! The real problem with Hungarian Notation is that it comes in so many different forms. So, its effectiveness can vary, and the perception of it can vary as well, from highly useful to friggin’ annoying. But the benefits of a consistent naming convention shouldn’t be written off just because there are some less-than-ideal examples out there. The truth is, it’s pretty useful, and it can save you a lot of grief.


What I’m talking about in this article is usually referred to as “systems” hungarian, where the naming prefixes are not specific to any particular application. They are generic, and usually include information on the type of a named entity, such as a variable or class member.

What’s the Point?

It’s important to remember the point of a Hungarian Notation naming convention. Contrary to common misconception, it is NOT to try to do the job of a compiler or to duplicate its type-safety mechanisms. The point is to assist the programmer in understanding code and to mitigate human error when writing it. We are only human. All bugs in software are due to human error. Even with “type safe” languages, compilers can only go so far.

All languages (that I have ever worked with) have automatic type conversions and ambiguities that should not be ignored. The “type safety” languages provided by languages like C++ and Java is sometimes an illusion. The type safety only goes so far, and it is often the intended semantics of a variable’s type that is important. Our disillusionment usually only comes through the pain of debugging illusive, weird issues caused by values considered compatible by a compiler but whose semantic differences can wreak havoc.

A couple of examples in C++

  • When passing a string to printf, the “%s” format specifier can be passed any pointer, or any integer, since a pointer is just a number, so any number will do (right?).
  • An instance variable and reference are compatible and assignable, when the semantics of the first will cause copies to be made, when a reference was intended will make a copy
  • A C++ function name can be used to invoke the function or get a pointer to a function (think of forgetting the ‘()’ on a function that _returns_ a pointer)
  • All numeric base types are automatically converted when necessary. This is true for all languages. So, if you’re doing lots of math in code, you can get unexpected results (e.g. answer = (1 * 0.5); what do you get when ‘answer’ is an ‘int’?). While the compiler may not allow you to shoot yourself in the foot, it will allow you to walk through a field of poison ivy where you’ll think your just fine, only to find yourself suffering greatly later, asking yourself how this could have happened and where the !!F!! is the calamine lotion???!?!


As projects grow, and the code we work with gets more and more complex, naming conventions become more and more important. Incidentally, this is also why it is difficult to come up with convincing examples. The examples I use are demonstrative and do not represent the intricate real-world situations where naming conventions really pay off.

It Actually Does Help

The other day, I helped someone on my team track down a bug in their C++ code. The symptoms were indicative of some sort of memory corruption, because a null’ed pointer suddenly had a value again. It was bizarre, and it was stressing everybody out. In this particular case, it was legacy code that did not use any naming conventions or hungarian notation. The problem ended up being that an assignment of a reference to an instance variable, when it was intended to be assigned as a reference. If our naming conventions had been used, then the problem may have been noticed much earlier, because the usage would have been inconsistent with the name, even though the compiler said it was OK. This simple, subtle oversight slipped by and it caused some very strange side effects in the software.

Now that I have been using Hungarian Notation for some time, I notice this sort of situation quite often, and I regularly see it preventing bugs in my day-to-day coding.

Scripting Languages

I do a lot of scripting. I love scripting languages because they allow you to do so much with so few lines of code. This is usually possible because they are loosely typed, and they handle type conversions automatically, depending on the context. With scripting languages, string is king. Strings can be interpreted as strings or numeric types, depending on the context. Numeric values and objects can be interpreted as strings.

Unfortunately, with this added convenience comes less clarity, and less checking by a compiler. So, it can be very easy for things to go wrong due to an unintended type conversion. Worse yet, it can be very subtle.

Scripting languages benefit most from Hungarian Notation. It has saved me more grief in scripting than can remember.

Benefits of Hungarian Notation

The whole reason I writing about this is because I have experienced the benefits of Hungarian Notation first hand. Here are some of the ways in which I have seen those benefits.

Knowing the Type

Yes, in compiled languages, the compiler will do a lot of the type checking for you. And that’s great. The compiler catches a lot of potential problems. But in EVERY language I write code in, seeing a type indicator in names has helped me better understand the intent of the code, what it is trying to do. And that understanding helps me avoid bugs regularly.

Knowing the Scope

Scoping issues, such as the unintentional eclipsing of variables, are more easily detected by the programmer. It’s an easy mistake to make; I’ve made it myself on several occasions. Yes, better names can help sometimes, but this can still happen. And the compiler will not help in this case.

It Saves Time

As I mentioned, when I’m looking at code, whether it’s my own or someone else’s, a type indicator in names allows me to know the type just by looking. As it turns out, when I’m reading code, I’m already looking at it! Neat, huh? So, this is really a freebie. I don’t have to navigate to the thing’s definition, or take the time to reach for my mouse and hover over the thing, or go look it up, or track it down in any way.

Another time saver is with IDE completion. If you can’t remember the name of the thing you’re looking for, but you DO know that it’s a global enumeration, typing in those first couple of characters can greatly reduce the number of results that come back from a code completion or intellisense lookup.

It Prevents Bugs

This is actually another time saver, but it is so significant that it warrants being on its own.

I can say with confidence that EVERY time I am working with and writing code that has no Hungarian Notation, I spend precious time debugging problems because of it. They turn out to be something like a scoping problem, an automatic type conversion, or some other weird thing that I would have noticed if I had been using Hungarian Notation like I should have been. It’s usually at that point that I start using it and adding it where it’s missing.

When I am using Hungarian Notation consistently, it becomes evident fairly quickly at how much it’s saving me. And on top of that, I am confident that it saves me more than I even realize! After all, bugs are inadvertent, right? They happen without realizing it? Otherwise there wouldn’t be any bugs in your code!

Because much of it may be unseen, it is very, very easy to underestimate the benefit of Hungarian Notation, especially when it comes to code quality and mitigation of defects.

Disadvantages

This discussion would be incomplete without addressing some of the complaints out there. Here is the list of disadvantages from the Wikipedia page, along with my responses to them.

The Hungarian notation is redundant when type-checking is done by the compiler. Compilers for languages providing type-checking ensure the usage of a variable is consistent with its type automatically; checks by eye are redundant and subject to human error.
I’ve already described how the compiler isn’t perfect, and the robustness of the compiler’s checking depends on the language.
All modern Integrated development environments display variable types on demand, and automatically flag operations which use incompatible types, making the notation largely obsolete.
I mentioned this in the “It Saves Time” section. Personally, if I’m looking at some intricate code that uses several variables of different types, I’d rather just look at it than float around with the mouse reading the tooltips (which I forget after reading the next one anyway).
Hungarian Notation becomes confusing when it is used to represent several properties, as in a_crszkvc30LastNameCol: a constant reference argument, holding the contents of a database column LastName of type varchar(30) which is part of the table's primary key.
This is NOT the kind of naming convention I am supporting. I actually agree with this comment.
It may lead to inconsistency when code is modified or ported. If a variable's type is changed, either the decoration on the name of the variable will be inconsistent with the new type, or the variable's name must be changed.
If the name of a variable isn’t updated along with the type, then it’s just plain laziness. You’re changing the declaration of the variable; it’s RIGHT THERE IN FRONT OF YOUR FACE!! Take the 2 extra seconds to change the name! Any modern IDE can do the refactoring for you in one step.

Most of the time, knowing the use of a variable implies knowing its type. Furthermore, if the usage of a variable is not known, it can't be deduced from its type.
Sure, if you already understand the code, then you usually know the type because you understand how the variable is used and what it means.... but that’s “most of the time”. What about the rest of the time? To the second point, no, of course knowing the type alone does not tell you the meaning of a variable or its usage; that’s what the NAME part is for! Naming prefixes are additional information, which should always be welcome when trying to read and understand code.
Hungarian notation strongly reduces the benefits of using feature-rich code editors that support completion on variable names, for the programmer has to input the whole type specifier first.
For verbose, cumbersome Hungarian Notation, this is true. I support a terse, concise convention that rarely exceeds 3 characters, in which case it can actually make completion easier! (See section above “It Saves Time”)
It makes code less readable, by obfuscating the purpose of the variable with needless type and scoping prefixes.
A concise Hungarian Notation is easy to understand and does not take away from readability of code. And if you don’t have a ‘need’ to prevent defects before they happen, and you don’t ‘need’ any additional helpful information when reading code, then by all means, don’t bother. Best of luck to you. By the way, you’re kidding yourself.
The additional type information can insufficiently replace more descriptive names. E.g. sDatabase doesn't tell the reader what it is. databaseName might be a more descriptive name.
Uh, so is ‘databaseName’ a ‘char*’, or maybe a String object? Or a StringBuffer object? I guess that wasn’t quite descriptive enough, buck-o!
When names are sufficiently descriptive, the additional type information can be redundant. E.g. firstName is most likely a string. So naming it sFirstName only adds clutter to the code.
I love it... “most likely” a string. Let’s all just start making assumptions in our code based on what is most likely, shall we? But please leave a comment if you’re writing software for an airplane or air traffic control or something so I know never to fly that airline! That ‘s’, whatever that means for that convention, is explicit, where the name alone is only implicit.
It's harder to remember the names.
This doesn’t even make sense. Let’s just move along...

Another complaint I’ve heard in person is that it can be used inconsistently, which makes it less effective. While that is true, it is a social problem within the team, not a problem with the notation itself.

Too often, arguments against Hungarian Notation come down to someone's personal preference.


You’re Probably Already Doing It Without Realizing It


A lot of talented programmers don’t use Hungarian Notation, but the irony is that they often achieve the same goals in other ways. I’m sure some of it is intentional, but some is probably also done without realizing it. It’s “six of one, half-dozen of the other”, except that these other techniques are extremely inconsistent, and much more individualized, which means that theya are less effective in the long run.
  • Using suffixes - this is a common naming technique for clarifying the underlying type of a value. A good example might be something like an enumeration in Java, where you have a string in ‘enumValueName’ and the actual enumeration value in ‘enumValueEnum’. Or maybe you are getting a numerical value from a file, so you read the number into ‘valueStr’, then parse it into ‘valueNum’.
  • Using ‘this’ - I see this all the time in setters and constructors. For example, a setter like setTimeout(int timeout) { this.timeout = timeout; }. The use of ‘this’ in this way is just another way of making a distinction that is automatic when you have hungarian naming prefix.
  • IDE colors - I’ve heard the argument that modern IDEs can format different types and scopes for you, so it’s unnecessary to use hungarian notation. Again, this is just another technique for achieving the same goal. The only problem is that it is completely unique to the IDE, it can be customized per developer, and is not common to all IDEs and versions. So, if I rely on my colors to help remind me what things are, and I get used to it, I will be totally lost when I go look over the shoulder of a colleague who wants to go over some of his code. And for that matter, if I switch to another IDE, or even another workspace, or even another version of the IDE, it’s likely to be inconsistent with your color and formatting settings. Character prefixes in hungarian notation are never impacted by the editor or IDE.

These are just examples of the kinds of things I’ve seen programmers do (including myself). They may work, but these are just things that developers do for their own benefit, or as their own attempt at creating “good” names. These tactics are often too individual, which means they are inconsistent, and will most likely be inconsistent even within a team, and that greatly reduces the benefit from one developer to the next.

What Makes an Effective Hungarian Notation


A standard naming convention with Hungarian Notation will be consistent across an entire team, and will persist for those that will maintain the code in the future.

What’s important with a naming convention is that it is concise and easy to understand. Naming prefixes cannot solve all of your naming problems, and they should not try to be more than terse hints as to the type and scope of a given entity. Many of the complaints about Hungarian Notation are targeted at long combinations of multi-character prefixes that can indeed become redundant and hard to understand.

It should be limited to indications of scope, type, possible multiplicity for arrays, and possible type modifier for pointers and references.

They’re all just words, names. But most quickly identifying what it is a name *of* is the goal.

An Example of Concise Hungarian Notation Prefixes

The following are prefixes I use in my own coding, based on conventions I’ve used in the past, but with modifications based on my own experience. This list is somewhat generic, and it can be tailored for specific languages. But this should cover enough possibilities to remain mostly consistent between languages. (I’m open to any comments with suggested improvements; this is a living naming convention).

The convention consists of 3 possible characters. The first and third characters are always present. The middle character is only present for applicable types.

Character 1 - Scope

a - method or function ‘a’rgument
g - global variable
s - class ‘s’tatic member
p - function static (‘p’ersistent value)
l - local
c - closure scope
e - enumeration. This is kinda the same as a global, but it helps make the distinction.
m - member
f - function pointer or “functor” (i.e. callable object)
_ (underscore) - package or global "private", unenforced. This is an honor-system convention for indicating that a variable should only be mondified by designated "internal" code.
__ (2 underscores) - same as one underscore, but for object member properties. Used to indicate that the member is intended to be used only by the class/object methods.

Character 2 - Multiplier/Qualifier

These are only present when applicable. When they are, the prefix will be at least 3 characters long. These characters should be repeated or combined as necessary, so it may be longer.

a - array
c - collection. Any collection object that’s not an array, such as a Map, List, Set, Vector, etc.
p - pointer. Indicates the value is a pointer to the type indicated in the final character
r - C++ reference

Final character - Type

b - boolean
i - integral (i.e. char, short, integer, long integer, etc)
r - rational (i.e. float, double, etc)
n - numeric (ambiguous)
e - enumeration
o - object
s - string object
z - zero-terminated C-style string
v - void/volatile (intended to hold values of different types)

Some Examples

public void foo(
    int aiIndex,  // integer argument
    String asName, // string argument
    Map<int, string=""> acsMapping // argument collection of strings
);
Socket goCommonSock; // global object
boolean gbCurrentlyConnected; // global boolean
char* lzName; // local c-style string
char[] laiName; // c-style string held in an array


Think It Over

If you’ve never wanted to try any sort of Hungarian Notation, I hope this might inspire you to at least give it a shot. The truth is, the best convincing can only be done by yourself. And the only way that could ever happen is if you give it a chance. That means take off the skeptical hat for a month or two, get a good, concise naming convention (doesn’t have to be based on mine), and use it diligently for some serious coding. Even if you’re still not convinced, you may at least be able to understand and/or appreciate why people use it.

No comments: