Joel Spolsky has a popular post from 2005 on "Making Wrong Code Look Wrong". In this post, I'll address his suggestion to use Hungarian Notation. It turns out we can now do better: we can make wrong code... wrong!
Hungarian Notation is a naming scheme that conveys information about methods and variables. For example, we might use the prefix "us" to denote variables that hold unsafe values, e.g., user input. This notation makes it easy to notice the SQL injection vulnerability in the following code:
:::Java
String usUser = getUserInput();
executeSqlQuery("SELECT * FROM users WHERE user='" + usUser + "'");
The problem with Hungarian Notation is that you still have to manually inspect the code. For real projects, manual inspection is insufficient because of overloading, overridding, threading, version control merges, etc. Testing is also insufficient because it can "show the presence of bugs, but never show their absence."
Type Qualifiers
Hungarian Notation typically doesn't convey arbitrary information
about a variable/parameter. Rather, the notation describes the set of
values an expression may or may not have. In the example above, the
"us" prefix tells us that usUser
is not just a String
, but a
potentially unsafe String
. The prefix is "qualifying" the
variable's type.
Java 8 brought language support for user-defined Type Qualifiers. Instead of using naming schemes, we can now define custom Type Qualifiers that we can use in method signatures:
:::Java
public void executeSqlQuery(@Safe String query) { ... }
public @Unsafe String getUserInput() { ... }
and method bodies:
:::Java
@Unsafe String usUser = getUserInput();
executeSqlQuery("SELECT * FROM users WHERE user='" + usUser + "'");
Enforcing Type Qualifiers
Type Qualifiers are easier to read than Hungarian Notation because you don't have to parse (possibly multiple) prefixes and suffixes. In some cases, however, Type Qualifiers are harder to reason about. Using Type Qualifiers, the qualifier is typically only written on declarations. Using Hungarian Notation, the qualifier appears at every occurence of the variable.
Luckily, there are tools that can reason about Type Qualifiers for us,
just like the Java compiler reasons about types. One such tool is the
Checker Framework. Using the Checker
Framework, we can define a checker that issues a warning wherever an
@Unsafe
expression is used where a @Safe
expression is
required. It's as simple as telling the Checker Framework that @Safe
is a subtype of @Unsafe
— a safe value can be used wherever an
@Unsafe
value is expected, but not vice versa:
:::Java
@TypeQualifier
@SubtypeOf({})
@DefaultQualifierInHierarchy // By default, expressions are potentially unsafe
@Target({ ElementType.TYPE_USE, ElementType.TYPE_PARAMETER })
public @interface Unsafe {}
:::Java
@TypeQualifier
@SubtypeOf({ Unsafe.class }) // @Safe is a subtype of @Unsafe
@Target({ ElementType.TYPE_USE, ElementType.TYPE_PARAMETER })
public @interface Safe {}
What makes the Checker Framework particularly effective is its flow-sensitivity. Checkers can reason about the type of a variable throughout the method:
:::Java
String user = getUserInput();
// WARNING: user is known to be @Unsafe
executeSqlQuery("SELECT * FROM users WHERE user='" + user + "'");
// encode has return type @Safe String
user = encode(user);
// SAFE: user is known to be @Safe
executeSqlQuery("SELECT * FROM users WHERE user='" + user + "'");
In practice, flow-sensitivity means that you rarely have to annotate local variables.
For a more thorough introduction to Type Qualifiers in Java 8, check out my article on InfoQ. Complete instructions for creating your own checker can be found in the Checker Framework documentation.
Enforcing Hungarian Notation
As we've seen, Hungarian Notation and Type Qualifiers are both techniques for qualifying types. Therefore, it should come as no surprise that we can leverage Type Qualifiers and the Checker Framework to automatically enforce Hungarian Notation.
To do so, we just need to have the Checker Framework read names as if we were reasoning about the code ourselves. For our running example, we have the following "type introduction" rule:
If the name of a variable or parameter has the prefix "s", add a
@Safe
annotation to its type.
We implement the rule by adding logic to the AnnotatedTypeFactory
for our checker. The following method adds the @Safe
qualifier to
local variables:
:::Java
private Pattern safeRegex = Pattern.compile("s[A-Z_].*");
private AnnotationMirror SAFE = AnnotationUtils.fromClass(elements, Safe.class);
@Override
public Void visitVariable(VariableTree node, AnnotatedTypeMirror type) {
if (safeRegex.matcher(node.getName()).matches()) {
type.replaceAnnotation(SAFE);
}
return super.visitVariable(node, type);
}
The logic is the same for parameters and fields. You can find a full implementation of the checker on GitHub.
The result is a system that combines the conciceness of Hungarian Notation with the safety of compile-time checking:
:::Java
// Parameter sQuery is given the @Safe annotation
public void executeSqlQuery(String sQuery){ ... }
// WARNING: sUser is given the @Safe annotation
String sUser = getUserInput();
Summary
This post demonstrated how Type Qualifiers and the Checker Framework can enforce Hungarian Notation at compile-time. Compile-time checking makes wrong code not just look wrong, but actually wrong!
The source code for the checker is available on GitHub at: https://github.com/twschiller/hungarian-checker.
This post was inspired by a talk that Charles Simonyi gave on May 10, 2012 at the University of Washington about the importance of notation.