Sunday, September 13, 2015

Erlang enum

Rationale

One day each Erlang developer with C/C++ background tries to find mechanism similar to enum. Enum it's light-weight datatype with all possible values enumerated. There are multiple options of Erlang implementation.

Macros

Many developers just define macros with some meaningful name and unique (within enumeration) value.
-define(AUTH_ERROR, 404).
-define(FORMAT_ERROR, 400).
-define(INTERNAL_ERROR, 500).

Usage of such enums requires including header file with definition. Typo in macro's name would lead to compilation error.
Advanced developers even define a macro, which checks whether a value belongs to enumeration type.
-define(IS_ERROR(V), (
    V =:= ?AUTH_ERROR orelse
    V =:= ?FORMAT_ERROR orelse
    V =:= ?INTERNAL_ERROR
)).
handle_error(E) when ?IS_ERROR(E) ->
  case E of
    ?AUTH_ERROR -> abort();
    ?FORMAT_ERROR -> abort();
    ?INTERNAL_ERROR -> retry()
  end.
The problem with this approach is that uniqueness of values is not checked by compiler. One can easily add a new error into enumeration with value assigned to a different element of enumeration.
-define(NOT_FOUND, 404).
Obviously it can break logic which relies on the enum.

Atoms

A set of atoms can also be used as enumeration. Atoms are unique identifiers within entire system. Header file is not required for atom-enum.
 handle_error(E) ->
  case E of
    auth_error -> abort();
    format_error -> abort();
    internal_error -> retry()
end.

The only problem is to check whether a given atom is an element of enumeration. This issue arises if somebody makes a typo error in atom name, for example auth_eror instead of auth_error. One can try to resolve it by defining macro function, which checks if an atom belongs to enum. The macro could be used as a guard experession.
-define(IS_ERROR(V), (
  V =:= auth_error orelse
  V =:= format_error orelse
  V =:= internal_error
)).
handle_error(E) when ?IS_ERROR(E) ->
  ...,
  ok.
...
handle_error(auth_eror).
It again requires including header file to use IS_ERROR macro in different modules, and the issue could be only found on run-time. Error in atom would not lead to compilation error.
In addition to that IDE would not be able to do automatic refactoring such as renaming, because it's not safe to rename an atom in entire project, and scope can not be narrowed down due to global nature of atoms. Even "find usages" might return irrelevant results.

Records

There is one more non-obvious way to implement enum in Erlang. Just define a record, where each field is an element of enum.
-record(http_errors,{
  auth_error,
  format_error,
  internal_error
}).

Now each element of enumeration can be referenced as a record field's position.
handle_error(E) ->
  case E of
    #http_errors.auth_error -> abort();
    #http_errors.format_error -> abort();
    #http_errors.internal_error -> retry()
  end.
handle_error(#http_errors.auth_error).

Obviously all enumeration elements are unique since record field's position is unique, and it's guaranteed by compiler (compare with macros approach).
Any typo in field name will cause a compilation error (compare with atoms approach).
Automatic refactoring becomes easy for IDE, for example, renaming record's field is trivial. "Find usages" function has a well defined scope.
Of course, usage of record requires including header file with it's definition, but that is a fair price for compile-time checks it provides.
Since record field's position is just an integer, it can be used for defining external API. If record definition string ("-record(http_errors,{") is on the first line of a file, and each field is on new line, each enumeration element would have an integer value, which equals to line number in source code, where it's defined.
  1. -record(http_errors,{
  2.   auth_error,        %%authentication error
  3.   format_error,      %%wrong json
  4.   internal_error     %%connection to db server is lost
  5. }).

For example, in http_errors enum auth_error element is on line 2, so it's integer values is 2. This value could be used as error code in RESTful API. It turns out that documentation is "generated" automatically.
Enum implementation using records is very close to C equivalent. It matches some verbose id with integer value. It automatically increments an interger value for each new element in enumeration. The only constraint is that integer value for the first element is always 2 and can not be changed.

Conclusion

To sum up, a comparison table is provided.

MacroAtomRecord
No include-+-
Uniqueness-++
Compile time membership check+-+
Automatic rename refactoring+-+
Force value+--

Erlang record approach seems to be the best option for it compile-time uniqueness guarantees and membership check.