Saturday, July 5, 2014

tuples vs records vs proplists

Data structures for business logic

When function is being designed, one of the most important questions is data structure to use. In Erlang there are several compound "types": lists, records, maps and tuples. ETS is separate type which has a specific usage, in most cases there is no confusion with other data structures. Maps appeared in Erlang 17 and not covered in this article even though they combine properties of both lists and tuples.
List contains variable number of elements. One of the popular "subtype" of it is proplist, which is a list of {Key, Value} tuples. It does not allow pattern matching on elements except on head.
Tuple's arity is fixed and client code in most cases should be aware of exact size. It provides intensive usage of pattern matching. With growing size usage of tuples becomes error-prone.
Records solve complexity of big tuples' usage by adding syntax sugar on compile time. In fact records are translated to tuples. Usage of records might require some additional compile dependencies.
Another alternative to tuples, records and proplists is complex function signature.
save_user(Name, Surname, Age)
where  each property is a separate argument. It's mostly equivalent to
save_user({Name, Surname, Age}).
except that if signature of function changes ofter it's much more work to adapt code comparing to single tuple argument.
In API design performance of operations on types of arguments and return values is not as crucial as simplicity of usage, protection from misuse and other criteria of "good code/design". That's why no benchmarks are provided.
Recipe of good design in this scope is quite straightforward: "Data structure shall be chosen based on it's characteristics and logic of code". It is easier to illustrate this rule with an example.

Example

Service with RESTful interface is being developed. User information is received in JSON format in POST request.
{
 "name": "Eddy",
 "surname": "Snow",
 "age": 28
}
There is a general purpose JSON object parsing function (arrays are not covered here for simplicity), it accepts binary as an argument. Data structure for return value is not that obvious.
First of all, client code needs to detect errors in parsing. Assuming that exceptions are not used, function should return tuple {ok, Result} on success or {error, Reason} on failure. Tuple for error handling here fits perfect, there is no need to introduce record since tuple size is two.
Next is data structure for Result. Since function parse_json_object is generic, it can parse any object and result  is dynamic. Proplist is suitable type for it.
-spec parse_json_object(JSON::binary()) ->
  {ok, Result::list()} | {error, Reason::term()}.

Once user information is correctly parsed, it might be needed to validate it and store in the database. So kind of internal user object is required. It could be tempting to continue using proplist, but record fits much better for it, because it applies number of compile-time checks.
-record(
  user,
  {
     name,
     surname,
     age
  }
).

Advantages and disadvantages of records

Elements of record are accessed by name and Erlang compiler verifies that only existing properties are "got/set". If tuple is used instead data could be accessed only by index of element, which is error-prone with big tuple's size.
Sometimes it's declared that records are not suitable for storing in riak in erlang binary format, because if record declaration is changed data in storage becomes invalid in terms of matching it to the new version. In fact it's worth to implement some serialisation layer for this task as versioning might also be required for proplist or tuple.
Another argument against records could be hot code upgrade because of the same problem with changing of record declaration. In that case law of "not using records as tuples" could be broken in code_change callback.

Conclusion

To sum up, general recommendation for choosing data structure are:

  1. Do not use tuples of size more than 3.
  2. Use proplists only if number of "object's" properties varies.
  3. Consider records as a main alternative to tuple of big size.

No comments:

Post a Comment