Erlang advisor

Tuesday, October 27, 2015

When to spawn a process?

Process spawning model

One of the most popular mistakes in Erlang development is a wrong choice of process spawning model. Basically people spawn too many or too few processes. On internet there are many recommendations on when to spawn a process. One of them is "process per message", which encourages to spawn a process for each concurrent entity. It could be not completely clear or could even be misunderstood. An example below illustrates different variants of spawning model.

Public key encryption example

It's needed to build an Erlang application, which provides functionality of public key encryption. Encryption could take a significant time, and algorithm is implemented in pure Erlang. Clients of the application should call a function encrypt/1, providing a data to be encrypted as an argument. The application should manage keys without exposing that complexity to end users. Encryption key is stored in file on disk.

0 processes

The first naive implementation could be the following:

encrypt(Data) ->
  {ok, Key} = file:read_file(?KEY_FILE_PATH),
  encrypt(Key, Data).

On each encryption request key file is read and it's content is passed as argument to the algorithm together with data. Slow disk operation is probably something, we would like to minimize in our system.

1 process

The more advanced implementation is a gen_server, which caches the key in it's state in init/1 callback and does an encryption in handle_call/3.

encrypt(Data) ->
  gen_server:call(?SERVER, {data, Data}).

start_link() ->
  gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

init([]) ->
  {ok, Key} = file:read_file(?KEY_FILE_PATH),
  {ok, #state{key = Key}}.

handle_call({data, Data}, _From, State) ->
  Enc = encrypt(State#state.key, Data),
  {reply, Enc, State}.

This approach eliminates the problem with constant disk reading, but might lead to the process's queue being overwhelmed by requests, because new encryption process can not be started until previous one is finished.

1+N processes

Some people improve performance of the previous example with spawning a new process, which does the actual encryption, in handle_call.

handle_call({data, Data}, From, State) ->
  erlang:spawn(fun() ->
    Enc  = encrypt(State#state.key, Data),
    gen_server:reply(From, Enc)
  end),
  {noreply, State}.

< That speeds things up, but leads to complicated and error-prone logic in the main "dispatcher" process, which should now monitor all other process it spawns as well as take care of reporting errors back to clients.

Pool of processes

Another alternative could be using one of pooling libraries, which organizes pre-allocated workers and takes care of task dispatching. Basically worker's code is the same as in example with single process besides that it should not be registered with name neither locally nor globally.

start_link() ->
  gen_server:start_link(?MODULE, [], []).

Again 1 process

But if we analyze the original task, we see, that we cache in state only the key. So the only thing, that needs to be done in handle_call is obtaining the key, and heavy encryption algorithm call could be moved to the context of client's process.

encrypt(Data) ->
  Key = gen_server:call(?SERVER, get_key),
  encrypt(Key, Data).

handle_call(get_key, _From, State) ->
  {reply, State#state.key, State}.

Getting a key from process's state is a relatively fast operation, thereby gen_server is not a bottle neck anymore.

Conclusion

Examples illustrate how good process spawning model can lead to efficient, simple and elegant code.
The rule of thumb for spawning could be the following:
A new process should be started to serialize an access to a shared resource. The resource could be cached memory, file descriptor(socket) and so on. People from C/C++ world can think about a process as about mutex. This rule might not fit all scenarios, but could be considered as a good starting point for design. It leads to elegant and highly concurrent code.
There are some exceptions of course. First of all, "let it crash" principle, which encourages to spawn a process for a code, which likely crashes.
Also people should not mix cached memory with memory for objects (in terms of OOP). For example, in online real-time strategy game there is a number of units, which belong to a specific user. Each unit stores in memory it's state, but developer should not spawn a process per unit. In fact shared resource in that case is user's session, which is represented by tcp socket. By analogy with OS mutex it's unlikely that synchronization primitive is needed for each unit in game.
Another exception could be a situation, when it's easier to describe some algorithm as Finite State Machine (FSM). It might be reasonable to spawn a process, which implements gen_fsm behaviour.

Sunday, September 13, 2015

Erlang enum

Rationale

One day each Erlang developer with C/C++ background tries to find mechanism similar to enum. Enum it's light-weight datatype with all possible values enumerated. There are multiple options of Erlang implementation.

Macros

Many developers just define macros with some meaningful name and unique (within enumeration) value.

-define(AUTH_ERROR, 404).
-define(FORMAT_ERROR, 400).
-define(INTERNAL_ERROR, 500).

Usage of such enums requires including header file with definition. Typo in macro's name would lead to compilation error.

Advanced developers even define a macro, which checks whether a value belongs to enumeration type.

-define(IS_ERROR(V), (
    V =:= ?AUTH_ERROR orelse
    V =:= ?FORMAT_ERROR orelse
    V =:= ?INTERNAL_ERROR
)).
handle_error(E) when ?IS_ERROR(E) ->
  case E of
    ?AUTH_ERROR -> abort();
    ?FORMAT_ERROR -> abort();
    ?INTERNAL_ERROR -> retry()
  end.

The problem with this approach is that uniqueness of values is not checked by compiler. One can easily add a new error into enumeration with value assigned to a different element of enumeration.

-define(NOT_FOUND, 404).

Obviously it can break logic which relies on the enum.

Atoms

A set of atoms can also be used as enumeration. Atoms are unique identifiers within entire system. Header file is not required for atom-enum.

 handle_error(E) ->
  case E of
    auth_error -> abort();
    format_error -> abort();
    internal_error -> retry()
end.

The only problem is to check whether a given atom is an element of enumeration. This issue arises if somebody makes a typo error in atom name, for example auth_eror instead of auth_error. One can try to resolve it by defining macro function, which checks if an atom belongs to enum. The macro could be used as a guard experession.

-define(IS_ERROR(V), (
  V =:= auth_error orelse
  V =:= format_error orelse
  V =:= internal_error
)).
handle_error(E) when ?IS_ERROR(E) ->
  ...,
  ok.
...
handle_error(auth_eror).

It again requires including header file to use IS_ERROR macro in different modules, and the issue could be only found on run-time. Error in atom would not lead to compilation error.
In addition to that IDE would not be able to do automatic refactoring such as renaming, because it's not safe to rename an atom in entire project, and scope can not be narrowed down due to global nature of atoms. Even "find usages" might return irrelevant results.

Records

There is one more non-obvious way to implement enum in Erlang. Just define a record, where each field is an element of enum.

-record(http_errors,{
  auth_error,
  format_error,
  internal_error
}).

Now each element of enumeration can be referenced as a record field's position.

handle_error(E) ->
  case E of
    #http_errors.auth_error -> abort();
    #http_errors.format_error -> abort();
    #http_errors.internal_error -> retry()
  end.
handle_error(#http_errors.auth_error).

Obviously all enumeration elements are unique since record field's position is unique, and it's guaranteed by compiler (compare with macros approach).
Any typo in field name will cause a compilation error (compare with atoms approach).
Automatic refactoring becomes easy for IDE, for example, renaming record's field is trivial. "Find usages" function has a well defined scope.
Of course, usage of record requires including header file with it's definition, but that is a fair price for compile-time checks it provides.
Since record field's position is just an integer, it can be used for defining external API. If record definition string ("-record(http_errors,{") is on the first line of a file, and each field is on new line, each enumeration element would have an integer value, which equals to line number in source code, where it's defined.

-record(http_errors,{
auth_error, %%authentication error
format_error, %%wrong json
internal_error %%connection to db server is lost
}).

For example, in http_errors enum auth_error element is on line 2, so it's integer values is 2. This value could be used as error code in RESTful API. It turns out that documentation is "generated" automatically.
Enum implementation using records is very close to C equivalent. It matches some verbose id with integer value. It automatically increments an interger value for each new element in enumeration. The only constraint is that integer value for the first element is always 2 and can not be changed.

Conclusion

To sum up, a comparison table is provided.

	Macro	Atom	Record
No include	-	+	-
Uniqueness	-	+	+
Compile time membership check	+	-	+
Automatic rename refactoring	+	-	+
Force value	+	-	-

Erlang record approach seems to be the best option for it compile-time uniqueness guarantees and membership check.

Monday, June 8, 2015

poolboy pitfall 2

The poolboy issue described in this article was successfully fixed, thereby in order to avoid it just make sure that you use at least version 1.5.1 of the library. Unfortunately another problem popped up.

Retries with poolboy

For some resources it is important to do retries within given time before reporting failure to the caller. From the first sight poolboy library provides all necessary features.

Internally it uses supervisor for restarting terminated processes (for retries);
Client can specify a timeout to wait for worker checkout.

So in worker's code one just needs to terminate a process if resource is not available and retries will be organised by poolboy.

handle_call(die, _From, State) ->
    {stop, {error, died}, dead, State}.

The issue

In fact the situation is a bit more complicated. Whenever gen_server callback returns a stop-tuple, it just instructs the underlying code in OTP to terminate the process, it is not stated in documentation, if caller gets response first and then the process terminates or vice versa. In addition to that poolboy's supervisor is notified about worker's termination, which is not reported to the caller without additional link/monitor.

So processes of worker's termination and checking out from the pool are not synchronized, as a result poolboy:checkout function might return a Pid of worker, which is already terminated. Further usage of the Pid will lead to exception exit: {noproc, ...}. Obviously the same could happen using poolboy:transaction function.

Client can handle this error by reporting failure, but that does not fulfill the requirement of retries for given time period.

The issue is reported in best traditions of TDD as a pull request with failing tests.

Workarounds

Since it was not fixed quickly the issue seems to be quite fundamental. In order to continue using poolboy some workaround is required.

The issue popped up, when I tried to organise retry logic by means of poolboy. An obvious workaround for this would be moving this logic to either worker or client, but my perfectionism did not allow me to expose such a complexity to that level.

Fortunately, commiters of the project advised me a simple technique to overcome the problem. Worker should check-in itself back to the pool in case of success, so that termination and check-in are synchronised.

handle_call(die, _From, State) ->
    {stop, {error, died}, dead, State};
handle_call(ok, _From, State) ->
    poolboy:checkin(pool_name, self()),
    {reply, ok, State}.

This trick implies, that poolboy:transaction can not be used anymore. It also breaks separation of concerns and abstraction, because worker starts "knowing" about the pool. But overall I find it as a "good deal" comparing to other workarounds for it's simplicity.

Friday, June 5, 2015

OOP in Erlang. Part 2. Polymorphism

Encapsulation was covered in previous article of my OOP in Erlang series.

Polymorphism

Polymorphism is ability to present the same interface for different instances (types).

Processes

First of all, process is a unit of encapsulation in Erlang, polymorphism could be implemented on that level.
Client can send the same message to different processes, which will handle the message differently.
For example, we can implement gen_server behaviour in two modules.
serv1.erl

handle_call(message, _From, State) ->
  {reply, 1, State}.

serv2.erl

handle_call(message, _From, State) ->
  {reply, 2, State}.

Then client chooses one of gen_servers and starts it, saving Pid of the process. And then gen_server:call(Pid, message) can be called and caller experiences different behaviour based on module chosen in the beginning.

Dynamic types

Erlang is dynamically typed language, as a result module and even function name could be taken from variable during function call. For example, interfaces of dict and orddict from OTP are unified, and following code shows polymorphism implementation.

Module =
       case Ordering of
           unordered ->
               dict;
           ordered ->
               ordict
       end,
Container = Module:from_list(Values),
Module:find(Key, Container).

Pattern matching

Pattern matching is a powerful feature of the language, which can be used for polymorphism implementation. The idea is that function changes it's behaviour depending on arguments it gets. The most obvious data type to match on is record.

move(#duck{}) -> move_duck();
move(#fish{}) -> move_fish().

Extensive usage of pattern matching for that purpose leads to decoupling of "object" from it's behaviour (business logic), as a result for adding of a new "type" changes in multiple modules are required. This is very similar to Anemic domain model, which has some advantages though.

Conclusion

Polymorphism in Erlang could me implemented in different ways. Even though processes provide the most clear interface for that, developer should not create processes only for modeling business logic, because:

Logic changes quite often and changing all boilerplate code of processes is an overhead.
System becomes more complicated with each new type of process spawned.

Wednesday, March 18, 2015

Resource management idioms in Erlang

Resource management problem

Resource management is an important question in any programming language. It stays important even if runtime provides garbage collector, because there are other resource besides memory. Resource is something, what client has to initialise and later clean up, for example, socket, file descriptor or connection to RDBMS.

Language specific solutions

Resource management problem is solved in different languages differently.

C

Every C developer has to take care of resource deallocation after some usage manually. For example, each memory allocation with malloc must be followed with free. From the first sight such approach might look not that difficult, because there are no exceptions in C. But in fact it makes code much more complicated. Situation is slightly improved in frameworks such as Glib.

C++

C++ is successor of C, it introduces classes with constructors and destructors and guarantees calling of object's destructor when it goes out of scope. As a result Resource Acquisition Is Initialisation (RAII) idiom emerged together with smart pointers, which are an implementation of the idiom. In C++ exceptions are also introduced and RAII helps in exception safety. So resource could be represented as an object, where it's initialised in constructor and deallocated in destructor.

Java

Even though objects in Java have finalize method, which is called before destruction, it can not be safely used for resource deallocation as C++ destructor. It's because Java runtime uses garbage collector and the moment of clean up is not guaranteed.
Instead try-finally block can be used

Type res = new Type();
try
{
    res.use();
}
finally
{
    res.deallocate();
}

The same approach is used in other languages with garbage collector (C#, Python, etc.) sometimes with additional syntax sugar.

Erlang

In Erlang developer can use all there techniques of resource management listed above.

Manual management

I assume that everybody is aware of disadvantages of C-style resource management and tries to limit it's usage, I would not go into details.

RAII

Some of OTP behaviours (gen_serv, gen_fsm) have init and terminate callbacks, which could be used for resource allocation and deallocation correspondingly. terminate is called right before process termination and fits for most usages of resource deallocation. The only potential issue could be a strict synchronisation of clean up with another process, which requires additional message sending.
Famous examples of RAII idiom from OTP are ETS and gen_tcp. ETS table is cleaned up by default, if process, which created it, terminates. Socket opened in context of process is closed on process termination as well.

try-catch

Resource deallocation function is guaranteed to be called in after clause of try-catch block. For example, this is how poolboy:transaction function is implemented.

transaction(Pool, Fun, Timeout) ->
    Worker = poolboy:checkout(Pool, true, Timeout),
    try
        Fun(Worker)
    after
        ok = poolboy:checkin(Pool, Worker)
    end.

Erlang-specific recommendations

The rule of thumb for choosing between try-catch block and RAII(separate process) is quite simple. Try-catch is easier and shorter, but is not applicable if allocated resource should exist during processing of next message/callback in caller process. It also can not be used if you need to share a resource between multiple processes.

Do not try to save few lines of boilerplate code for a new gen_server/gen_fsm module for resource management, when it's needed. It will eventually pay off with cleaner code.

General recommendations

Try to avoid manual (C-style) resource management.

Do not handle multiple resources neither in single try-catch block, nor in single object, which implements RAII idiom, because it's difficult to implement correctly.

Saturday, February 28, 2015

poolboy pitfall

Description of library

Poolboy is a popular Erlang library for organisation of workers' pools. For example, it's often used for RDBMS connections. It's API is extremely simple, after start client uses just one function poolboy:transaction, which calls poolboy:checkout and poolboy:checkin wrapped into try/catch block.

transaction(Pool, Fun, Timeout) ->

    Worker = poolboy:checkout(Pool, true, Timeout),

    try

        Fun(Worker)

    after

        ok = poolboy:checkin(Pool, Worker)

    end.

Restart of terminated workers, queueing and other complicated things are completely hidden from user.

Hidden restrictions

But there is one pitfall, which developers should be aware of. By default checkout is a blocking operation( and it's used with default settings in transaction function), it means that client code will not return until worker is allocated. But nothing lasts forever, poolboy:checkout is implemented as gen_server:call/2 and has timeout argument (default is 5 seconds).

-define(TIMEOUT, 5000).

checkout(Pool, Block, Timeout) ->

    try

        gen_server:call(Pool, {checkout, Block}, Timeout)

    catch

        Class:Reason ->

            gen_server:cast(Pool, {cancel_waiting, self()}),

            erlang:raise(Class, Reason, erlang:get_stacktrace())

    end.

If timeout occurs client is "exited" with timeout reason. Attempt to handle such situation has even worse consequences.
Poolboy correctly recovers from termination of process, which checked out a worker (this test case passes). But if poolboy:checkout exits with error and client tries to handle it without termination of process, the worker might stay blocked (this test case fails).

transaction_timeout() ->

    {ok, Pid} = new_pool(1, 0),

    ?assertEqual({ready,1,0,0}, pool_call(Pid, status)),

    WorkerList = pool_call(Pid, get_all_workers),

    ?assertMatch([_], WorkerList),

    ?assertExit(

        {timeout, _},

        poolboy:transaction(Pid,

            fun(Worker) ->

                ok = pool_call(Worker, work)

            end,

            0)),

    ?assertEqual(WorkerList, pool_call(Pid, get_all_workers)),

    ?assertEqual({ready,1,0,0}, pool_call(Pid, status)).

One can say that this issue could be experienced with enormous timeout value for checkout, but that could happen also in case of slow worker start, which is called in the same gen_server:handle_call, if overflow is allowed. Message from call might be queued for a long time, if worker is being restarted due to termination, as a result the same exit occurs on poolboy:checkout.

new_worker(Sup) ->

    {ok, Pid} = supervisor:start_child(Sup, []),

    true = link(Pid),

    Pid.

Found issue is reported to the author of poolboy together with PR, which reproduces the problem via unit test. I do not think this could be easily fixed with current architecture of poolboy, but following simple rule in client code can prevent problems.

Recommendation to avoid issues

Loosing of worker in pool could be avoided simply by not handling exits in process which call checkout. If handling of timeout on worker checkout is necessary just spawn a special process, which calls poolboy:transaction or poolboy:checkout, and "let it crash" handling it's exit as you want.

Monday, January 26, 2015

OOP in Erlang. Part 1. Encapsulation

Introduction

There is a popular statement, that principles from object oriented programming (OOP) are not be applicable in functional languages and particularly in Erlang. Let's try to analyse it.
The main concepts of object oriented design (OOD) are

Encapsulation.
Polymorphism.
Inheritance

Encapsulation

Encapsulation concept in Erlang could be found at least in process model.

Processes

Each process has it's own state, which can only be modified by handling of message sent to the process. Process is free to ignore any kind of message, handling particular ones.
One of Erlang's "design pattern" is process per message, which encourages to spawn a process for each concurrent instance in a system. In case of HTTP service requests' handling is concurrent, but having encapsulation only on that level is not enough for a good Web framework. Also good people never force client to call gen_server:call/2 for a gen_server they've implemented, they wrap such calls with functions in client API modules.

Modules

Erlang module has export attribute, which allows to call some functions implemented in it from outside. What else is it, if not splitting to public and private functions in OOP language?
Modules can represent objects even in easier way than processes. For example, queue and proplists. One can say that there is no protection from module's function being called with wrong argument, but the same situation is in Python, where private functions are distinguished from public only by naming conventions.

Conclusion

Erlang has two levels on encapsulation, which could be combined in order to create a good design for system.

To be continued.