Friday, July 18, 2014

Misuse of environment variables

Application environment variables

Environment variables are the main configuration mechanism for Erlang applications. Configuration of Erlang node basically is a list of application names together with list of application's environment variables.
Usually variables are set once on start of application and can be read any time during application's execution and in any place in the code. It's also possible to set or update it in runtime.
Application usually uses only it's environment variables, but accessing of other application's environment is also possible.
With such a "freedom" this mechanism is often misused. Let's see an example.

Testing

In gen_server:handle_call callback environment variable is read:

handle_call(use, _From, State) ->
 {ok, Var2} = application:get_env(var2),
 {reply, Var2, State}.

It works fine until we start writing of unit tests and application:gen_env/1 returns undefined leading to process crash. Next attempt could be setting variable in fixture of test case.

basic_test_() ->
 {
 setup,
 fun() ->
   application:set_env(app_env_var, var2, 3),
   start_link()
 end,

If it's launched in gen_server's test suite without starting application it crashes again. It happens because code is executed in context of other application (application:get_application/0 returns undefined until application is started). The only reasonable fix in this case is specifying of application name on getting environment variable value.

{ok, Var2} = application:get_env(app_env_var, var2)

To conclude, getting application environment variables in low-level functions makes testing more difficult.

Another approach

In order to overcome difficulties in testing, variables could be used in a different way. All environment variables should be read in application:start/2 callback and passed further to supervisor and rest processes in the chain.

start(_StartType, _StartArgs) ->
 {ok, Var1} = application:get_env(var1),
 app_env_var_sup:start_link(Var1).

In testing variables setup is changed with passing appropriate value to start_link function of gen_server and saving in state.

basic_test_() ->
 {
 setup,
 fun() -> start_link(3) end,

This kind of environment variable usage implies additional code for passing values to the place where it is actually used. One more disadvantage is necessary restart of the node for changing variable's value.
Environment variable can be compared with global variable in languages like C++ or Java. It is accessible everywhere, what could be convenient for some task, but has all the disadvantages, which are well know. 

Performance

One more argument in favour of not using environment variables is performance. Two gen_servers, which access data from application environment variable and from it's state correspondingly , were compared. Here are results.

gs1 (value from state accessed)  : 291802
gs2 (environment variable)       : 325638

Time is specified in microseconds, results are provided for 100000 runs of the same code.

Conclusion

Code of all unit tests and benchmarking is provided in github repository.
Passing environment variables' values as arguments from application:start/2  callback results into some additional boilerplate code, requires application's restart on configuration change, but makes code much cleaner and easier to test comparing to direct reading of env variables. It is important to find balance between both approaches. 
For global functionality such as logging passing configuration to each call using additional arguments is a big overhead in terms of code readability. If configurations of some application changes often and/or it's restart is undesirable accessing env variables is preferred. In most of other cases suggested approach fits better.

No comments:

Post a Comment