Lower bound
Sometimes lower bound algorithm, which returns an ordered subset of container's elements which are greater or equal to given value, is required in development. Surprisingly there is no implementation of it in OTP.
OTP containers
In Erlang there is a number of key-value containers with efficient search operation:- ETS;
- mnesia;
- gb_trees;
- dict;
- orddict.
Several implementations of sets: gb_sets, sets, ordsets, which are more or less the same as key-value container besides that only keys are present.
DETS - is a specialised file storage. Maps are not covered in this article since the appeared in Erlang 17.
Basically there are some alternatives to base lower bound algorithm implementation on.dict, sets
In documentation it's stated that the representation of a dictionary is not defined. Thereby implementation of lower bound algorithm requires reverse engineering and would be based on data structure which might be changed in next version.
In order to find out if optimisation exists for range matching in ETS, test is created. Two functions are implemented: first just finds element in ETS and returns it's key (key can be used as iterator), second - returns key of first element of lower bound . Both functions are implemented based on ets:select/3 with limit 1. ETS is initialised with 20000000 elements. Each element is a tuple {Key, Value}, where Key = Value. Integers incremented from 1 to 20000000 are used as keys. In performance test both functions are called with different element to search for in the same ETS. Here are function execution times.
From provided values it's obvious that ets:select/3 makes full table scan if the key is not bound. It's important to notice that select function stops after first matching element is found because of limit argument set to one.
Although optimisation of key range matching may appear in next version of Erlang at least for some types (ordered_set), ETS and mnesia are not suitable for lower bound algorithm implementation at the moment.
orddict, ordsets
Sorted lists from the first sight is one of the best data for lower bound implementation, but in fact lists in Erlang do not provide random access to it's elements with constant complexity (singly linked list). In ordict:find/2 source code one can see that it just traverses the entire list starting from the head. So binary search is not used, as a result not efficient lower bound implementation is possible.ETS, mnesia
ETS and mnesia are based on native code of Erlang VM and should have the best performance. Both containers have iterators so that resulting subset of lower bound algorithm could be traversed without copying. Unfortunately in ets:match_object as well as in mnesia:match_object key must be bound for efficient search, otherwise the entire container is searched.In order to find out if optimisation exists for range matching in ETS, test is created. Two functions are implemented: first just finds element in ETS and returns it's key (key can be used as iterator), second - returns key of first element of lower bound . Both functions are implemented based on ets:select/3 with limit 1. ETS is initialised with 20000000 elements. Each element is a tuple {Key, Value}, where Key = Value. Integers incremented from 1 to 20000000 are used as keys. In performance test both functions are called with different element to search for in the same ETS. Here are function execution times.
Key | Bound key time, s | Range key time,s |
---|---|---|
10000 | 0.000 | 0.002 |
100000 | 0.000 | 0.015 |
1000000 | 0.000 | 0.163 |
10000000 | 0.000 | 1.332 |
20000000 | 0.000 | 1.537 |
Although optimisation of key range matching may appear in next version of Erlang at least for some types (ordered_set), ETS and mnesia are not suitable for lower bound algorithm implementation at the moment.
Jacob Erlbeck found in ETS documentation of ets:next/2, that "If the table is of type ordered_set, the function returns the next key in order, even if the object does no longer exist." In combination with ets:lookup/2 this function might be used for efficient lower_bound algorith implementation on ETS. Thank you, Jacob.
ReplyDeleteI have found another lower_bound implementation in the current 18.0 release: gb_sets:iterator_from/2 and gb_trees:iterator_from/2 are probably exactly what you have looked for. The iterator can then be used with next/1 for an efficient traversal of the subsequent elements.
ReplyDelete1> S = gb_sets:from_list([1,5,8,11,56,78,89]).
2> I1 = gb_sets:iterator_from(10, S).
3> {E1, I2} = gb_sets:next(I1).
{11,..}
4> J1 = gb_sets:iterator_from(11, S).
5> gb_sets:next(J1).
{11,...}