Minborg

Minborg
Minborg

Monday, December 15, 2014

Java 8, BYO Super Efficient Maps! Quadruple Your Map Performance

Background

Maps are used ubiquitously in almost all Java applications. There are several built-in implementations of the Map interface tailored for different purposes. All these Map implementations rely on calculating a hash value for the key, and using that hash value, we are able to put or get objects from a backing array. But sometimes we know more about the keys than that they are just Objects. In this post I will show how you can take advantage of that fact to write super-fast and super-efficient Map implementations. The implementation devised here is really nothing more than an offsetted array decorated with the Map interface methods. Since it does not have to deal with hashing, rehashing and hash collision, it is unsurprising that it is so fast. Its potential memory efficiency is however, something to elaborate on.

Performance

When comparing one of the Map implementation depicted in this post (namely the IntArrayMap) we will see that it is substantially more memory efficient than any other of the built-in Map implementation.

           HashMap   5470 MB     54 bytes per entry 
           TreeMap   5599 MB     55 bytes per entry
     LinkedHashMap   6271 MB     62 bytes per entry 
 ConcurrentHashMap   5471 MB     54 bytes per entry 
       IntArrayMap    400 MB      4 bytes per entry

(Bytes per entry is excluding size of value object itself.)

Applying the same comparative test but for get() method performance, we will also be able to conclude that the IntArrayMap is much faster than the other Maps.

           HashMap    44864 ms,   111 MTPS
           TreeMap   364277 ms,    13 MTPS
     LinkedHashMap    74616 ms,    67 MTPS
 ConcurrentHashMap    62002 ms,    80 MTPS
       IntArrayMap(*) 29635 ms,   168 MTPS
       IntArrayMap(**) 9520 ms,   525 MTPS

All tests are made using a JIT warmup of 5 000 000 000 iterations and then an additional 5 000 000 000 iterations for actual testing.

So the IntArrayMap is potentially ten times more memory efficient and more than four times quicker than any other built-in Map.

IntArrayMap(*) using the get(Object) method.
IntArrayMap(**) using the get(int) method.

Restrictions

In this post, the keys in the Map implementations must be an Object that extends Number, for example Integer, Long or Byte. The Object must not be able to hold any decimals (excluding the classes Double, Float, BigInteger and BigDecimal). Furthermore the key space must be known in advance. This might first appear as an unacceptable restriction and I agree that this certainly is the case for many applications. However, there are, more often than one might think, times when we know the key space in advance. For example, when dealing with Enums or other static mappings or smaller dynamic maps. A final restriction in the solutions provided in this post is that the key set must be within the range of Integer.

Overall Solution

Because the key space is know, the solutions can rely on direct mapping of the key onto an array that holds the values. So, when we create a Map, we also provide the minBound and maxBound index we want to use. If we, for example, create a Map with key space (10-14) we will create an array with 5 elements (maxBound-minBound+1) and when we get() from that map, we will take the key, subtract it with 10 (the minBound) and then get the value in that array position. The solution is related to the way EnumMap works.

Benefits


  • Much more memory efficient because the key is not stored in the Map. It is instead derived implicitly from the index.
  • Deterministic behavior since we do not have to re-hash when the Map is too small and needs to be grown.
  • Faster put() and get() operations because no hash code needs to be computed.
  • Reduced object creation rate (no boxing/un-boxing) because the put() and get() methods can work directly with primitive classes (e.g. int instead of Integer)


Drawbacks


  • Static key space that must be decided upon before map creation.
  • Keys must extend Number.
  • Less memory efficient for sparse maps (i.e. maps with few entries in the key space).
  • Key space must be within Integer range.
  • The Map implementations does not support null values.


Base Number Array Map

This is how the overall abstract base class NumberArrayMap looks like (yes, I know it it a big class. There are many things a Map has to do):

public abstract class NumberArrayMap<K extends Number, V> extends AbstractMap<K, V> implements Map<K, V>, Serializable {

    private static final long serialVersionUID = -2304239764179123L;

    private int minBound;
    private int maxBound;
    //
    private transient int size;
    private transient int modCount;
    private transient int capacity;
    private transient V[] values;

    private transient Set<Map.Entry<K, V>> entrySet;

    protected NumberArrayMap(int minBound, int maxBound) {
        if (minBound > maxBound) {
            throw new IllegalArgumentException("Error: minBound (" + minBound + ") must not be greater than maxBound (" + maxBound + ")");
        }
        this.minBound = minBound;
        this.maxBound = maxBound;
        init();
    }

    private void init() {
        initCapacity();
        clear();
    }

    @Override
    public int size() {
        return size;
    }

    @Override
    public boolean isEmpty() {
        return size == 0;
    }

    @Override
    public boolean containsKey(Object key) {
        if (key instanceof Number) {
            return containsKey(((Number) key).intValue());
        }
        return false;
    }

    public boolean containsKey(int key) {
        return isInRange(key) && values[index(key)] != null;
    }

    @Override
    public boolean containsValue(Object value) {
        for (final V v : values) {
            if (v != null && v.equals(value)) {
                return true;
            }
        }
        return false;
    }

    @Override
    public V get(Object key) {
        if ((key instanceof Number)) {
            return get(((Number) key).intValue());
        }
        return null;
    }

    public V get(int key) {
        if (isInRange(key)) {
            return values[index(key)];
        }
        return null;
    }

    @Override
    public V put(K key, V value) {
        return put(key.intValue(), value);
    }

    public V put(int key, V value) {
        assertInRange(key);
        Objects.requireNonNull(value, "This Map does not support null values");
        final V oldValue = get(key);
        if (oldValue == null) {
            // A new value, increment size
            size++;
        }
        values[index(key)] = value;
        modCount++;
        return oldValue;
    }

    @Override
    public V remove(Object key) {
        if (key instanceof Number) {
            return remove(((Number) key).intValue());
        }
        return null;
    }

    public V remove(int key) {
        final V oldValue = get(key);
        if (oldValue != null) {
            // An existing value, decrement size
            size--;
        }
        values[index(key)] = null;
        modCount++;
        return oldValue;
    }

    @Override
    public void putAll(Map<? extends K, ? extends V> m) {
        for (final Entry<? extends K, ? extends V> e : m.entrySet()) {
            put(e.getKey(), e.getValue());
        }
    }

    @Override
    public void clear() {
        @SuppressWarnings("unchecked")
        final V[] newValues = (V[]) new Object[capacity];
        this.values = newValues;
        size = 0;
        modCount++;
    }

    @Override
    public Set<Entry<K, V>> entrySet() {
        Set<Entry<K, V>> es;
        return (es = entrySet) == null ? (entrySet = new NumberArrayMap<K, V>.EntrySet()) : es;
    }

    final class EntrySet extends AbstractSet<Map.Entry<K, V>> {

        @Override
        public final int size() {
            return size;
        }

        @Override
        public final void clear() {
            NumberArrayMap.this.clear();
        }

        @Override
        public final Iterator<Map.Entry<K, V>> iterator() {
            return new NumberArrayMap<K, V>.EntryIterator();
        }

        @Override
        public final boolean contains(Object o) {
            if (!(o instanceof Map.Entry)) {
                return false;
            }
            final Map.Entry<?, ?> e = (Map.Entry<?, ?>) o;
            final Object key = e.getKey();
            final V valueCandidate = get(key);
            return valueCandidate != null && valueCandidate.equals(e.getValue());
        }

        @Override
        public final boolean remove(Object o) {
            if (o instanceof Map.Entry) {
                final Map.Entry<?, ?> e = (Map.Entry<?, ?>) o;
                final Object key = e.getKey();
                final V valueCandidate = get(key);
                if (valueCandidate != null && valueCandidate.equals(e.getValue())) {
                    return NumberArrayMap.this.remove(key) != null;
                }
            }
            return false;
        }

        @Override
        public final void forEach(Consumer<? super Map.Entry<K, V>> action) {
            Objects.requireNonNull(action);
            if (size > 0) {
                int mc = modCount;
                for (int i = minBound; i <= maxBound; ++i) {
                    final V value = get(i);
                    if (value != null) {
                        action.accept(makeEntry(makeKeyFromInt(i), value));
                    }
                    if (modCount != mc) {
                        throw new ConcurrentModificationException();
                    }
                }
            }
        }
    }

    private abstract class BucketIterator<N> implements Iterator<N> {

        private Entry<K, V> next;
        private Entry<K, V> current;
        private int expectedModCount;
        private int currentIndex;

        public BucketIterator() {
            expectedModCount = modCount;
            currentIndex = 0;
            if (size > 0) {
                // advance to first entry
                forwardToNext(minBound);
            }
        }

        @Override
        public final boolean hasNext() {
            return next != null;
        }

        public final Entry<K, V> nextEntry() {
            if (modCount != expectedModCount) {
                throw new ConcurrentModificationException();
            }
            if (next == null) {
                throw new NoSuchElementException();
            }
            current = next;
            forwardToNext(currentIndex + 1);
            return current;
        }

        @Override
        public final void remove() {
            final Entry<K, V> p = current;
            if (p == null) {
                throw new IllegalStateException();
            }
            if (modCount != expectedModCount) {
                throw new ConcurrentModificationException();
            }
            current = null;
            final K key = p.getKey();
            NumberArrayMap.this.remove(key);
            expectedModCount = modCount;
        }

        private void forwardToNext(int start) {
            if (start <= maxBound) {
                for (int i = start; i <= maxBound; ++i) {
                    final V value = get(i);
                    if (value != null) {
                        currentIndex = i;
                        next = makeEntry(makeKeyFromInt(i), value);
                        return;
                    }
                }
            }
            // We have reached the end...
            next = null;
        }

    }

    final class KeyIterator extends BucketIterator<K> {

        @Override
        public final K next() {
            return nextEntry().getKey();
        }
    }

    final class ValueIterator extends BucketIterator<V> {

        @Override
        public final V next() {
            return nextEntry().getValue();
        }
    }

    final class EntryIterator extends BucketIterator<Map.Entry<K, V>> {

        @Override
        public final Map.Entry<K, V> next() {
            return nextEntry();
        }
    }

    protected abstract K makeKeyFromInt(int k);

    private Entry<K, V> makeEntry(K key, V value) {
        return new AbstractMap.SimpleImmutableEntry<>(key, value);
    }

    private int index(int key) {
        return key - minBound;
    }

    private boolean isInRange(int key) {
        return (key >= minBound && key <= maxBound);
    }

    private void assertInRange(int key) {
        if (!isInRange(key)) {
            throw new ArrayIndexOutOfBoundsException("Key " + key + " out of range. Key shoud be >= " + minBound + " and <= " + maxBound);
        }
    }

    private void initCapacity() {
        this.capacity = maxBound - minBound + 1;
    }

    private void writeObject(java.io.ObjectOutputStream s) throws IOException {
        s.defaultWriteObject();
        s.writeInt(minBound);
        s.writeInt(maxBound);
        s.writeInt(size);
        for (final Entry<K, V> e : entrySet()) {
            final V value = e.getValue();
            if (value != null) {
                s.writeObject(e.getKey());
                s.writeObject(e.getValue());
            }
        }
    }

    private void readObject(java.io.ObjectInputStream s) throws IOException, ClassNotFoundException {
        s.defaultReadObject();
        minBound = s.readInt();
        maxBound = s.readInt();
        init();
        final int noItems = s.readInt();
        if (noItems < 0) {
            throw new InvalidObjectException("Illegal noItems count: " + noItems);
        }
        if (noItems > 0) {
            // Read the keys and values, and put the mappings in the NumberMap
            for (int i = 0; i < noItems; i++) {
                @SuppressWarnings("unchecked")
                final K key = (K) s.readObject();
                @SuppressWarnings("unchecked")
                final V value = (V) s.readObject();
                put(key, value);
            }
        }
    }

    @SuppressWarnings("unchecked")
    @Override
    public Object clone() throws CloneNotSupportedException {
        NumberArrayMap<K, V> result;
        try {
            result = (NumberArrayMap<K, V>) super.clone();
        } catch (CloneNotSupportedException e) {
            // this shouldn't happen, since we are Cloneable
            throw new InternalError(e);
        }
        result.init();
        result.putAll(this);
        return result;
    }


}


All concrete implementing classes that inherit from this class must implement the K makeKeyFromInt(int k) method in order to be able to reproduce a key for any given index. This is due to the reason that we do not store the keys as the built-in Map implementations do. If, for example, the key type K is Byte, then we need to cast k to byte like this: return (byte)k;  We will see how that works in the next chapter.

It is worth noticing that the normal contains(), get(), remove() and put() methods all have corresponding methods that works directly with int as key. This is important form a performance perspective because that way, we can avoid unnecessary boxing/un-boxing of keys.

By letting our class inherit from AbstractMap we get a number of methods for free, including the hashCode(), equals() and toString() methods. We also elect to override some of the AbstractMap methods for performance reasons.



Sub Classes

Now that the abstract NumberArrayMap class is implemented, it is very easy to create concrete Map implementations for different Number classes such as Integer and Long.  Here is a list of the ones I have made:

public class ByteArrayMap<V> extends NumberArrayMap<Byte, V> {

    private static final long serialVersionUID = -2304239764179124L;
  
    public ByteArrayMap() {
        this(Byte.MIN_VALUE, Byte.MAX_VALUE);
    }

    public ByteArrayMap(byte minBound, byte maxBound) {
        super(minBound, maxBound);
    }

    @Override
    protected Byte makeKeyFromInt(int k) {
        return (byte) k;
    }

}

public class ShortArrayMap<V> extends NumberArrayMap<Short, V> {

    private static final long serialVersionUID = -2304239764179125L;
  
    public ShortArrayMap(short minBound, short maxBound) {
        super(minBound, maxBound);
    }

    @Override
    protected Short makeKeyFromInt(int k) {
        return (short)k;
    }

}


public class IntArrayMap<V> extends NumberArrayMap<Integer, V> {

    private static final long serialVersionUID = -2304239764179126L;
  
    public IntArrayMap(int minBound, int maxBound) {
        super(minBound, maxBound);
    }

    @Override
    protected Integer makeKeyFromInt(int k) {
        return k;
    }

}


public class LongArrayMap<V> extends NumberArrayMap<Long, V> {

    private static final long serialVersionUID = -2304239764179127L;

    public LongArrayMap(int minBound, int maxBound) {
        super(minBound, maxBound);
    }

    @Override
    protected Long makeKeyFromInt(int k) {
        return (long) k;
    }

}



// This class will erroneously map Numbers with equal integers (e.g. 1.2 and 1.3) to the same key
// The same goes for Double and Float, so use only integer Number implementations
@Depricated

public class BigDecimalArrayMap<V> extends NumberArrayMap<BigDecimal, V> {

    private static final long serialVersionUID = -2304239764179128L;

    public BigDecimalArrayMap(int minBound, int maxBound) {
        super(minBound, maxBound);
    }

    @Override
    protected BigDecimal makeKeyFromInt(int k) {
        return new BigDecimal(k);
    }

}



Working with Enums

Often, Enums comes with additional properties of some kind. Consider the following Enum class:

public enum SomeEnum 

    ADAM(0, 4), BERT(1, 42), CHARLIE(2, 134), SOME_DUDE(10, 13);

    private SomeEnum(int value, int luckyNumber) {
        this.value = value;
        this.luckyNumber = luckyNumber;
    }

    private final int value;
    private final int luckyNumber;
    public int getValue() {         return value;     }
    public int getLuckyNumber() {         return luckyNumber;     } }


By implementing a Map support class, we can very easily implement lookup methods (or "finders" as they sometimes are called). Here is the support class:

public class IntEnumArrayMap<E extends Enum<?>> extends IntArrayMap<E> {

    private static final long serialVersionUID = -2304239764179128L;

    public static <E extends Enum<?>> IntEnumArrayMap<E> fromEnums(E[] enums, ToIntFunction<E> function) {
        final int min = Stream.of(enums).mapToInt(function).min().getAsInt();
        final int max = Stream.of(enums).mapToInt(function).max().getAsInt();
        final IntEnumArrayMap<E> result = new IntEnumArrayMap<>(min, max);
        Stream.of(enums).forEach((e) -> result.put(function.applyAsInt(e), e));
        return result;
    }

    public IntEnumArrayMap(int minBound, int maxBound) {
        super(minBound, maxBound);
    }
}

Now we can implement our finders like this:

public enum SomeEnum {

    ADAM(0, 4), BERT(1, 42), CHARLIE(2, 134), SOME_DUDE(10, 13);

    private SomeEnum(int value, int luckyNumber) {
        this.value = value;
        this.luckyNumber = luckyNumber;
    }

    private final int value;
    private final int luckyNumber;

    private final static IntEnumArrayMap<SomeEnum> valueMap =
             IntEnumArrayMap.fromEnums(values(), SomeEnum::getValue);
    private final static IntEnumArrayMap<SomeEnum> luckyNumberMap =
             IntEnumArrayMap.fromEnums(values(), SomeEnum::getLuckyNumber);

    public static SomeEnum findByValue(int value) {
        return valueMap.get(value);
    }

    public static SomeEnum findByLuckyNumber(int value) {
        return luckyNumberMap.get(value);
    }

    public int getValue() {
        return value;
    }

    public int getLuckyNumber() {
        return luckyNumber;
    }

}

Very convenient and efficient! We know the key set of the Map we are creating so the NumberArrayMap seams like a perfect fit. Note how nice we just provide the method we want to use for mapping (e.g. SomeEnum::getValue) using Java 8's new function interface.


Future work

The implementation above relies on the inherited basic Spliterator from AbstractMap. It would be advantages to write a better Spliterator that can parallelize the entrySet(). Any one up to the challenge?

Perhaps it would be possible to create other type of Map implementations that can handle other types of key sets (i.e other than Number) that can be mapped to int. Enum is one candidate (using the ordinal() as key) but there is already an EnumMap.

Grab the source code using "git clone https://github.com/minborg/javapot.git" and expand the code and also see how I made the performance tests.

Concusion

If you know your key space, you can create much more efficient Map implementations than the built-in Java ones. Enums are a perfect place where these Map implementations can live. Static maps and small maps that are created dynamically are also a perfect match.

Let's Map the World!

Wednesday, December 10, 2014

Java 8, Initializing Maps in the Smartest Way

Background

Create maps
Illustration: Elis Minborg
When I was a kid, I was taught that, in the Swedish language, one should write out numbers using their text representation in the range from zero to twelve, otherwise one should use their number representation. For example, one should write "There are two birds and 21 flowers" (but in Swedish then of course).

In Java, we are likely to use Maps to perform such translations. The question in this post is: How does one initialize Maps in the best way? What are the alternatives and what are their pros and cons?


Objective

The objective in this post is to write a method that returns a Map with a translation from an Integer to a String. The Map shall be Unmodifiable, because we might want to reuse the Map in several places in our code and we want to be sure that is has not been tampered with. The Map shall contain the mapping pairs (or Entries as they are called in Java): 0->"zero", 1->"one", 2->"two", ... , 12->"twelve".

The Imperative Way

The straight forward solution is to declare a Map and then just put() entries into the Map. After that is done, we return an Unmodifiable view of the newly created map like this:
protected static Map<Integer, String> imperative() {
        final Map<Integer, String> numMap = new HashMap<>();
        numMap.put(0, "zero");
        numMap.put(1, "one");
        numMap.put(2, "two");
        numMap.put(3, "three");
        numMap.put(4, "four");
        numMap.put(5, "five");
        numMap.put(6, "six");
        numMap.put(7, "seven");
        numMap.put(8, "eight");
        numMap.put(9, "nine");
        numMap.put(10, "ten");
        numMap.put(11, "eleven");
        numMap.put(12, "twelve");
        return Collections.unmodifiableMap(numMap);
    }
In this solution, as opposed to the other solutions that are shown later in this post, we have to declare an intermediate result variable (numMap). We then have to reference this result variable over and over again for each put() operation , which is disturbing. There is a risk that this result variable can "leak", effectively rendering the Unmodifiable map modifiable, because the Collections.unmodifiableMap() method just provides a view of the provided Map. So if we retain a reference to the provided Map, we can still change it.

Double Brace Initialization

The Double Brace Initialization Idiom firsts appears as very appealing. In the beginning, I liked it and started to use it frequently in my code. This is how it works:
    @SuppressWarnings("serial")
    protected static Map<Integer, String> doubleBracket() {

        return Collections.unmodifiableMap(new HashMap<Integer, String>() {
            {
                put(0, "zero");
                put(1, "one");
                put(2, "two");
                put(3, "three");
                put(4, "four");
                put(5, "five");
                put(6, "six");
                put(7, "seven");
                put(8, "eight");
                put(9, "nine");
                put(10, "ten");
                put(11, "eleven");
                put(12, "twelve");
            }
        });
    }
There is actually no magic with it at all. The first pair of braces will create an anonymous class and the second pair of braces will create an instance initializer block that is run when the anonymous inner class is instantiated. In any class, you can have such brackets containing code that will be run when your create instances of your class (or when the class is used the first time, if you precede the brackets with the keyword static). This looks much better because now you do not have to create an intermediate result and you do not have to reference a variable each time you call put().

However, there are a number of snags with this scheme that is not immediately apparent. As previously stated, when you declare a class with double braces, an anonymous class will be created. Because the anonymous class is not static, it will also hold a reference (this$0) to its containing class (if defined within another class, which is the most normal case). This means that the containing instance (this$0) can not be garbage-collected as long as your Map is alive. A big concern according to my view.

Another disadvantage is that you must provide the types of the Map explicitly, because the compiler is not able to infer the types using the diamond (<>) operator. Also, the new anonymous class does not define a serialVersionUID variable, so we must suppress the resulting warning using the @SuppressWarnings("serial") annotation if you want it to compile nicely. You should really think twice before you use the Double Brace Idiom.

The Builder Pattern

This is one of my favorites and you can read more about this pattern in my previous posts "Creating Objects Using The Builder Pattern" and "The Interface Builder Pattern". The idea here is to create a Builder that has methods that returns the Builder itself again and again (like the StringBuilder does). This way the Builder can be called several time until the final build() method is called and the result is returned. I have created builders that can be used to create instance of Map and ConcurrentMap and placed them in a utility class called Maps:
public class Maps {

    private Maps() {
    }

    public static <K, V> MapBuilder<K, V> builder() {
        return builder(HashMap::new);
    }

    public static <K, V> MapBuilder<K, V> builder(Supplier<Map<K, V>> mapSupplier) {
        return new MapBuilder<>(mapSupplier.get());
    }

    public static <K, V> ConcurrentMapBuilder<K, V> concurrentBuilder() {
        return concurrentBuilder(ConcurrentHashMap::new);
    }

    public static <K, V> ConcurrentMapBuilder<K, V> concurrentBuilder(Supplier<ConcurrentMap<K, V>> mapSupplier) {
        return new ConcurrentMapBuilder<>(mapSupplier.get());
    }

    private static class BaseBuilder<M extends Map<K, V>, K, V> {

        protected final M map;

        public BaseBuilder(M map) {
            this.map = map;
        }

        public BaseBuilder<M, K, V> put(K key, V value) {
            map.put(key, value);
            return this;
        }

        public M build() {
            return map;
        }

    }

    public static class MapBuilder<K, V> extends BaseBuilder<Map<K, V>, K, V> {

        private boolean unmodifiable;

        public MapBuilder(Map<K, V> map) {
            super(map);
        }

        @Override
        public MapBuilder<K, V> put(K key, V value) {
            super.put(key, value);
            return this;
        }

        public MapBuilder<K, V> unmodifiable(boolean unmodifiable) {
            this.unmodifiable = unmodifiable;
            return this;
        }

        @Override
        public Map<K, V> build() {
            if (unmodifiable) {
                return Collections.unmodifiableMap(super.build());
            } else {
                return super.build();
            }
        }

    }

    public static class ConcurrentMapBuilder<K, V> extends BaseBuilder<ConcurrentMap<K, V>, K, V> {

        public ConcurrentMapBuilder(ConcurrentMap<K, V> map) {
            super(map);
        }

        @Override
        public ConcurrentMapBuilder<K, V> put(K key, V value) {
            super.put(key, value);
            return this;
        }

    }

}
For the sake of completeness, I have also included methods to create concurrent maps and also two builder()  methods that takes a map Provider, allowing any type of Map to be created using the Builder. Using the support methods, we can now create maps easily like this:

    protected static Map<Integer, String> builderPattern() {
        return Maps.<Integer, String>builder().
                put(0, "zero").
                put(1, "one").
                put(2, "two").
                put(3, "three").
                put(4, "four").
                put(5, "five").
                put(6, "six").
                put(7, "seven").
                put(8, "eight").
                put(9, "nine").
                put(10, "ten").
                put(11, "eleven").
                put(12, "twelve").
                unmodifiable(true).
                build();
    }

Although we must enter the generic types of the Map (i.e. <Integer, String>) it looks much better. We can avoid all the other disadvantages that the Double Brace Idiom had. Note the Builder method unmodifiable() that sets a flag, controlling if the build() method shall return a normal Map or an unmodifiable Map. Very convenient! Another advantage is that this pattern works with older Java versions and that no extra objects are created during map creation.

The Java 8 Way

Java 8 has a number of features that can be used to create and initialize maps. Take a look at this method:      

    protected static Map<Integer, String> stdJava8() {

        return Collections.unmodifiableMap(Stream.of(
                new SimpleEntry<>(0, "zero"),
                new SimpleEntry<>(1, "one"),
                new SimpleEntry<>(2, "two"),
                new SimpleEntry<>(3, "three"),
                new SimpleEntry<>(4, "four"),
                new SimpleEntry<>(5, "five"),
                new SimpleEntry<>(6, "six"),
                new SimpleEntry<>(7, "seven"),
                new SimpleEntry<>(8, "eight"),
                new SimpleEntry<>(9, "nine"),
                new SimpleEntry<>(10, "ten"),
                new SimpleEntry<>(11, "eleven"),
                new SimpleEntry<>(12, "twelve"))
                .collect(Collectors.toMap((e) -> e.getKey(), (e) -> e.getValue())));
    }

Here we create a Stream of map entries. At least two implementations of Entry already exists in the standard Java libraries and here I have used SimpleEntry. After the Stream is constructed, we collect all the entries and creates a Map by splitting up each Entry in a key and a value. It is important to include the diamond operator for each SimpleEntry or else the toMap() method will not be able to infer the types of the Entry.

As can be seen, many intermediate objects need to be created before the final Map can be constructed. This can be a disadvantage if many instances are created rapidly. One advantage with this method is that it is using a standard Java 8 stream pattern, so it is very easy to start out with this pattern and then modify it to do something else.

The Java 8 Way Simplified  

We can further simplify the process of creating maps the Java 8 way, by adding a small number of support methods to our Maps class like this:      

    public static <K, V> Map.Entry<K, V> entry(K key, V value) {
        return new AbstractMap.SimpleEntry<>(key, value);
    }

    public static <K, U> Collector<Map.Entry<K, U>, ?, Map<K, U>> entriesToMap() {
        return Collectors.toMap((e) -> e.getKey(), (e) -> e.getValue());
    }

    public static <K, U> Collector<Map.Entry<K, U>, ?, ConcurrentMap<K, U>> entriesToConcurrentMap() {
        return Collectors.toConcurrentMap((e) -> e.getKey(), (e) -> e.getValue());
    }



 Now we can create the desired Map using this method:  

    protected static Map<Integer, String> extJava8() {
        return Collections.unmodifiableMap(Stream.of(
                entry(0, "zero"),
                entry(1, "one"),
                entry(2, "two"),
                entry(3, "three"),
                entry(4, "four"),
                entry(5, "five"),
                entry(6, "six"),
                entry(7, "seven"),
                entry(8, "eight"),
                entry(9, "nine"),
                entry(10, "ten"),
                entry(11, "eleven"),
                entry(12, "twelve")).
                collect(entriesToMap()));
    }

Do not forget to import the support methods statically. Now, this looks much nicer than the previous attempt because is is much shorter and does not contain so much boiler plate code.

Conclusion

There are many ways to create and initialize a Map. The choice is really up to you. Personally, I like the Builder Pattern best because it is very efficient, flexible and straight forward to use. The Java 8 pattern is also nice. A final warning on using the Double Brace Idiom should also be made.

How do we do the actual translation of text using the Map we have created using any of the various methods in this post? One nice way of doing it is to use Java 8's Map method getOrDefault() like this:

     public static String toText(Map<Integer, String> map, int val) {
         return map.getOrDefault(val, Integer.toString(val));
     }
    
     final Map<Integer, String> map = imperative();
     System.out.println("There are " + toText(map, 2) + " birds and " + toText(map, 21) + " flowers.");


   "There are two birds and 21 flowers."

UPDATE: Learn how you can implement a type safe dynamic Map builder in this newer post.

Friday, December 5, 2014

Java 8, Implementing a ConcurrentHashSet

Background

In the package java.util.concurrent there are numerous classes that enables concurrent access to various objects and data structures. However, there is a lack of a concurrent Set in the standard libraries. In this post I will show how to fix this problem.

The easy way out

There is a very simple way of getting a kind of concurrent Set with elements of type K in Java:
Map<K, Boolean> concurrentMap = new ConcurrentHashMap();
Set<K> concurrentSet = Collections.newSetFromMap(concurrentMap);

For example, one can get a concurrent Set of strings like this
Set<String> concurrentSet = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());

The concurrentSet will exhibit the same concurrency properties as the underlying ConcurrentMap does (i.e. it will provide concurrent lock free access in this case) which is good. It also inherits all the other Map properties such as null key capability etc.

Of course you can provide any map to the newSetFromMap() including the non-concurrent HashMap (keys are in any order), LinkedHashMap  (preserves insertion order) or TreeMap (keeps the keys in their natural order). The resulting Set will inherit those underlying properties. For example, if you provide LinkedHashMap, you will get a non-concurrent Set where the keys will be retrieve in insertion order.

Limitations and Drawbacks

When you are using the Collections.newSetFromMap() method, the Map you provide must be empty from the beginning. You must also take care not to keep an additional reference to the backing map, because newSetFromMap() does not perform any defensive copy of the map. If you keep such a reference, you can alter the Set using the Map reference too, which is unsafe. The value type Boolean used in the underlying Map also strikes me as odd. Why was that particular type selected one might ask oneself? Object would be more general, one might argue.

The most important drawback in my opinion, is that the returned Set really is just a Set that just happens to be concurrent (again, if you provide a ConcurrentMap). For example, it does not implement the method that corresponds to the features ConcurrentMap brings over Map, like putIfAbsent() and numerous other new Java 8 methods like computeIfAbsent(). There is no way to really guarantee that the Set is concurrent.

The Interface Proposal

So what do we want to accomplish here? Well, wouldn't it be nice if we could define an interface ConcurrentSet and use it just like we are using the interface ConcurrentMap. Let us take a look at the latter interface (as defined in Java 8):

public interface ConcurrentMap<K, V> extends Map<K, V> {

    V getOrDefault(Object key, V defaultValue);

    void forEach(BiConsumer<? super K, ? super V> action);

    V putIfAbsent(K key, V value);

    boolean remove(Object key, Object value);

    boolean replace(K key, V oldValue, V newValue);

    void replaceAll(BiFunction<? super K, ? super V, ? extends V> function)

    V computeIfAbsent(K key, Function<? super K, ? extends V> mappingFunction);

    V computeIfPresent(K key, BiFunction<? super K, ? super V, ? extends V> remappingFunction);

    V compute(K key,BiFunction<? super K, ? super V, ? extends V> remappingFunction);

    V merge(K key, V value, BiFunction<? super V, ? super V, ? extends V> remappingFunction);

}

As can be seen, most of the methods are dealing with values rather than keys. Thus, most of the methods can not be applied to a ConcurrentSet. So, we can discard most methods and perhaps redefine others so that they only deal with keys. First I thought that we should skip all methods except the putIfAbsent() method, but I was wrong since it is equivalent to add() (thanks for pointing that out Louis Wasserman). So, we drop all methods and end up with just a Marker Interface with no extra methods over Set. Still, it is useful because we can show our intent with the new interface.

public interface ConcurrentSet<E> extends Set<E> {

}

Additional concurrent methods can be added later if desired. Please leave a comment below if you can think of new methods that would be nice to add to a ConcurrentSet.

What's Alredy There?

If we take a closer look at the Collections.newSetFromMap() method, we see that it basically delegates the methods from a backing map (and the backing map's keySet). Below, I have shown a shortened version of the SetFromMap class, so you will get the basic idea. The same concept is also used by Java's ConcurrentSkipListSet, so this certainly seams to be common practice.

 private static class SetFromMap<E> extends AbstractSet<E>
        implements Set<E>, Serializable
    {
        private final Map<E, Boolean> m;  // The backing map
        private transient Set<E> s;       // Its keySet

        SetFromMap(Map<E, Boolean> map) {
            if (!map.isEmpty())
                throw new IllegalArgumentException("Map is non-empty");
            m = map;
            s = map.keySet();
        }

        public void clear()               {        m.clear(); }
        public int size()                 { return m.size(); }
        public boolean isEmpty()          { return m.isEmpty(); }
        public boolean contains(Object o) { return m.containsKey(o); }
        public boolean remove(Object o)   { return m.remove(o) != null; }
        public boolean add(E e) { return m.put(e, Boolean.TRUE) == null; }
        public Iterator<E> iterator()     { return s.iterator(); }
        public Object[] toArray()         { return s.toArray(); }
        public <T> T[] toArray(T[] a)     { return s.toArray(a); }
        public String toString()          { return s.toString(); }
        public int hashCode()             { return s.hashCode(); }
        public boolean equals(Object o)   { return o == this || s.equals(o); }
        public boolean containsAll(Collection<?> c) {return s.containsAll(c);}
        public boolean removeAll(Collection<?> c)   {return s.removeAll(c);}
        public boolean retainAll(Collection<?> c)   {return s.retainAll(c);

       // The rest comes here...
}

An Implementation Proposal

Using the Delegation Pattern, we can easily come up with a similar implementation of the new ConcurrentSet interface as depicted here under:

/**
 * A hash set supporting full concurrency of retrievals and
 * high expected concurrency for updates.
 *
 * @param <E> the type of elements maintained by this set
 * @author pemi
 *
public class ConcurrentHashSet<E> implements ConcurrentSet<E>, Serializable {

    private final ConcurrentMap<E, Object> m;
    private transient Set<E> s;

    public ConcurrentHashSet() {
        this.m = new ConcurrentHashMap<>();
        init();
    }

    public ConcurrentHashSet(int initialCapacity) {
        this.m = new ConcurrentHashMap<>(initialCapacity);
        init();
    }

    public ConcurrentHashSet(int initialCapacity, float loadFactor) {
        this.m = new ConcurrentHashMap<>(initialCapacity, loadFactor);
        init();
    }

    public ConcurrentHashSet(int initialCapacity, float loadFactor, int concurrencyLevel) {
        this.m = new ConcurrentHashMap<>(initialCapacity, loadFactor, concurrencyLevel);
        init();
    }

    public ConcurrentHashSet(Set<? extends E> s) {
        this(Math.max(Objects.requireNonNull(s).size(), 16));
        addAll(s);
    }

    // New type of constructor
    public ConcurrentHashSet(Supplier<? extends ConcurrentMap<E, Object>> concurrentMapSupplier) {
        final ConcurrentMap<E, Object> newMap = concurrentMapSupplier.get();
        if (!(newMap instanceof ConcurrentMap)) {
            throw new IllegalArgumentException("The supplied map does not implement "+ConcurrentMap.class.getSimpleName());
        }
        this.m = newMap;
        init();
    }
   
    private void init() {
        this.s = m.keySet();
    }
   
    @Override public void clear()               {        m.clear(); }
    @Override public int size()                 { return m.size(); }
    @Override public boolean isEmpty()          { return m.isEmpty(); }
    @Override public boolean contains(Object o) { return m.containsKey(o); }
    @Override public boolean remove(Object o)   { return m.remove(o) != null; }
    @Override public boolean add(E e)           { return m.put(e, Boolean.TRUE) == null; }
    @Override public Iterator<E> iterator()     { return s.iterator(); }
    @Override public Object[] toArray()         { return s.toArray(); }
    @Override public <T> T[] toArray(T[] a)     { return s.toArray(a); }
    @Override public String toString()          { return s.toString(); }
    @Override public int hashCode()             { return s.hashCode(); }
    @Override public boolean equals(Object o)   { return s.equals(o); }
    @Override public boolean containsAll(Collection<?> c) {return s.containsAll(c);}
    @Override public boolean removeAll(Collection<?> c)   {return s.removeAll(c);}
    @Override public boolean retainAll(Collection<?> c)   {return s.retainAll(c);}

    @Override
    public boolean addAll(Collection<? extends E> c) {
        // Use Java 8 Stream 
        return Objects.requireNonNull(c).stream().map((e) -> add(e)).filter((b)->b).count() > 0;
    }


    // Override default methods in Collection
    @Override public void forEach(Consumer<? super E> action) { s.forEach(action);}
    @Override public boolean removeIf(Predicate<? super E> filter) { return s.removeIf(filter);}
    @Override public Spliterator<E> spliterator()     {return s.spliterator();}
    @Override public Stream<E> stream()               {return s.stream();}
    @Override public Stream<E> parallelStream()       {return s.parallelStream();}

    private static final long serialVersionUID = -913526372691027123L;

    private void readObject(java.io.ObjectInputStream stream)
       throws IOException, ClassNotFoundException
    {
        stream.defaultReadObject();
        init();
    }

}
Note the new constructor ConcurrentHashSet(Supplier<ConcurrentMap<E, Object>> concurrentMapSupplier) that allows us to provide any ConcurrentMap as the underlying map at creation time. By providing a Supplier rather than a concrete Map instance, we avoid the double reference problem and the "map must be empty" problem associated with the Collections.newSetFromMap() method. For example, we can create a ConcurrentSet with the keys in their natural order by calling new ConcurrentHashSet(ConcurrentSkipListMap::new).

Worth noticing is also the addAll() method that uses Java 8's stream library to iteratively add new elements to the Set. We the filter out all add() calls that returned true and if there were more than zero such additions, we return true (i.e. there was a modification of the set).

Game, Set and match...