Minborg

Minborg
Minborg

Wednesday, May 23, 2018

Making Pivot Tables with Java Streams from Databases

Making Pivot Tables with Java Streams from Databases

Raw data from database rows and tables does not provide so much insight to human readers. Instead, humans are much more likely to see data patterns if we perform some kind of aggregation on the data
before it is being presented to us. A pivot table is a specific form of aggregation where we can apply operations like sorting, averaging, or summing, and also often grouping of columns values.

In this article, I will show how you can compute pivot tables of data from a database in pure Java without writing a single line of SQL. You can easily reuse and modify the examples in this article to fit your own specific needs.

In the examples below, I have used open-source Speedment, which is a Java Stream ORM, and the open-source Sakila film database content for MySQL. Speedment works for any major relational database type such as MySQL, PostgreSQL, Oracle, MariaDB, Microsoft SQL Server, DB2, AS400 and more.

Pivoting

I will construct a Map of Actor objects and, for each Actor, a corresponding List of film ratings of films that a particular Actor has appeared in. Here is an example of how a pivot entry for a specific Actor might look like expressed verbally:

“John Doe participated in 9 films that were rated ‘PG-13’ and 4 films that were rated ‘R’”.

We are going to compute pivot values for all actors in the database. The Sakila database has three tables of interest for this particular application:

1) “film” containing all the films and how the films are rated (e.g. “PG-13”, “R”, etc.).
2) “actors” containing (made up) actors (e.g. “MICHAEL BOLGER”, “LAURA BRODY”, etc.).
3) “film_actor” which links films and actors together in a many-to-many relation.

The first part of the solution involves joining these three tables together. Joins are created using Speedment’s JoinComponent which can be obtained like this:

// Visit https://github.com/speedment/speedment
// to see how a Speedment app is created. It is easy!
Speedment app = …;

JoinComponent joinComponent = app.getOrThrow(JoinComponent.class);

Once we have the JoinComponent, we can start defining Join relations that we need to compute our pivot table:
Join<Tuple3<FilmActor, Film, Actor>> join = joinComponent
        .from(FilmActorManager.IDENTIFIER)
        .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
        .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
        .build(Tuples::of);
The build() takes a method reference Tuples::of that will resolve to a constructor that takes three entities of type; FilmActor, Film and Actor and that will create a compound immutable Tuple3 comprising those specific entities. Tuples are built into Speedment.

Armed with our Join object we now can create our pivot Map using a standard Java Stream obtained from the Join object:

Map<Actor, Map<String, Long>> pivot = join.stream()
    .collect(
        groupingBy(
            // Applies Actor as a first classifier
            Tuple3::get2,
            groupingBy(
                // Applies rating as second level classifier
                tu -> tu.get1().getRating().get(),
                counting() // Counts the elements 
                )
            )
        );

Now that the pivot Map has been computed, we can print its content like this:
// pivot keys: Actor, values: Map<String, Long>
pivot.forEach((k, v) -> { 
    System.out.format(
        "%22s  %5s %n",
        k.getFirstName() + " " + k.getLastName(),
        V
    );
});
This will produce the following output:
        MICHAEL BOLGER  {PG-13=9, R=3, NC-17=6, PG=4, G=8} 
           LAURA BRODY  {PG-13=8, R=3, NC-17=6, PG=6, G=3} 
     CAMERON ZELLWEGER  {PG-13=8, R=2, NC-17=3, PG=15, G=5}
...


Mission completed! In the code above, the method Tuple3::get2 will retrieve the third element from the tuple (an Actor) whereas the method tu.get1() will retrieve the second element from the tuple (a Film).

Speedment will render SQL code automatically from Java and convert the result to a Java Stream. If we enable Stream logging, we can see exactly how the SQL was rendered:
SELECT 
    A.`actor_id`,A.`film_id`,A.`last_update`, 
    B.`film_id`,B.`title`,B.`description`,
    B.`release_year`,B.`language_id`,B.`original_language_id`,
    B.`rental_duration`,B.`rental_rate`,B.`length`,
    B.`replacement_cost`,B.`rating`,B.`special_features`,
    B.`last_update`, C.`actor_id`,C.`first_name`,
    C.`last_name`,C.`last_update`
FROM 
    `sakila`.`film_actor` AS A
INNER JOIN 
    `sakila`.`film` AS B ON (B.`film_id` = A.`film_id`) 
INNER JOIN 
    `sakila`.`actor` AS C ON (C.`actor_id` = A.`actor_id`)

Joins with Custom Tuples

As we noticed in the example above, we have no actual use of the FilmActor object in the Stream since it is only used to link Film and Actor entities together during the Join phase. Also, the generic Tuple3 had general get0(), get1() and get2() methods that did not say anything about what they contained.

All this can be fixed by defining our own custom “tuple” called ActorRating like this:

private static class ActorRating {
    private final Actor actor;
    private final String rating;

    public ActorRating(FilmActor fa, Film film, Actor actor) {
        // fa is not used. See below why
        this.actor = actor;
        this.rating = film.getRating().get();
    }

    public Actor actor() {
        return actor;
    }

    public String rating() {
        return rating;
    }

}


When Join objects are built using the build() method, we can provide a custom constructor that we want to apply on the incoming entities from the database. This is a feature that we are going use as depicted below:
Join<ActorRating> join = joinComponent
    .from(FilmActorManager.IDENTIFIER)
    .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
    .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
    .build(ActorRating::new); // Use a custom constructor

Map<Actor, Map<String, Long>> pivot = join.stream()
    .collect(
        groupingBy(
            ActorRating::actor,
            groupingBy(
                ActorRating::rating,
                counting()
            )
         )
    );
In this example, we proved a class with a constructor (the method reference ActorRating:new gets resolved to new ActorRating(fa, actor, film)) that just discards the linking FilmActor object altogether. The class also provided better names for its properties which made the code more readable. The solution with the custom ActorRating class will produce exactly the same output result as the first example but it looks much nicer when used. I think the effort of writing a custom tuple is worth the extra effort over using generic Tuples in most cases.

Using Parallel Pivoting

One cool thing with Speedment is that it supports the Stream method parallel() out-of-the-box. So, if you have a server with many CPUs, you can take advantage of all those CPU cores when running database queries and joins. This is how parallel pivoting would look like:

Map<Actor, Map<String, Long>> pivot = join.stream()
    .parallel()  // Make our Stream parallel
    .collect(
        groupingBy(
            ActorRating::actor,
            groupingBy(
                ActorRating::rating,
                counting()
            )
         )
    );
We only have to add a single line of code to get parallel aggregation. The default parallel split strategy kicks in when we reach 1024 elements. Thus, parallel pivoting will only take place on tables or joins larger than this. It should be noted that the Sakila database only contains 1000 films, so we would have to run the code on a bigger database to actually be able to benefit from parallelism.

Take it for a Spin!

In this article, we have shown how you can compute pivot data from a database in Java without writing a single line of SQL code. Visit Speedment open-source on GitHub to learn more.

Read more about other features in the the User's Guide.

Wednesday, May 16, 2018

Java Stream ORM Now with JOINs

Java Stream ORM Now with JOINs

Speedment is a Java Stream ORM Toolkit and Runtime that allows you to view database tables as standard Java Streams. Because you do not have to mix Java and SQL, the application becomes much more compact making it faster to develop, less prone to errors and easier to maintain. Streams are also strictly type-safe and lazily constructed so that only a minimum amount of data is pulled in from the database as elements are consumed by the streams.

The new version Speedment 3.1.1 “Homer” now also supports dynamically joined tables to be viewed as standard Java Streams. This is a big deal when developing Java applications that explore relations between database tables.

In the examples below, I have used the open-source Sakila film database content for MySQL that you can download here. Speedment works for any major relational database type such as Oracle, MySQL, Microsoft SQL Server, PostgreSQL, DB2, MariaDB, AS400 and more.

Streaming over a Single Table

The following code snippet will create a List of all Film objects that has a Film.RATING of “PG-13” and where the List is sorted in Film.LENGTH order:
List<Film> list = films.stream()
    .filter(Film.RATING.equal("PG-13"))
    .sorted(Film.LENGTH)
    .collect(toList());

The stream will be automatically rendered to a SQL query under the hood. If we enable Stream logging, we will see the following (prepared statement “?”-variables given as values in the end):
SELECT 
    `film_id`,`title`,`description`,`release_year`,
    `language_id`,`original_language_id`,
    `rental_duration`,`rental_rate`,
    `length`,`replacement_cost`,`rating`,`special_features`,
    `last_update` 
FROM 
    `sakila`.`film` 
WHERE 
    (`rating`  = ? COLLATE utf8_bin) 
ORDER BY
    `length` ASC

values:[PG-13]

Thus, the advantage is that you can express your database queries using type-safe Java and then consume the result by means of standard Java Streams. You do not have to write any SQL code.

Joining Several Tables

Appart from the table “film”, the Sakila database also contains other tables. One of these is a table called “language”. Each Film entity has a foreign key to the Language being spoken in the film using a column named “language_id”.

In this example, I will show how we can create a standard Java Stream that represents a join of these two tables. This way, we can get a Java Stream of matching pairs of Film/Language entities.

Join objects are created using the JoinComponent which can be obtained like this:
// Visit https://github.com/speedment/speedment
// to see how a Speedment app is created. It is easy!
Speedment app = …;

JoinComponent joinComponent = app.getOrThrow(JoinComponent.class);

Once we have grabbed the JoinComponent, we can start creating Join objects like this:
Join<Tuple2<Film, Language>> join = joinComponent
    .from(FilmManager.IDENTIFIER)
    .innerJoinOn(Language.LANGUAGE_ID).equal(Film.LANGUAGE_ID)
    .build(Tuples::of);

Now that we have defined our Join object we can create the actual Java Stream:
join.stream()
    .map(t2 -> String.format(
        "The film '%s' is in %s",
            t2.get0().getTitle(), // get0() -> Film
            t2.get1().getName()   // get1() -> Language
    ))
    .forEach(System.out::println);

This will produce the following output:
The film 'ACADEMY DINOSAUR' is in English
The film 'ACE GOLDFINGER' is in English
The film 'ADAPTATION HOLES' is in English
...

In the code above, the method t2.get0() will retrieve the first element from the tuple (a Film) whereas the method t2.get1() will retrieve the second element from the tuple (a Language). Default generic tuples are built into Speedment and thus Tuple2 is not a Guava class. Speedment does not depend on any other library. Below you will see how you can use any class constructor for the joined tables. Again, Speedment will render SQL code automatically from Java and convert the result to a Java Stream. If we enable Stream logging, we can see exactly how the SQL code was rendered:
SELECT
    A.`film_id`,A.`title`,A.`description`,
    A.`release_year`,A.`language_id`,A.`original_language_id`,
    A.`rental_duration`,A.`rental_rate`,A.`length`,
    A.`replacement_cost`,A.`rating`,A.`special_features`,
    A.`last_update`,
    B.`language_id`,B.`name`,B.`last_update` 
FROM 
    `sakila`.`film` AS A
INNER JOIN 
    `sakila`.`language` AS B 
ON
    (B.`language_id` = A.`language_id`)

Interestingly, the Join object can be created once and be re-used over and over again to create new Streams.

Many-to-Many Relations

The Sakila database also defines a handful of Many-to-Many relations. For example, the table “film_actor” contains rows links films to actors. Each film can have multiple actors and each actor might have appeared in multiple films. Every row in the table links a particular Film to a specific Actor. For example, If a Film depicts 12 Actor entities, then FilmActor contains 12 entries all having the same film_id but different actor_ids. The purpose of this example is to create a complete list of all films and the appearing actors in a Java Stream. This is how we can join the three tables together:
Join<Tuple3<FilmActor, Film, Actor>> join = joinComponent
    .from(FilmActorManager.IDENTIFIER)
    .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
    .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
    .build(Tuples::of);

    join.stream()
        .forEach(System.out::println);


The code above will produce the following output (formatted for readability):
...
Tuple3Impl {
    FilmActorImpl { actorId = 137, filmId = 249, lastUpdate = 2006-02-15 05:05:03.0 },
    FilmImpl { filmId = 249, title = DRACULA CRYSTAL, description =...,
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS,...}
}

Tuple3Impl {
    FilmActorImpl { actorId = 137, filmId = 254, lastUpdate = 2006-02-15 05:05:03.0 },
    FilmImpl { filmId = 254, title = DRIVER ANNIE, description = ...,
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS, ...}
}

Tuple3Impl {
    FilmActorImpl { actorId = 137, filmId = 263, lastUpdate = 2006-02-15 05:05:03.0 },
    FilmImpl { filmId = 263, title = DURHAM PANKY, description = ... },
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS,... }
}
...

Joins with Custom Tuples

As we noticed in the example above, we have no actual use of the FilmActor object in the Stream since it is only used to link Film and Actor objects together during the Join phase.

When Join objects are built using the build() method, we can provide a custom constructor that we want to apply on the incoming entities from the database. The constructor can be of any type so you can write your own Java objects that holds, for example, Film and Actor or any of the columns they contain and that are of interest.

In this example, I proved a (lambda) constructor that just discards the linking FilmActor objects altogether:
Join<Tuple2<Film, Actor>> join = joinComponent
    .from(FilmActorManager.IDENTIFIER)
    .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
    .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
    .build((fa, f, a) -> Tuples.of(f, a));

    join.stream()
        .forEach(System.out::println);

The code above will produce the following output (formatted for readability):
...
Tuple2Impl {
    FilmImpl { filmId = 249, title = DRACULA CRYSTAL, description = ... },
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS, ...}
}
Tuple2Impl {
    FilmImpl { filmId = 254, title = DRIVER ANNIE, description = A... }, 
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS,...}
}
Tuple2Impl {
    FilmImpl { filmId = 263, title = DURHAM PANKY, description = ... }, 
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS,...}
}
...
Thus, we only get matching pairs of Film and Actor entities where there is an appearance of an actor in a film. The linking object FilmActor is never seen in the Stream.

Take it for a Spin!

Over the course of this article, you have learned how to stream over one or several database tables using Speedment.

Visit Speedment open-source on GitHub and try it out!

Read all about the new JOIN functionality in the User's Guide.