JDK 16's Top 5 for Scala

Just yesterday we explored the importance of Scala.js and how great it is to be on the JavaScript platform, and today the next version of Java was released. The perfect opportunity to cover both!

Here are my top 5 improvements of the JDK 16, from the Scala perspective.

JEP 395: Records

Java finally gets its own flavour of case classes! 🎉

When interfacing with Java code, we are never sure whether the object's constituent values will change, and thus its hashCode: this lack of guarantees makes it harder for the user to know whether the final program will simply work as expected. Thanks to Record types, it is now only a matter of time before Scala libraries will be able to perform generic programming on Java records. We get more guarantees, but even more importantly, a wider audience will be able to relate to how Scala works, in particular immutability.

JEP 338: Vector API (Incubator)

Note: this entirely different from the Vector data type in Scala.

This one is going to be interesting to the data science folks: many computations performed today are done serially. For example, if you wanted to sum two arrays pair-wise to get another array, you would have to your code to do the first pair, then the second pair, and so forth. With vector operations at hardware level, you only specify that you want to sum these two arrays and nothing else; then, the CPU takes over and gives you back a new vector, it the way it thinks is the most efficient. While Java supports auto-vectorization (meaning it notices your code could be vectorized, and then makes it so for you automatically), it is not always guaranteed to work. Vectorization with native code of NumPy is what makes Python so fast for data scientists. High performance: win ✔️.

JEP 389: Foreign Linker API (Incubator)

Scala ❤️s️ interop, especially if it is easy to use! The JNI is highly performant but requires you to write C code and compile it; I had done this before for extremely high performance code and it worked, but it is definitely more involved and error-prone. For simplicity, JNA is the way to go but performance can be a magnitude worse. There are other libraries like JNR and JavaCPP but for detail on those, see the technical analysis by Vladimir Ivanov from Oracle. This JEP gives us the best of both worlds and provides an easier path to integrating with native libraries.

This is incredibly important and useful from many aspects: easier integration with quantitative computing/HPC libraries in the C/C++ world from the data science perspective and also direct interaction with device drivers for example in IoT. We must keep in mind that Scala also has Scala Native, which has an advantage of not bringing the whole JDK with it at runtime, and also closer-to-the-metal optimizations. A few years ago I wrote a whole article about the topic of interfacing with native libraries like libpcap: "Capturing Packets with Scala Native and libpcap". I would say this benefits the hardware hacking, IoT and Data Science crowds the most.

JEP 393: Foreign-Memory Access API (Incubator)

Related to the above, this API makes direct memory access an official feature; currently the most commonly use approach to direct memory access is the use of sun.misc.Unsafe which is an unofficial API and has complications associated with it. The reasons to use direct memory access are two-fold: one is for customised off-heap storage, and another is for data transfers between the Java applications and a native libraries/apps it may be speaking to, at maximum data rates. Off-heap storage is particularly important for projects like Lucene and memcached, who know exactly how they want to store data most efficiently, and not have to suffer through garbage collection. This is likely again to benefit the Data Science folks the most, as libraries like Apache Arrow, that are currently used in PySpark-Spark communication, use sun.misc.Unsafe.

JEP 380: Unix-Domain Socket Channels

This is more towards server-side folks. UNIX sockets are an alternative to TCP for communication between local processes and are supported by both Windows and UNIX operating systems. The key 2 benefits of UNIX sockets are security and performance. They are more secure because they do not speak to the external world and don't need passwords. They are more performant because network routing and filtering are avoided completely. I am mostly familiar with UNIX sockets because of PHP, PostgreSQL and nginx (FastCGI), which use them for client-server communication.

This should be very useful to http4s and skunk projects, which could both benefit from tighter integration with the systems they are deployed to. To those not familiar with http4s, it is an HTTP library that enables you to write Scala HTTP applications and run them inside Servlets or standalone non-Servlet or Servlet apps; the http4s-blaze could be speaking directly to nginx via a UNIX socket for higher performance. skunk is a PostgreSQL client written in pure Scala without JDBC, which could also speak to PostgreSQL locally via a UNIX socket for security and performance.

Summary

Fantastic improvements to Java 16/JDK 16! These are my top 5, out of many, so please check them out on openjdk.java.net: compatibility, garbage collection, packaging and so forth.

More power to the web, data science, Big Data & IoT!

🔗JEP 395: Records

🔗JEP 338: Vector API (Incubator)

🔗JEP 389: Foreign Linker API (Incubator)

🔗JEP 393: Foreign-Memory Access API (Incubator)

🔗JEP 380: Unix-Domain Socket Channels