Just yesterday we explored the importance of Scala.js and how
great it is to be on the JavaScript platform,
and today the next version of Java was released.
The perfect opportunity to cover both!
Here are my top 5 improvements of the JDK 16, from the Scala perspective.
đJEP 395: Records
Java finally gets its own flavour of case classes! đ
When interfacing with Java code, we are never sure whether the object's constituent values will change,
and thus its
hashCode: this lack
of guarantees makes it harder for the user to know whether the final program will simply work as
expected.
Thanks to Record types, it is now only a matter of time before Scala libraries will be able to perform
generic
programming on
Java records. We get more guarantees, but even more importantly, a wider audience will be able to relate
to how Scala works, in particular immutability.
đJEP 338: Vector API (Incubator)
Note: this entirely different from the Vector data type in Scala.
This one is going to be interesting to the data science folks: many computations performed today are done
serially. For example, if you wanted to sum two arrays pair-wise to get another array, you would have to
your
code to do the first pair, then the second pair, and so forth. With vector operations at hardware
level, you only specify that you want to sum these two arrays and nothing else; then, the CPU takes over
and gives you back a new vector, it the way it thinks is the most efficient.
While Java supports auto-vectorization (meaning it
notices your code could be vectorized, and then makes it so for you automatically), it is not always
guaranteed to work. Vectorization with native code of NumPy is what makes Python so fast for data
scientists.
High performance: win âī¸.
đJEP 389: Foreign Linker API
(Incubator)
Scala â¤ī¸sī¸ interop, especially if it is easy to use! The JNI is highly performant but requires
you to write C code
and compile it; I had done this before for extremely high performance code and it worked, but it is
definitely
more involved and error-prone. For simplicity, JNA is the way to go but performance can be a
magnitude worse. There are other libraries like JNR and JavaCPP but for detail on those,
see the technical
analysis by Vladimir Ivanov from Oracle.
This JEP gives us the best of both worlds and provides an easier path to integrating with native
libraries.
This is incredibly important and useful from many aspects: easier integration with quantitative
computing/HPC libraries in the C/C++ world from the data science perspective and also direct interaction
with device drivers for example in IoT. We must keep in mind that Scala also has Scala Native, which
has an advantage of not bringing the whole
JDK with it at runtime, and also closer-to-the-metal optimizations. A few years ago I wrote a whole
article about the topic of interfacing with
native libraries like libpcap: "Capturing Packets with Scala Native
and libpcap". I would say this benefits the hardware hacking, IoT and Data
Science crowds the most.
đJEP 393: Foreign-Memory Access API (Incubator)
Related to the above, this API makes direct memory access an official feature; currently the most
commonly use
approach to direct memory access is the use of sun.misc.Unsafe which is an unofficial API and has
complications associated with it. The reasons to use direct memory access are two-fold: one is for
customised off-heap storage, and another is for data transfers between the Java applications and a
native libraries/apps it may be speaking to, at maximum data rates. Off-heap storage is particularly
important for projects like Lucene and memcached, who know exactly how they want to store data most
efficiently, and not have to suffer through garbage collection. This is likely again to benefit the Data
Science folks the most, as libraries like Apache Arrow, that are currently used in PySpark-Spark
communication, use sun.misc.Unsafe.
đJEP 380: Unix-Domain Socket
Channels
This is more towards server-side folks. UNIX sockets are an alternative to TCP for communication between
local processes and are supported by both Windows and UNIX operating systems. The key 2 benefits of UNIX
sockets are security and performance. They are more secure because they do not speak to the external
world and don't need passwords. They are more performant because network routing and filtering are
avoided completely. I am mostly familiar with UNIX sockets because of PHP, PostgreSQL and nginx
(FastCGI), which use them for client-server communication.
This should be very useful to
http4s and skunk
projects, which could both benefit from tighter integration with the systems they are deployed to.
To those not familiar with http4s, it is an HTTP library that enables you to write Scala HTTP
applications and run them inside Servlets or standalone non-Servlet or Servlet apps; the http4s-blaze
could be speaking
directly to nginx via a UNIX socket for higher performance.
skunk
is a PostgreSQL client written in pure Scala without JDBC, which could also speak to
PostgreSQL locally via
a UNIX socket for security and performance.
Summary
Fantastic improvements to Java 16/JDK 16! These are my top 5, out of many, so please
check them out on
openjdk.java.net: compatibility, garbage collection, packaging and so forth.
More power to the web, data science, Big Data & IoT!