Saturday, March 25, 2017

Using Groovy to Quickly Analyze Terracotta HealthCheck Properties

One of the considerations when configuring Terracotta servers with tc-config.xml is the specification of health check properties between Terracotta servers (L2-L2), from clients to server (L1-L2), and from server to client (L2-L1). Terracotta checks the combination of these properties' configurations in high-availability scenarios to ensure that these combinations fall in certain ranges. This blog post demonstrates using Groovy to parse and analyze a given tc-config.xml file to determine whether Terracotta will provide a WARN-level message regarding these properties' configurations.

The "About HealthChecker" section of the Terracotta 4.3.2 BigMemory Max High-Availability Guide (PDF) describes the purpose of the HealthChecker: "HealthChecker is a connection monitor similar to TCP keep-alive. HealthChecker functions between Terracotta server instances (in High Availability environments), and between Terracotta sever instances and clients. Using HealthChecker, Terracotta nodes can determine if peer nodes are reachable, up, or in a GC operation. If a peer node is unreachable or down, a Terracotta node using HealthChecker can take corrective action."

The Terracotta 4.3.2 BigMemory Max High-Availability Guide includes a table under the section HealthChecker Properties that articulates the Terracotta properties that go into the calculations used to determine if warnings about misconfigured high availability should be logged. There are similarly named properties specified for each of the combinations (l2.healthcheck.l1.* properties for server-to-clients [L2L1], l2.healthcheck.l2.* for server-to-server [L2L2], and l1.healthcheck.l2.* for clients-to-server [L1L2]) and the properties significant to the high availability configuration checks (the * portion of the properties names just referenced) are ping.enabled, ping.idletime, ping.interval, ping.probes, socketConnect, socketConnectCount, and socketConnectTimeout. This post's associated Groovy script assumes that one has the ping.enabled and socketConnect properties for L2-L2, L1-L2, and L2-L1 all configured to true (which is the default for both properties for all L2L2, L1L2, L2L1 combinations).

The Terracotta class com.tc.l2.ha.HASettingsChecker detects two combinations of these properties that lead to WARN-level log messages starting with the phrase, "High Availability Not Configured Properly: ...". The two warning messages specifically state, "High Availability Not Configured Properly: L1L2HealthCheck should be less than L2-L2HealthCheck + ElectionTime + ClientReconnectWindow" and "High Availability Not Configured Properly: L1L2HealthCheck should be more than L2-L2HealthCheck + ElectionTime".

The Terracotta class HASettingsChecker implements the formula outlined in the "Calculating HealthChecker Maximum" section of the High Availability Guide in its method interNodeHealthCheckTime(int,int,int,int,int):

    pingIdleTime + ((socketConnectCount) * (pingInterval * pingProbes + socketConnectTimeout * pingInterval))

The following Groovy script parses an indicated tc-config.xml file and applies the same health check properties check to the relevant properties defined in that file's <tc-properties> section. The Groovy script shown here has no external dependencies other than a valid tc-config.xml file to be parsed and analyzed. The script would be shorter and require less future maintenance if it accessed the String constants defined in com.tc.properties.TCPropertiesConsts instead of defining its own hard-coded versions of these.

checkTCServerProperties.groovy

#!/usr/bin/env groovy

def cli = new CliBuilder(
   usage: 'checkTCServerProperties -f <pathToTcConfigXmlFile> [-v] [-h]',
   header: '\nAvailable options (use -h for help):\n',
   footer: '\nParses referenced tc-config.xml file and analyzes its health check parameters..\n')
import org.apache.commons.cli.Option
cli.with
{
   h(longOpt: 'help', 'Usage Information', required: false)
   f(longOpt: 'file', 'Path to tc-config.xml File', args: 1, required: true)
   v(longOpt: 'verbose', 'Specifies verbose output', args: 0, required: false)
}
def opt = cli.parse(args)

if (!opt) return
if (opt.h) cli.usage()

String tcConfigFileName = opt.f
boolean verbose = opt.v

println "Checking ${tcConfigFileName}'s properties..."
def tcConfigXml = new XmlSlurper().parse(tcConfigFileName)
TreeMap<String, String> properties = new TreeSet<>()
tcConfigXml."tc-properties".property.each
{ tcProperty ->
   String tcPropertyName = tcProperty.@name
   String tcPropertyValue = tcProperty.@value
   properties.put(tcPropertyName, tcPropertyValue)
}
if (verbose)
{
   properties.each
   { propertyName, propertyValue ->
      println "${propertyName}: ${propertyValue}"
   }
}

boolean isL2L1PingEnabled = extractBoolean(properties, "l2.healthcheck.l1.ping.enabled")
boolean isL2L2PingEnabled = extractBoolean(properties, "l2.healthcheck.l2.ping.enabled")
boolean isL1L2PingEnabled = extractBoolean(properties, "l1.healthcheck.l2.ping.enabled")
boolean isPingEnabled = isL2L1PingEnabled && isL2L2PingEnabled && isL1L2PingEnabled
println "Health Check Ping ${isPingEnabled ? 'IS' : 'is NOT'} enabled."
if (!isPingEnabled)
{
   System.exit(-1)
}

Long pingIdleTimeL2L1 = extractLong(properties, "l2.healthcheck.l1.ping.idletime")
Long pingIdleTimeL2L2 = extractLong(properties, "l2.healthcheck.l2.ping.idletime")
Long pingIdleTimeL1L2 = extractLong(properties, "l1.healthcheck.l2.ping.idletime")

Long pingIntervalL2L1 = extractLong(properties, "l2.healthcheck.l1.ping.interval")
Long pingIntervalL2L2 = extractLong(properties, "l2.healthcheck.l2.ping.interval")
Long pingIntervalL1L2 = extractLong(properties, "l1.healthcheck.l2.ping.interval")

Long pingProbesL2L1 = extractLong(properties, "l2.healthcheck.l1.ping.probes")
Long pingProbesL2L2 = extractLong(properties, "l2.healthcheck.l2.ping.probes")
Long pingProbesL1L2 = extractLong(properties, "l1.healthcheck.l2.ping.probes")

boolean socketConnectL2L1 = extractBoolean(properties, "l2.healthcheck.l1.socketConnect")
boolean socketConnectL2L2 = extractBoolean(properties, "l2.healthcheck.l2.socketConnect")
boolean socketConnectL1L2 = extractBoolean(properties, "l1.healthcheck.l2.socketConnect")

if (!socketConnectL2L1 || !socketConnectL2L2 || !socketConnectL1L2)
{
   println "Socket connect is disabled."
   System.exit(-2)
}

Long socketConnectTimeoutL2L1 = extractLong(properties, "l2.healthcheck.l1.socketConnectTimeout")
Long socketConnectTimeoutL2L2 = extractLong(properties, "l2.healthcheck.l2.socketConnectTimeout")
Long socketConnectTimeoutL1L2 = extractLong(properties, "l1.healthcheck.l2.socketConnectTimeout")

Long socketConnectCountL2L1 = extractLong(properties, "l2.healthcheck.l1.socketConnectCount")
Long socketConnectCountL2L2 = extractLong(properties, "l2.healthcheck.l2.socketConnectCount")
Long socketConnectCountL1L2 = extractLong(properties, "l1.healthcheck.l2.socketConnectCount")

Long maximumL2L1 = calculateMaximumTime(pingIdleTimeL2L1, pingIntervalL2L1, pingProbesL2L1, socketConnectCountL2L1, socketConnectTimeoutL2L1)
Long maximumL2L2 = calculateMaximumTime(pingIdleTimeL2L2, pingIntervalL2L2, pingProbesL2L2, socketConnectCountL2L2, socketConnectTimeoutL2L2)
Long maximumL1L2 = calculateMaximumTime(pingIdleTimeL1L2, pingIntervalL1L2, pingProbesL1L2, socketConnectCountL1L2, socketConnectTimeoutL1L2)

if (verbose)
{
   println "L2-L1 Maximum Time: ${maximumL2L1}"
   println "L2-L2 Maximum Time: ${maximumL2L2}"
   println "L1-L2 Maximum Time: ${maximumL1L2}"
}

long electionTime = 5000
long clientReconnectWindow = 120000

long maximumL2L2Election = maximumL2L2 + electionTime
long maximumL2L2ElectionReconnect = maximumL2L2Election + clientReconnectWindow

if (verbose)
{
   println "L2-L2 Maximum Time + ElectionTime: ${maximumL2L2Election}"
   println "L2-L2 Maximum Time + ElectionTime + Client Reconnect Window: ${maximumL2L2ElectionReconnect}"   
}

if (maximumL1L2 < maximumL2L2Election)
{
   print "WARNING: Will lead to 'High Availability Not Configured Properly: L1L2HealthCheck should be more than L2-L2HealthCheck + ElectionTime' "
   println "because ${maximumL1L2} < ${maximumL2L2Election}."
}
else if (maximumL1L2 > maximumL2L2ElectionReconnect)
{
   print "WARNING: Will lead to 'High Availability Not Configured Properly: L1L2HealthCheck should be less than L2-L2HealthCheck + ElectionTime + ClientReconnectWindow' "
   println "because ${maximumL1L2} > ${maximumL2L2ElectionReconnect}."
}

/**
 * Extract a Boolean value for the provided property name from the provided
 * properties.
 *
 * @return Boolean value associated with the provided property name.
 */
boolean extractBoolean(TreeMap<String, String> properties, String propertyName)
{
   return  properties != null && properties.containsKey(propertyName)
         ? Boolean.valueOf(properties.get(propertyName))
         : false
}

/**
 * Extract a Long value for the provided property name from the provided
 * properties.
 *
 * @return Long value associated with the provided property name.
 */
Long extractLong(TreeMap<String, String> properties, String propertyName)
{
   return  properties != null && properties.containsKey(propertyName)
         ? Long.valueOf(properties.get(propertyName))
         : 0
}

/**
 * Provides the maximum time as calculated using the following formula:
 *
 * Maximum Time =
 *      (ping.idletime) + socketConnectCount *
 *      [(ping.interval * ping.probes) + (socketConnectTimeout * ping.interval)]
 */
Long calculateMaximumTime(Long pingIdleTime, Long pingInterval, Long pingProbes,
   Long socketConnectCount, Long socketConnectTimeout)
{
   return pingIdleTime + socketConnectCount * pingInterval * (pingProbes + socketConnectTimeout)
}

This script will also be available on GitHub. At some point, I may address some of its weaknesses and limitations in that GitHub version. Specifically, as shown above, this script currently assumes the default values for "election time" and "client reconnect window", but these could be parsed from the tc-config.xml file.

The following screen snapshots demonstrate this script in action against various tc-config.xml files. The first image depicts the script's behavior when ping is not enabled. The second image depicts the script's behavior when socket checking is not enabled. The third and fourth images depict the two warnings one might encounter when properties for high availability configuration are not configured properly. The fifth image depicts a fully successful execution of the script that indicates a configuration of health check properties that are in the expected ranges.

Ping Not Enabled (not default)

Socket Not Enabled (not default)

HealthCheck Properties Warning #1

HealthCheck Properties Warning #2

HealthCheck Properties Enabled and Configured Properly

I have used a simple spreadsheet to perform these calculations and that works fairly well. However, the Groovy script discussed in this post allows for automatic parsing of a candidate tc-config.xml file rather than needing to copy and paste values into the spreadsheet. The Groovy script could be adapted to use Terracotta provided Java files as discussed earlier. There are also several other enhancements that could make the script more useful such as parsing the client reconnect window and election time from the tc-config.xml file rather than assuming the default values.

Tuesday, March 21, 2017

Project Amber: Smaller, Productivity-Oriented Java Language Features

Brian Goetz's recent message Welcome to Amber! introduces Project Amber (part of OpenJDK and proposed originally in January). Goetz opens the message with the introduction, "Welcome to Project Amber, our incubation ground for selected productivity-oriented Java language JEPs." Goetz reiterates that Project Amber is not for discussing ideas for arbitrary potential new language features, but rather is for collecting new language features for which a JDK Enhancement Proposal (JEP) already exists ("let's keep the focus on the specific features that have been adopted").

Three JEPs are already associated with Project Amber: JEP 286 ("Local-Variable Type Inference"), JEP 301 ("Enhanced Enums"), and JEP 302 ("Lambda Leftovers"). Goetz also writes that "the 'data classes' and 'pattern matching' features, already discussed publicly are intended to be adopted by Amber when we're ready to propose JEPs on them."

Work on Project Amber will proceed on the Amber repository that is "based on the jdk10 repo."

I was enthusiastic about the announcement of Project Coin with JDK 7 and have really enjoyed using its features. I feel a similar excitement about Project Amber and look forward to using its features on a regular basis. Nicolai Parlog has written that Project Amber Will Revolutionize Java.

Tuesday, March 14, 2017

Deprecating Java's Finalizer

JDK-8165641 ("Deprecate Object.finalize") has been opened to "deprecate Object.finalize()" because "finalizers are inherently problematic and their use can lead to performance issues, deadlocks, hangs, and other problematic behavior" and because "the timing of finalization is unpredictable with no guarantee that a finalizer will be called." I recently experienced and wrote about some of these nasty consequences of using Object.finalize() in the post Java's Finalizer is Still There.

In the message RFR 9: 8165641 : Deprecate Object.finalize, Roger Riggs invites review and comment on the changes associated with this issue [150 new lines that include the addition of @Deprecated to java.lang.Object.finalize() and numerous additions of @SuppressWarnings("deprecation") annotations on current JDK classes' implementations of Object.finalize() methods].

The proposed addition of Javadoc @deprecated-associated text for the Object.finalize() method restates descriptive information included in JDK-8165641 and in Roger Riggs's message. This includes the recommendations to "implement java.lang.AutoCloseable if appropriate" for "classes whose instances hold non-heap resources" and to "provide a method to enable explicit release of those resources." The descriptive information also states, "The {@link java.lang.ref.Cleaner} and {@link java.lang.ref.PhantomReference} provide more flexible and efficient ways to release resources when an object becomes unreachable." See JDK-8138696 for more background on JDK 9-introduced java.lang.ref.Cleaner. The deprecation of Object.finalize() includes the enhanced @Deprecated annotation to state since when the method has been deprecated [@Deprecated(since="9")].

Although the proposed deprecation of Object.finalize() won't remove the ability to use the Java finalizer or reduce any of its potential negative consequences, it will at least provide an even more obvious warning about the risks of using that approach and, as currently documented, provides better potential alternatives to be considered.

Thursday, March 2, 2017

Java's Finalizer is Still There

When I was first learning Java and transitioning from C++ to Java, I remember being told repeatedly and frequently reading that one should not treat the Java finalizer like C++ destructors and should not count on it. The frequency and insistent nature of this advice had such effect on me that I cannot recall the last time I wrote a finalize() method and I cannot recall ever having written one in all the years I've written, read, reviewed, maintained, modified, and debugged Java code. Until recently, however, the effects of finalize() were not something I thought much about, probably because I have not used finalize(). A recent experience with finalize() has moved the effects of Java finalizers from an "academic exercise" to a real issue "in the wild."

The method-level Javadoc document comment for Object.finalize() provides some interesting details on the Java finalizer. It begins by providing an overall description of the method, "Called by the garbage collector on an object when garbage collection determines that there are no more references to the object. A subclass overrides the finalize method to dispose of system resources or to perform other cleanup." Another portion of this Javadoc comment warns of a couple issues commonly associated with use of Java finalizers: "The Java programming language does not guarantee which thread will invoke the finalize method for any given object. It is guaranteed, however, that the thread that invokes finalize will not be holding any user-visible synchronization locks when finalize is invoked. If an uncaught exception is thrown by the finalize method, the exception is ignored and finalization of that object terminates."

Josh Bloch devotes an item in Effective Java to the subject of Java finalizers. Item 7 of Effective Java's Second Edition is titled simply and concisely, "Avoid finalizers." Although many of the items in Effective Java use verbs such as "Prefer" or "Consider," this item uses the stronger verb "Avoid." Bloch does outline some examples where finalizers might be used, but his description of the inherent issues that remain and the many things to consider to mitigate those issues persuade most of us to avoid them as much as possible.

Bloch starts Effective Java item "Avoid Finalizers" with the emphasized (in bold) statement, "Finalizers are unpredictable, often dangerous, and generally unnecessary." Bloch emphasizes that developers should "never do anything time-critical in a finalizer" because "there is no guarantee [Java finalizers will] be executed promptly" and he emphasizes that developers should "never depend on a finalizer to update critical persistent state" because there is "no guarantee that [Java finalizers will] get executed at all." Bloch cites that exceptions in finalizers are not caught and warns of the danger of this because "uncaught exceptions can leave objects in a corrupt state."

The negative effect of Java finalizers that I had recent experience with is also described by Bloch. His "Avoid finalizers" item emphasizes (in bold), "there is a severe performance penalty for using finalizers" because it takes considerably longer "to create and destroy objects with finalizers." In our case, we were using a third-party library that internally used Java class finalize() methods to deallocate native memory (C/C++ through JNI). Because there was a very large number of these objects of these classes with finalize() methods, it appears that the system thread that handles Java finalization was getting behind and was locking on objects it was finalizing.

Garbage collection was also impacted adversely with the collector kicking off more frequently than we'd normally see. We realized quickly that the garbage collection logs were indicating garbage collection issues that were not easily traceable to typical heap size issues or memory leaks of our own classes. Running the highly useful jcmd against the JVM process with jcmd <pid> GC.class_histogram helped us to see the underlying culprit quickly. That class histogram showed enough instances of java.lang.ref.Finalizer to warrant it being listed third from the top. Because that class is typically quite a bit further down the class histogram, I don't even typically see it or think about it. When we realized that three more of the the top eight instances depicted in the class histogram were three classes from the third-party library and they they implemented finalize() methods, we were able to explain the behavior and lay blame on the finalizers (four of the top eight classes in the histogram made it a pretty safe accusation).

The Java Language Specification provides several details related to Java finalizers in Section 12.6 ("Finalization of Class Instances"). The section begins by describing Java finalizers: "The particular definition of finalize() that can be invoked for an object is called the finalizer of that object. Before the storage for an object is reclaimed by the garbage collector, the Java Virtual Machine will invoke the finalizer of that object." Some of the intentionally indeterminate characteristics of Java finalizers described in this section of the Java Language Specification are quoted here (I have added any emphasis):

  • "The Java programming language does not specify how soon a finalizer will be invoked."
  • "The Java programming language does not specify which thread will invoke the finalizer for any given object."
  • "Finalizers may be called in any order, or even concurrently."
  • "If an uncaught exception is thrown during the finalization, the exception is ignored and finalization of that object terminates."

I found myself enjoying working with the team that resolved this issue because I was able to experience in "real life" what I had only read about and knew about in an "academic" sense. It is always satisfying to apply a favorite tool (such as jcmd) and to apply previous experiences (such as recognizing what looked out of place in the jcmd class histogram) to resolve a new issue.