Notice: Blog Moved

Posted: August 3, 2016 in Uncategorized

I have moved my personal blog from WordPress to Blogger. Both of them are excellent blogging platform, but I feel Blogger suits my purposes best.

As a consequence of this move, those who were subscribed to this blog will need to update their rss feed/email subscription settings and use the ones from Blogger. Sorry for the trouble!

My old articles will still remain here.

Thanks for reading! Hope to see you again in my new blog!!

Update

Please see the comments from Jean-philippe Bempel in the comment section. He mentioned a real example of how a deadlock can happen from JVM optimization. One of the reasons I like to blog as much as possible is that I can learn from the community if I misunderstood something. Thank you!

What is a volatile variable?

volatile is a keyword in Java. You cannot use this as a variable or method name. Period.

Seriously, jokes aside, what is volatile variable? When should we use it?

Ha ha, sorry, couldn’t help.

We typically use volatile keyword when we share variables with more than one thread in a multi-threaded environment, and we want to avoid any memory inconsistency errors due to the caching of these variables in the CPU cache.

Consider the following example of producer/consumer, where we are producing/consuming items one at a time –

public class ProducerConsumer {
  private String value = "";
  private boolean hasValue = false;

  public void produce(String value) {
    while (hasValue) {
      try {
        Thread.sleep(500);
      } catch (InterruptedException e) {
        e.printStackTrace();
      }
    }

    System.out.println("Producing " + value + " as the next consumable");
    this.value = value;
    hasValue = true;
  }

  public String consume() {
    while (!hasValue) {
      try {
        Thread.sleep(500);
      } catch (InterruptedException e) {
        e.printStackTrace();
      }
    }

    String value = this.value;
    hasValue = false;
    System.out.println("Consumed " + value);
    return value;
  }
}

In the above class, the produce method generates a new value by storing its argument into value, and changing the hasValue flag to true. The while loop checks if the value flag (hasValue) is true, which signifies the presence of a new value not yet consumed, and if it’s true then it requests the current thread to sleep. This sleeping loop only stops if the hasValue flag has been changed to false, which is only possible if the new value has been consumed by the consume method. The consume method requests the current thread to sleep if no new value is available. When a new value is produced by the produce method it terminates its sleeping loop, consumes it, and clears the value flag.

Now imagine that two threads are using an object of this class – one is trying to produce values (the writer thread), and another one is consuming them (the reader thread). The following test illustrates this approach –

public class ProducerConsumerTest {

  @Test
  public void testProduceConsume() throws InterruptedException {
    ProducerConsumer producerConsumer = new ProducerConsumer();
    List<String> values = Arrays.asList("1", "2", "3", "4", "5", "6", "7", "8",
        "9", "10", "11", "12", "13");
    Thread writerThread = new Thread(() -> values.stream()
        .forEach(producerConsumer::produce));
    Thread readerThread = new Thread(() -> {
      for (int i = 0; i > values.size(); i++) {
        producerConsumer.consume();
      }
    });

    writerThread.start();
    readerThread.start();

    writerThread.join();
    readerThread.join();
  }
}

This example will produce expected output in most of the times, but it also has a strong chance to run into a deadlock!

How?

Let’s talk about computer architecture a bit.

We know that a computer consists of CPUs and Memory Units (and many other parts). Even though the main memory is where all of our program instructions and variables/data reside, during program execution CPUs can store copies of variables in their internal memory (which is known as CPU cache) for performance gain. Since modern computers now have more than one CPUs, there are more than one CPU caches as well.

In a multi-threaded environment, it’s possible for more than one threads to execute at the same time, each one in a different CPU, (although this is totally dependent on the underlying OS), and each one of them may copy variables from main memory into their corresponding CPU cache. When a thread accesses these variables, they will then then access these cached copies, not the actual ones in the main memory.

Now let’s assume that the two threads in our test are running on two different CPUs, and the hasValue flag has been cached on either one of them (or both). Now consider the following execution sequence –

  1. writerThread produces a value, and changes the hasValue to true. However, this update is only reflected in the cache, not in the main memory.
  2. readerThread is trying to consume a value, but it’s cached copy of the hasValue flag is set to false. So even though a value has been produced by the writerThread, it cannot consume it as the thread cannot break out of the sleeping loop (hasValue is false).
  3. Since the readerThread is not consuming the newly generated value, writerThread cannot proceed either as the flag is not being cleared, and hence it will be stuck in its sleeping loop.
  4. And we have a deadlock in our hands!

This situation will only change if the hasValue flag is synchronized across all caches, which totally depends on the underlying OS.

What’s the solution then? And how does volatile fit into this example?

If we just mark the hasValue flag as volatile, we can be sure that this type of deadlock will not occur –

private volatile boolean hasValue = false;

Marking a variable as volatile will force each thread to read the value of that variable directly from the main memory. Also each write to a volatile variable will be flushed into the main memory immediately. If the threads decide to cache the variable, it will be synced with the main memory on each read/write.

After this change, consider the previous execution steps which led to deadlock –

  1. Writer thread produces a value, and changes the hasValue to true. This time the update will be directly reflected into the main memory (even if it’s cached).
  2. Reader thread is trying to consume a value, and checking the value of hasValue. This time every read will force the value to be fetched directly from the main memory, so it will pick up the change made by the writer thread.
  3. Reader thread consumes the generated value, and clears the value of the flag. This new value will go to the main memory (if it’s cached, then the cached copy will also be updated).
  4. Writer thread will pick up this change as every read is now accessing the main memory. It will continue to produce new values.

And voila! We are all happy ^_^ !

I see. Is this all volatile do, forcing threads to read/write variables directly from memory?

Actually it has some further implications. Accessing a volatile variable establishes a happens-before relationship between program statements.

What is a happens-before relationship?

happens-before relationship between two program statements is sort a guarantee which ensures that any memory writes by one statement are visible to another statement.

How does it relate with volatile?

When we write to a volatile variable, it creates a happens-before relationship with each subsequent read of that same variable. So any memory writes that have been done until that volatile variable write, will subsequently be visible to any statements that follow the read of that volatile variable.

Err….Ok….I sort of got it, but may be an example will be good.

Ok, sorry about the vague definition. Consider the following example –

// Definition: Some variables
private int first = 1;
private int second = 2;
private int third = 3;
private volatile boolean hasValue = false;

// First Snippet: A sequence of write operations being executed by Thread 1
first = 5;
second = 6;
third = 7;
hasValue = true;

// Second Snippet: A sequence of read operations being executed by Thread 2
System.out.println("Flag is set to : " + hasValue);
System.out.println("First: " + first);  // will print 5
System.out.println("Second: " + second); // will print 6
System.out.println("Third: " + third);  // will print 7

Let’s assume that the above two snippets being executed by two different threads – thread 1 and 2. When the first thread changes hasValue, it will not only flush this change to main memory, but it will also cause the previous three writes (and any other previous writes) to be flushed into the main memory as well! As a result, when the second thread accesses these three variables it will see all the writes made by thread 1, even if they were all cached before (and these cached copies will be updated as well)!

This is the exactly why we did not have to mark the value variable in our first example with volatile as well. Since we wrote to that variable before accessing hasValue, and read from it after reading hasValue, it was automatically synced with the main memory.

This has another interesting consequence. JVM is famous for its program optimization. Sometimes it reorders the program statements to boost performance without changing the output of the program. As an example, it can change the following sequence of statements –

first = 5;
second = 6;
third = 7;

into this –

second = 6;
third = 7;
first = 5;

However, when the statements involve accessing a volatile variable, then it will never move a statement occurring before a volatile write after it. Which means, it will never transform this –

first = 5;  // write before volatile write
second = 6;  // write before volatile write
third = 7;   // write before volatile write
hasValue = true;

into this –

first = 5;
second = 6;
hasValue = true;
third = 7;  // Order changed to appear after volatile write! This will never happen!

even though from the perspective of program correctness both of them seem to be equivalent. Note that the JVM is still allowed to reorder the first three writes among them as long as they all appear before the volatile write.

Similarly, the JVM will also not change the order of a statement which appears after a volatile variable read to appear before the access. Which means the following –

System.out.println("Flag is set to : " + hasValue);  // volatile read
System.out.println("First: " + first);  // Read after volatile read
System.out.println("Second: " + second); // Read after volatile read
System.out.println("Third: " + third);  // Read after volatile read

will never be transformed by the JVM into this –

System.out.println("First: " + first);  // Read before volatile read! Will never happen!
System.out.println("Fiag is set to : " + hasValue); // volatile read
System.out.println("Second: " + second); 
System.out.println("Third: " + third);  

However, the JVM can certainly reorder the last three reads among them, as long as they keep appearing after the volatile read.

I sense a performance penalty has to be paid for volatile variables.

You got that right, since volatile variables force main memory access, and accessing main memory is always way slower than accessing CPU caches. It also prevents certain program optimizations by JVM as well, further reducing the performance.

Can we always use volatile variables to maintain data consistency across threads?

Unfortunately not. When more than one threads read and write to the same variable, then marking it as volatile is not enough to maintain consistency. Consider the following UnsafeCounter class –

public class UnsafeCounter {
  private volatile int counter;

  public void inc() {
    counter++;
  }

  public void dec() {
    counter--;
  }

  public int get() {
    return counter;
  }
}

and the following test –

public class UnsafeCounterTest {

  @Test
  public void testUnsafeCounter() throws InterruptedException {
    UnsafeCounter unsafeCounter = new UnsafeCounter();
    Thread first = new Thread(() -> {
      for (int i = 0; i < 5; i++) { unsafeCounter.inc(); } }); Thread second = new Thread(() -> {
      for (int i = 0; i < 5; i++) {
        unsafeCounter.dec();
      }
    });

    first.start();
    second.start();
    first.join();
    second.join();

    System.out.println("Current counter value: " + unsafeCounter.get());
  }
}

The code is pretty self-explanatory. We are incrementing the counter in one thread, and decrementing it in another by same number of times. After running this test we expect the counter to hold 0, but this is not guaranteed. Most of the times it will be 0, and some of the times it will be -1, -2, 1, 2 i.e., any integer value between the range [-5, 5].

Why does this happen? It happens because both the increment and the decrement operation of the counter are not atomic – they do not happen all at once. Both of them consists of multiple steps, and the sequence of steps overlap with each other. So you can think of an increment operation as follows –

  1. Read the value of the counter.
  2. Add one to it.
  3. Write back the new value of the counter.

and an decrement operation as follows –

  1. Read the value of the counter.
  2. Subtract one from it.
  3. Write back the new value of the counter.

Now, let’s consider the following execution steps –

  1. First thread has read the value of the counter from memory. Initially it’s set to zero. It then adds one to it.
  2. Second thread has also read the value of the counter from memory, and saw that it’s set to zero. It then subtracts one from it.
  3. First thread now writes back the new value of counter to memory, changing it to 1.
  4. Second thread now writes back the new value of counter to memory, which is -1.
  5. First thread’s update is lost.
How do we prevent this?

By using synchronization –

public class SynchronizedCounter {
  private int counter;

  public synchronized void inc() {
    counter++;
  }

  public synchronized void dec() {
    counter--;
  }

  public synchronized int get() {
    return counter;
  }
}

Or by using an AtomicInteger

public class AtomicCounter {
  private AtomicInteger atomicInteger = new AtomicInteger();

  public void inc() {
    atomicInteger.incrementAndGet();
  }

  public void dec() {
    atomicInteger.decrementAndGet();
  }

  public int get() {
    return atomicInteger.intValue();
  }
}

My personal choice is the one using AtomicInteger as the synchronized one hampers performance greatly by allowing only one thread to access any of the inc/dec/get methods.

I notice that the synchronized version does not mark the counter as volatile. Does this mean……..?

Yup. Using the synchronized keyword also establishes a happens-before relationship between statements. Entering a synchronized method/block establishes a happens-before relationship between the statements that appear before it and the ones inside the method/block. For a full list of what establishes a happens-before relationship, please go here.

That’s all I have to say about volatile for the time being. All the examples have been uploaded in my github repo.

The following is a conversation that I was having with myself a few days ago while revisiting some old Java concepts.

What is constructor chaining?

When we create an object in Java, all the constructors in the inheritance hierarchy are called and run. This is known as constructor chaining.

Can you show me an example?

Yes.

import java.util.*;
import java.lang.*;
import java.io.*;

class Ideone
{
  class Parent {
    public Parent() {
      System.out.println("This is parent");
    }
  }

  class Child extends Parent {
    public Child() {
      System.out.println("This is child");
    }
  }

  public static void main (String[] args)
  {
    Ideone one = new Ideone();
    Child c = one.new Child();
  }
}

If you run the above example, you will see the following output –

This is parent
This is child

As you can see, the compiler implicitly put a call to the no-arg constuctor of the parent from the child, which resulted in the above output.

What will happen if the parent class does not have any constructor?

Each and every class in Java has a constructor. If you do not write one explicitly, then the compiler will provide a no-arg constructor by default.

What will happen is the parent class does not have a no-arg constructor, like this –

import java.util.*;
import java.lang.*;
import java.io.*;

class Ideone
{
  class Parent {
    public Parent(String hello) {
      System.out.println(hello);
    }
  }

  class Child extends Parent {
    public Child() {
      System.out.println("This is child");
    }
  }

  public static void main (String[] args)
  {
    Ideone one = new Ideone();
    Child c = one.new Child();
  }
}

A compile-time error will be issued if you do not explicitly call any of the available parent constructors, because in this case the compiler tries to automatically call the no-arg constructor of the parent, and since it does not have any, an error will occur.

But I thought the compiler will always define the no-arg constructor for me?

Nope.

The moment you defined a constructor for the parent by yourself, the compiler stopped interfering. Which means, now it will not automatically define the default constructor for you. If you want a no-arg constructor now, you will have to define one by yourself.

So if I now explicitly define a no-arg constructor in the parent, the error will be resolved?

That is one of the two ways to solve it. The other one is given below –

import java.util.*;
import java.lang.*;
import java.io.*;

class Ideone
{
  class Parent {
    public Parent(String hello) {
      System.out.println(hello);
    }
  }

  class Child extends Parent {
    public Child() {
      super("Hi Parent!");
      System.out.println("This is child");
    }
  }

  public static void main (String[] args)
  {
    Ideone one = new Ideone();
    Child c = one.new Child();
  }
}

Now you will not get any error!

Using super you can explicitly call a parent constructor, providing the required arguments and thus choosing an appropriate overloaded version. This is exactly how the compiler called the parent constructors in the first and the second examples, except the super call was invisible to us. The compiler automatically put it when it compiled our code.

You need to be aware of one thing though – the super call should be the first statement of the child constructor, otherwise the compiler will throw an error. As a consequence, you cannot use super() and this() in the same constructor at the same time.

What is this()?

You use this() to call the constructor of the same class. Usually you use it to call an overloaded version of the constructor which contains common initialization logic for the class, like below –

import java.util.*;
import java.lang.*;
import java.io.*;

class Ideone
{
  class AClass {
    public AClass() {
      this("Say Hi!");
      System.out.println("This is default");
    }

    public AClass(String hi) {
      System.out.println(hi);
      System.out.println("This is with one arg");
    }
  }

  public static void main (String[] args)
  {
    Ideone one = new Ideone();
    AClass c = one.new AClass();
  }
}

If you run the above example, you will see –

Say Hi!
This is with one arg
This is default

 

If I use this() inside a constructor to call another constructor, will the parent constructor be called twice?

Nope –

import java.util.*;
import java.lang.*;
import java.io.*;

class Ideone
{
  class Parent {
    public Parent() {
      System.out.println("Greetings, underlings!");
    }
  }

  class Child extends Parent {
    public Child() {
      this("Hi single-arg child!");
      System.out.println("This is no-arg child");
    }

    public Child(String sayHi) {
      System.out.println(sayHi);
      System.out.println("This is single-arg child");
    }
  }

  public static void main (String[] args) throws java.lang.Exception
  {
    Ideone one = new Ideone();
    Child c = one.new Child();
  }
}

The output of the above program is

Greetings, underlings!
Hi single-arg child!
This is single-arg child
This is no-arg child

 

We know that Object is implicitly extended by each and every class in Java. Does that mean its no-arg constructor is also invoked during constructor chaining?

Yes, that’s right. Every constructor chain ends with a call to the no-arg constructor of Object.

I feel sleepy. It will be nice to have a summary of our conversation at this point.

Sure.

  • When the parent has no/only the default no-arg constructor –
    1. If the child constructor does not explicitly invoke the parent constructor, then the compiler inserts one call to the default parent constructor as the first line of the child constructors.
    2. If the child explicitly calls the parent constructor, then no automatic call to parent constructor is issued.
  • When the parent has default/no-arg constructor as well as other overloaded constructor(s) –
    1. If the child constructor does not explicitly invoke the parent constructor, then the compiler inserts one call to the default parent constructor as the first line of the child constructors.
    2. If the child explicitly calls any of the parent constructors (does not matter which one) then no automatic calls to parent is issued.
  • When the parent has no default/no-arg constructors, that means it has explicit constructors and all of them requires arguments –
    1. If the child does not explicitly invoke any of the existing parent constructors, then a compile time error is issued.
    2. If the child constructor explicitly calls any of the existing parent constructors then the program continues without any error.

Thank you. Let’s have this type of conversation again. Good night.

Sure. Anytime.

Good night to you too.

In my last article I showed two different ways to read/write persistent entity state – field and property. When field access mode is used, JPA directly reads the state values from the fields of an entity using reflection. It directly translates the field names into database column names if we do not specify the column names explicitly.  In case of property access mode, the getter/setter methods are used to read/write the state values. In this case we annotate the getter methods of the entity states instead of the fields using the same annotations. If we do not explicitly specify the database column names then they are determined following the JavaBean convention, that is by removing the “get” portion from the getter method name and converting the first letter of the rest of the method name to lowercase character.

We can specify which access mode to use for an entity by using the @Access annotation in the entity class declaration. This annotation takes an argument of type AccessType (defined in the javax.persistence package) enum, which has two different values corresponding to two different access modes – FIELD and PROPERTY. As an example, we can specify property access mode for the Address entity in the following way –

@Entity
@Table(name = "tbl_address")
@Access(AccessType.PROPERTY)
public class Address {
  private Integer id;
  private String street;
  private String city;
  private String province;
  private String country;
  private String postcode;
  private String transientColumn;

  @Id
  @GeneratedValue
  @Column(name = "address_id")
  public Integer getId() {
    return id;
  }

  public Address setId(Integer id) {
    this.id = id;
    return this;
  }

  public String getStreet() {
    return street;
  }

  public Address setStreet(String street) {
    this.street = street;
    return this;
  }

  public String getCity() {
    return city;
  }

  public Address setCity(String city) {
    this.city = city;
    return this;
  }

  public String getProvince() {
    return province;
  }

  public Address setProvince(String province) {
    this.province = province;
    return this;
  }

  public String getCountry() {
    return country;
  }

  public Address setCountry(String country) {
    this.country = country;
    return this;
  }

  public String getPostcode() {
    return postcode;
  }

  public Address setPostcode(String postcode) {
    this.postcode = postcode;
    return this;
  }
}

Couple of points to note about the above example –

  1. As discussed before, we are now annotating the getter method of the entity id with the @Id@GeneratedValue and @Column annotations.
  2. Since now column names will be determined by parsing the getter methods, we do not need to mark the transientColumn field with the @Transient annotation anymore. However if Address entity had any other method whose name started with “get”, then we needed to apply @Transient on it.

If an entity has no explicit access mode information, just like our Address entity that we created in the first part of this series, then JPA assumes a default access mode. This assumption is not made at random. Instead, JPA first tries to figure out the location of the @Id annotation. If the @Id annotation is used on a field, then field access mode is assumed. If the @Id annotation is used on a getter method, then property access mode is assumed. So even if we remove the @Access annotation from the Address entity in the above example the mapping will still be valid and JPA will assume property access mode –

@Entity
@Table(name = "tbl_address")
public class Address {
  private Integer id;
  private String street;
  private String city;
  private String province;
  private String country;
  private String postcode;
  private String transientColumn;

  @Id
  @GeneratedValue
  @Column(name = "address_id")
  public Integer getId() {
    return id;
  }

  // Rest of the class........
  

Some important points to remember about the access modes –

  1. You should never declare a field as public if you use field access mode. All fields of the entity should have either private (best!), protected or default access type. The reason behind this is that declaring the fields as public will allow any unprotected class to directly access the entity states which could defeat the provider implementation easily. For example, suppose that you have an entity whose fields are all public. Now if this entity is a managed entity (which means it has been saved into the database) and any other class changes the value of its id, and then you try to save the changes back to the database, you may face unpredictable behaviors (I will try to elaborate on this topic in a future article). Even the entity class itself should only manipulate the fields directly during initialization (i.e., inside the constructors).
  2. In case of property access mode, if we apply the annotations on the setter methods rather than on the getter methods, then they will simply be ignored.

It’s also possible to mix both of these access types. Suppose that you want to use field access mode for all but one state of an entity, and for that one remaining state you would like to use property access mode because you want to perform some conversion before writing/after reading the state value to and from the database. You can do this easily by following the steps below –

  1. Mark the entity with the @Access annotation and specify AccessType.FIELD as the access mode for all the fields.
  2. Mark the field for which you do not like to use the field access mode with the @Transient annotation.
  3. Mark the getter method of the property with the @Access annotation and specify AccessType.PROPERTY as the access mode.

The following example demonstrates this approach as the postcode has been changed to use property access mode –

@Entity
@Table(name = "tbl_address")
@Access(AccessType.FIELD)
public class Address {
  @Id
  @GeneratedValue
  @Column(name = "address_id")
  private Integer id;

  private String street;
  private String city;
  private String province;
  private String country;
 
  /**
    * postcode is now marked as Transient
    */
  @Transient
  private String postcode;
 
  @Transient
  private String transientColumn;

  public Integer getId() {
    return id;
  }

  public Address setId(Integer id) {
    this.id = id;
    return this;
  }

  public String getStreet() {
    return street;
  }

  public Address setStreet(String street) {
    this.street = street;
    return this;
  }

  public String getCity() {
    return city;
  }

  public Address setCity(String city) {
    this.city = city;
    return this;
  }

  public String getProvince() {
    return province;
  }

  public Address setProvince(String province) {
    this.province = province;
    return this;
  }

  public String getCountry() {
    return country;
  }

  public Address setCountry(String country) {
    this.country = country;
    return this;
  }

  /**
    * We are now using property access mode for reading/writing
    * postcode
    */
  @Access(AccessType.PROPERTY)
  public String getPostcode() {
    return postcode;
  }

  public Address setPostcode(String postcode) {
    this.postcode = postcode;
    return this;
  }
}

The important thing to note here is that if we do not annotate the class with the @Access annotation to explicitly specify the field access mode as the default one, and we annotate both the fields and the getter methods, then the resultant behavior of the mapping will be undefined. Which means the outcome will totally depend on the persistence provider i.e., one provider might choose to use the field access mode as default, one might use property access mode, or one might decide to throw an exception!

That’s it for today. If you find any problems/have any questions, please do not hesitate to comment!

Until next time.

Resources

  1. Pro JPA 2 by Mike Keith, Merrick Schincariol
  2. Java Persistence WikiBook

In my last post I showed a simple way of persisting an entity. I explained the default approach that JPA uses to determine the default table for an entity. Let’s assume that we want to override this default name. We may like to do so because the data model has been designed and fixed before and the table names do not match with our class names (I have seen people to create tables with “tbl_” prefix, for example). So how should we override the default table names to match the existing data model?

Turns out, it’s pretty simple. If we need to override the default table names assumed by JPA, then there are a couple of ways to do it –

  1. We can use the name attribute of the @Entity annotation to provide an explicit entity name to match with the database table name. For our example we could have used @Entity(name = “tbl_address”) in our Address class if our table name was tbl_address.
  2. We can use a @Table (defined in the javax.persistence package) annotation just below the @Entity annotation and use its name attribute to specify the table name explicitly –
@Entity
@Table(name = "tbl_address")
public class Address {
  // Rest of the class
}

From these two approaches the @Table annotation provides more options to customize the mapping. For example, some databases like PostgreSQL have a concept of schemas, using which you can further categorize/group your tables. Because of this feature you can create two tables with the same name in a single database (although they will belong to two different schemas). To access these tables you then add the schema name as the table prefix in your query. So if a PostgreSQL database has two different schemas named public (which is sort of like default schema for a PostgreSQL database) and document, and both of these schemas contain tables named document_collection, then both of these two queries are perfectly valid –

-- fetch from the table under public schema
SELECT *
FROM   public.document_collection;

-- fetch from the table under document schema
SELECT *
FROM   document.document_collection;

In order to map an entity to the document_collection table in the document schema, you will then use the @Table annotation with its schema attribute set to document

@Entity
@Table(name="document_collection", schema="document")
public class DocumentCollection {
  // rest of the class
}

When specified this way, the schema name will be added as a prefix to the table name when the JPA goes to the database to access the table, just like we did in our queries.

What if rather than specifying the schema name in the @Table annotation you append the schema name in the table name itself, like this –

@Entity
@Table(name = "document.document_collection")
public class DocumentCollection {
  // rest of the class
}

Inlining the schema name with the table name this way is not guaranteed to work across all JPA implementations because support for this is not specified in the JPA specification (non-standard). So it’s better if you do not make a habit of doing this even if your persistence provider supports it.

Let’s turn our attention to the columns next. In order to determine the default columns, JPA does something similar to the following –

  1. At first it checks to see if any explicit column mapping information is given. If no column mapping information is found, it tries to guess the default values for columns.
  2. To determine the default values, JPA needs to know the access type of the entity states i.e., the way to read/write the states of the entity. In JPA two different access types are possible – field and property. For our example we have used the field access (actually JPA assumed this from the location/placement of the @Id annotation,  but more on this later). If you use this access type then states will be written/read directly from the entity fields using the Reflection API.
  3. After the access type is known, JPA then tries to determine the column names. For field access type JPA directly treats the field name as the column names, which means if an entity has a field named status then it will be mapped to a column named status.

At this point it should be clear to us how the states of the Address entities got saved into the corresponding columns. Each of the fields of the Address entity has an equivalent column in the database table tbl_address, so JPA directly saved them into their corresponding columns. The id field was saved into the id column, city field into the city column and so on.

OK then, let’s move on to overriding column names. As far as I know there is only one way (if you happen to know of any other way please comment in!) to override the default column names for entity states, which is by using the @Column (defined in the javax.persistence package) annotation. So if the id column of the tbl_address table is renamed to be address_id then we could either change our field name to address_id, or we could use the @Column annotation with its name attribute set to address_id

@Entity
@Table(name = "tbl_address")
public class Address {
  @Id
  @GeneratedValue
  @Column(name = "address_id")
  private Integer id;

  // Rest of the class
}

You can see that for all the above cases the default approaches that JPA uses are quite sensible, and most of the cases you will be happy with it. However, changing the default values are also very easy and can be done very quickly.

What if we have a field in the Address entity that we do not wish to save in the database? Suppose that the Address entity has a column named transientColumn which does not have any corresponding default column in the database table –

@Entity
@Table(name = "tbl_address")
public class Address {
  @Id
  @GeneratedValue
  @Column(name = "address_id")
  private Integer id;

  private String street;
  private String city;
  private String province;
  private String country;
  private String postcode;
  private String transientColumn;

  // Rest of the class
}

If you compile your code with the above change then you will get an exception which looks something like below –

Exception in thread “main” java.lang.ExceptionInInitializerError
at com.keertimaan.javasamples.jpaexample.Main.main(Main.java:33)
Caused by: javax.persistence.PersistenceException: Unable to build entity manager factory
at org.hibernate.jpa.HibernatePersistenceProvider.createEntityManagerFactory(HibernatePersistenceProvider.java:83)
at org.hibernate.ejb.HibernatePersistence.createEntityManagerFactory(HibernatePersistence.java:54)
at javax.persistence.Persistence.createEntityManagerFactory(Persistence.java:55)
at javax.persistence.Persistence.createEntityManagerFactory(Persistence.java:39)
at com.keertimaan.javasamples.jpaexample.persistenceutil.PersistenceManager.<init>(PersistenceManager.java:31)
at com.keertimaan.javasamples.jpaexample.persistenceutil.PersistenceManager.<clinit>(PersistenceManager.java:26)
… 1 more
Caused by: org.hibernate.HibernateException: Missing column: transientColumn in jpa_example.tbl_address
at org.hibernate.mapping.Table.validateColumns(Table.java:365)
at org.hibernate.cfg.Configuration.validateSchema(Configuration.java:1336)
at org.hibernate.tool.hbm2ddl.SchemaValidator.validate(SchemaValidator.java:155)
at org.hibernate.internal.SessionFactoryImpl.<init>(SessionFactoryImpl.java:525)
at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1857)
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl$4.perform(EntityManagerFactoryBuilderImpl.java:850)
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl$4.perform(EntityManagerFactoryBuilderImpl.java:843)
at org.hibernate.boot.registry.classloading.internal.ClassLoaderServiceImpl.withTccl(ClassLoaderServiceImpl.java:398)
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl.build(EntityManagerFactoryBuilderImpl.java:842)
at org.hibernate.jpa.HibernatePersistenceProvider.createEntityManagerFactory(HibernatePersistenceProvider.java:75)
… 6 more

The exception is saying that the persistence provider could not find any column in the database whose name is transientColumn, and we did not do anything to make it clear to the persistence provider that we do not wish to save this field in the database. The persistence provider took it as any other fields in the entity which are mapped to database columns.

In order to fix this problem, we can do any of the following –

  1. We can annotate the transientColumn field with the @Transient (defined in javax.persistence package) annotation to let the persistence provider know that we do not wish to save this field, and it does not have any corresponding column in the table.
  2. We can use the transient keyword that Java has by default.

The difference between these two approaches that comes to my mind is that, if we use the transient keyword instead of the annotation, then if one of the Address entities gets serialized from one JVM to another then the transientColumn field will get reinitialized again (just like any other transient fields in Java). For the annotation, this will not happen and the transientColumn field will retain its value across the serialization. As a rule of thumb, I always use the annotation if I do not need to worry about serialization (and in most of the cases I don’t).

So using the annotation, we can fix the problem right away –

@Entity
@Table(name = "tbl_address")
public class Address {
  @Id
  @GeneratedValue
  @Column(name = "address_id")
  private Integer id;

  private String street;
  private String city;
  private String province;
  private String country;
  private String postcode;

  @Transient
  private String transientColumn;

  // Rest of the class
}

So that’s it for today folks. If you find any mistakes/have any input, please feel free to comment in!

Until next time.