Feeds:
Posts
Comments

Archive for January, 2010

After the previous post, I was still uncertain about the best and safe way to manage a cache of objects without paying too much price for the apparently unvoidable synchronization!

Then I found these articles:


Concurrent structures and collections in Java 5

Java theory and practice: Concurrent collections classes

So it suffices to replace the class HashMap by ConcurrentHashMap, like this:


private ConcurrentMap<String, HeavyObject> cache 
  = new ConcurrentHashMap<String, HeavyObject>();

public HeavyObject getHeavyObject(String key) {
    HeavyObject anObject = cache.get(key);
    if (anObject  == null) {
        anObject = getHeavyObjectFromDataBase(key);
        cache.put(key, anObject);
    }
    return anObject;
}

It is not perfect still, because there is a possibility that two threads add the same object in a very short time. But at least, when an object is already in the cache, there is apparently a very little overhead compared to traditionnal synchronization and there is no risk of corrupting the map by concurrent updates.

Notice that I am not using the method putIfAbsent because anyway I will have paid the price of retrieving the object from the database, so, I prefer refreshing the cache with the most current object.

If you use JMX to reload objects after they have been modified externaly in the database, you need simply to do this:


public void reloadObject(String key) {
    anObject = getHeavyObjectFromDataBase(key);
    cache.put(key, anObject);   
}

This is the best way I found so far. I will use it now for all projects.

Thank you Neil Coffey and Brian Goetz !

Read Full Post »

Let’s say that you need random access to some persisted objects and that retrieving them is a costly operation.

You will naturally come up with the idea of putting these objects into a cache. Each time you need a particular object, you will first look in the cache, in case it has already been read. If it has not, you will read it from the database or the related service and you will put it in the cache for later use, before delivering it to the requester. This is a classical method.

For example, assuming we want to put instances of HeavyObject in cache and that the key of these objects is a String:


private Map<String, HeavyObject> cache = new HashMap<String, HeavyObject>();

public synchronized HeavyObject getHeavyObject(String key) {
    HeavyObject anObject = cache.get(key);
    if (anObject  == null) {
        anObject = getHeavyObjectFromDataBase(key);
        cache.put(key, anObject);
    }
    return anObject;
}

If you are in a multi-threaded environment, you need to synchronize the getHeavyObject() method. Otherwise the cache map could be (and will be) corrupted by concurrent modification. In simple case, where the cached objects are not mutable it could not be a big deal, resulting only in reading more objects than necessary. But in more complex situations, it could simply be not acceptable.

With synchronization you ensure the data integrity. The problem is that you pay a price for this synchronization even when the hit ratio is growing high, meaning that most objects requested happen to be already in the cache.

You could be tempted by a technique called “double-check”, to cancel the synchronization time penalty. I will not describe this technique here, first because it has been explained numerous times and you can easily find articles on it. But I will simply say that it is proved that this technique just don’t work, due to the way the JVM optimizes its memory accesses.

So, what else can we do?

I use the following approach and never had any problem with it.

I use two caches, the primary cache and the secondary one.


private Map<String, HeavyObject> primaryCache = 
                            new HashMap<String, HeavyObject>();
private Map<String, HeavyObject> secondCache = 
                            new HashMap<String, HeavyObject>();

public HeavyObject getHeavyObject(String key) {
    HeavyObject anObject = primaryCache.get(key);
    if (anObject  == null) {
        synchronized(this) {
            anObject = secondCache.get(key);
            if (anObject  == null) {
                anObject = getHeavyObjectFromDataBase(key);
                secondCache.put(key, anObject);
                primaryCache = 
                    new HashMap<String, HeavyObject>(secondCache);
            }
        }
    }
    return anObject;
}

Initially both caches are empty. When an object is requested, we first look into the primary cache without synchronization. If it is not found then when entered a synchronized block which ensures that only one thread at a time can execute this piece of code.

If the record is not found in the second cache either, it is retrieved from the database and added to the secondary cache. Then, the secondary cache is simply cloned to make the primary cache.

After this process, both caches contain the same set of objects.

Now let’s say that thread A look for the object 1. And a few micro-seconds later, thread B looks for the very same object. Both lookups will fail because the object 1 is not in the cache. Then only one thread begins the synchronized block, let’s say thread A, while thread B is waiting for the lock to be released.

When thread A will release the lock, the object 1 will have been put into both caches. Thread B will enter the synchronized block but will not do anything since it will find the object in the secondary cache.

By this technique we can have the best of both worlds. No synchronization when there is a hit, and a synchronized update of the cache.

I have just finished to explain this pattern and, quite frankly, I am not so sure anymore that it is bulletproof!

May be I was just lucky so far!

Have I just re-invented the “double-check” on disguise?

So, if you are kings of the JVM, your input would be more than welcomed!

Read Full Post »

Scala part 2 – properties

In the preceding post of this series we have scratched the surface of the Scala programming language.

Today we will continue to dissect our small first example, before adding more flesh to the bone.

This example was:

package com.jmg.insurance

class Product (insuranceAmount: Int, deductible: Int) {
  val rate = .05
  def premium = insuranceAmount * rate
}

object Main {
  def main(args: Array[String]): Unit = {
    val p1 = new Product(10000, 500)
    println(p1)
  }
}

There is a few little things not explained yet. First, you have probably already guessed that in Scala, like in most object-oriented languages, the members of a class may be of one of this two sorts:

  • a variable (or a value)
  • a method

In Scala, methods start with the keyword def. Think of define or definition.

A variable declaration start with the keyword var, while a value declaration start with the keyword val.

The only difference between a var and a val is that the former is mutable and the latter is not.

If we would try to modify the value of rate after it has been declared, this won’t compile:

package com.jmg.insurance

class Product (insuranceAmount: Int, deductible: Int) {
  val rate = .05
  rate = .06                 // does not compile
  def premium = insuranceAmount * rate
}

But if we change the code like this:

package com.jmg.insurance

class Product (insuranceAmount: Int, deductible: Int) {
  var rate = .05
  rate = .06                 // now it works 
  def premium = insuranceAmount * rate
}

Now it compiles just fine. Of course it is quite a contrived illustration!

So this sentence in Scala: val rate = .05 is roughly equivalent to this Java statement: public final Double rate = .05;

Every method, value or variable in Scala is public by default. Thus, there is no public modifier, but there is a private or a protected one.

For every public value in a class, Scala adds transparently an accessor method of the same name, for example, with this:

package com.jmg.insurance

class Product (insuranceAmount: Int, deductible: Int) {
  val rate = .05
  def premium = insuranceAmount * rate
}

object Main {
  def main(args: Array[String]): Unit = {
    val p1 = new Product(10000, 500)
    println(p1.rate)
  }
}

Since rate is public, we are allowed to called the generated accessor method:

p1.rate

You may wonder what is this construct, starting with object, after the product class declaration?

Your trained Java expert eyes will have guessed that it contains the famous main method, used to bootstrap a program.

In Java, we need to have a class with a static method called main to trigger the execution of a system.

In Scala, there is no such thing as a static method ! But we can define a singleton object by replacing class by object as the first keyword.

This singleton object’s name could be anything instead of Main. However, the bootstrap method, like in Java, must be called main.

It must declare an array of String as its only parameter. Command line parameters are transmitted this way.

Notice that the syntax to declare an array of strings in Scala is:
arrayName: Array[String]
instead of String[] arrayName as in Java.

Compared to Java, there is a lot less “special cases” in Scala. In Java, an array is a special beast, implemented as a language construct. In Scala, an array is just another class from the library.

The class Array is generic. You have to declare which type of elements you want the array to store.

Therefore the syntax is: args: Array[String]. You can read this: declare args as an array of strings.

Now back to the product class example.

package com.jmg.insurance

class Product (insuranceAmount: Int, deductible: Int) {
  var rate = .05
  def premium = insuranceAmount * rate
}

object Main {
  def main(args: Array[String]): Unit = {
    val p1 = new Product(10000, 500)
    p1.rate = .06
    println(p1)
  }
}

If we choose to declare var rate = .05 Scala would add transparently an accessor method (as with the val example ) and a mutator method, allowing us to modify rate from the outside of the class. (see line 11)

I can hear your disagreement ! This is just a blatant violation of the encapsulation principle !

But it is not ! While it looks like if we are changing a variable of the class from the outside, in fact we are just calling a mutator method, automatically generated by the Scala compiler. It is exactly like if we would have written:

package com.jmg.insurance

class Product (var insuranceAmount: Int, deductible: Int) {
  private var internal_rate = 0.05
  def rate = internal_rate
  def rate_= (r: Double) {internal_rate = r}
}

object Main {
  def main(args: Array[String]): Unit = {
    val p1 = new Product(10000, 500)
    p1.rate = 0.06
    println(p1.rate)
  }
}

This time, we have defined a private variable and named it internal_rate. Then we have defined two public methods, one named rate to return the variable value and the other rate_=(r: Double), the mutator. Ending a method name with _= , like rate_= is the way to declare a mutator method that will be called like this: rate = 0.06. Notice that is just syntactic sugar, We can also call it like this: rate_=(0.06) but the first form is much more interesting, isn’t it?

Notice that with implicit or explicit definition of the methods, the client code is the same. It means that you can safely use the simplest form, the implicit definition and, later on, if you want to react at the property changing event and validate for example that this property modification is not breaking your object integrity, you can switch to the explicit definition of the two methods, and the client code won’t be affected.

Read Full Post »