Ch 31 -- Persistence and Java Serialization

Java 1.1 Unleashed

- 31 -
Persistence and Java Serialization

by Jim Mathis

IN THIS CHAPTER

Introducing Object Persistence
Using Java Object Serialization
Making Your Objects Persistent
Looking Inside Object Serialization
Introducing Persistent Stores

Your Java programs create and process various pieces of data or information. Some information is temporary, such as the visual elements of the AWT display; other pieces of information have meaning well beyond the execution lifetime of your program. In a traditional system, such permanent information is stored in files or databases; various conventions such as file types and extensions are used to identify the format of the data or the programs that can process the data. This model perpetuates the structured programming view of the separation of code and data.

In object-oriented languages such as Java, data and code are tightly bound together so that one does not exist without the other. To support permanent information in an object-oriented manner, you require a scheme to preserve the qualities of an object-oriented language when the object instances are temporarily written to a file and later restored. Persistence in an object-oriented programming language deals with the capability of objects to exist beyond the lifetime of the program that created them.

In this chapter, you learn about object persistence in general; more specifically, you learn how to make objects persistent using the new facilities provided in Java Development Kit version 1.1--the Serialization and Externalization interface classes. You indirectly use serialization in the Remote Method Invocation package, covered in Chapters 17, "The RMI Package," and 54, "Remote Objects and the Java IDL System," and in JavaBeans components, covered in Chapters 38 through 41.
Introducing Object Persistence

Persistence describes something that exists beyond its expected lifetime or that lasts after program completion (for example, a network drive-letter assignment). As applied to an object-oriented programming language, persistence describes objects that exist beyond the scope, in terms of time or location, of the original program that created the objects. You can store a persistent object in a file for later use or transmit a persistent object to another machine.

To provide persistence, you need the following:

A way of converting the in-memory layout of objects into a serial, byte-stream form suitable for storage or for Internet transmission.

A way of creating an object from the serial form that preserves the object-oriented properties of the programming language and produces an object identical to the original.

A mechanism to trigger saving or restoring this object state, either automatically or on command.

Extending an Object's Lifetime

Java objects have a lifetime. An object begins its life when it is created by the new operator (for example, new String("hi")). After it is created, the object exists until it is destroyed by the Java virtual machine's garbage collector. (An object can be garbage collected only when the Java program no longer holds a reference to the object.)

To understand your need for persistent objects, consider an AddressBook class that contains names and addresses. You enter information into an address book so that it is available when you need it at a later date. If you use an AddressBook class to represent a real address book, you find that it does not support the "save it now, use it later" paradigm. All instances of the AddressBook class are destroyed when the Java program ends. To be useful, your AddressBook objects must exist for an extended period of time; they must be persistent.

Persistence is usually implemented by preserving the state (attributes) of an object between executions of the program. To preserve the state, the object is converted to a sequence of bytes (that is, it is serialized) and put on some kind of long-term storage media (usually a disk). When the object is needed again, it is restored from the long-term media; the restoration process creates a new Java object identical to the original. Although the restored object is not "the same object," its state and behavior are identical.

Persistence is different from the load() and save() behavior of the Properties class. In the Properties class, the data is limited to being a string, and only the contents of the strings are stored. Any subclass of Properties can load data stored by any other, possibly incompatible, subclass of Properties; no object attributes are stored. With persistence, all object attributes (such as the class name, field names, and access modifiers) are associated with the stored data so that accidentally restoring the data to the wrong type of object is prevented. Generally, objects are restored as exactly the same class they were before they were saved; however, interoperability between different versions of the same class can be supported.
What Does JDK 1.1 Support?

Java supports object persistence by providing standard mechanisms for the encoding and decoding of objects from their in-memory form to a byte-stream format. No longer is object persistence a task each programmer must independently implement for himself or herself. JDK 1.1 adds the following persistence-related classes for you to use:

Class Description

ObjectOutputStream Use this output stream to convert objects from the in-memory form to serial form. This stream implements the ObjectOutput interface.

ObjectInputStream Use this input stream to restore objects from the serial form. This stream implements the ObjectInput interface.

Serializable Implement this interface to indicate that the class can be converted to a serial form and to define methods that can be overridden to control the encoding of the class.

Externalizable Implement this subclass of Serializable to define methods that provide complete control over the encoding process of the object.

ObjectInputValidation Use this callback interface to validate the decoding of an object.

The JDK 1.1 provides facilities for the versioning of class implementations while preserving the strong type-checking and type-safe casting provided by Java. It is inevitable that the implementation of a particular class will evolve over time; you may have to add methods or fields, or implement additional interfaces, to support new versions of the JDK or to add functionality to your classes. It is unacceptable to lose stored objects whenever you make any change to the class definition. Java provides a form of interoperation between different versions of the same class as long as you follow some simple rules in creating the new version of an existing class. Versioning is covered in more detail in "Supporting Class Versioning," later in this chapter.

In the next section, you learn how to use these classes to make objects persistent.
Using Java Object Serialization

The JDK 1.1 uses a manually controlled object persistence scheme; the storage and retrieval of persistent Java objects is completely under your control. To make an object persistent, you create an ObjectOutputStream and serialize the object by calling writeObject(). You specify where the output of the ObjectOutputStream, and hence the object, are sent. Wrap the ObjectOutputStream around a FileOutputStream, and you write the state of the object to a file as shown in the following code:

FileOutputStream fout = new FileOutputStream(filename); ObjectOutputStream out = new ObjectOutputStream(fout); out.writeObject(obj);

Wrap the ObjectOutputStream around an OutputStream retrieved from a socket (using getOutputStream()), and you transfer the state of the object (but not the code) across the Internet to another machine as shown here:

Socket s = new Socket(remotehost, remoteport); ObjectOutputStream out = new ObjectOutputStream(s.getOutputStream()); out.writeObject(obj);

Wrap the ObjectOutputStream around a ByteArrayOutputStream, and you can access the encoded form of the object for storage in a database or for other processing using the following code:

ByteArrayOutputStream bout = new ByteArrayOutputStream(); ObjectOutputStream out = new ObjectOutputStream(bout); out.writeObject(obj); Byte barray[] = bout.toByteArray();

Conversely, to restore an object, you create an ObjectInputStream and call readObject(). You wrap ObjectInputStream around various types of input streams in a way similar to what you did with the ObjectOutputStream and the various types of output streams. The next object in the stream is reconstituted and returned from the readObject() call. Because the returned type of readObject() is Object, you usually want to cast the returned object to another type. Java's type-safe casting ensures that you do not cast the object to an incorrect type. You can also use instanceof or methods in the object reflection package to determine the type of received objects.

The next section provides a more detailed description of the serialization API.
The Serialization API

Table 31.1 describes the constructor and commonly used methods for the ObjectOutputStream class; Table 31.2 describes the constructor and commonly used methods for the ObjectInputStream class. Because these streams also implement the DataOutput or DataInput interface (as appropriate), methods are provided for writing Java data types to the stream in a standard, machine-independent format. Note that a few of the methods can be called only while the stream is actively encoding or decoding an object; these methods are covered in more detail in "Making Your Objects Persistent," later in this chapter.
Table 31.1. Commonly used ObjectOutputStream constructor and methods.

Constructor/Method Description

ObjectOutputStream(OutputStream) Creates an ObjectOutputStream that writes the serialized object to the indicated OutputStream.

writeObject(Object) Serializes and writes the object to the OutputStream.

close() Closes the stream.

flush() Flushes the stream, forcing a write of any buffered data to the underlying OutputStream.

reset() Resets the encoding state of the ObjectOutputStream, effectively discarding any objects already written to the stream, although the data has already been written.

writeInt(int) Writes a 32-bit int to the stream.

writeUTF(String) Writes a String to the stream in UTF format.

defaultWriteObject() Writes the nonstatic and nontransient fields of the current class to this stream using the default encoding format. You can call this method only from the writeObject() method of the class being encoded.

Table 31.2. Commonly used ObjectInputStream constructor and methods.

Constructor/Method Description

ObjectInputStream(InputStream) Creates an ObjectInputStream that reads from the specified InputStream.

readObject() Reads an object from the input stream; is the opposite of writeObject().

close() Closes the input stream.

available() Returns the number of bytes that can be read without blocking.

readInt() Reads a 32-bit int from the stream.

readUTF() Reads a UTF-formatted String from the stream.

defaultReadObject() Reads the nonstatic and nontransient fields of the current class from the stream, ensuring the standard encoding format. You can call this method only from the readObject()method of the class being decoded.

registerValidation Registers a handler to validate the restored object.

(ObjectInputValidation, int) You can call this method only while a class is being decoded, as described in "Validating the Restored Object," later in this chapter. The int specifies a relative callback priority; typically, you just specify a priority of zero.

The Serializable interface defines no methods but is an indication that the class is compatible with serialization and may have private readObject() and writeObject() methods to control serialization.

Only the data in the objects and the declarations of the classes are encoded in the byte stream; the Java virtual machine bytecodes that implement the methods of the classes are not stored when an object is serialized. When an object is retrieved from the stream, the class declaration is read and the normal class-loading mechanisms (for example, searching through the CLASSPATH) are used to load the code. If a matching class is not found, readObject() throws ClassNotFoundException. The JDK support for persistence does not deal with the issues of code distribution (which must be addressed if you are going to build Java agents that can migrate from machine to machine).
Object References

One complication of serializing an object is the correct handling of other referenced objects. When you store an object or send it across the Internet, the object must include a copy of all the objects it references, all the objects those objects reference, and so on. The object has to include all these other objects because all these objects are part of the total state of the one object you explicitly serialized. Serializing an object that has many object references produces a larger-than-expected serialization output. Saving an apparently simple object, such as a button, may entail many kilobytes of data.

Listing 31.1 is a very simple program that creates a frame with a button. You can find the chapter31.ex1 program on the CD-ROM that accompanies this book. The classes in this program, like many classes in JDK 1.1, are already designed to be persistent. After creating the frame, you serialize the button first without an event listener (lines 25 through 29) and then with a frame as the registered action event listener (lines 31 through 36). Just the button, without an action listener, requires approximately 1,064 bytes to serialize; add the application frame as an action listener (which is a common practice in the new JDK 1.1 event model), and the size more than doubles to approximately 2,609 bytes. Because of object references, serializing the button results in serializing the complete application.

NOTE: The writeObject() method does not explicitly synchronize on the object being serialized. If you have multiple threads using the same object, and one thread can possibly be serializing an object while another thread is manipulating fields of the same object, you must take steps to be thread safe: You can either add explicit synchronization code or make a clone of the object before serialization.

Listing 31.1. chapter31.ex1: Serializing a button.

01 import java.awt.*; 02 import java.awt.event.*; 03 import java.io.*; 04 package chapter31.ex1; 05 06 public class Ex1 extends Frame 07 implements ActionListener{ 08 Button button; 09 10 public Ex1() { 11 super("Button Serialization Test"); 12 setSize(300,200); 13 button = new Button("Push"); 14 add(button); 15 setVisible(true); 16 } 17 18 public void actionPerformed(ActionEvent event) { 19 // perform indicated action 20 } 21 22 public static void main(String args[]) { 23 Ex1 test = new Ex1(); 24 try { 25 ByteArrayOutputStream bout = new ByteArrayOutputStream(); 26 ObjectOutputStream out = new ObjectOutputStream(bout); 27 out.writeObject(test.button); 28 System.out.println("Serializing just the button takes " + 29 bout.size() + " bytes"); 30 // add action listener to button 31 test.button.addActionListener(test); 32 bout = new ByteArrayOutputStream(); 33 out = new ObjectOutputStream(bout); 34 out.writeObject(test.button); 35 System.out.println("Serializing this button takes " + 36 bout.size() + " bytes"); 37 out.writeObject(test); 38 System.out.println("Serializing the button and frame takes " + 39 bout.size() + " bytes"); 40 System.exit(0); 41 } 42 catch (Exception e) { 43 e.printStackTrace(System.out); 44 } 45 }

46 }

You may have several objects that reference a single object, or you may have objects that reference each other. When these objects are serialized, you want only one copy of each unique object written. For example, in Listing 31.1, the button has a reference to the frame (its action listener), and the frame has a reference to the button. You do not want two copies of the button object in the serialized output. Or worse, you do not want an infinite loop that serializes the button which, in turn, serializes the frame which, in turn, serializes the button, and so on. This problem is solved by saving the contents of the object once and using object references inside an ObjectOutputStream. If you are interested, the section "Looking Inside Object Serialization," later in this chapter, covers the implementation details on how object references are handled.

Here's an example of how object references are encoded: In lines 37 through 39 of Listing 31.1, you can write the frame object to the same stream (without resetting the ObjectOutputStream state), and the resulting size increases by only 5 bytes. These 5 bytes are simply a reference to the frame object that is already encoded in the stream.
Making Your Objects Persistent

Although you can indirectly use object serialization by simply referencing JDK package objects (most are already serializable), or subclassing JDK package classes (a subclass of a serializable class is assumed to be serializable), you eventually have to deal directly with serialization issues when you create new classes. In the following sections, you learn how to make your classes persistent.
The Serializable and Externalizable Interfaces

A class implements Serializable or Externalizable to indicate whether or not the instances of the class can be serialized for persistence storage or for transmission over the Internet. A class that is not meaningful when removed from its execution environment and later restored does not implement either interface. Attempts to serialize this class, either directly or indirectly because of a reference from another object being serialized, throw NotSerializableException.
Suitability Tests

You test whether or not a class is suitable for persistence by considering what happens to the object if it is serialized and later restored. Generally, objects that are tied to system resources (such as process identifiers, file descriptors, network sockets, and so on) are not candidates for serialization. For example, if you serialize an open FileInputStream object, can you continue reading from the stream when the object is serialized and restored at a later time? Probably not, because the open file descriptor used by the underlying operating system is gone, and the file may even no longer exist. The FileInputStream object state includes information outside the defined instance variables and is not normally available for serialization. In this example, such external information includes the file system descriptors and the file contents.

You can attempt to save the external state of the FileInputStream by serializing the complete path name of the file, the date of last file modification, and the logical byte offset for the next read. You would throw an exception if the file was not found or had a different modification date when the FileInputStream object was restored. Such nonstandard serialization is covered in "Custom Serialization," later in this chapter. When you implement custom encoding, consider whether the semantic meaning of the class is preserved or whether you are attempting to serialize the wrong class.

In this example, instead of making the FileInputStream class persistent, we define a new class, FileIdentifier, that holds the filename, modification date, and other identifying information; this new class also has a method that creates a FileInputStream. This approach clearly indicates that the reference to the file is persistent and not the input stream. Similarly, in JDK 1.1, the URL class is persistent but the URLConnection class is not.

Another clue to potential serialization problems is native methods. Such methods often interact with external software whose state is not automatically captured by serialization.
The Externalizable Interface

The Serializable and Externalizable interfaces differ in the amount of control they give you in the serialization process and the extent of customizations you can make. The Externalizable class is a subclass of Serializable for situations in which the class requires complete control over the encoding process; only the class identification of the object being serialized is automatically written to the output stream. By implementing Externalizable, your class can control whether or not the state of superclasses is stored in the stream and exactly which fields are stored. For an Externalizable class, you must implement the following public methods:

Method Description

void readExternal(ObjectInput) The object implements the readExternal() method to restore its contents by calling the methods of DataInput for primitive types and readObject() for objects, strings, and arrays.

void writeExternal(ObjectOutput) The object implements the writeExternal() method to save its contents by calling the methods of DataOutput for its primitive values or calling the writeObject() method of ObjectOutput for objects, strings, and arrays.

The writeExternal() method obviously must encode the data of the object in a form and sequence supported by readExternal(). Unlike Serializable, the Externalizable interface does not handle code versioning automatically; you must provide your own versioning approach.

Because the state of superclasses can be indirectly manipulated by the publicly accessible readExternal() and writeExternal() methods, you must use Externalizable with extreme care so that you do not create a security problem. In most cases, you use the Serializable interface because its built-in, default object-encoding rules are suitable for most purposes, and it provides control over which fields are serialized. The remaining examples in this chapter use the Serializable interface.
Implementing Serializable: The Default Case

For many classes, you simply add implements Serializable to the class definition to use the default Java runtime serialization format to serialize the objects. Listing 31.2 (provided as chapter31.ex2 on the CD-ROM that accompanies this book) shows a simple AddressBook class that is used as the basis for the examples in the following sections. For brevity, only the essential fields and methods are defined and the javadoc comments are removed from the listing; a practical address book class would have to do much more than this limited version.

In this example, you have an AddressBook class that holds AddressEntry objects in a hash table and uses a ServerSocket (unused in this example, but perhaps it can handle network-client lookups). An AddressEntry contains a name and address string; the AddressBook lookup() method returns an AddressEntry based on a search key.
Listing 31.2. The serializable AddressBook class.

01 import java.io.*; 02 import java.net.*; 03 import java.util.*; 04 package chapter31.ex2; 05 06 class AddressEntry implements Serializable { 07 String name; 08 String address; 09 10 /** create an AddressEntry from the supplied strings */ 11 public AddressEntry(String name, String address) { 12 if ((name == null) || (address == null)) 13 throw new IllegalArgumentException(); 14 this.name = name; 15 this.address = address; 16 } 17 18 public boolean equals(AddressEntry e) { 19 return (name.equalsIgnoreCase(e.name)) && 20 (address.equalsIgnoreCase(e.address)); 21 } 22 } 23 24 class AddressBook implements Serializable { 25 Hashtable table; 26 transient ServerSocket socket; 27 28 public AddressBook() { 29 table = new Hashtable(); 30 try { 31 socket = new ServerSocket(2020); 32 } 33 catch (IOException e) { 34 socket = null; 35 } 36 } 37 38 public AddressEntry lookup(String key) { 39 return (AddressEntry) table.get(key); 40 } 41 42 public AddressEntry add(String key, AddressEntry entry) { 43 return (AddressEntry) table.put(key, entry); 44 } 45 46 public int size() { 47 return table.size(); 48 } 49 50 public boolean equals(AddressBook b) { 51 if ((b == null) || (size() != b.size())) 52 return false; 53 Enumeration keys = table.keys(); 54 while (keys.hasMoreElements()) { 55 String key = (String)keys.nextElement(); 56 AddressEntry mine = lookup(key); 57 AddressEntry other = b.lookup(key); 58 if (!mine.equals(other)) 59 return false; 60 } 61 return true; 62 } 63 } 64 65 public class Ex2 { 66 public static void main(String args[]) { 67 String fname = "addrbook2.out"; 68 AddressEntry dave = new AddressEntry("Dave", "Main Street"); 69 AddressEntry tom = new AddressEntry("Tom", "1st Street"); 70 AddressEntry bill = new AddressEntry("Bill", "Downtown"); 71 72 AddressBook addr = new AddressBook(); 73 addr.add("Dave", dave); 74 addr.add("Tom", tom); 75 addr.add("Bill", bill); 76 addr.add("SysAdmin", bill); 77 78 try { 79 FileOutputStream fout = new FileOutputStream(fname); 80 ObjectOutputStream out = new ObjectOutputStream(fout); 81 out.writeObject(addr); 82 out.close(); 83 84 FileInputStream fin = new FileInputStream(fname); 85 ObjectInputStream in = new ObjectInputStream(fin); 86 AddressBook copy = (AddressBook) in.readObject(); 87 if (copy.lookup("Bill") != copy.lookup("SysAdmin")) 88 System.out.println("Multiple keys to object not restored"); 89 if (addr.equals(copy)) 90 System.out.println("Objects are equal"); 91 else 92 System.out.println("Objects are different"); 93 } 94 catch (Exception e) { 95 e.printStackTrace(System.out); 96 } 97 } 98 }

Compared to a nonpersistent JDK 1.0.2 implementation, you make only three changes to enable the AddressBook to be persistent. First, in line 24, you declare the AddressBook class to be serializable using the following declaration:

24 class AddressBook implements Serializable {

This declaration results in the storing of every field in the object. This is exactly the behavior you want for the hash table because it holds the address-book information and is defined in the JDK as being serializable. However, because the ServerSocket is not serializable, your second change is to declare the ServerSocket as transient in line 26:

26 transient ServerSocket socket;

The transient modifier instructs the serialization routines to not serialize this field. Fields declared as static are also not serialized. When you serialize the hash table, each key and entry is serialized and these classes must implement Serializable or Externalizable. Your third change is in line 6, where you declare the AddressEntry to be serializable as follows:

06 class AddressEntry implements Serializable {

Because the strings stored in an AddressEntry object are themselves serializable, no further changes are necessary.

In the main() method, you perform a simple test of persistence by creating a small address book, storing it, restoring a copy, and verifying that the saved and restored object are equal. Here, equals means that the two objects have the same search keys that retrieve AddressEntry objects that contain equal name and address strings.

To check that object references are handled correctly, you define two search keys ("Bill" and "SysAdmin") that refer to the same AddressEntry object. Because the definition of equals does not detect this subtle difference, you add the following explicit test:

87 if (copy.lookup("Bill") != copy.lookup("SysAdmin")) 88 System.out.println("Multiple keys to object not restored");

Running this example, you see that both tests pass; the saved and restored objects are identical. In the next section, you implement custom encoding and explore the division of effort between the object being serialized and the ObjectOutputStream.
Setting Up Custom Serialization

In most cases, the use of the transient modifier is sufficient to control the encoding of your class. However, there are times when you want even more control to better deal with versioning issues or to produce a more compact representation. In these situations, you add both a private writeObject() and a private readObject() method to your class definition, defined exactly as follows:

private void readObject(ObjectInputStream) throws IOException,ClassNotFoundException

private void writeObject(ObjectOutputStream) throws IOException

Because the ObjectOutputStream implements the DataOutput methods, and ObjectInputStream implements the DataInput methods, you use methods such as writeInt() to store the contents of fields. This arrangement writes the data in a platform-independent manner.

Listing 31.3 (provided as chapter31.ex3 on the CD-ROM that accompanies this book) shows the readObject() and writeObject() methods you use for custom encoding of the AddressBook class. These methods serialize only the search key and AddressEntry from the address book rather than serializing the complete AddressBook, including the hash table. By serializing just the data from the hash table table rather than the hash table object, you can easily switch to a different internal data storage technique (such as binary trees) in later versions of the class. This representation is slightly smaller (105 bytes) because the class definition for the hash table is not included in the serialized output.

NOTE: Because the program examples have similar code, the source code listings in this chapter have been condensed to highlight only the changed classes or methods. The full source code for each example is included on the CD-ROM that accompanies this book.

Listing 31.3. Customizing the serialization format.

01 /* the same AddressEntry class is used as defined in listing 31.2 */ 02 03 class AddressBook implements Serializable { 04 // do not automatically serialized the hashtable 05 transient Hashtable table; 06 07 /* this class has the same constructor, lookup(), add(), 08 size() and equals() methods as listing 31.2 */ 09 10 /** override writeObject to provide a custom serial format 11 * for an AddressBook */ 12 private void writeObject(ObjectOutputStream out) 13 throws IOException { 14 out.writeInt(table.size()); 15 Enumeration enum = (Enumeration) table.keys(); 16 while (enum.hasMoreElements()) { 17 String key = (String)enum.nextElement(); 18 AddressEntry entry = (AddressEntry)table.get(key); 19 out.writeObject(key); 20 out.writeObject(entry); 21 } 22 } 24 25 /** override readObject to restore an AddressBook using 26 * customized format. At this point, our instance has been 27 * created but not instance variables not initialized. */ 28 private void readObject(ObjectInputStream in) 29 throws IOException, ClassNotFoundException { 30 table = new Hashtable(); 31 for (int count = in.readInt(); count > 0; count--) { 32 String key = (String)in.readObject(); 33 AddressEntry entry = (AddressEntry)in.readObject(); 34 if ((entry.name == null) || (entry.address == null)) 35 throw new InvalidObjectException 36 ("name and address fields must be non-null"); 37 table.put(key, entry); 38 } 39 } 40 }

In this example, you check for non-null AddressEntry fields in the readObject() method. In the next section, you add object validation to this example to ensure that the restored address book passes the same restrictions as those you impose in the constructor.
Validating the Restored Object

If your program logic requires certain conditions to operate correctly (such as a given field being non-null), you commonly place such checks in the constructor so that an invalid object cannot be created. But because restoring a Serializable object does not invoke the constructor, such checks must be performed in another method. The readObject() method uses a registration and callback scheme to an object that implements the ObjectInputValidation interface.

A class that implements ObjectInputValidation must define the validateObject() method. This method is called to validate the object just before ObjectInputStream.readObject() returns. This timing ensures that all object references are correct.

Listing 31.4 (provided as chapter31.ex4 on the CD-ROM) shows the addition of input object validation to the AddressBook example. In this case, you require that the name and address fields of the AddressEntry be non-null and then you check for that requirement in the constructor. However, a different version of AddressEntry--either older or newer--may not include such a check and can potentially produce persistent objects with null name and address fields.
Listing 31.4. Validating deserialized objects.

01 /* the same AddressEntry class is used as defined in listing 31.2 */ 02 03 class AddressBook implements Serializable { 04 // do not automatically serialized the hashtable 05 transient Hashtable table; 06 07 /* this class has the same constructor, lookup(), add(), 08 size() and equals() methods as listing 31.2 */ 09 10 /** must override writeObject() if you also override readObject(). 11 * Just do the normal serialization processing. */ 12 private void writeObject(ObjectOutputStream out) 13 throws IOException { 14 out.defaultWriteObject(); 15 } 16 17 /** override readObject for the sole purpose of registering a 18 * validation callback. This method is called after the complete 19 * object graph has been reconstructed. Otherwise, do the 20 * normal deserialization processing. */ 21 private void readObject(ObjectInputStream in) 22 throws IOException, ClassNotFoundException { 23 // register validation callback and read in object 24 in.registerValidation(this, 0); 25 in.defaultReadObject(); 26 } 27 28 /** validateObject is called after the root object has been 29 * reconstructed to validate the entire contents of the 30 * AddressBook, rather than register individual callbacks 31 * for each AddressEntry instance. */ 32 public void validateObject() throws InvalidObjectException { 33 // ensure every entry has a both name and address 34 Enumeration enum = (Enumeration) table.elements(); 35 while (enum.hasMoreElements()) { 36 AddressEntry entry = (AddressEntry)enum.nextElement(); 37 if ((entry.name == null) || (entry.address == null)) 38 throw new InvalidObjectException 39 ("name and address fields must be non-null"); 40 } 41 System.out.println("validation passed"); 42 } 43 }

You can call registerValidation() only while an object is being deserialized; the method throws NotActiveException if called at other times. To get this timing, you override readObject() for the sole purpose of registering the callback and then call defaultReadObject() to use the standard serialization format. Right before the AddressBook object is returned, the validation methods are called. Registering a single validation method for the AddressBook is much more efficient than registering a callback for each AddressEntry.

NOTE: So that your writeObject() and readObject() methods can be called, you must define both methods, even if, as in the case of registering a validation object, you need only one.

In the next section, you learn how to define your classes to support interoperation of objects between different versions of your class, such as between a version that has validation and one that does not, or a version that adds new fields.
Supporting Class Versioning

When you store an object in an ObjectOutputStream, the serialVersionUID of the class is written to the stream to ensure that the correct class is found when the object is restored. The serialVersionUID is a hash computed over various attributes of the class, including the class name, implemented interface names, field names, and method names. When the object is restored, the class is loaded by name and then verified to be identical by comparing the serialVersionUID of the class from the stream with the serialVersionUID computed for the class just loaded. If you add fields, interfaces, or methods to a class definition, the serialVersionUID changes and the new class is assumed to be incompatible with the old. This strict class compatibility matching poses a problem; a user's most important investment is his or her data, which becomes unreadable when even the most trivial changes are made to the class definition.

To provide for interoperation between different versions of the same class, you must explicitly declare the serialVersionUID of the class you are compatible with and follow some simple rules in creating the new version:

Do not delete a field or change it to be transient or static

Do not change a field's built-in type (for example, do not change ashort to an int)

Do not change a class from Serializable to Externalizable or vice versa

You can add fields, add methods, change access modifiers, and implement new interfaces. However, in all cases, you must deal with the consequences of class change, such as missing fields when restoring an object saved by an earlier version of the class.

Listing 31.5 (provided as chapter31.ex5 on the CD-ROM) shows a new version of the AddressEntry class that adds a new interface (line 6), adds new fields (lines 3 and 5), makes some fields protected (line 4), and adds new methods (lines 36 through 42). Normally, this new definition would be incompatible with the old one shown in Listing 31.2, and you could not restore the AddressBook saved by .ex2 using the .ex5 application.

To make this new class compatible with the AddressEntry class shown in Listing 31.2, you compute the serialVersionUID of the old class and indicate that your new class is compatible by including the following statement:

09 static final long serialVersionUID = -2357486172207358492L;

You obtain this serialVersionUID using the serialver tool included in JDK 1.1.

Because the earlier version of AddressEntry did not have a phone field, you add logic (lines 22 through 24) to deal with the missing field. In this case, you do not check the phone field for earlier versions when deciding whether two AddressEntry objects are equal. Fields that are missing from the ObjectInputStream are set to a type-specific default value. The main() method of this example (included in the CD-ROM source) tests version interoperability by restoring objects from the output of the program in Listing 31.5.
Listing 31.5. Interoperating with an older program version.

01 class AddressEntry implements Serializable, Cloneable { 02 String name; 03 protected int version; 04 protected String address; 05 protected String phone; 06 07 // indicate we are compatible with earlier name/address only 08 // version of AddressEntry defined in listing 31.2 09 static final long serialVersionUID = -2357486172207358492L; 10 11 /** create an AddressEntry from the supplied strings */ 12 public AddressEntry(String name, String address, String phone) { 13 if ((name == null) || (address == null)) 14 throw new IllegalArgumentException(); 15 this.name = name; 16 this.address = address; 17 this.phone = phone; 18 this.version = 1; 19 } 20 21 public boolean equals(AddressEntry e) { 22 if (version != e.version) 23 return (name.equalsIgnoreCase(e.name)) && 24 (address.equalsIgnoreCase(e.address)); 25 26 if (phone == null) 27 return (name.equalsIgnoreCase(e.name)) && 28 (address.equalsIgnoreCase(e.address)) && 29 (e.phone == null); 30 else 31 return (name.equalsIgnoreCase(e.name)) && 32 (address.equalsIgnoreCase(e.address)) && 33 (phone.equalsIgnoreCase(e.phone)); 34 } 35 36 public String getAddress() { 37 return address; 38 } 39 40 public String getPhone() { 41 return phone; 42 } 43 }

This completes the explanation of the commonly used functions of JDK 1.1 object serialization. In the next section, you take a look inside the implementation of serialization.
Looking Inside Object Serialization

In the next few sections, you briefly look inside the implementation of serialization in JDK 1.1 to understand the mechanism and get hints about how to reduce the size of the serialized output. You can skip these sections if you are not concerned about performance.
Comparison of Encoding Sizes

As a baseline for your understanding of the additional overhead incurred by object persistence as opposed to just data persistence, Listing 31.6 provides an example. In Listing 31.6 (provided as chapter31.ex6 on the CD-ROM that accompanies this book), you implement a manual encoding of the AddressBook objects by writing fields using the DataOutputStream format. In hand-coding the save() and restore() methods, you save the contents of the entire address book by enumerating the keys of the hash table and saving each AddressEntry as two UTF-encoded strings. Although the data representation used by DataOutputStream can be further compacted, this is the format used by object serialization. For example, an int always takes four bytes--even if its value can be represented in a single byte. JDK serialization uses a fixed-size encoding technique instead of a variable-size one; the resulting output is slightly larger but encodes and decodes slightly faster. Running this example produced a serialized output of 95 bytes.

NOTE: UTF (Unicode Transfer Format) is a compressed way of exchanging or storing a Unicode string. The UTF form consists of a 16-bit length and the lower byte of each Unicode character in a given plane (that is, with the same high byte). Planes are changed using escape sequences. For the English (or Latin-1) character set, UTF encoding is essentially an ASCII string with a two-byte length.

Listing 31.6. The AddressBook example using save() and restore().

01 class AddressEntry { 02 String name; 03 String address; 04 05 /** create an AddressEntry from the information in the file */ 06 public AddressEntry(DataInputStream in) throws IOException { 07 name = in.readUTF(); 08 address = in.readUTF(); 09 } 10 11 /** writes the contents of the entry to the output stream */ 12 void save(DataOutputStream out) throws IOException { 13 out.writeUTF(name); 14 out.writeUTF(address); 15 } 16 } 17 18 class AddressBook { 19 Hashtable table; 20 21 /* same constructor, lookup(), add(), and size() methods 22 as defined in listing 31.2 */ 23 24 public void save(String fname) throws IOException { 25 FileOutputStream fout = new FileOutputStream(fname); 26 DataOutputStream out = new DataOutputStream(fout); 27 28 Enumeration enum = (Enumeration) table.keys(); 29 while (enum.hasMoreElements()) { 30 String key = (String)enum.nextElement(); 31 AddressEntry entry = (AddressEntry)table.get(key); 32 out.writeUTF(key); 33 entry.save(out); 34 } 35 out.close(); 36 } 37 38 void restore(String fname) throws IOException { 39 FileInputStream fin = new FileInputStream(fname); 40 DataInputStream in = new DataInputStream(fin); 41 42 while (fin.available() > 0) { 43 String key = in.readUTF(); 44 AddressEntry entry = new AddressEntry(in); 45 table.put(key, entry); 46 } 47 } 48 }

Listing 31.2, earlier in this chapter, implemented the Serializable interface and used the standard encoding format. When you ran that example, you produced a serialized output of 333 bytes--which is 238 bytes larger than the same information content produced by the code in Listing 31.6. These extra bytes are used to convey type information. Encoded in the output stream is a complete description of the nonstatic, nontransient variables of all the classes serialized. This description includes the variable name and type. Listing the variables ensures that values are correctly restored even when new fields are added to a later version of the class. This is an important mechanism that supports version interoperability (as previously described). Listing 31.3 provided nonstandard serialization that did not include the hash table; running that example produced a serialized output of 228 bytes--133 bytes larger than the program in Listing 31.6 but 105 bytes smaller than the program in Listing 31.2. These extra 105 bytes were used to encode the hash table's class description and internal fields.

One subtle difference between the hand-coding approach and the Serializable approach is the object that controls the encoding. In Listing 31.6, you call the save() method of the AddressBook class and pass the stream as a parameter; in this case, the AddressBook and AddressEntry classes completely control the encoding. In Listing 31.2, you call the writeObject() method of the ObjectOutputStream and pass the AddressBook as a parameter; in that case, the stream controls the encoding. If you have to change the object encoding format rather than simply changing which fields are serialized, it is much easier to change the writeObject() method of the ObjectOutputStream than change the save() method of every persistent class.
Object Encoding Format

Variables that are primitive data types (such as int, long, and boolean) and some fundamental classes (such as String and Throwable) are identified by a one-byte flag. For example, this field declaration is encoded as an I followed by the UTF encoding of the field name string size:

int size;

Table 31.3 lists the flag bytes commonly used in encoding.
Table 31.3. Common encoding flag-byte values.

Code Description Code Description

`B' byte variable 0x70 Null (TC_NULL)

`C' char variable 0x71 reference (TC_REFERENCE)

`D' double variable 0x72 class descriptor (TC_CLASSDESC)

`F' float variable 0x73 Object (TC_OBJECT)

`I' int variable 0x74 String (TC_STRING)

`J' long variable 0x75 array (TC_ARRAY)

`S' short variable 0x76 class (TC_CLASS)

`Z' boolean variable 0x77 block data start (TC_BLOCKDATA)

0x78 block data end (TC_ENDBLOCKDATA)

Encoding Object References

The chapter31.ex6 example program in Listing 31.6 also shows one of the most common problems with hand-coded serialization routines: proper handling of object references. In this example, two names reference the single AddressEntry for Bill. The test code checks that a single AddressEntry is created. As coded, the test fails because two copies of Bill's AddressEntry will be created.

Rather than serializing an object's value (as is done in the chapter31.ex6 program), the JDK serialization methods serialize an object reference and enough information to recreate the instance. Each time writeObject() serializes a field that contains an object reference, the reference is first looked up in a hash table maintained for each ObjectOutputStream. If this instance has already been serialized to the stream, an entry is found in the table and a TC_REFERENCE flag and reference number is written to the stream. If no entry is found, a reference number is assigned to the object, an entry is made in the hash table, and writeObject() starts outputting the fields of this object. The result is that each unique object is serialized only once to a given ObjectOutputStream.

This completes a brief description of the internal mechanisms. In the next section, you learn about new research that aims to make object persistence even easier to use.
Introducing Persistent Stores

The persistence mechanisms specified for JDK 1.1 require you to explicitly manage the saving and restoring of objects. But just as databases can hide the details of storing, organizing, and retrieving data, a persistent store can automate the handling of persistent objects. An example of such a system is the University of Glasgow's PJava project. The following sections provide a very brief introduction to the interesting topic of persistent object stores. More information about persistent Java research projects in general, and about PJava in particular, can be found at ht.//www.dcs.gla.ac.uk/pjava.
Persistent Stores and Relational Databases

By far the most common type of client/server database system is the relational database (some examples are Oracle, Informix, Sybase, and DB2). Relational databases are organized in tabular data structures: tables, columns, and rows. Data from different tables can be joined to create new ways of looking at the data.

Relational databases, with their tabular data structures, do not mesh well with object-oriented programming languages for the following reasons:

Relational data structures do not provide for class encapsulation. Java programmers are encouraged to model their domain by using classes, providing an API, and hiding all data within the class. Relational structures expose all data and do not allow encapsulation by an API.

Because a Java class is a data type, it may be difficult or impossible to model efficiently in a relational structure. Examples include multidimensional arrays, dictionaries, and object references.

It is difficult to represent class inheritance in a relational database. Although it is possible, deep class inheritance trees can result in n-way joins on the database server, causing poor performance.

Tools that attempt to solve the object and relational mismatch are available. These tools map relational data structures into object-oriented classes using relatively simple rules (for example, they map tables to classes, columns to attributes, and foreign key attributes to object relationships). Although some of these products have been successful, they often suffer from performance issues--particularly when complex navigation is performed through the mapped data structures. Additionally, these products limit the type expressiveness of the language because not all the data types expressible in the object-oriented language can be easily expressed in a relational database.

Persistent stores are different from relational databases. Persistent stores do the following:

Eliminate the use of relational data structures; instead, they store whole objects directly in the database (such as a flat-file database)

Enable the programmer to write classes in a normal, object-oriented fashion to represent data that will be made persistent

Enable the programmer to take advantage of more data types than is possible when using a relational database

Provide a simpler interface than a relational database interface

Creating and Using Persistent Store Objects

Different persistent storage interfaces have different methods for creating persistent objects (or for making existing objects persistent). Some interfaces require the programmer to specify whether an object is to be persistent at the time the object is created. Other persistent stores implement a concept referred to as persistent roots. Persistent root objects are explicitly identified as objects that are persistent; any object referred to by the persistent root is also considered persistent. All objects that can be reached from the persistent root are also considered to be persistent and are saved in the persistent store. This concept is called persistence via reachability.

Retrieving objects from a persistent store is significantly different from retrieving data through SQL. When using SQL, the programmer must explicitly request data (using SELECT statements). With persistent stores, programmers seldom make explicit queries for objects. Persistent stores usually provide a mechanism to request only "top-level" objects, either through direct query or through a request for a particular persistent root.

Persistent storage interfaces almost universally employ a process known as swizzling (or object faulting) to retrieve objects from the database. Objects are retrieved on the fly, as they are needed. After obtaining a reference to a top-level object, programmers normally use that object to access related objects. When attempting to access an object that has not yet been retrieved from the database, the object is swizzled in. The attempt to access the object is trapped by the database interface, which then retrieves the object's storage block from the database, restores the object, and then allows the object access to continue.

Finally, persistent stores usually have a mechanism to identify objects uniquely: the object ID. Every object in a persistent store is assigned its own unique object ID, which can be used to differentiate objects of the same class whose values are equal.
The PJava Project

The stated goal of the PJava project is to provide orthogonal persistence in Java; that is, to create a persistent storage mechanism that can store objects of any type. Any object, without respect to type, can be made persistent. Many persistent stores and object databases do not support orthogonal persistence because it is extremely hard to implement in most programming languages.

Although the PJava project makes no changes to the Java language, the project's approach centers around a specially modified Java virtual machine that can interact with the persistent store to save and retrieve objects on an as-needed basis. Because of the special virtual machine modifications, PJava remains a research activity with no current plan for being commercially introduced.
Summary

A standard mechanism for object persistence was a significant omission from JDK 1.0 that has been partially corrected in JDK 1.1. As you have seen, use of the serialization facilities is easy and straightforward in most cases. In this chapter, you learned how to provide custom serialization and version interoperability.

However, some burden still falls on the programmer to manage the saving and restoring of objects. Ongoing research on projects such as PJava shows how even that burden can eventually be reduced by using persistent object stores.

©Copyright, Macmillan Computer Publishing. All rights reserved.

Java 1.1 Unleashed

- 31 - Persistence and Java Serialization

Introducing Object Persistence

Extending an Object's Lifetime

What Does JDK 1.1 Support?

Using Java Object Serialization

The Serialization API

Table 31.1. Commonly used ObjectOutputStream constructor and methods.

Table 31.2. Commonly used ObjectInputStream constructor and methods.

Object References

Listing 31.1. chapter31.ex1: Serializing a button.

Making Your Objects Persistent

The Serializable and Externalizable Interfaces

Suitability Tests

The Externalizable Interface

Implementing Serializable: The Default Case

Listing 31.2. The serializable AddressBook class.

Setting Up Custom Serialization

Listing 31.3. Customizing the serialization format.

Validating the Restored Object

Listing 31.4. Validating deserialized objects.

Supporting Class Versioning

Listing 31.5. Interoperating with an older program version.

Looking Inside Object Serialization

Comparison of Encoding Sizes

Listing 31.6. The AddressBook example using save() and restore().

Object Encoding Format

Table 31.3. Common encoding flag-byte values.

Encoding Object References

Introducing Persistent Stores

Persistent Stores and Relational Databases

Creating and Using Persistent Store Objects

The PJava Project

Summary

- 31 -
Persistence and Java Serialization