Object Persistence & Distribution - Online Article

If you're developing Internet or intranet software, two of the most interesting new features of the final JDK 1.1 release are object serialization and the new Remote Methods Interface (RMI). Developers will soon be able to use these two techniques to let objects on one Java1 Virtual Machine invoke methods on objects in another--a real boon to anyone developing applications for multiplatform, distributed computing. They'll also make it easy for one applet or Java application to communicate with others, whether the other code is running on a machine down the hall or on one across an ocean, connected only by the Internet. Finally, serialization will let you store and retrieve objects, and read them to and write them from streams as easily as numbers and characters.

This article is the first of a multipart series exploring these two new JDK 1.1 features. Based on the idea that you have to walk before you can run, this article explains the basics of serialization and how you can use it to handle objects in a distributed environment.

Who Needs Serialization ?

In most applications, data persistence is handled either by using text files or commercial databases, depending on the complexity of the application, and the availability of budgetary and programmer resources. For simple applications, text files can work well. They are flexible, relatively straightforward to work with, and not limited to use by one program. However, text files are not object friendly. When the file format becomes more complex than a simple table, or a parameter list--which is often the case for object-oriented applications--the code for managing such files can become unwieldy, and consume valuable programmer time.

At the other end of the spectrum, relational and object databases work well for programs that require the special features that databases offer: transactions with rollback, record locking, indexing and the like. But they are generally expensive, can be difficult to manage, and are often overkill. Many project managers tend to equate persistence with databases: If a design requirement states that data must be saved, then it is often assumed that a database must be used. In many cases, all that is needed is an object-oriented file format that is well-integrated with the programming environment.

A similar situation exists in the less frequented world of distributed programming. Sockets are flexible and easy to use, much like files, but have the same problems when transmitting complex formatted data. Distributed object middleware based on CORBA has facilities for transmitting objects, but is a rather expensive solution.

Java object serialization provides a great medium-weight solution for saving objects to files and sending them over a network. Even for large projects that do use commercial databases or communications middleware, it can still be used as a valuable file format for auxiliary files or miscellaneous communication. In addition, the Java Remote Method Invocation and JavaBeans APIs both use object serialization for storing and communicating with objects. So in any Java application involving persistence or distribution, object serialization can be a powerful programming tool.

The design of object serialization allows for most common cases to be handled easily, but there are also many features that allow it to be scaled up to handle complex tasks. This article focuses on those aspects that will be most commonly used. There are two examples to show how objects can be easily saved to files, and a third showing how to use sockets to send objects over a network.

For a complete reference on serialization, particularly for those who want to get into the guts of how serialization is implemented, or explore its outer reaches, refer to the Java Object Serialization Specification.

How Does Serialization Work ?

Java object serialization allows objects to be easily written to, and read from streams, such as file streams or socket data streams. This gives programmers a quick way to save individual objects, or large structures of objects to files, or send them over network connections.

From the programmer's perspective, most of the work is done automatically. The serialization mechanism keeps track of the types of objects, the references between them, and many details about how data is stored. The serialization API is structured so that the most common cases can be handled very simply, while allowing for incremental increases in customization for more complex cases.

Example 1: How to Write Objects to a Stream


The best way to understand the concepts behind serialization is to see an example at work.

For many cases, all an object will need to do to become serializable, is to add implements Serializable to its class declaration. Here is a simple tree node that does so:

class TreeNode implements Serializable {
  Vector children;
  TreeNode parent;
  String name;
  public TreeNode(String s) {
  children = new Vector(5);
  name = s;
  }
  public void addChild(TreeNode n) {
  children.addElement(n);
  n.parent = this;
  }
  public String toString() {
  Enumeration e = children.elements();
  StringBuffer buff = new StringBuffer(100);
  buff.append("[ " + name + " : ");
  while(e.hasMoreElements()) {
  buff.append(e.nextElement().toString());
  }
  buff.append(" ] ");
  return buff.toString();
  }
}


The Serializable interface has no methods that must be implemented; however any data fields of a class that implements this interface must also be serializable. The references to other TreeNode objects, such as the one held by the parent field, or stored as one of the children, are serializable by definition. All basic types and objects in the JDK 1.1 such as, ints, Strings, Arrays, Vectors or Hashtables are serializable. Thus the Vector, String, and Date data fields in the TreeNode class are all serializable as part of the JDK 1.1.

Here is a snippet of code to build a tree of TreeNodes:

  TreeNode top = new TreeNode("top");
  top.addChild(new TreeNode("left child"));
  top.addChild(new TreeNode("right child"));

The work of writing an object to a stream is done by the ObjectOutputStream class, which implements the generic ObjectOutput interface. This interface is an extension of the DataOutput interface that adds the capability of writing objects, as well as basic types like strings and numbers.

An ObjectOutputStream is created on top of some other OutputStream, such as a file stream or a socket. Here is an example of creating one from a file stream:

FileOutputStream fOut = 
  new FileOutputStream("test.out");
ObjectOutput out = new ObjectOutputStream(fOut);

The programmer can then use the writeObject method to write an object onto the stream:

     out.writeObject(top);

Flushing and closing the streams completes the job:

     out.flush();
  out.close();

The ObjectOutputStream writes the given object to the stream, as well as any other objects reachable from that object. Here, that means the entire tree is written. When writing an object to a stream, the ObjectOutputStream keeps track of references between objects, so that complex data structures are maintained. The ObjectOutputStream also keeps track of the types of objects, so that they can be read back from the stream correctly.

How to Read Objects from a Stream

To read an object from a stream, use the ObjectInputStream class, to implement the ObjectInput interface. Just as with ObjectOutputStream, an ObjectInputStream is created from an InputStream. The readObject method is used to read an object from a stream:

FileInputStream fIn = new FileInputStream("test.out");
ObjectInputStream in = new ObjectInputStream(fIn);
TreeNode n = (TreeNode)in.readObject();

For each object that is detected in the stream, a new one is created in memory and its data fields filled in with data from the stream. This includes restoring references between the objects stored in the stream.

Note that readObject returns an Object that must be cast to a TreeNode before it can be used as a TreeNode. How do you know what the real type of the object is? Well, the answer is either you just know, such as the case here, or you can query the Object returned for more information about itself, using the getClass method, or by using some of the new reflection capabilities in the JDK 1.1.

Options for Custom Streaming

With the Serializable interface, most of the work of serializing objects is done automatically, however the programmer has several options for customization:

The transient keyword can be used when defining data fields to keep those fields from being written to a stream. When an object is read from a stream, transient data fields are set to their default values, such as 0 for integers, and null for strings. The programmer can restore transient data by implementing a readObject method, this is described below.

An object can get further control over its serialization by implementing writeObject and readObject methods. This can be useful for writing a customized version of the object, or simply writing additional data to a stream.

When ObjectOutputStream writes an object to a stream, it looks for a writeObject method in the object. If there is one, the ObjectOutputStream uses that to write the object to the stream. An object that implements writeObject can use the defaultWriteObject method of ObjectOutputStream to write the default representation out, then can use other ObjectOutput methods such as writeChar or writeDouble to write more data out.

On the reader side of things, the programmer can implement a readObject method to gain control over how an object is read from a stream. This method is called on a new object just after it has been instantiated in memory, but before its data has been read from the stream. Similarly to defaultWriteObject, the defaultReadObject method of ObjectInputStream can be used to read in the default representation of the object.

In both cases, these methods are only responsible for writing and reading data for the object in which they are defined, not for subclasses and superclasses.

Object Validation

An object that implements the method readObject can register itself with the ObjectInputStream to have a validation method called after the entire object graph has been read from a stream. Only the method readObjectcan register a validation method. For the first example, this would mean that the validation method would be called after the entire tree has been reconstructed in memory, but before the call to ObjectInputStream.readObject has returned. The validation method is defined in the ObjectInputValidation interface with a simple validateObject method. This allows an object to do extra work, such as checking for internal consistency errors between objects, cannot be done inside the readObject method. Example 2, below, shows you how to do this.

The Externalizable Interface

With the Externalizable interface, the programmer takes full responsibility for reading and writing the object from a stream, including subclass and superclass data. This allows for complete flexibility, such as when a data format has already been defined, or the programmer has a specific format in mind. It also requires more programming, which is beyond the scope of this article, but might be an interesting topic for a future article.

Example 2: Using Validation

In this example, two transient data fields, timeStamp and magicNumber, are added to the TreeNode. The timeStamp holds the date when the object is instantiated. It will be updated whenever the object is read in from a stream, so there is no point in making it persistent. The magicNumber keeps track of the number of times the object has been read in from a stream, and is used to demonstrate how to read and write extra data to an object stream along with an object's default data. In addition, the validation callback is used to validate the object references in the tree.

When an object is written to a stream and then read back, the object returned by ObjectInputStream.readObject is not the same object that was originally written, but a new clone that is built from the data written to the stream. For this reason, identity issues become more complex when objects are in a persistent or distributed environment, and have to be considered in the design of the program.

The magicNumber field in this example indicates the generation of a TreeNode clone being read in. If the magicNumber of a TreeNode is zero, then the it was instantiated using the new operator. If it is one or greater, then it has been read in from an object stream.

The following code snippet shows the declarations for the two new fields, as well as the addition of the implements ObjectInputValidation clause, and the updated constructor.

class TreeNode 
  implements Serializable, ObjectInputValidation {
  ...
  /* the timeStamp will mark when this object is
  instantiated or read from a stream */
  transient Date timeStamp;
  /* the magic number will keep track of how a
  node has been read in from a stream */
  transient int magicNumber = 0;

  public TreeNode(String s) {
  ...
  timeStamp = new Date();
  }
  ...
}

Next is the implementation of the validateObject method. This method checks the structure of the tree by checking that each child of a tree node has the same node as its parent. In practice, you can use validation methods to determine if any errors have occurred in the stream, or to handle versioning issues (more about that later).

// check that each child has
  // this object as its parent
  public void validateObject()
  throws InvalidObjectException {
  Enumeration e = children.elements();
  while(e.hasMoreElements()) {
  if(((TreeNode)e.nextElement()).parent != this)
  throw new InvalidObjectException(
  getClass().getName());
  }
  }

The writeObject and readObject methods complete the additions to the code:

private void writeObject(ObjectOutputStream stream)
  throws IOException {
try {
  stream.defaultWriteObject();
  stream.writeInt(magicNumber);
  } catch(Exception e) { System.out.println(e); }
  }
  private void readObject(ObjectInputStream stream)
  throws IOException {
  try {
  stream.registerValidation(this,0);
  stream.defaultReadObject();
  magicNumber = stream.readInt() + 1;
  } catch(Exception e) { System.out.println(e); }
  }

In the writeObject method, the defaultWriteObject method is used to write out the default representation of the object. After that, the magic number is written to the object stream. This shows how to gain control over how an object is written to the stream. Because an ObjectOutputStream implements the DataOutput interface, regular data types such as Strings and ints can be written out along with objects.

The implementation of the readObject method uses the ObjectInputStream.registerValidation method to signal a callback to the TreeNode's validateObject call. The first argument to the call (in this case the TreeNode being read in) is the object the validateObject method is called on. The second argument is a priority code used to order multiple validation callbacks. Callbacks with a higher priority are called first.

After the callback is registered, the defaultReadObject method is used to read in the TreeNode object, then the magicNumber is read in as an int, incremented, and assigned to the magicNumber data field.

Here is a link to the source of the completed program. It is an application that creates a tree, prints it to System.out, writes it to a file, reads it back in from the file, and then prints it to System.out again.

Versioning Issues

One of the more difficult problems that any object-oriented persistence mechanism must deal with is versioning objects. Invariably, the class definitions for all objects change over time, and this means that at some point, the serialization mechanism may have to read in an object whose structure is out of date compared with the current version of the class it belongs to.

The serialization specification, which can be found here, outlines the numerous cases that can occur, and though large-scale changes to an object, or changes in its location in an object hierarchy, require the programmer to deal with converting out-of-date objects manually, some cases can be handled automatically, or nearly so.

For example, one common case is when new data fields have been added to a class. In this case the new object will be created, the data from the old version of the object read into the appropriate data field, and the new fields are set to their default values. The class can do further initialization by implementing a readObject method or using a validation callback.

Using Sockets to Distribute Objects

Because the DataInput and DataOutput streams of a socket look just like file streams, ObjectOutputStream and ObjectInputStream can write and read objects to them just as easily. This example takes the tree from the first example, and shows how to send it from a client application to a server using a socket connection. The server then adds a new node to the tree and sends it back to the client.

Using object serialization over a socket connection allows two Java applications to easily communicate with each other in terms of objects, rather than characters, making it easier to get a distributed application up and running. The communication link could also be between an applet and a server application running concurrently with a web server, greatly extending the capabilities of the applet.

For convenience, this example assumes both client and server are on the same machine. The code can be easily modified to work with programs located on different machines.

The client starts communication by opening a socket to the server, the host name and port number having been set earlier in the program.

    client = new Socket(host, port);

Next the client constructs a tree, as in the first example, then writes the tree to the output stream of the socket.

TreeNode top = new TreeNode("top");
  top.addChild(new TreeNode("left child"));
  top.addChild(new TreeNode("right child"));
  ObjectOutput out =
  new ObjectOutputStream(client.getOutputStream());
  out.writeObject(top);
  out.flush();

In the server program, after a socket connection has been accepted, and input and output streams are established, the tree is read in from the input, a new node is added, and the tree is written back out again:

socket = server.accept();
out = new ObjectOutputStream(socket.getOutputStream());
in = new ObjectInputStream(socket.getInputStream());
TreeNode n = (TreeNode)in.readObject();
  n.addChild(new TreeNode("server node"));
  out.writeObject(n);
  out.flush();

Back on the client side, the tree reflected from the server is read back in from the socket, and printed to System.out.

ObjectInputStream in = 
  new ObjectInputStream(client.getInputStream());
  TreeNode n = (TreeNode)in.readObject();
  System.out.println("read tree: \n" + n.toString());

Security Issues

One aspect of serialization that requires special consideration, particularly when using sockets, is security. A serialized object traveling across the internet is subject to the same privacy violations as email or any other unencrypted communication. It may be read by unintended parties, or it may be tampered with while in transit.

In general, sensitive data in serializable objects, such as file descriptors, or other handles to system resources, should be made both private and transient. This prevents the data from being written when the object is serialized. And when the object is read back from a stream, only the originating class can assign a value to the private data field. A validation callback can also be used to check the integrity of a group of objects when they are read from a stream.

The best overall way to avoid security problems is to encrypt the serialization stream, ensuring both privacy and integrity. This can be done on an object basis by implementing custom readObject and writeObject methods, or by using the Externalizable interface. For a global solution, the classes ObjectInputStream and ObjectOutputStream can be customized to encrypt the entire object stream.

About the Author:

No further information.




Comments

No comment yet. Be the first to post a comment.