URL Class - Tutorial - Java Programming Language

Bringing this down to a more concrete level is the Java URL class. The URL class represents a URL address and provides a simple API for accessing web resources, such as documents and apps on servers. It can use an extensible set of protocol and content handlers to perform the necessary communication and even data conversion. With the URL class, an app can open a connection to a server on the network and retrieve content with just a few lines of code. As new types of servers and new formats for content evolve, additional URL handlers can be supplied to retrieve and interpret the data without modifying your apps. A URL is represented by an instance of the java.net.URL class. A URL object manages all the component information within a URL string and provides methods for retrieving the object it identifies. We can construct a URL object from a URL string or from its component parts:

try {
 URL aDoc =
 new URL( "http://foo.bar.com/documents/homepage.html" );
 URL sameDoc =
 new URL("http","foo.bar.com","documents/homepage.html");
}
catch ( MalformedURLException e ) { }

These two URL objects point to the same network resource, the homepage.html document on the server foo.bar.com. Whether the resource actually exists and is available isn't known until we try to access it. When initially constructed, the URL object contains only data about the object's location and how to access it. No connection to the server has been made. We can examine the various parts of the URL with the getProtocol( ), getHost( ), and getFile( ) methods. We can also compare it to another URL with the sameFile( ) method (which has an unfortunate name for something which may not point to a file). sameFile( ) determines whether two URLs point to the same resource. It can be fooled, but sameFile( ) does more than compare the URL strings for equality; it takes into account the possibility that one server may have several names as well as other factors. (It doesn't go as far as to fetch the resources and compare them, however.) When a URL is created, its specification is parsed to identify just the protocol component. If the protocol doesn't make sense, or if Java can't find a protocol handler for it, the URL constructor throws a MalformedURLException. A protocol handler is a Java class that implements the communications protocol for accessing the URL resource. For example, given an http URL, Java prepares to use the HTTP protocol handler to retrieve documents from the specified web server. As of Java 5.0, URL protocol handlers are guaranteed to be provided for http, https (secure HTTP), and ftp as well as local file URLs and jar URLs that refer to files inside JAR archives. Outside of that, it gets a little dicey. We'll talk more about the issues surrounding content and protocol handlers a bit later in this chapter.

Stream Data

The lowest level and most general way to get data back from a URL is to ask for an InputStream from the URL by calling openStream( ). Getting the data as a stream may also be useful if you want to receive continuous updates from a dynamic information source. The drawback is that you have to parse the contents of the byte stream yourself. Working in this mode is basically the same as working with a byte stream from socket communications, but the URL protocol handler has already dealt with all of the server communications and is providing you with just the content portion of the transaction. Not all types of URLs support the openStream( ) method because not all types of URLs refer to concrete data; you'll get an UnknownServiceException if the URL doesn't. The following code prints the contents of an HTML file on a web server:

try {
 URL url = new URL("http://server/index.html"); BufferedReader bin = new BufferedReader (
 new InputStreamReader( url.openStream( ) )); String line;
 while ( (line = bin.readLine( )) != null )
 System.out.println( line );
} catch (Exception e) { }

We ask for an InputStream with openStream( ) and wrap it in a BufferedReader to read the lines of text. Because we specify the http protocol in the URL, we enlist the services of an HTTP protocol handler. Note that we haven't talked about content handlers yet. In this case, since we're reading directly from the input stream, no content handler (no transformation of the content data) is involved. One note about applets. In the applet environment, you typically have additional security restrictions that limit the URLs to which you may communicate. To be sure that you can access the specified URL and use the correct protocol handler, you should construct URLs relative to the base URL that identifies the applet's codebasethe location of the applet code. This ensures that any data you load comes via the same protocol and from the same server as your applet. For example:

new URL( getCodeBase( ), "foo/bar.gif" );

Alternately, if you are just trying to get data files or media associated with an applet, there is a more general way; see the discussion of geTResource( ) in .

Getting the Content as an Object

As we said previously, reading raw content from a stream is the most general mechanism for accessing data over the Web. openStream( ) leaves the parsing of data up to you. The URL class, however, supports a more sophisticated, pluggable, content-handling mechanism that we'll discuss now, but be aware that this is not widely used because of lack of standardization and limitations in how you can deploy new handlers. Although the Java community made some progress in Java 5.0 in standardizing a small set of protocol handlers, no such effort was made to standardize content handlers. This means that although this part of the discussion is interesting, its usefulness is limited. The way it's supposed to work is that when Java knows the type of content being retrieved from a URL, and a proper content handler is available, you can retrieve the URL addresses as an appropriate Java object by calling the URL's getContent( ) method. In this mode of operation, getContent( ) initiates a connection to the host, fetches the data for you, determineon to the host, fetches the data for you, determines the type of data, and then invokes a content handler to turn the bytes into a Java object. It acts sort of as if you had read a serialized Java object, as in . Java will try to determine the type of the content by looking at its MIME type, its file extension, or even by examining the bytes directly. For example, given the URL http://foo.bar.com/index.html, a call to getContent( ) uses the HTTP protocol handler to retrieve data and might use an HTML content handler to turn the data into an appropriate document object. Similarly, a GIF file might be turned into an AWT Image or an ImageProducer object using a GIF content handler. If we access the GIF file using an FTP URL, Java would use the same content handler but a different protocol handler to receive the data. Since the content handler must be able to return any type of object, the return type of getContent( ) is Object. This might leave us wondering what kind of object we got. In a moment, we'll describe how we could ask the protocol handler about the object's MIME type. Based on this, and whatever other knowledge we have about the kind of object we are expecting, we can cast the Object to its appropriate, more specific type. For example, if we expect an image, we might cast the result of getContent( ) to ImageProducer:

try {
 ImageProducer ip = (ImageProducer)myURL.getContent( );
} catch ( ClassCastException e ) { ... }

Various kinds of errors can occur when trying to retrieve the data. For example, getContent( ) can throw an IOException if there is a communications error. Other kinds of errors can occur at the app level: some knowledge of how the app-specific content and protocol handlers deal with errors is necessary. One problem that could arise is that a content handler for the data's MIME type wouldn't be available. In this case, getContent( ) invokes a special "unknown type" handler that returns the data as a raw InputStream (back to square one). In some situations, we may also need knowledge of the protocol handler. For example, consider a URL that refers to a nonexistent file on an HTTP server. When requested, the server returns the familiar "404 Not Found" message. To deal with protocol-specific operations like this, we may need to talk to the protocol handler, which we'll discuss next.

Managing Connections

Upon calling openStream( ) or getContent( ) on a URL, the protocol handler is consulted and a connection is made to the remote server or location. Connections are represented by a URLConnection object, subtypes of which manage different protocol-specific communications and offer additional metadata about the source. The HttpURLConnection class, for example, handles basic web requests and also adds some HTTP-specific capabilities such as interpreting "404 Not Found" messages and other web server errors. We'll talk more about HttpURLConnection later in this chapter. We can get a URLConnection from our URL directly with the openConnection( ) method. One of the things we can do with the URLConnection is ask for the object's content type, before reading data. For example:

URLConnection connection = myURL.openConnection( );
String mimeType = connection.getContentType( );
InputStream in = connection.getInputStream( );

Despite its name, a URLConnection object is initially created in a raw, unconnected state. In this example, the network connection was not actually initiated until we called the getContentType( ) method. The URLConnection does not talk to the source until data is requested or its connect( ) method is explicitly invoked. Prior to connection, network parameters and protocol-specific features can be set up. For example, as of Java 5.0, we can set timeouts on the initial connection to the server and on reads:

URLConnection connection = myURL.openConnection( );
connection.setConnectTimeout( 10000 ); // milliseconds connection.setReadTimeout( 10000 ); // milliseconds InputStream in = connection.getInputStream( );

As we'll see in the section "Using the POST Method," by casting the URLConnection to its specific subtype we can get at the protocol-specific information.

Handlers in Practice

The content- and protocol-handler mechanisms we've described are very flexible; to handle new types of URLs, you need only add the appropriate handler classes. One interesting app of this would be Java-based web browsers that could handle new and specialized kinds of URLs by downloading them over the Net. The idea for this was touted since the earliest days of Java. Unfortunately, it has never come to fruition. There is no API for dynamically downloading new content and protocol handlers. In fact, there is no standard API for determining what content and protocol handlers exist on a given platform. Although Java 5.0 mandates protocol handlers for HTTP, HTTPS, FTP, FILE, and JAR, earlier versions of Java made no such guarantees. While in practice you will generally find these basic protocol handlers with all versions of Java, that's not entirely comforting and the story for content handlers is even less clear. The standard Java classes don't, for example, include content handlers for HTML, GIF, JPEG, or other common data types. Furthermore, although content and protocol handlers are part of the Java API and an intrinsic part of the mechanism for working with URLs, specific content and protocol handlers aren't defined. Even those protocol handlers that have been required in Java 5.0 are still packaged as part of the Sun implementation classes and are not truly part of the core API for all to see. There are two real issues:

There isn't a complete standard that says that certain types of handlers have to be provided in each environment along with the core Java API. Instead, we have to rely on the app to decide what kinds of data types it needs. This may make sense but is frustrating when it should be reasonable to expect certain basic types to be handled in all environments.
No standard tells you what kind of object the content handler should return. Maybe GIF data should be returned as an ImageProducer object, but at the moment, that's an app-level decision. If you're writing your own app and your own content handlers, that isn't an issue: you can make any decision you want. (In practical terms, few developers take this approach.) But if you're writing content handlers for arbitrary apps, you need to know what they expect.

In summary, the Java content- and protocol-handler mechanism is a forward-thinking approach that never quite materialized. The promise of web browsers that dynamically extend themselves for new types of protocols and new content is, like flying cars, always just a few years away. Although the basic mechanics of the protocol-handler mechanism are useful (especially now with Java 5.0), for decoding content in your own apps you should probably turn to other, newer frameworks that have a bit more specificity.

Other Handler Frameworks

The idea of dynamically downloadable handlers could also be applied to other kinds of handler-like components. For example, the Java XML community is fond of referring to XML as a way to apply semantics to documents and to Java as a portable way to supply the behavior that goes along with those semantics. It's possible that an XML viewer could be built with downloadable handlers for displaying XML tags. The JavaBeans APIs also touch upon this subject with the Java Activation Framework (JAF), which provides a way to detect the data stream type and "encapsulate access to it" in a Java bean. If this sounds suspiciously like the content handler's job, it is. Unfortunately, it looks like these APIs will not be merged and, outside of the Java Mail API, the JAF has not been widely used. Fortunately, for working with URL streams of images, music, and video, very mature APIs are available. The Java Advanced Imaging API (JAI) includes a well-defined, extensible set of handlers for most image types, and the Java Media Framework (JMF) can play most common music and video types found online.

Writing Content and Protocol Handlers

Although content and protocol handlers are used fairly extensively in the internals of Java, they have not been leveraged much by developers for their own apps. We discussed some of the reasons for this earlier. But, if you're adventurous and want to try utilizing content and protocol handlers in your own apps or you have a need to retrofit an existing app, you can find out more in the extras folder on the CD accompanying this tutorial. There you'll find a full chapter with examples covering everything you'll need to write your own content and protocol handlers.