What Is a Protocol Handler?
The way the URL
, URLStreamHandler
, URLConnection
, and URLStreamHandlerFactory
classes work together can be confusing. Everything starts with a URL, which represents a pointer to a particular Internet resource. Each URL specifies the protocol used to access the resource; typical values for the protocol include mailto
, http
, and ftp
. When you construct a URL
object from the URL's string representation, the constructor strips the protocol field and passes it to the URLStreamHandlerFactory
. The factory's job is to take the protocol, locate the right subclass of URLStreamHandler
for the protocol, and create a new instance of that stream handler, which is stored as a field within the URL
object. Each app has at most one URLStreamHandlerFactory
; once the factory has been installed, attempting to install another will throw an Error
. Now that the URL
object has a stream handler, it asks the stream handler to finish parsing the URL string and create a subclass of URLConnection
that knows how to talk to servers using this protocol. URLStreamHandler
subclasses and URLConnection
subclasses always come in pairs; the stream handler for a protocol always knows how to find an appropriate URLConnection
for its protocol. It is worth noting that the stream handler does most of the work of parsing the URL. The format of the URL, although standard, depends on the protocol; therefore, it must be parsed by a URLStreamHandler
, which knows about a particular protocol, and not by the URL
object, which is generic and has no knowledge of specific protocols. This also means that if you are writing a new stream handler, you can define a new URL format that's appropriate to your task.
|
The URLConnection
class, which you learned about in the previous chapter, represents an active connection to an Internet resource. It is responsible for interacting with the server. A URLConnection
knows how to generate requests and interpret the headers that the server returns. The output from a URLConnection
is the raw data requested with all traces of the protocol (headers, etc.) stripped, ready for processing by a content handler. In most apps, you don't need to worry about URLConnection
objects and stream handlers; they are hidden by the URL
class, which provides a simple interface to the functionality you need. When you call the getInputStream( )
, getOutputStream()
, and getContent( )
methods of the URL
class, you are really calling similarly named methods in the URLConnection
class. We have seen that interacting directly with a URLConnection
can be convenient when you need a little more control over communication with a server, such as when downloading binary files or posting data to a server-side program. However, the URLConnection
and URLStreamHandler
classes are even more important when you need to add new protocols. By writing subclasses of these classes, you can add support for standard protocols such as finger, whois, or NTP that Java doesn't support out of the box. Furthermore, you're not limited to established protocols with well-known services. You can create new protocols that perform database queries, search across multiple Internet search engines, view pictures from binary newsgroups, and more. You can add new kinds of URLs as needed to represent the new types of resources. Furthermore, Java apps can be built so that they load new protocol handlers at runtime. Unlike current browsers such as Mozilla and Internet Explorer, which contain explicit knowledge of all the protocols and content types they can handle, a Java browser can be a relatively lightweight skeleton that loads new handlers as needed. Supporting a new protocol just means adding some new classes in predefined locations, not writing an entirely new release of the browser. What's involved in adding support for a new protocol? As I said earlier, you need to write two new classes: a subclass of URLConnection
and a subclass of URLStreamHandler
. You may also need to write a class that implements the URLStreamHandlerFactory
interface. The URLConnection
subclass handles the interaction with the server, converts anything the server sends into an InputStream
, and converts anything the client sends into an OutputStream
. This subclass must implement the abstract method connect( )
; it may also override the concrete methods getInputStream( )
, getOutputStream( )
, and getContentType( )
. The URLStreamHandler
subclass parses the string representation of the URL into its separate parts and creates a new URLConnection
object that understands that URL's protocol. This subclass must implement the abstract openConnection( )
method, which returns the new URLConnection
to its caller. If the String
representation of the URL doesn't look like a standard hierarchical URL, you should also override the parseURL( )
and toExternalForm( )
methods. Finally, you may need to create a class that implements the URLStreamHandlerFactory
interface. The URLStreamHandlerFactory
helps the app find the right protocol handler for each type of URL. The URLStreamHandlerFactory
interface has a single method, createURLStreamHandler( )
, which returns a URLStreamHandler
object. This method must find the appropriate subclass of URLStreamHandler
given only the protocol (e.g., ftp); that is, it must understand the package and class-naming conventions used for stream handlers. Since URLStreamHandlerFactory
is an interface, you can place the createURLStreamHandler( )
method in any convenient class, perhaps the main class of your app. When it first encounters a protocol, Java looks for URLStreamHandler
classes in this order:
- First, Java checks to see whether a
URLStreamHandlerFactory
is installed. If it is, the factory is asked for aURLStreamHandler
for the protocol. - If a
URLStreamHandlerFactory
isn't installed or if Java can't find aURLStreamHandler
for the protocol, Java looks in the packages named in thejava.protocol.handler.pkgs
system property for a sub-package that shares the protocol name and a class calledHandler
. The value of this property is a list of package names separated by a vertical bar (|
). Thus, to indicate that Java should seek protocol handlers in thecom.macfaq.net.www
andorg.cafeaulait.protocols
packages, you would add this line to your properties file:java.protocol.handler.pkgs=com.macfaq.net.www|org.cafeaulait.protocols
To find an FTP protocol handler (for example), Java first looks for the class
com.macfaq.net.www.ftp.Handler
. If that's not found, Java next tries to instantiateorg.cafeaulait.protocols.ftp.Handler
. - Finally, if all else fails, Java looks for a
URLStreamHandler
namedsun.net.www.protocol
.name
.Handler
, wherename
is replaced by the name of the protocol; for example,sun.net.www.protocol.ftp.Handler
.In the early days of Java (circa 1995), Sun promised that protocols could be installed at runtime from the server that used them. For instance, in 1996, James Gosling and Henry McGilton wrote: "The HotJava Browser is given a reference to an object (a URL). If the handler for that protocol is already loaded, it will be used. If not, the HotJava Browser will search first the local system and then the system that is the target of the URL." (The Java Language Environment, A White Paper, May 1996, http://java.oracle.com/docs/white/langenv/HotJava.doc1.html) However, the loading of protocol handlers from web sites was never implemented, and Sun doesn't talk much about it anymore.
Most of the time, an end user who wants to permanently install an extra protocol handler in a program such as HotJava will place the necessary classes in the program's class path and add the package prefix to the java.protocol.handler.pkgs
property. However, a programmer who just wants to add a custom protocol handler to their program at compile time will write and install a URLStreamHandlerFactory
that knows how to find their custom protocol handlers. The factory can tell an app to look for URLStreamHandler
classes in any place that's convenient: on a web site, in the same directory as the app, or somewhere in the user's class path. When each of these classes has been written and compiled, you're ready to write an app that uses the new protocol handler. Assuming that you're using a URLStreamHandlerFactory
, pass the factory object to the static URL
.setURLStreamHandlerFactory()
method like this:
URL.setURLStreamHandlerFactory(new MyURLStreamHandlerFactory( ));
This method can be called only once in the lifetime of an app. If it is called a second time, it will throw an Error
. Untrusted code will generally not be allowed to install factories or change the java.protocol.handler.pkgs
property. Consequently, protocol handlers are primarily of use to standalone apps such as HotJava; Netscape and Internet Explorer use their own native C code instead of Java to handle protocols, so they're limited to a fixed set of protocols. To summarize, here's the sequence of events:
- The program constructs a
URL
object. - The constructor uses the arguments it's passed to determine the protocol part of the URL, e.g., http.
- The
URL( )
constructor tries to find aURLStreamHandler
for the given protocol like this:- If the protocol has been used before, the
URLStreamHandler
object is retrieved from a cache. - Otherwise, if a
URLStreamHandlerFactory
has been set, the protocol string is passed to the factory'screateURLStreamHandler( )
method. - If the protocol hasn't been seen before and there's no
URLStreamHandlerFactory
, the constructor attempts to instantiate aURLStreamHandler
object namedprotocol
.Handler
in one of the packages listed in thejava.protocol.handler.pkgs
property. - Failing that, the constructor attempts to instantiate a
URLStreamHandler
object namedprotocol
.Handler
in thesun.net.www.protocol
package. - If any of these attempts succeed in retrieving a
URLStreamHandler
object, theURL
constructor sets theURL
object'shandler
field. If none of the attempts succeed, the constructor throws aMalformedURLException
.
- If the protocol has been used before, the
- The program calls the
URL
object'sopenConnection( )
method. - The
URL
object asks theURLStreamHandler
to return aURLConnection
object appropriate for this URL. If there's any problem, anIOException
is thrown. Otherwise, aURLConnection
object is returned. - The program uses the methods of the
URLConnection
class to interact with the remote resource.
Instead of calling openConnection( )
in step 4, the program can call getContent( )
or getInputStream( )
. In this case, the URLStreamHandler
still instantiates a URLConnection
object of the appropriate class. However, instead of returning the URLConnection
object itself, the URLStreamHandler
returns the result of URLConnection
's getContent( )
or getInputStream()
method.