URL Class
The java.net.URL
class is an abstraction of a Uniform Resource Locator such as http://www.hamsterdance.com/ or ftp://ftp.redhat.com/pub/. It extends java.lang.Object
, and it is a final class that cannot be subclassed. Rather than relying on inheritance to configure instances for different kinds of URLs, it uses the strategy design pattern. Protocol handlers are the strategies, and the URL
class itself forms the context through which the different strategies are selected:
public final class URL extends Object implements Serializable
Although storing a URL as a string would be trivial, it is helpful to think of URLs as objects with fields that include the scheme (a.k.a. the protocol), hostname, port, path, query string, and fragment identifier (a.k.a. the ref), each of which may be set independently. Indeed, this is almost exactly how the Unlike the The simplest Like all constructors, this may only be called after the Example 7-1 is a simple program for determining which protocols a virtual machine supports. It attempts to construct a The results of this program depend on which virtual machine runs it. Here are the results from Java 1.4.1 on Mac OS X 10.2, which turns out to support all the protocols except Telnet, LDAP, RMI, NFS, and JDBC:
Results using Sun's Linux 1.4.2 virtual machine were identical. Other 1.4 virtual machines derived from the Sun code will show similar results. Java 1.2 and later are likely to be the same except for maybe HTTPS, which was only recently added to the standard distribution. VMs that are not derived from the Sun codebase may vary somewhat in which protocols they support. For example, here are the results of running The nonsupport of RMI and JDBC is actually a little deceptive; in fact, the JDK does support these protocols. However, that support is through various parts of the The second constructor builds a This constructor sets the port to -1 so the default port for the protocol will be used. The This creates a The other arguments are the same as for the This code creates a Screenshot-1 shows the results of Example 7-2 in Mozilla 1.4 with Java 1.4 installed. This browser supports HTTP, HTTPS, FTP, mailto, file, gopher, doc, netdoc, verbatim, systemresource, and jar but not HTTPS, ldap, Telnet, jdbc, rmi, jndi, finger or daytime.
This constructor builds an absolute For instance, you may be parsing an HTML document at http://www.ibiblio.org/javafaq/index-network-dev-java-programming-language.html.gz and encounter a link to a file called mailinglists.html with no further qualifying information. In this case, you use the URL to the document that contains the link to provide the missing information. The constructor computes the new The filename is removed from the path of Of course, the output from this applet depends on the document base. In the run shown in Screenshot-2, the original When using this constructor with Two constructors allow you to specify the protocol handler used for the URL. The first constructor builds a relative All The Besides the constructors discussed here, a number of other methods in the Java class library return URLs are composed of five pieces:
For example, given the URL http://www.ibiblio.org/javafaq/books/jnp/index-network-dev-java-programming-language.html.gz?=1565922069#toc, the scheme is http, the authority is www.ibiblio.org, the path is /javafaq/books/jnp/index-network-dev-java-programming-language.html.gz, the fragment identifier is toc, and the query string is =1565922069. However, not all URLs have all these pieces. For instance, the URL http://www.faqs.org/rfcs/rfc2396.html has a scheme, an authority, and a path, but no fragment identifier or query string. The authority may further be divided into the user info, the host, and the port. For example, in the URL http://admin@www.blackstar.com:8080/, the authority is admin@www.blackstar.com:8080. This has the user info admin, the host www.blackstar.com, and the port 8080. Read-only access to these parts of a URL is provided by five public methods: The The The most recent virtual machines get this method right but some older ones, including Sun's JDK 1.3.0, may return a host string that is not necessarily a valid hostname or address. In particular, URLs that incorporate usernames, like ftp://anonymous:anonymous@wuarchive.wustl.edu/, sometimes include the user info in the host. For example, consider this code fragment:
Java 1.3 sets The The The If the URL does not have a file part, Java 1.2 and earlier append a slash to the URL and return the slash as the filename. For example, if the URL is http://www.slashdot.org (rather than something like http://www.slashdot.org/, The Note that the The The In Java 1.2 and earlier, you need to extract the query string from the value returned by Some URLs include usernames and occasionally even password information. This information comes after the scheme and before the host; an @ symbol delimits it. For instance, in the URL http://elharo@java.oracle.com/, the user info is elharo. Some URLs also include passwords in the user info. For instance, in the URL ftp://mp3:secret@ftp.example.com/c%3a/stuff/mp3/, the user info is mp3:secret. However, most of the time including a password in a URL is a security risk. If the URL doesn't have any user info, Between the scheme and the path of a URL, you'll find the authority. The term authority is taken from the Uniform Resource Identifier specification (RFC 2396), where this part of the URI indicates the authority that resolves the resource. In the most general case, the authority includes the user info, the host, and the port. For example, in the URL ftp://mp3:mp3@138.247.121.61:21000/c%3a/, the authority is mp3:mp3@138.247.121.61:21000. However, not all URLs have all parts. For instance, in the URL http://conferences.oracle.com/java/speakers/, the authority is simply the hostname conferences.oracle.com. The Here's the result of running this against several of the URL examples in this chapter:
Naked URLs aren't very exciting. What's interesting is the data contained in the documents they point to. The These methods differ in that they return the data at the URL as an instance of different classes.
The This code fragment catches an And here are the first few lines of output when There are quite a few more lines in that web page; if you want to see them, you can fire up your web browser. The shakiest part of this program is that it blithely assumes that the remote URL is text, which is not necessarily true. It could well be a GIF or JPEG image, an MP3 sound file, or something else entirely. Even if it is text, the document encoding may not be the same as the default encoding of the client system. The remote host and local client may not have the same default character set. As a general rule, for pages that use a character set radically different from ASCII, the HTML will include a An XML document will likely have an XML declaration instead:
In practice, there's no easy way to get at this information other than by parsing the file and looking for a header like this one, and even that approach is limited. Many HTML files hand-coded in Latin alphabets don't have such a The Use this method when you want to communicate directly with the server. The This overrides any proxy server set with the usual The Here's the result of trying to get the content of http://www.oracle.com:
The exact class may vary from one version of Java to the next (in earlier versions, it's been Here's what happens when you try to load a Java applet using Here's what happens when you try to load an audio file using The last result is the most unusual because it is as close as the Java core API gets to a class that represents a sound file. It's not just an interface through which you can load the sound data. This example demonstrates the biggest problems with using Starting in Java 1.3, it is possible for a content handler to provide different views of an object. This overloaded variant of the You then have to test for the type of the returned object using The The The output is:
The The Java 1.5 adds a Like all good classes, An Object is equal to a When you run this program, you discover:
The The last method in the This method sets the java.net.URL
class is organized, though the details vary a little between different versions of Java. The fields of java.net.URL
are only visible to other members of the java.net
package; classes that aren't in java.net
can't access a URL
's fields directly. However, you can set these fields using the URL
constructors and retrieve their values using the various getter methods (getHost( )
, getPort()
, and so on). URLs are effectively immutable. After a URL
object has been constructed, its fields do not change. This has the side effect of making them thread-safe.
Creating New URLs
InetAddress
objects in , you can construct instances of java.net.URL
. There are six constructors, differing in the information they require. Which constructor you use depends on the information you have and the form it's in. All these constructors throw a MalformedURLException
if you try to create a URL
for an unsupported protocol and may throw a MalformedURLException
if the URL is syntactically incorrect. Exactly which protocols are supported is implementation-dependent. The only protocols that have been available in all major virtual machines are http and file, and the latter is notoriously flaky. Java 1.5 also requires virtual machines to support https, jar, and ftp; many virtual machines prior to Java 1.5 support these three as well. Most virtual machines also support ftp, mailto, and gopher as well as some custom protocols like doc, netdoc, systemresource, and verbatim used internally by Java. The Netscape virtual machine supports the http, file, ftp, mailto, telnet, ldap, and gopher protocols. The Microsoft virtual machine supports http, file, ftp, https, mailto, gopher, doc, and systemresource, but not telnet, netdoc, jar, or verbatim. Of course, support for all these protocols is limited in applets by the security policy. For example, just because an untrusted applet can construct a URL
object from a file URL does not mean that the applet can actually read the file the URL refers to. Just because an untrusted applet can construct a URL
object from an HTTP URL that points to a third-party web site does not mean that the applet can connect to that site. If the protocol you need isn't supported by a particular VM, you may be able to install a protocol handler for that scheme. This is subject to a number of security checks in applets and is really practical only for apps. Other than verifying that it recognizes the URL scheme, Java does not make any checks about the correctness of the URLs it constructs. The programmer is responsible for making sure that URLs created are valid. For instance, Java does not check that the hostname in an HTTP URL does not contain spaces or that the query string is x-www-form-URL-encoded. It does not check that a mailto URL actually contains an email address. Java does not check the URL to make sure that it points at an existing host or that it meets any other requirements for URLs. You can create URLs for hosts that don't exist and for hosts that do exist but that you won't be allowed to connect to.
Constructing a URL from a string
URL
constructor just takes an absolute URL in string form as its single argument:
public URL(String url) throws MalformedURLException
new
operator, and like all URL
constructors, it can throw a MalformedURLException
. The following code constructs a URL
object from a String
, catching the exception that might be thrown:
try {
URL u = new URL("http://www.audubon.org/");
}
catch (MalformedURLException ex) {
System.err.println(ex);
}
URL
object for each of 14 protocols (8 standard protocols, 3 custom protocols for various Java APIs, and 4 undocumented protocols used internally by HotJava). If the constructor succeeds, you know the protocol is supported. Otherwise, a MalformedURLException
is thrown and you know the protocol is not supported.
Example 7-1. ProtocolTester
/* Which protocols does a virtual machine support? */
import java.net.*;
public class ProtocolTester {
public static void main(String[] args) {
// hypertext transfer protocol
testProtocol("http://www.adc.org"); // secure http
testProtocol("https://www.amazon.com/exec/obidos/order2/"); // file transfer protocol
testProtocol("ftp://metalab.unc.edu/pub/languages/java/javafaq/");
// Simple Mail Transfer Protocol testProtocol("mailto:elharo@metalab.unc.edu");
// telnet testProtocol("telnet://dibner.poly.edu/");
// local file access
testProtocol("file:///etc/passwd");
// gopher testProtocol("gopher://gopher.anc.org.za/");
// Lightweight Directory Access Protocol
testProtocol(
"ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress");
// JAR
testProtocol(
"jar:http://cafeaulait.org/books/javaio/ioexamples/javaio.jar!"
+"/com/macfaq/io/StreamCopier.class");
// NFS, Network File System
testProtocol("nfs://utopia.poly.edu/usr/tmp/");
// a custom protocol for JDBC
testProtocol("jdbc:mysql://luna.metalab.unc.edu:3306/NEWS");
// rmi, a custom protocol for remote method invocation
testProtocol("rmi://metalab.unc.edu/RenderEngine");
// custom protocols for HotJava
testProtocol("doc:/UsersGuide/release.html");
testProtocol("netdoc:/UsersGuide/release.html");
testProtocol("systemresource://www.adc.org/+/index-network-dev-java-programming-language.html.gz");
testProtocol("verbatim:http://www.adc.org/");
}
private static void testProtocol(String url) {
try { URL u = new URL(url);
System.out.println(u.getProtocol( ) + " is supported");
}
catch (MalformedURLException ex) {
String protocol = url.substring(0, url.indexOf(':'));
System.out.println(protocol + " is not supported");
}
} }
% java ProtocolTester
http is supported https is supported ftp is supported mailto is supported telnet is not supported file is supported gopher is supported ldap is not supported jar is supported nfs is not supported jdbc is not supported rmi is not supported doc is supported netdoc is supported systemresource is supported verbatim is supported
ProtocolTester
with the open source Kaffe VM 1.1.1:
% java ProtocolTester
http is supported https is not supported ftp is supported mailto is not supported telnet is not supported file is supported gopher is not supported ldap is not supported jar is supported nfs is not supported jdbc is not supported rmi is not supported doc is not supported netdoc is not supported systemresource is not supported verbatim is not supported
java.rmi
and java.sql
packages, respectively. These protocols are not accessible through the URL
class like the other supported protocols (although I have no idea why Sun chose to wrap up RMI and JDBC parameters in URL clothing if it wasn't intending to interface with these via Java's quite sophisticated mechanism for handling URLs).
Constructing a URL from its component parts
URL
from three strings specifying the protocol, the hostname, and the file:
public URL(String protocol, String hostname, String file) throws MalformedURLException
file
argument should begin with a slash and include a path, a filename, and optionally a fragment identifier. Forgetting the initial slash is a common mistake, and one that is not easy to spot. Like all URL
constructors, it can throw a MalformedURLException
. For example:
try {
URL u = new URL("http", "www.eff.org", "/blueribbon.html#intro");
}
catch (MalformedURLException ex) {
// All VMs should recognize http
}
URL
object that points to http://www.eff.org/blueribbon.html#intro, using the default port for the HTTP protocol (port 80). The file specification includes a reference to a named anchor. The code catches the exception that would be thrown if the virtual machine did not support the HTTP protocol. However, this shouldn't happen in practice. For the rare occasions when the default port isn't correct, the next constructor lets you specify the port explicitly as an int
:
public URL(String protocol, String host, int port, String file) throws MalformedURLException
URL(String
protocol
, String
host
, String
file)
constructor and carry the same caveats. For example:
try {
URL u = new URL("http", "fourier.dur.ac.uk", 8000, "/~dma3mjh/jsci/");
}
catch (MalformedURLException ex) {
System.err.println(ex);
}
URL
object that points to http://fourier.dur.ac.uk:8000/~dma3mjh/jsci/, specifying port 8000 explicitly. Example 7-2 is an alternative protocol tester that can run as an applet, making it useful for testing support of browser virtual machines. It uses the three-argument constructor rather than the one-argument constructor in Example 7-1. It also stores the schemes to be tested in an array and uses the same host and file for each scheme. This produces seriously malformed URLs like mailto://www.peacefire.org/bypass/SurfWatch/, once again demonstrating that all Java checks for at object construction is whether it recognizes the scheme, not whether the URL is appropriate.
Example 7-2. A protocol tester applet
import java.net.*;
import java.applet.*;
import java.awt.*;
public class ProtocolTesterApplet extends Applet {
TextArea results = new TextArea( ); public void init( ) {
this.setLayout(new BorderLayout( )); this.add("Center", results);
}
public void start( ) {
String host = "www.peacefire.org";
String file = "/bypass/SurfWatch/";
String[] schemes = {"http", "https", "ftp", "mailto", "telnet", "file", "ldap", "gopher",
"jdbc", "rmi", "jndi", "jar",
"doc", "netdoc", "nfs", "verbatim",
"finger", "daytime", "systemresource"};
for (int i = 0; i < schemes.length; i++) {
try {
URL u = new URL(schemes[i], host, file);
results.append(schemes[i] + " is supported\r\n");
}
catch (MalformedURLException ex) {
results.append(schemes[i] + " is not supported\r\n"); }
} }
}
Screenshot-1. The ProtocolTesterApplet running in Mozilla 1.4
Constructing relative URLs
URL
from a relative URL
and a base URL
:
public URL(URL base, String relative) throws MalformedURLException
URL
as http://www.ibiblio.org/javafaq/mailinglists.html. For example:
try {
URL u1 = new URL("http://www.ibiblio.org/javafaq/index-network-dev-java-programming-language.html.gz");
URL u2 = new URL (u1, "mailinglists.html");
}
catch (MalformedURLException ex) {
System.err.println(ex);
}
u1
and the new filename mailinglists.html is appended to make u2
. This constructor is particularly useful when you want to loop through a list of files that are all in the same directory. You can create a URL for the first file and then use this initial URL to create URL
objects for the other files by substituting their filenames. You also use this constructor when you want to create a URL
relative to the applet's document base or code base, which you retrieve using the getDocumentBase()
or getCodeBase()
methods of the java.applet.Applet
class. Example 7-3 is a very simple applet that uses getDocumentBase( )
to create a new URL
object:
Example 7-3. A URL relative to the web page
import java.net.*;
import java.applet.*;
import java.awt.*;
public class RelativeURLTest extends Applet {
public void init ( ) {
try { URL base = this.getDocumentBase( );
URL relative = new URL(base, "mailinglists.html");
this.setLayout(new GridLayout(2,1));
this.add(new Label(base.toString( )));
this.add(new Label(relative.toString( )));
}
catch (MalformedURLException ex) {
this.add(new Label("This shouldn't happen!"));
}
}
}
URL
(the document base) refers to the file RelativeURL.html; the constructor creates a new URL
that points to the mailinglists.html file in the same directory.
Screenshot-2. A base and a relative URL
getDocumentBase()
, you frequently put the call to getDocumentBase( )
inside the constructor, like this:
URL relative = new URL(this.getDocumentBase( ), "mailinglists.html");
Specifying a URLStreamHandler // Java 1.2
URL
from a base URL
and a relative part. The second builds the URL
from its component pieces:
public URL(URL base, String relative, URLStreamHandler handler) // 1.2
throws MalformedURLException public URL(String protocol, String host, int port, String file, // 1.2
URLStreamHandler handler) throws MalformedURLException
URL
objects have URLStreamHandler
objects to do their work for them. These two constructors change from the default URLStreamHandler
subclass for a particular protocol to one of your own choosing. This is useful for working with URLs whose schemes aren't supported in a particular virtual machine as well as for adding functionality that the default stream handler doesn't provide, such as asking the user for a username and password. For example:
URL u = new URL("finger", "utopia.poly.edu", 79, "/marcus", new com.macfaq.net.www.protocol.finger.Handler( ));
com.macfaq.net.www.protocol.finger.Handler
class used here will be developed in . While the other four constructors raise no security issues in and of themselves, these two do because class loader security is closely tied to the various URLStreamHandler
classes. Consequently, untrusted applets are not allowed to specify a URLSreamHandler
. Trusted applets can do so if they have the NetPermission
specifyStreamHandler
. However, for reasons that will become apparent in , this is a security hole big enough to drive the Microsoft money train through. Consequently, you should not request this permission or expect it to be granted if you do request it.
Other sources of URL objects
URL
objects. You've already seen getDocumentBase( )
from java.applet.Applet
. The other common source is getCodeBase( )
, also from java.applet.Applet
. This works just like getDocumentBase( )
, except it returns the URL
of the applet itself instead of the URL of the page that contains the applet. Both getDocumentBase( )
and getCodeBase( )
come from the java.applet.AppletStub
interface, which java.applet.Applet
implements. You're unlikely to implement this interface yourself unless you're building a web browser or applet viewer. In Java 1.2 and later, the java.io.File
class has a toURL( )
method that returns a file URL matching the given file. The exact format of the URL returned by this method is platform-dependent. For example, on Windows it may return something like file:/D:/JAVA/JNP3/07/ToURLTest.java. On Linux and other Unixes, you're likely to see file:/home/elharo/books/JNP3/07/ToURLTest.java. In practice, file URLs are heavily platform- and program-dependent. Java file URLs often cannot be interchanged with the URLs used by web browsers and other programs, or even with Java programs running on different platforms. Class loaders are used not only to load classes but also to load resources such as images and audio files. The static ClassLoader.getSystemResource(String name)
method returns a URL
from which a single resource can be read. The ClassLoader.getSystemResources(String name)
method returns an Enumeration
containing a list of URL
s from which the named resource can be read. Finally, the instance method getResource(String
name)
searches the path used by the referenced class loader for a URL to the named resource. The URLs returned by these methods may be file URLs, HTTP URLs, or some other scheme. The name of the resource is a slash-separated list of Java identifiers, such as /com/macfaq/sounds/swale.au or com/macfaq/images/headshot.jpg. The Java virtual machine will attempt to find the requested resource in the class path-potentially including parts of the class path on the web server that an applet was loaded from-or inside a JAR archive. Java 1.4 adds the URI
class, which we'll discuss soon. URIs can be converted into URLs using the toURL( )
method, provided Java has the relevant protocol handler installed. There are a few other methods that return URL
objects here and there throughout the class library, but most are simple getter methods that return only a URL you probably already know because you used it to construct the object in the first place; for instance, the getPage( )
method of java.swing.JEditorPane
and the getURL( )
method of java.net.URLConnection
.Splitting a URL into Pieces
getFile( )
, getHost()
, getPort( )
, getProtocol( )
, and getRef( )
. Java 1.3 adds four more methods: getQuery( )
, getPath( )
, getUserInfo( )
, and getAuthority( )
.
public String getProtocol( )
getProtocol( )
method returns a String
containing the scheme of the URL, e.g., "http", "https", or "file". For example:
URL page = this.getCodeBase( );
System.out.println("This applet was downloaded via " + page.getProtocol( ));
public String getHost( )
getHost( )
method returns a String
containing the hostname of the URL. For example:
URL page = this.getCodeBase( );
System.out.println("This applet was downloaded from " + page.getHost( ));
URL u = new URL("ftp://anonymous:anonymous@wuarchive.wustl.edu/");
String host = u.getHost( );
host
to anonymous:anonymous@wuarchive.wustl.edu
, not simply wuarchive.wustl.edu
. Java 1.4 would return wuarchive.wustl.edu
instead.
public int getPort( )
getPort( )
method returns the port number specified in the URL as an int
. If no port was specified in the URL
, getPort( )
returns -1 to signify that the URL does not specify the port explicitly, and will use the default port for the protocol. For example, if the URL is http://www.userfriendly.org/, getPort( )
returns -1; if the URL is http://www.userfriendly.org:80/, getPort( )
returns 80. The following code prints -1 for the port number because it isn't specified in the URL
:
URL u = new URL("http://www.ncsa.uiuc.edu/demoweb/html-primer.html");
System.out.println("The port part of " + u + " is " + u.getPort( ));
public int getDefaultPort( )
getDefaultPort( )
method returns the default port used for this URL
's protocol when none is specified in the URL. If no default port is defined for the protocol, getDefaultPort( )
returns -1. For example, if the URL is http://www.userfriendly.org/, getDefaultPort( )
returns 80; if the URL is ftp://ftp.userfriendly.org:8000/, getDefaultPort( )
returns 21.
public String getFile( )
getFile( )
method returns a String
that contains the path portion of a URL; remember that Java does not break a URL into separate path and file parts. Everything from the first slash (/) after the hostname until the character preceding the # sign that begins a fragment identifier is considered to be part of the file. For example:
URL page = this.getDocumentBase( );
System.out.println("This page's path is " + page.getFile( ));
getFile()
returns /
. Java 1.3 and later simply set the file to the empty string.
public String getPath( ) // Java 1.3
getPath( )
method, available only in Java 1.3 and later, is a near synonym for getFile( )
; that is, it returns a String
containing the path and file portion of a URL. However, unlike getFile( )
, it does not include the query string in the String
it returns, just the path.
getPath( )
method does not return only the directory path and getFile( )
does not return only the filename, as you might expect. Both getPath()
and getFile( )
return the full path and filename. The only difference is that getFile()
also returns the query string and getPath( )
does not.
public String getRef( )
getRef( )
method returns the fragment identifier part of the URL. If the URL doesn't have a fragment identifier, the method returns null
. In the following code, getRef( )
returns the string xtocid1902914
:
URL u = new URL(
"http://www.ibiblio.org/javafaq/javafaq.html#xtocid1902914");
System.out.println("The fragment ID of " + u + " is " + u.getRef( ));
public String getQuery( ) // Java 1.3
getQuery( )
method returns the query string of the URL. If the URL doesn't have a query string, the method returns null
. In the following code, getQuery()
returns the string category=Piano
:
URL u = new URL(
"http://www.ibiblio.org/nywc/compositions.phtml?category=Piano");
System.out.println("The query string of " + u + " is " + u.getQuery( ));
getFile( )
instead.
public String getUserInfo( ) // Java 1.3
getUserInfo()
returns null
. Mailto URLs may not behave like you expect. In a URL like mailto:elharo@metalab.unc.edu, elharo@metalab.unc.edu is the path, not the user info and the host. That's because the URL specifies the remote recipient of the message rather than the username and host that's sending the message.
public String getAuthority( ) // Java 1.3
getAuthority( )
method returns the authority as it exists in the URL, with or without the user info and port. Example 7-4 uses all eight methods to split URLs entered on the command line into their component parts. This program requires Java 1.3 or later.
Example 7-4. The parts of a URL
import java.net.*;
public class URLSplitter {
public static void main(String args[]) {
for (int i = 0; i < args.length; i++) {
try {
URL u = new URL(args[i]);
System.out.println("The URL is " + u);
System.out.println("The scheme is " + u.getProtocol( )); System.out.println("The user info is " + u.getUserInfo( ));
String host = u.getHost( );
if (host != null) {
int atSign = host.indexOf('@'); if (atSign != -1) host = host.substring(atSign+1);
System.out.println("The host is " + host); }
else { System.out.println("The host is null."); }
System.out.println("The port is " + u.getPort( ));
System.out.println("The path is " + u.getPath( ));
System.out.println("The ref is " + u.getRef( ));
System.out.println("The query string is " + u.getQuery( ));
} // end try
catch (MalformedURLException ex) {
System.err.println(args[i] + " is not a URL I understand.");
}
System.out.println( );
} // end for
} // end main
} // end URLSplitter
% java URLSplitter \
http://www.ncsa.uiuc.edu/demoweb/html-primer.html#A1.3.3.3 \
ftp://mp3:mp3@138.247.121.61:21000/c%3a/ \
http://www.oracle.com \
http://www.ibiblio.org/nywc/compositions.phtml?category=Piano \
http://admin@www.blackstar.com:8080/ \
The URL is http://www.ncsa.uiuc.edu/demoweb/html-primer.html#A1.3.3.3
The scheme is http The user info is null The host is www.ncsa.uiuc.edu The port is -1
The path is /demoweb/html-primer.html The ref is A1.3.3.3
The query string is null The URL is ftp://mp3:mp3@138.247.121.61:21000/c%3a/
The scheme is ftp The user info is mp3:mp3
The host is 138.247.121.61
The port is 21000
The path is /c%3a/
The ref is null The query string is null The URL is http://www.oracle.com The scheme is http The user info is null The host is www.oracle.com The port is -1
The path is The ref is null The query string is null The URL is http://www.ibiblio.org/nywc/compositions.phtml?category=Piano The scheme is http The user info is null The host is www.ibiblio.org The port is -1
The path is /nywc/compositions.phtml The ref is null The query string is category=Piano The URL is http://admin@www.blackstar.com:8080/
The scheme is http The user info is admin The host is www.blackstar.com The port is 8080
The path is /
The ref is null The query string is null
Retrieving Data from a URL
URL
class has several methods that retrieve data from a URL:
public InputStream openStream( ) throws IOException public URLConnection openConnection( ) throws IOException public URLConnection openConnection(Proxy proxy) throws IOException // 1.5
public Object getContent( ) throws IOException public Object getContent(Class[] classes) throws IOException // 1.3
public final InputStream openStream( ) throws IOException
openStream( )
method connects to the resource referenced by the URL
, performs any necessary handshaking between the client and the server, and returns an InputStream
from which data can be read. The data you get from this InputStream
is the raw (i.e., uninterpreted) contents of the file the URL
references: ASCII if you're reading an ASCII text file, raw HTML if you're reading an HTML file, binary image data if you're reading an image file, and so forth. It does not include any of the HTTP headers or any other protocol-related information. You can read from this InputStream
as you would read from any other InputStream
. For example:
try {
URL u = new URL("http://www.hamsterdance.com");
InputStream in = u.openStream( );
int c;
while ((c = in.read( )) != -1) System.out.write(c);
}
catch (IOException ex) {
System.err.println(ex);
}
IOException
, which also catches the MalformedURLException
that the URL
constructor can throw, since MalformedURLException
subclasses IOException
. Example 7-5 reads a URL from the command line, opens an InputStream
from that URL, chains the resulting InputStream
to an InputStreamReader
using the default encoding, and then uses InputStreamReader
's read( )
method to read successive characters from the file, each of which is printed on System.out
. That is, it prints the raw data located at the URL: if the URL references an HTML file, the program's output is raw HTML.
Example 7-5. Download a web page
import java.net.*;
import java.io.*;
public class SourceViewer {
public static void main (String[] args) {
if (args.length > 0) {
try {
//Open the URL for reading
URL u = new URL(args[0]);
InputStream in = u.openStream( );
// buffer the input to increase performance in = new BufferedInputStream(in); // chain the InputStream to a Reader
Reader r = new InputStreamReader(in);
int c;
while ((c = r.read( )) != -1) {
System.out.print((char) c);
} }
catch (MalformedURLException ex) {
System.err.println(args[0] + " is not a parseable URL");
}
catch (IOException ex) {
System.err.println(ex);
}
} // end if
} // end main
} // end SourceViewer
SourceViewer
downloads http://www.oracle.com:
% java SourceViewer http://www.oracle.com
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us" xml:lang="en-US">
<head>
<title>oracle.com -- Welcome to Oracle -- computer tutorials, software conferences, online publishing</title>
<meta content="Oracle, oracle, computer tutorials, technical books, UNIX, unix, Perl, Java, Linux, Internet, Web, C, C++, Windows, Windows NT, Security, Sys Admin, System Administration, Oracle, PL/SQL, online tutorials,
books online, computer tutorial online, e-books, ebooks, Perl Conference, Open Source Conference, Java Conference, open source, free software, XML, Mac OS X, .Net, dot net, C#, PHP, CGI, VB, VB Script, Java Script, javascript, Windows 2000, XP, bioinformatics, web services, p2p" />
<meta content="Oracle is a leader in technical and computer tutorial documentation, online content, and conferences for UNIX, Perl, Java, Linux, Internet, Mac OS X, C, C++, Windows, Windows NT, Security, Sys Admin, System Administration, Oracle, Design and Graphics, Online Books, e-books, ebooks, Perl Conference, Java Conference, P2P Conference" />
META
tag in the header specifying the character set in use. For instance, this META
tag specifies the Big-5 encoding for Chinese:
<meta http-equiv="Content-Type" content="text/html; charset=big5">
<?xml version="1.0" encoding="Big5"?>
META
tag. Since Windows, the Mac, and most Unixes have somewhat different interpretations of the characters from 128 to 255, the extended characters in these documents do not translate correctly on platforms other than the one on which they were created. And as if this isn't confusing enough, the HTTP header that precedes the actual document is likely to have its own encoding information, which may completely contradict what the document itself says. You can't read this header using the URL
class, but you can with the URLConnection
object returned by the openConnection( )
method. Encoding detection and declaration is one of the thornier parts of the architecture of the Web.
public URLConnection openConnection( ) throws IOException
openConnection( )
method opens a socket to the specified URL and returns a URLConnection
object. A URLConnection
represents an open connection to a network resource. If the call fails, openConnection( )
throws an IOException
. For example:
try {
URL u = new URL("http://www.jennicam.org/");
try {
URLConnection uc = u.openConnection( );
InputStream in = uc.getInputStream( );
// read from the connection...
} // end try
catch (IOException ex) {
System.err.println(ex);
}
} // end try catch (MalformedURLException ex) {
System.err.println(ex);
}
URLConnection
gives you access to everything sent by the server: in addition to the document itself in its raw form (e.g., HTML, plain text, binary image data), you can access all the metadata specified by the protocol. For example, if the scheme is HTTP, the URLConnection
lets you access the HTTP headers as well as the raw HTML. The URLConnection
class also lets you write data to as well as read from a URL-for instance, in order to send email to a mailto URL or post form data. The URLConnection
class will be the primary subject of . Java 1.5 adds one overloaded variant of this method that specifies the proxy server to pass the connection through:
public URLConnection openConnection(Proxy proxy) throws IOException
socksProxyHost
, socksProxyPort
, http.proxyHost
, http.proxyPort
, http.nonProxyHosts
, and similar system properties. If the protocol handler does not support proxies, the argument is ignored and the connection is made directly if possible.
public final Object getContent( ) throws IOException
getContent( )
method is the third way to download data referenced by a URL. The getContent( )
method retrieves the data referenced by the URL and tries to make it into some type of object. If the URL refers to some kind of text object such as an ASCII or HTML file, the object returned is usually some sort of InputStream
. If the URL refers to an image such as a GIF or a JPEG file, getContent( )
usually returns a java.awt.ImageProducer
(more specifically, an instance of a class that implements the ImageProducer
interface). What unifies these two disparate classes is that they are not the thing itself but a means by which a program can construct the thing:
try {
URL u = new URL("http://mesola.obspm.fr/");
Object o = u.getContent( );
// cast the Object to the appropriate type
// work with the Object...
} catch (Exception ex) {
System.err.println(ex);
}
getContent( )
operates by looking at the Content-type
field in the MIME header of the data it gets from the server. If the server does not use MIME headers or sends an unfamiliar Content-type
, getContent( )
returns some sort of InputStream
with which the data can be read. An IOException
is thrown if the object can't be retrieved. Example 7-6 demonstrates this.
Example 7-6. Download an object
import java.net.*;
import java.io.*;
public class ContentGetter {
public static void main (String[] args) {
if (args.length > 0) {
//Open the URL for reading
try {
URL u = new URL(args[0]);
try {
Object o = u.getContent( );
System.out.println("I got a " + o.getClass( ).getName( ));
} // end try
catch (IOException ex) {
System.err.println(ex);
}
} // end try
catch (MalformedURLException ex) {
System.err.println(args[0] + " is not a parseable URL");
}
} // end if
} // end main
} // end ContentGetter
% java ContentGetter http://www.oracle.com/
I got a sun.net.www.protocol.http.HttpURLConnection$HttpInputStream
java.io.PushbackInputStream
or sun.net.www.http.KeepAliveStream
) but it should be some form of InputStream
. Here's what you get when you try to load a header image from that page:
% java ContentGetter http://www.oracle.com/graphics_new/animation.gif
I got a sun.awt.image.URLImageSource
getContent( )
:
% java ContentGetter http://www.cafeaulait.org/RelativeURLTest.class
I got a sun.net.www.protocol.http.HttpURLConnection$HttpInputStream
getContent( )
:
% java ContentGetter http://www.cafeaulait.org/course/week9/spacemusic.au
I got a sun.applet.AppletAudioClip
getContent( )
: it's hard to predict what kind of object you'll get. You could get some kind of InputStream
or an ImageProducer
or perhaps an AudioClip
; it's easy to check using the instanceof
operator. This information should be enough to let you read a text file or display an image.
public final Object getContent(Class[] classes) throws IOException // Java 1.3
getContent( )
method lets you choose what class you'd like the content to be returned as. The method attempts to return the URL's content in the order used in the array. For instance, if you prefer an HTML file to be returned as a String
, but your second choice is a Reader
and your third choice is an InputStream
, write:
URL u = new URL("http://www.nwu.org");
Class[] types = new Class[3];
types[0] = String.class;
types[1] = Reader.class;
types[2] = InputStream.class;
Object o = u.getContent(types);
instanceof
. For example:
if (o instanceof String) {
System.out.println(o); }
else if (o instanceof Reader) {
int c;
Reader r = (Reader) o;
while ((c = r.read( )) != -1) System.out.print((char) c); }
else if (o instanceof InputStream) {
int c;
InputStream in = (InputStream) o;
while ((c = in.read( )) != -1) System.out.write(c); }
else {
System.out.println("Error: unexpected type " + o.getClass( )); }
Utility Methods
URL
class contains a couple of utility methods that perform common operations on URLs. The sameFile( )
method determines whether two URLs point to the same document. The toExternalForm( )
method converts a URL
object to a string that can be used in an HTML link or a web browser's Open URL dialog.
public boolean sameFile(URL other)
sameFile( )
method tests whether two URL
objects point to the same file. If they do, sameFile( )
returns true
; otherwise, it returns false
. The test that sameFile( )
performs is quite shallow; all it does is compare the corresponding fields for equality. It detects whether the two hostnames are really just aliases for each other. For instance, it can tell that http://www.ibiblio.org/ and http://metalab.unc.edu/ are the same file. However, it cannot tell that http://www.ibiblio.org:80/ and http://metalab.unc.edu/ are the same file or that http://www.cafeconleche.org/ and http://www.cafeconleche.org/index-network-dev-java-programming-language.html.gz are the same file. sameFile( )
is smart enough to ignore the fragment identifier part of a URL, however. Here's a fragment of code that uses sameFile( )
to compare two URLs:
try {
URL u1 = new URL("http://www.ncsa.uiuc.edu/HTMLPrimer.html#GS");
URL u2 = new URL("http://www.ncsa.uiuc.edu/HTMLPrimer.html#HD");
if (u1.sameFile(u2)) {
System.out.println(u1 + " is the same file as \n" + u2);
}
else {
System.out.println(u1 + " is not the same file as \n" + u2);
}
}
catch (MalformedURLException ex) {
System.err.println(ex);
}
http://www.ncsa.uiuc.edu/HTMLPrimer.html#GS is the same file as http://www.ncsa.uiuc.edu/HTMLPrimer.html#HD
sameFile( )
method is similar to the equals( )
method of the URL
class. The main difference between sameFile( )
and equals( )
is that equals( )
considers the fragment identifier (if any), whereas sameFile( )
does not. The two URLs shown here do not compare equal although they are the same file. Also, any object may be passed to equals( )
; only URL
objects can be passed to sameFile( )
.
public String toExternalForm( )
toExternalForm( )
method returns a human-readable String
representing the URL. It is identical to the toString( )
method. In fact, all the toString( )
method does is return toExternalForm( )
. Therefore, this method is currently redundant and rarely used.
public URI toURI( ) throws URISyntaxException // Java 1.5
toURI( )
method that converts a URL
object to an equivalent URI
object. We'll take up the URI
class shortly. In the meantime, the main thing you need to know is that the URI
class provides much more accurate, specification-conformant behavior than the URL
class. For operations like absolutization and encoding, you should prefer the URI
class where you have the option. In Java 1.4 and later, the URL
class should be used primarily for the actual downloading of content from the remote server.The Object Methods
URL
inherits from java.lang.Object
, so it has access to all the methods of the Object class. It overrides three to provide more specialized behavior: equals( )
, hashCode( )
, and toString( )
.
public String toString( )
java.net.URL
has a toString( )
method. Example 7-1 through Example 7-5 implicitly called this method when URL
s were passed to System.out.println( )
. As those examples demonstrated, the String
produced by toString( )
is always an absolute URL, such as http://www.cafeaulait.org/javatutorial.html. It's uncommon to call toString( )
explicitly. Print statements call toString( )
implicitly. Outside of print statements, it's more proper to use toExternalForm( )
instead. If you do call toString( )
, the syntax is simple:
URL codeBase = this.getCodeBase( );
String appletURL = codeBase.toString( );
public boolean equals(Object o)
URL
only if it is also a URL
, both URL
s point to the same file as determined by the sameFile( )
method, and both URL
s have the same fragment identifier (or both URL
s don't have fragment identifiers). Since equals( )
depends on sameFile( )
, equals( )
has the same limitations as sameFile( )
. For example, http://www.oracle.com/ is not equal to http://www.oracle.com/index-network-dev-java-programming-language.html.gz, and http://www.oracle.com:80/ is not equal to http://www.oracle.com/. Whether this makes sense depends on whether you think of a URL as a string or as a reference to a particular Internet resource. Example 7-7 creates URL
objects for http://www.ibiblio.org/ and http://metalab.unc.edu/ and tells you if they're the same using the equals()
method.
Example 7-7. Are http://www.ibiblio.org and http://www.metalab.unc.edu the same?
import java.net.*;
public class URLEquality {
public static void main (String[] args) {
try {
URL ibiblio = new URL ("http://www.ibiblio.org/");
URL metalab = new URL("http://metalab.unc.edu/");
if (ibiblio.equals(metalab)) {
System.out.println(ibiblio + " is the same as " + metalab);
}
else {
System.out.println(ibiblio + " is not the same as " + metalab);
}
}
catch (MalformedURLException ex) {
System.err.println(ex);
}
}
}
% java URLEquality
http://www.ibiblio.org/ is the same as http://metalab.unc.edu/
public int hashCode( )
hashCode( )
method returns an int
that is used when URL
objects are used as keys in hash tables. Thus, it is called by the various methods of java.util.Hashtable
. You rarely need to call this method directly, if ever. Hash codes for two different URL
objects are unlikely to be the same, but it is certainly possible; there are far more conceivable URLs than there are four-byte integers.Methods for Protocol Handlers
URL
class I'll just mention briefly here for the sake of completeness: setURLStreamHandlerFactory( )
. It's primarily used by protocol handlers that are responsible for new schemes, not by programmers who just want to retrieve data from a URL. We'll discuss it in more detail in .
public static synchronized void setURLStreamHandlerFactory(URLStreamHandlerFactory factory)
URLStreamHandlerFactory
for the app and throws a generic Error
if the factory has already been set. A URLStreamHandler
is responsible for parsing the URL and then constructing the appropriate URLConnection
object to handle the connection to the server. Most of the time this happens behind the scenes.