Workshop: Writing Bytes to an MP3 File

This hour's workshop uses Java's file-handling and property configuration capabilities to clean up web pages. The TagCleaner app transforms all HTML markup tags on pages to upper- or lowercase, making these formatting commands more readable. Listing 20.3 contains an example web page, test.html, with markup tags capitalized in several different ways.

Listing 20.3. The Full Text of test.html
 1: <HTML>
 2: <Head>
 3: <TITLE>PALINDROME</TITLE>
 4: </head>
 5: <Body BGCOLOR="#FFFFFF">
 6: <div ALIGN="Center">
 7: <P>Dennis and Edna sinned.</P>
 8: </div>
 9: </Body>
10: </HTML>


The HTML markup tags in Listing 20.3 are enclosed within "<" and ">" characters. These tags are not displayed when the page displays in a browser; they dictate the format and positioning of elements on the page. On the page, some tag names are written in all-caps, others have an initial capital letter, and some are lowercase. HTML is a language that's case insensitive, a coding term that means it doesn't matter how you capitalize something, as long as you spell it properly. This page loads properly in browsers, but editing the HTML becomes more difficult because tags don't stand out as clearly. In Listing 20.4, the HTML has been transformed consistently as uppercase.

Listing 20.4. The Full Text of test.html.clean
 1: <HTML>
 2: <HEAD>
 3: <TITLE>PALINDROME</TITLE>
 4: </HEAD>
 5: <BODY BGCOLOR="#FFFFFF">
 6: <DIV ALIGN="Center">
 7: <P>Dennis and Edna sinned.</P>
 8: </DIV>
 9: </BODY>
10: </HTML>


Tag names are hard to miss under the formatting in Listing 20.4. The TagCleaner app takes configuration settings from a properties file that determines two aspects of the program's performance:

  • Whether to display tags in lowercase or uppercase
  • Whether to display output when the program runs

The properties file, TagCleaner.properties, is contained in Listing 20.5.

Listing 20.5. The Full Text of TagCleaner.properties
1: case=upper
2: hideOutput=false


With the properties file and a web page to use as input, you're ready to create and run the TagCleaner app. Enter the text of Listing 20.6 into an editor and save the file as TagCleaner.java when you're done.

Listing 20.6. The Full Text of TagCleaner.java
 1: import java.io.*;
 2: import java.util.*;
 3:
 4: public class TagCleaner {
 5:
 6: public static void main(String[] arguments) {
 7: if (arguments.length < 1) {
 8: System.out.println("Usage: java TagCleaner filename");
 9: System.exit(-1);
10: }
11: TagCleaner cleaner = new TagCleaner(arguments[0]);
12: }
13:
14: public TagCleaner(String filename) {
15: try {
16: // load configuration properties
17: File propFile = new File("TagCleaner.properties");
18: FileInputStream propStream = new FileInputStream(propFile);
19: Properties props = new Properties();
20: props.load(propStream);
21: String caseProperty = props.getProperty("case");
22: String hideOutput = props.getProperty("hideOutput");
23: // set up file input and output
24: File file = new File(filename);
25: FileInputStream in = new FileInputStream(file);
26: File clean = new File(filename + ".clean");
27: FileOutputStream out = new FileOutputStream(clean);
28: boolean eof = false;
29: boolean inTag = false;
30: boolean inQuote = false;
31: if (hideOutput.equals("false")) {
32: System.out.print("Creating file ... ");
33: }
34: while (!eof) {
35: int input = in.read();
36: if (input == -1) {
37: eof = true;
38: continue;
39: }
40: // look for quote characters
41: if (input == '"') {
42: if (inQuote != true) {
43: inQuote = true;
44: } else {
45: inQuote = false;
46: }
47: }
48: // look for tag opening
49: if (input == '<') {
50: inTag = true;
51: }
52: // look for tag closing
53: if (input == '>') {
54: inTag = false;
55: inQuote = false;
56: }
57: if ((!inTag) | (inQuote)) {
58: out.write((char)input);
59: } else {
60: if (caseProperty.equals("lower")) {
61: out.write(Character.toLowerCase((char)input));
62: } else {
63: out.write(Character.toUpperCase((char)input));
64: }
65: }
66: }
67: in.close();
68: out.close();
69: if (hideOutput.equals("false")) {
70: System.out.println("done");
71: }
72: } catch (Exception e) {
73: System.out.println("error\n\n" + e.toString());
74: }
75: }
76: }


When you're done, compile the file with your Java compiler. The app requires a web page to use as input. This page won't be altered by the app—instead, a transformed version of the page will be saved under a new filename. Run the app with the name of the page as a command-line argument, as in the following JDK example:

java TagCleaner test.html


The TagCleaner program reads in the page. A file input stream associated with the page is created in Lines 24–25. The page is read in the while loop contained in Lines 34–66. A boolean variable called eof is used as the condition of the loop. As long as it is equal to false, the loop will continue executing. In Line 35, a single byte is read. If this byte is equal to -1, the end of the page has been reached, so eof is set to TRue in Lines 36–39, which will cause the loop to end. The altered version of the page will be written to its own file, which is given the original file's name followed by ".clean" in Line 26. This file is associated with a file output stream in Line 27. Data is written to the altered page in Lines 57–65. The TagCleaner app decides whether to capitalize a character based on this rule: If the letter is contained with "<" and ">" tags, change its case, except when it also is contained within quote marks. The Boolean variables inTag and inQuote keep track of whether the most recently read character from the page is contained within an HTML tag or quote marks.

By the Way

The reason for the exception: Quote marks surround attribute values inside HTML tags that might not be case insensitive. For example, consider this tag to display a graphic on a web page:

<IMG src="mugshot.gif">


An IMG tag displays the file named in its SRC attribute. If you changed an SRC value from Mugshot.gif to mugshot.gif, it would stop displaying the graphics file.


      
Comments