Program: htmlsub

This program makes substitutions in HTML files so that the changes only happen in normal text. If you had the file index.html that contained:

<HTML><HEAD><TITLE>Hi!</TITLE></HEAD><BODY>
<H1>Welcome to Scooby World!</H1> I have <A href="pictures.html">pictures</A> of the crazy dog himself. Here's one!<P>
<IMG src="scooby.jpg" ALT="Good doggy!"><P>
<BLINK>He's my hero!</BLINK> I would like to meet him some day, and get my picture taken with him.<P> P.S. I am deathly ill. <A href="shergold.html">Please send cards</A>. </BODY></HTML>

You can use htmlsub change every occurrence of the word "picture" in the document text to read "photo". It prints the new document on STDOUT:

% htmlsub picture photo scooby.html <HTML><HEAD><TITLE>Hi!</TITLE></HEAD><BODY> <H1>Welcome to Scooby World!</H1> I have <A href="pictures.html">photos</A> of the crazy dog himself. Here's one!<P> <IMG src="scooby.jpg" ALT="Good doggy!"><P> <BLINK>He's my hero!</BLINK> I would like to meet him some day, and get my photo taken with him.<P> P.S. I am deathly ill. <A href="shergold.html">Please send cards</A>. </BODY></HTML>

The program is shown in Example 20.11.

Example 20.11: htmlsub

#!/usr/bin/perl -w # htmlsub - make substitutions in normal text of HTML files # from Gisle Aas <gisle@aas.no> sub usage {
 die "Usage: $0 <from>
<to>
<file>...\n"
}
my $from = shift or usage; my $to = shift or usage; usage unless @ARGV; # Build the HTML::Filter subclass to do the substituting. package MyFilter; require HTML::Filter; @ISA=qw(HTML::Filter); use HTML::Entities qw(decode_entities encode_entities); sub text {
 my $self = shift; my $text = decode_entities($_[0]); $text =~ s/\Q$from/$to/go; # most important line $self->SUPER::text(encode_entities($text));
}
# Now use the class. package main; foreach (@ARGV) {
 MyFilter->new->parse_file($_);
}