Matching Letters

Problem

You want to see whether a value only consists of alphabetic characters.

Solution

The obvious character class for matching regular letters isn't good enough in the general case:

if ($var =~ /^[A-Za-z]+$/) {
 # it is purely alphabetic }

That's because it doesn't respect the user's locale settings. If you need to match letters with diacritics as well, use locale and match against a negated character class:

use locale; if ($var =~ /^[^\W\d_]+$/) {
 print "var is purely alphabetic\n";
}

Discussion

Perl can't directly express "something alphabetic" independent of locale, so we have to be more clever. The w regular expression notation matches one alphabetic, numeric, or underscore character. Therefore, W is not one of those. The negated character class [^\W\d_] specifies a byte that must not be an alphanumunder, a digit, or an underscore. That leaves us with nothing but alphabetics, which is what we were looking for.

Here's how you'd use this in a program:

use locale; use POSIX 'locale_h'; # the following locale string might be different on your system unless (setlocale(LC_ALL, "fr_CA.ISO8859-1")) {
 die "couldn't set locale to French Canadian\n";
}
while (<DATA>) {
 chomp; if (/^[^\W\d_]+$/) {
 print "$_: alphabetic\n";
}
else {
 print "$_: line noise\n";
}
} __END__ silly faÚade coÃperate niÓo RenÊe MoliÉre hÖmoglobin naÐve tschØñ random!stuff#here

See Also

The treatment of locales in Perl in perllocale (1); your system's locale (3) manpage; we discuss locales in greater depth in ; the "Perl and the POSIX Locale" section of Chapter 7 of Mastering Regular Expressions