Patterns and Pattern Matching

We'll continue refining our solution to Task 4-1 later in this chapter. The next type of string operator is used to match portions of a variable's string value against patterns. Patterns, as we saw in Chapter 1, are strings that can contain wildcard characters (*, ?, and [] for character sets and ranges).

Table 4-2 lists bash's pattern-matching operators.

广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元

Table 4-2. Pattern-matching operators

Operator

Meaning

${variable #pattern}

If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest.

${variable ##pattern}

If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest.

${variable %pattern}

If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest.

${variable %%pattern}

If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest.

${variable/ pattern/ string}${variable// pattern/ string}

The longest match to pattern in variable is replaced by string. In the first form, only the first match is replaced. In the second form, all matches are replaced. If the pattern begins with a #, it must match at the start of the variable. If it begins with a %, it must match with the end of the variable. If string is null, the matches are deleted. If variable is @ or *, the operation is applied to each positional parameter in turn and the expansion is the resultant list.[6]

[6] The pattern-matching and replacement operator is not available in versions of bash prior to 2.0.

These can be hard to remember; here's a handy mnemonic device: # matches the front because number signs precede numbers; % matches the rear because percent signs follow numbers.

The classic use for pattern-matching operators is in stripping off components of pathnames, such as directory prefixes and filename suffixes. With that in mind, here is an example that shows how all of the operators work. Assume that the variable path has the value /home/cam/book/long.file.name; then:

Expression                   Result
${path##/*/}                      long.file.name
${path#/*/}              cam/book/long.file.name
$path              /home/cam/book/long.file.name
${path%.*}         /home/cam/book/long.file
${path%%.*}        /home/cam/book/long

The two patterns used here are /*/, which matches anything between two slashes, and .*, which matches a dot followed by anything.

The longest and shortest pattern-matching operators produce the same output unless they are used with the * wildcard operator. As an example, if filename had the value alicece, then both ${filename%ce} and ${filename%%ce} would produce the result alice. This is because ce is an exact match; for a match to occur, the string ce must appear on the end $filename. Both the short and long matches will then match the last grouping of ce and delete it. If, however, we had used the * wildcard, then ${filename%ce*} would produce alice because it matches the shortest occurrence of ce followed by anything else. ${filename%%ce*} would return ali because it matches the longest occurrence of ce followed by anything else; in this case the first and second ce.

The next task will incorporate one of these pattern-matching operators.


Task 4-2

You are writing a graphics file conversion utility for use in creating a web page. You want to be able to take a PCX file and convert it to a JPEG file for use on the web page.[7]


Graphics file conversion utilities are quite common because of the plethora of different graphics formats and file types. They allow you to specify an input file, usually from a range of different formats, and convert it to an output file of a different format. In this case, we want to take a PCX file, which can't be displayed with a web browser, and convert it to a JPEG which can be displayed by nearly all browsers. Part of this process is taking the filename of the PCX file, which ends in .pcx, and changing it to one ending in .jpg for the output file. In essence, you want to take the original filename and strip off the .pcx, then append .jpg. A single shell statement will do this:

outfile=${filename%.pcx}.jpg

The shell takes the filename and looks for .pcx on the end of the string. If it is found, .pcx is stripped off and the rest of the string is returned. For example, if filename had the value alice.pcx, the expression ${filename%.pcx} would return alice. The .jpg is appended to form the desired alice.jpg, which is then stored in the variable outfile.

If filename had an inappropriate value (without the .pcx) such as alice.xpm, the above expression would evaluate to alice.xpm.jpg: since there was no match, nothing is deleted from the value of filename, and .jpg is appended anyway. Note, however, that if filename contained more than one dot (e.g., if it were alice.1.pcx—the expression would still produce the desired value alice.1.jpg).

The next task uses the longest pattern-matching operator.


Task 4-3

You are implementing a filter that prepares a text file for printer output. You want to put the file's name—without any directory prefix—on the "banner" page. Assume that, in your script, you have the pathname of the file to be printed stored in the variable pathname.


Clearly, the objective is to remove the directory prefix from the pathname. The following line will do it:

bannername=${pathname##*/}

This solution is similar to the first line in the examples shown before. If pathname were just a filename, the pattern */ (anything followed by a slash) would not match and the value of the expression would be pathname untouched. If pathname were something like book/wonderland, the prefix book/ would match the pattern and be deleted, leaving just wonderland as the expression's value. The same thing would happen if pathname were something like /home/cam/ book/wonderland: since the ## deletes the longest match, it deletes the entire /home/cam/book/.

If we used #*/ instead of ##*/, the expression would have the incorrect value home/cam/book/wonderland, because the shortest instance of "anything followed by a slash" at the beginning of the string is just a slash (/).

The construct ${ variable ##*/} is actually equivalent to the UNIX utility basename. basename takes a pathname as argument and returns the filename only; it is meant to be used with the shell's command substitution mechanism (see the following explanation). basename is less efficient than ${ variable ##*/} because it runs in its own separate process rather than within the shell. Another utility, dirname, does essentially the opposite of basename: it returns the directory prefix only. It is equivalent to the bash expression ${ variable %/*} and is less efficient for the same reason.

The last operator in the table matches patterns and performs substitutions. Task 4-4 is a simple task where it comes in useful.


Task 4-4

The directories in PATH can be hard to distinguish when printed out as one line with colon delimiters. You want a simple way to display them, one to a line.


As directory names are separated by colons, the easiest way would be to replace each colon with a LINEFEED:

$ echo -e ${PATH//:/'\n'}
/home/cam/bin
/usr/local/bin
/bin
/usr/bin
/usr/X11R6/bin

Each occurrence of the colon is replaced by \n. As we saw earlier, the -e option allows echo to interpret \n as a LINEFEED. In this case we used the second of the two substitution forms. If we'd used the first form, only the first colon would have been replaced with a \n.