[Santa Clara University]
Department of Mathematics
and Computer Science
[Return to Math 169 Homepage]

Math 169 Notes -- SNOBOL


Contents


A. "Hello World!" in SNOBOL

The well-known sample program can be written in SNOBOL as follows:

	       OUTPUT = "Hello world!"
	  END

B. History

SNOBOL and its most recent version, SNOBOL4, were designed to allow easy manipulation of character strings. SNOBOL was developed at Bell Labs between 1962 and 1964 and SNOBOL4 released in 1967. SNOBOL introduced the concept of pattern-matching as a fundamental "operation" and included operations and functions specifically geared to string manipulation.

Some of the features of SNOBOL include user-defined functions and data types, automatic data-type conversion, heterogeneous arrays (i.e., arrays with different data types), "table" data type with associative lookup for access (i.e., an "array" with arbitrary, non-numeric subscripts).

C. Pattern Matching

A fundamental "operation" (action) in SNOBOL is pattern matching, that is finding (or not finding) a "pattern" string within a "subject" string.

If this operation succeeds, one may wish to perform a replacement within the subject string, or perform some type of action. If this operation does not succeed ("fails"), one may wish to perform an alternative action.

The ability to detect success or failure of pattern matchings is fundamental to SNOBOL. Pattern matching can also be combined with replacement of the pattern detected with an assignment statement in the same line.

Pattern matching is accomplished by putting a subject string (or variable) on a statement line, followed by a pattern string.

The actual "matching" command is indicated by a blank space. One can indicate a branch label for each statement line by including a colon followed by an S with a branch label in parentheses to which control is transferred on a success or an F with a branch label to which control is transferred on a failure.

Thus, the following code will replace X's with A's and Y's with B's in an input string:

                 TEXT = INPUT
        ONE	 TEXT 'X' = 'A'          :S(ONE)
	TWO	 TEXT 'Y' = 'B'          :S(TWO)
	         OUTPUT = TEXT
        END
Patterns can consist of options (indicated by the "pipe" (vertical bar) with spaces on either side), and concatenation of several subpatterns is achieved (also) by the blank space "operator." One can also use the dot operator to "capture" a final value. Thus,
                 X = 'BREAD AND BUTTER'
		 PAT = (('B' | 'R') ('E' | 'EA') ('D' | 'DS')) . Y
		 X PAT
		 OUTPUT = Y
	END
results in Y getting the value of READ.

D. Functions

Besides some of the common functions associated with character strings (e.g., LGT for "lexically greater than"), SNOBOL introduced certain functions, many of which have carried over to ICON and are specifically associated with character strings.

TRIM(X) -- removes all trailing blanks from input string X
SIZE(X) -- returns the length of input string X
TAB(n) -- matches any substring up to an including position n (starting at the current position)
LEN(n) -- matches any pattern of length n
ARB -- matches any pattern of arbitrary length
ANY(X) -- matches any single character contained in the input string X considered as a set of characters
NOTANY(X) -- matches any single character NOT contained in the input string X considered as a set of characters
SPAN(X) -- matches the longest substring (of the subject string) whose characters are in X
BREAK(X) -- matches the longest substring (of the subject string) whose characters are NOT in X
POS(n) -- matches the null string if the matching cursor is at position n
REM -- matches the remainder of the string starting at the current cursor position.

The following is an example making use of some of these functions to capture sections of a subject string.

        CARD = 'DENNIS SMOLARSKI 554-4124'
	X = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
	PAT = BREAK(X) SPAN(X) . FIRST SPAN(' ') SPAN(X) . LAST SPAN(' ') REM . TEL
	CARD PAT
	OUTPUT = "FIRSTNAME=" FIRST
	OUTPUT = "LASTNAME=" LAST
	OUTPUT = "TELEPHONE=" TEL
  END
The pattern string PAT indicates that the interpreter stop ("breaks") at the first element in the set X (i.e., a letter of the alphabet), then moves over ("spans") the largest string consisting of elements in set X and capturing this value in FIRST, then spanning over blanks, then spanning over a character string and capturing it in LAST, then spanning blanks, then capturing the REMainder of the subject string in TEL.

E. Sample Program

The well-known interactive "counseling" program, Eliza, is shown here written in SNOBOL.


This page is maintained by Dennis C. Smolarski, S.J. dsmolarski@math.scu.edu
Last updated: 14 May 2002.