A1 Understanding two-level rules

[ Guide contents | Appendix A contents | Next section: A2 Implementing two-level rules as finite state machines ]

A1.1 Generative rules and two-level rules

A1.2 Correspondences and feasible pairs

A1.4 Using character subsets in rules

A1.6 Expressing complex environments

A1.7 Understanding two-level environments

Figure A1 Diagnostic properties of the four rule types

Section A1 describes the form and meaning of two-level rules: how they differ from the rules of generative phonology, their notation, the four types of two-level rules, and the concept of two-level environments.

A1.1 Generative rules and two-level rules

Two-level rules are similar to the rules of standard generative phonology, but differ in several crucial ways. Rule R1 is an example of a generative rule.

R1        t ---> c / ___ i

Rule R2 is the analogous two-level rule.

R2        t:c => ___ i

The difference between the two rule formalisms is not just notational; rather their meanings are different.

Generative rules have three main characteristics. First, they are transformational rules---they transform or rewrite one symbol into another symbol. Rule R1 states that t becomes (is changed into) c when it precedes i. After rule R1 rewrites t as c, t no longer "exists." Second, sequentially applied generative rules convert underlying forms to surface forms via any number of intermediate levels of representation; that is, the application of each rule results in the creation of a new intermediate level of representation. Third, generative rules are unidirectional---they can only convert underlying form to surface form, not vice versa.

In contrast, two-level rules are declarative; they state that certain correspondences hold between a lexical (that is, underlying) form and its surface form. Rule R2 states that lexical t corresponds to surface c before i; it is not changed into c, and it still exists after the rule is applied. Because two-level rules express a correspondence rather than rewrite symbols, they apply in parallel rather than sequentially. Thus no intermediate levels of representation are created as artifacts of a rewriting process. Only the lexical and surface levels are allowed. It is this aspect of their nature that is emphasized by the name "two-level" rules. Furthermore, because the two-level model is defined as a set of correspondences between lexical and surface representation, two-level rules are bidirectional. Given a lexical form, PC-KIMMO will return the surface form. Given a surface form, PC-KIMMO will return the lexical form.

A1.2 Correspondences and feasible pairs

Two-level rules treat each word as a correspondence between its lexical representation (LR) and its surface representation (SR). For example, consider the lexical form tati and its surface form taci:

LR:  t a t i
SR:  t a c i

Each pair of lexical and surface characters is a correspondence pair or simply correspondence. We write a correspondence with the notation lexical-character:surface-character, for instance t:t, a:a, and t:c. There must be an exact one-to-one correspondence between the characters of the lexical form and the characters of the surface form.

There are two types of correspondences exemplified in these forms: default correspondences such as t:t and a:a, and special correspondences such as t:c. The sum of the default and special correspondences makes up the set of feasible pairs sanctioned by the description. In other words, all feasible pairs must be explicitly declared in the description, either as default or as special correspondences.

A1.3 Two-level rule notation

Looking again at rule R2 (from section A1.1), we see that a two-level rule is made up of three parts: the correspondence, the rule operator, and the environment or context. The first part of rule R2 is the correspondence t:c. It specifies a lexical t that corresponds to a surface c.

The second part of rule R2 is the rule operator =>. Although this operator is shaped like an arrow, its meaning is quite different from the rewriting arrow of generative rules (for instance, rule R1). The rule operator specifies the relationship between the correspondence and the environment in which it occurs. There are four operators: =>, <=, <=>, and /<=. The semantics of the rule operators are discussed in section A1.5 in detail, but briefly they mean the following:

=>: the correspondence only occurs in the environment
<=: the correspondence always occurs in the environment
<=>: the correspondence always and only occurs in the environment
/<=: the correspondence never occurs in the environment

The third part of rule R2 is the environment or context, written as ___i. It specifies the phonological environment in which the correspondence is found. As in standard phonological notation, an underline, called an environment line, indicates the position of the correspondence in the environment.

The environment of rule R2 contains a notational shorthand. In its full form the rule is written this way:

R3        t:c => ___ i:i

Rule R3 states that a lexical t corresponds to (or is realized as) a surface c only when it precedes a lexical i that corresponds to (or is realized as) a surface i. If the lexical and surface characters of a correspondence pair are identical, the correspondence can be written as a single character. Thus rule R2 is equivalent to rule R3.

Rule R3 illustrates that environments are also stated in terms of two-level correspondences. We cannot just say that t corresponds to c before i; we must specify whether it is a lexical or surface i. This means that a two-level rule has access to both the lexical and surface environments (see section A1.7 below on two-level environments). In contrast, rules in generative phonology can refer only to the environment of the local level of representation, which is often an intermediate level. To emphasize the two-level nature of rule R3, we can write it on multiple lines:

       t        ___i
           =>
       c        ___i

The environment of a rule can also make use of a special "wildcard" or ANY symbol (written here as @) that stands for any alphabetic character (as qualified below). For example,

R4        t:c => ___ i:@

Rule R4 states that t corresponds to c before any feasible pair whose lexical character is i (that is, a lexical i regardless of how it is realized). In the example above, this could include both the default correspondence i:i and any special correspondences such as i:y. Note carefully that, when used in a rule, the ANY symbol does not really mean any alphabetic character, rather it means any alphabetic character that constitutes a feasible pair with the other character in the correspondence (see section A3.1). As a notational shorthand, a correspondence such as i:@ is simplified to just the lexical character followed by a colon, that is, i:. The ANY symbol can also be used on the lexical side of a correspondence:

R5        t:c => ___ @:i

Rule R5 states that t corresponds to c before any feasible pair whose surface character is i (that is, a surface i regardless of what lexical character it realizes). This notation is simplified to just a colon followed by the surface character, that is, :i. It should be noted that the ANY symbol can only be used in the environment part of a rule, not in the correspondence part.

Another important characteristic of two-level rules is that they require a one-to-one correspondence between the characters of the lexical form and the characters of the surface form. That is, there must be an equal number of characters in both lexical and surface forms, and each lexical character must map to exactly one surface character, and vice versa. Phonological processes that delete or insert characters are expressed in the two-level model as correspondences with the NULL symbol, written here as 0 (zero). In the following forms, a lexical + (morpheme boundary) corresponds to a surface 0 (that is, it is deleted) and a surface '(stress mark) corresponds to a lexical 0 (that is, it is inserted).

LR:  0 t a t + i
SR:  ' t a c 0 i

The NULL symbol is used only internally by the rules; it is not printed in output forms and does not need to be written in input forms. PC-KIMMO will accept the lexical input form tat+i and return the surface output form 'tati. The NULL symbol can be used both in the environment and the correspondence parts of a rule.

Another special symbol is the BOUNDARY symbol, written here as #. It indicates a word boundary, either initial or final. It can be used only in the environment of a rule and can only correspond to another BOUNDARY symbol, that is, #:#.

A1.4 Using character subsets in rules

In generative phonology, classes of characters are referred to using distinctive features; for example, vowels are referred to using the cluster of features [+syllabic, +sonorant, -consonantal]. PC-KIMMO does not support distinctive features. Instead, classes of characters are enumerated in lists that are given single-word names (one or more characters, no spaces).

These character classes are defined in SUBSET statements in the rules file (see section A4.3). For example, the following declarations define C as the set of consonants, V as the set of vowels, S as the set of stops, and NAS as the set of nasals:

   SUBSET C     p t k b d g m n ng s l r w y
   SUBSET V     i e a o u
   SUBSET S     p t k b d g
   SUBSET NAS   m n ng

Suppose that after writing rule R2 above (section A1.1), further data show that all the alveolar obstruents are palatalized before high, front vowels. For example,

LR:  ati  ade  asi  aze
SR:  aci  aje  aSi  aZe

Rather than write a separate rule for each correspondence, we will define subset D for alveolar obstruents, subset P for their palatalized counterparts, and subset Vhf for the high, front vowels:

  SUBSET D     t d s z
  SUBSET P     c j S Z
  SUBSET Vhf   i e

Rule R2 can now be generalized by writing it with subsets:

R6        D:P => ___ Vhf

A1.5 The four rule types

The rule operator specifies the logical relation between the correspondence and the environment of a two-level rule. The rule operators are roughly equivalent to the conditional or implicative operators of formal logic. Rule R7 (rule R2 above) is written with the rule operator =>.

R7       t:c => ___ i

The => operator means "only but not always." Rule R7 states that lexical t corresponds to surface c only preceding i, but not necessarily always in that environment. Thus other realizations of lexical t may be found in that context, including t:t. In logical terms, the => operator means that the correspondence implies the context, but the context does not necessarily imply the correspondence. To state it negatively, rule R7 prohibits the occurrence of the correspondence t:c everywhere except preceding i.

The => rule is roughly equivalent to an optional rule in generative phonology, and is typically used in cases of so-called free variation. Rule R7 would be used if the occurrence of t and c freely varies before i. Given the lexical input form tati and rule R7, the PC-KIMMO generator will produce both surface forms taci and tati.

Rule R8 is the same as rule R7 except that it is written with the rule operator <=.

R8        t:c <= ___ i

The <= operator means "always but not only." Rule R8 states that lexical t always (obligatorily) corresponds to surface c preceding i, but not necessarily only in that environment. Thus t:c is permitted to occur in other contexts. In logical terms, the <= operator means that the context implies the correspondence, but the correspondence does not necessarily imply the context. To state it negatively, if t:¬c (where ¬c means the logical negation of c) means the correspondence of lexical t to surface not-c (that is, anything except c), then rule R8 prohibits the occurrence of t:¬c in the specified context.

The <= rule is roughly equivalent to an obligatory rule in generative phonology. It is used in cases where a correspondence is obligatory in one environment but also occurs in some other environment as specified by another rule. Given the lexical input form tati and rule R8, the PC-KIMMO generator will produce both surface forms taci and caci, unless constrained by some other rule.

Rule R9 is again the same, except that it is written with the rule operator <=>.

R9        t:c <=> ___ i

The <=> operator is the combination of the operators <= and => and means "always and only." Rule R9 states that lexical t corresponds to surface c always and only preceding i. The <=> rule is used when a correspondence obligatorily occurs in a given environment (compare the <= operator) and in no other environment (compare the => operator). It is equivalent to the biconditional logical operator and means that the correspondence is allowed if and only if it is found in the specified context. Given the lexical form tati and rule R9, the PC-KIMMO generator will return only the surface form taci. Thus rule R9 is equivalent to the combination of rules R7 and R8. It is up to the analyst to choose between writing separate <= and => rules or collapsing them into one <=> rule.

Rule R10 is written with the rule operator /<=.

R10        t:c /<= ___ i:ê

The /<= operator means "never." It means that the correspondence specified by the rule is prohibited from occurring in the specified context. A /<= rule is usually used to cover "exceptions" to a more general rule. Rule R10 states that lexical t cannot correspond to surface c preceding i:ê. Given the lexical form tati, rule R10, and a rule sanctioning a i:ê correspondence, the PC-KIMMO generator will allow the surface forms tatê and catê but disallow tacê or cacê. As the operator symbol suggests, the /<= operator is similar to the <= operator in that it does not prohibit the correspondence from occurring in other environments.

Figure A1 summarizes the diagnostic properties of rules R7 through R10. For more on the semantics of the four rule types, see section A3.3.

Figure A1 Diagnostic properties of the four rule types

--------------------------------------------------------------------------
| Rules 7-10   | Is t:c allowed | Is preceding i the    | Must t always  |
|              | preceding i?   | only environment in   | correspond to  |
|              |                | which t:c is allowed? | c before i?    |
|------------------------------------------------------------------------|
| t:c =>  ___i |      yes       |         yes           |      no        |
| t:c <=  ___i |      yes       |         no            |      yes       |
| t:c <=> ___i |      yes       |         yes           |      yes       |
| t:c /<= ___i |      no        |         ___           |      ___       |
--------------------------------------------------------------------------

A1.6 Expressing complex environments

Several notational conventions exist that can be used to build complex environment expressions. These involve optional elements, repeated elements, and alternative elements. As an example we will use a vowel reduction rule, which states that a vowel followed by some number of consonants followed by stress (indicated by ') is reduced to schwa (ê). For example,

LR:  bab'a  bamb'a
SR:  bêb'a  bêmb'a

Parentheses indicate an optional element. Rule R11 requires either one or two consonants.

R11        V:ê <=> ___ C(C)'

Rule R12 requires either zero, one, or two consonants.

R12        V:ê <=> ___ (C)(C)'

An asterisk indicates zero or more instances of an element. (The asterisk functions the same as a Kleene-star in regular expressions.) Rule R13 requires either zero, one, or more consonants.

R13        V:ê <=> ___ C*'

Rule R14 requires one or more consonants.

R14        V:ê <=> ___ CC*'

A correspondence may occur in more than one environment. Consider a rule of vowel lengthening whereby the correspondence a:ä (short and long a) occurs in two distinct environments: when it occurs in the syllable preceding stress (pretonic lengthening) and when it occurs in the stressed syllable (tonic lengthening). For example,

LR:  ladab'ar
SR:  ladäb'är

Rule R15 expresses pretonic lengthening and rule R16 expresses tonic lengthening.

R15        a:ä => ___ C'

R16        a:ä => '___

Note carefully that a description containing these two rules is self-contradictory. Both rules use the => operator, which permits the correspondence to occur only in the specified environment. Rule R15 says that a:ä occurs only in a pretonic syllable; rule R16 says that a:ä occurs only in a tonic syllable. Thus the two rules conflict with each other. This type of rule conflict is called an environment conflict (or a => conflict for short, since it involves => rules) and is discussed more fully in section A3.13. The conflict between rules R15 and R16 can be resolved by collapsing the two rules into one. In rule R17 the vertical bar indicates disjunction between expressions and the square brackets delimit the disjunctive expressions from the rest of the environment (which in this case is empty).

R17        a:ä => [ ___ C'| '___ ]

Rule R17 now says, correctly, that the a:ä correspondence is allowed only in either pretonic or tonic position.

Now consider rules R18 and R19, which are the same as rules R15 and R16 except that they are written with the <= operator.

R18        a:ä <= ___ C'

R19        a:ä <= '___

The <= operator means that the correspondence occurs always (obligatorily) in the environment but not only there. Rules R18 and R19 do not conflict with each other, and so do not have to be collapsed into a single rule. However, if the analyst so chooses, they can be collapsed into rule R20.

R20        a:ä <= [ ___ C'| '___ ]

Given the meanings of the rule types as explained in section A1.5, we have the following choices for writing rules for lengthening. If vowel lengthening occurs only but not always in the specified environments, we must use rule R17. If vowel lengthening occurs always but not only in the specified environments, we must use rule R20 (or rules R18 and R19). And if vowel lengthening occurs always and only in the specified environment, we must use both rules R17 and R20. If the last case is true, the analyst also has the option of collapsing rules R17 and R20 into one <=> rule, rule R21.

R21        a:ä <=> [ ___ C'| '___ ]

Rule R21 is an example of inclusive disjunction; that is, the correspondence is found in either environment (or both) in the same input word. In standard generative phonology, the two subparts of a rule of this type must be implicitly ordered with the convention that if one of the subparts of the rule applies, the rest of the subparts are not skipped but also apply (this is called conjunctive ordering in Schane 1973:90). In the two-level model, rule ordering is both unavailable and unnecessary, since all rule environments are available simultaneously.

As an example of an exclusive disjunction, consider the situation where the vowel of the ultimate (final) syllable of a word is lengthened unless it is schwa, in which case the vowel of the penultimate (next to final) syllable is lengthened. For example,

LR:  maman  mamanê
SR:  mamän  mamänê

Assume these subsets, where V is the set of short vowels and Vlng is the set of their lengthened counterparts (but ê is not lengthened):

  SUBSET V       i a u ê
  SUBSET Vlng    ï ä ü

Rules R22 and R23 (where # represents word boundary) demonstrate a => conflict (see above and section A3.13 on rule conflicts) and must be collapsed.

R22        V:Vlng => ___ C*ê#

R23        V:Vlng => ___ C*#

In the example above on tonic and pretonic lengthening (rules R15 to R21), we collapsed the rules with the vertical bar notation, allowing lengthening to occur in either or both of the environments. This is what we wanted, since tonic lengthening and pretonic lengthening are separate phonological processes and both are possible in the same word. But in the present example we are dealing with just one lengthening process, though with two alternative environments. We want lengthening to occur in one or the other of the environments but not both in the same word. Rules R22 and R23 can then be collapsed as rule R24 by using the parenthesis notation.

 R24       V:Vlng => ___ C*(ê)#

A word-final schwa will not be lengthened by this rule because, even though lexical schwa belongs to the V subset, no correspondence to a surface long schwa has been declared (see section A3.1 for details on subsets and declaring feasible pairs).

A1.7 Understanding two-level environments

One of the defining features of two-level rules is that they can refer to both lexical and surface environments. (N.B.: This section is based on Dalrymple and others 1987:19-22.) This makes it possible for a two-level description to handle many phenomena that would require sequentially ordered rules in standard generative phonology. For example, consider these two rules:

Nasal Assimilation:: the nasal character N (unspecified for
Stop Voicing:: a voiceless stop is voiced after a nasal.

These rules relate the lexical sequences Np, Nt, and Nk to the surface sequences mb, nd, and ngg, respectively. To account for these correspondences, generative phonology would require two rules (the following rules account only for labials):

           Nasal Assimilation
R25        N ---> m / ___ p

           Stop Voicing
R26        p ---> b / m ___

The rules must apply in this order, since if rule R26 were applied first it would destroy the environment needed for rule R25. The two-level versions of these rules do not need to be ordered; in effect they apply simultaneously:

           Nasal Assimilation
R27        N:m <=> ___ p:

           Stop Voicing
R28        p:b <=> :m ___

These rules work because of the careful specification of lexical and surface environments. Rule R27 says that a lexical N is realized as a surface m preceding a lexical p. In this context the notation p: (equivalent to p:@) stands for the correspondences p:p (by default) and p:b (from rule R28). Rule R28 says that a lexical p is realized as a surface b following a surface m. The notation :m (equivalent to @:m) stands for the correspondences m:m (by default) and N:m (from rule R27). Because the two-level model allows rule environments to have access to both the lexical and surface levels, rule ordering and intermediate levels are not needed.

A common error in writing two-level rule environments is to overspecify the environment. Consider this overspecified version of rule R27:

            Overspecified version of Nasal Assimilation
R27a        N:m <=> ___ p

Even though the symbol p seems simpler than p:, it actually is more specific, as it stands for the correspondence p:p only. Rule R27a is now in conflict with the voicing rule (R28), which says that after a surface m only the correspondence p:b can occur. Conversely, the correspondence p:b of rule R28 is in conflict with rule R27a, which allows only p:p to occur after a surface m.

Now consider another incorrectly specified version of the Nasal Assimilation rule (R27):

            Overspecified version of Nasal Assimilation
R27b        N:m <=> ___ p:b

At first this version seems correct, since it is precisely in the environment of p:b that we want the rule to apply. The problem is that rule R27b does not require N to be realized as m preceding a lexical p that is realized as anything other than surface b, and the Voicing rule (R28) does not require p to be realized as b except when it follows a surface m. Assuming that otherwise lexical N corresponds to surface n, the lexical form Np will be realized as both np and mb. Thus overspecification does not always result in a rule conflict; it may also result in overgeneration. (See also section A3.12 on two-level environments.)

[ Guide contents | Appendix A contents | Next section: A2 Implementing two-level rules as finite state machines ]