The following subsections document each command, arranged by function, of the PC-KIMMO system. Square brackets in the command line summaries indicate optional elements. The notation {x | y} means either x or y (but not both). Command keywords and arguments in boldface are typed literally; for instance, the command summary set tracing {on | off} means to type either set tracing on or set tracing off. Command arguments in italics are replaced by elements of the specified type; for instance, the command summary show rule number means to replace number with a rule number, such as show rule 3.
Displays a list of command names.
help [command-name]
Issuing the help command with no argument displays a list of commands with a brief description of their function. Issuing the help command with the name of a specific command displays a usage summary for the command.
command-name ?
Typing a command name followed by ?, instead of a keyword, displays a message listing the keywords expected for that command.
load rules [filespec]
The load rules loads a set of rules from the file specified on the command line. The filespec can contain a path; for example, B:\ENGLISH\ENGLISH.RUL is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). The default file name extension is .RUL; thus, the command load rules english will load the file ENGLISH.RUL. If no file name is given, the default file name RULES.RUL is used. The rules file must be in the format described later in this chapter (see section 4.7.1).
An error in the format of the rules file will cause the program to stop loading the file, erase the rules already loaded, and report an error message with the line number where the error was encountered. Refer to section 4.10.2 on error messages for more details.
The rules file must be loaded before the lexicon and before performing any generation or recognition operations.
The load rules command can also be invoked by using the -r command line option when starting up PC-KIMMO (see section 4.3.6).
load lexicon [filespec]
The load lexicon command loads a lexicon from the file specified in the command line. The filespec can contain a path; for example, B:\ENGLISH\ENGLISH.LEX is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). The default file name extension is .LEX; thus, the command load lexicon english will load the file ENGLISH.LEX. If no file name is given, the default file name LEXICON.LEX is used. The lexicon file must be in the format described later in this chapter (see section 4.7.2).
An error in the format of the lexicon file will cause the program to stop loading the file, erase the parts of the lexicon already loaded, and report an error message with the line number where the error was encountered. Refer to section 4.10.3 on error messages for more details.
The rules file must be loaded before the lexicon. The lexicon file must be loaded before using the recognizer function. The generator function can be used without loading a lexicon and indeed makes no use of a lexicon.
The load lexicon command can also be invoked by using the -l command line option when starting up PC-KIMMO (see section 4.3.6).
load synthesis-lexicon [filespec]
The load synthesis-lexicon command loads a synthesis-lexicon from the file specified in the command line. The filespec can contain a path; for example, B:\ENGLISH\ENGLISH.LEX is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). The default file name extension is .LEX; thus, the command load synthesis-lexicon english will load the file ENGLISH.LEX. If no file name is given, the default file name LEXICON.LEX is used. The synthesis-lexicon file must be in the format described later in this chapter (see section 4.7.2).
An error in the format of the synthesis-lexicon file will cause the program to stop loading the file, erase the parts of the synthesis-lexicon already loaded, and report an error message with the line number where the error was encountered. Refer to section 4.10.3 on error messages for more details.
The rules file must be loaded before the synthesis-lexicon. The synthesis-lexicon file must be loaded before using the synthesizer function. The generator function can be used without loading a synthesis-lexicon and indeed makes no use of a synthesis-lexicon.
The load synthesis-lexicon command can also be invoked by using the -l command line option when starting up PC-KIMMO (see section 4.3.6).
load grammar [filespec]
The load grammar command loads a grammar from the file specified in the command line. The filespec can contain a path; for example, B:\ENGLISH\ENGLISH.RUL is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). The default file name extension is .GRM; thus, the command load grammar english will load the file ENGLISH.GRM. If no file name is given, the default file name GRAMMAR.GRM is used. The grammar file must be in the format described later in this chapter (see section 4.7.3).
An error in the format of the grammar file will cause the program to stop loading the file, erase the parts of the grammar already loaded, and report an error message with the line number where the error was encountered. Refer to section 4.10.4 on error messages for more details.
The rules and lexicon files must be loaded before the grammar. Use of a grammar file is optional, even when using the recognizer function. The generator function can be used without loading a grammar and indeed makes no use of a grammar file.
The load grammar command can also be invoked by using the -g command line option when starting up PC-KIMMO (see section 4.3.6).
The clear command erases from memory the rules, lexicon, synthesis-lexicon, and grammar currently loaded. Strictly speaking it is not needed, since the load rules command erases all existing rules, the load lexicon command erases any existing lexicon, the load synthesis-lexicon command erases any existing synthesis-lexicon, and the load grammar command erases any existing grammar.
The take command causes PC-KIMMO to read and execute commands from a file. The filespec can contain a path; for example, B:\KIMMO\ENGLISH.TAK is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). The take command recognizes the default file name PCKIMMO.TAK and the default file extension .TAK. The command file can itself issue the take command to call another command file down to a depth of three files. That is, the user can specify a command file <file1> that contains the command take <file2>, that itself contains the command take <file3>. It would be an error for <file3> to contain a take command.
On start-up, PC-KIMMO automatically tries to load default settings and commands from a "take" file named PCKIMMO.TAK or PC-KIMMO.TAK.
A command file can also be specified by using the -t command line option when starting up PC-KIMMO (see section 4.3.6). Note that a command file cannot submit forms to the special generator and recognizer prompts (see sections 4.5.10 and 4.5.11).
list pairs
The list pairs command displays on the screen the set of feasible pairs specified by the set of rules currently turned on.
list rules
The list rules command displays on the screen the current state of the rules that are loaded. The display consists of each rule by number, an indication of whether the rule is on or off, and the rule name from the header lines of its state table in the rules file.
list lexicon
The list lexicon command displays on the screen the names of the sublexicons of the lexicon currently in use.
4.5.6.1 General options
set alignment {on | off}
The set alignment command turns alignment display mode on or off. If alignment mode is on, then the results of the recognizir function are displayed on the screen in a vertically aligned format: the first (top line) displays the lexical form of each morpheme; the second line displays sublexicon names; and the third line displays glosses. If alignment mode is off, the results are displayed the usual way. The default setting is off.
set limit {on | off}
The set limit command limits the result of a generation or recognition function to one form. That is, if limit is set off, then PC-KIMMO backtracks after finding a correct result so that it can find every possible result. With limit set on, after finding one correct result form PC-KIMMO does not backtrack to try to find more results. The default setting is off.
set rules {on | off} {list-of-numbers | all}
The set rules command allows you to turn selected rules on or off for testing or debugging purposes. When a rule is turned off, it is completely ignored in the recognition or generation of forms. One effect of this is to cause the recalculation of feasible pairs, considering only the rules which remain on. Use the list pairs command to see the set of feasible pairs currently in use.
On the command line, you can specify the action on or off followed by a list of rule numbers or the keyword all (in which case all rules are turned on or off). Specific rules are turned on or off by listing their rule numbers (shown by the list rules command), each separated by a space.
set timing {on | off}
The set timing command uses the computer's system clock to time the execution of generation and recognition operations. It displays the result as the number of seconds the operation lasted. It applies to these commands: generate, recognize, file compare generate, file compare recognize, file compare pairs, file generate, and file recognize. The default setting is off.
set tracing {on | off | level}
The set tracing command allows you turn the tracing mechanism on or off. When tracing is on, details of the analysis of a form are displayed on the screen during generation or recognition operations. If logging (see section 4.5.7) is on, the trace will also be written to the log file. Tracing is operative for these commands: generate, recognize, file compare generate, file compare recognize, file compare pairs, file generate, and file recognize. The default setting is off.
The amount of detail shown in the trace display is set by the tracing level. The level argument to the set tracing command can range from 0 to 3, where 0 is no tracing at all and 3 is the most detailed level of tracing. Issuing the command set tracing off sets tracing to level 0. Issuing the command set tracing on sets tracing to level 2. At level 1, no information is given as to which feasible pair is being tried or the condition of the rules (that is, what state each automaton is in). Both the generator and recognizer report each RESULT line, with all NULL symbols being explicitly printed. The recognizer also displays lexicon information; that is, it reports which sublexicon is being entered or backed out of. At level 2, the feasible pairs being tried and the state of each rule (automaton) is displayed. The recognizer displays lexicon information as it does at level 1. At level 3, more detailed information is given on which feasible pairs are being tried and the state of each rule. For more information on the format of the trace display, see section 4.5.8 on trace formats.
set verbose {on | off}
The set verbose command affects the amount of information displayed on the screen during a file comparison operation (either generate, recognize, or pairs, see section 4.5.12). If verbose is set off, a file comparison operation displays only a dot for each form correctly analyzed, though any exceptional results will cause the complete form and warning messages to be displayed. If verbose is set on, a file comparison operation displays the complete contents of the file (minus comments) plus confirmation and warning messages. The default setting is off.
set warnings {on | off}
The set warnings command turns warning mode on or off. If warning mode is on, then any warning messages that occur while loading a file or while processing a form are displayed on the screen. If warning mode is off, then no warning messages are displayed. The default setting is on. (See section 4.10 for an explanation of the difference between errors and warnings.)
4.5.6.2 Options related to use of a word grammar
set ambiguities number
The set ambiguities command limits the number of analyses produced by the word grammar to the specified number. The default setting is 10. Note that this command assumes that a word grammar is loaded (see section 4.5.2) and that the grammar option is set to on (see section 4.5.6.2).
set failures {on | off}
The set failures command turns grammar failure mode on or off. When grammar failure mode is on, the partial results of forms that fail the word grammar are displayed. A form may fail the word grammar either by failing the feature constraints or by failing the constituent structure rules. In the latter case, a partial tree (bush) will be displayed. When grammar failure mode is off, forms that fail the word grammar are filtered out and no results for them are displayed. The default setting is off. Note that this command assumes that a word grammar is loaded (see section 4.5.2) and that the grammar option is set to on (see section 4.5.6.2).
set features {top | all | off}
set features {full | flat}
The command set features controls the display of feature structures returned by the recognizer when a word grammar is used. When features is set to top, the feature structure for only the top node of the tree is displayed. When features is set to all, the feature structures for all nodes of the tree. are displayed. And when features is set to off, no feature structures are displayed. The default setting is top.
When features is set to full, the feature structures are displayed in a vertical, indented format; when features is set to flat, the feature structures are displayed as a linear string. The default setting is full.
Note that this command assumes that a word grammar is loaded (see section 4.5.2) and that the grammar option is set to on (see section 4.5.6.2).
set grammar {on | off}
The command set grammar turns the word grammar on or off. When grammar is on, then results from the lexicon are passed to the word grammar for parsing. When grammar is off, then the results from the lexicon are displayed without using the word grammar. The default setting is off, but is automatically turned on when a grammar is loaded.
set tree {full | flat | indented | off}
The command set tree controls the display of the tree structure returned by the recognizer when a word grammar is used. When tree is set to full, a full branching tree is displayed. When tree is set to flat, a linear bracketed string is displayed.When tree is set to indented, a north-west oriented indented tree is displayed. And when tree is set to off, no tree structure is displayed. The default setting is full. Note that this command assumes that a word grammar is loaded (see section 4.5.2) and that the grammar option is set to on (see section 4.5.6.2).
set trim-empty-features {on | off}
The command set trim-empty-features controls the display of features that have empty values. When trim-empty-features is set to on, empty features are not displayed. When trim-empty-features is set to off, empty features are displayed. The default setting is on.
set unification {on | off}
The command set unification turns feature unification in the word grammar on or off. When unification is on, any feature constraints used in conjunction with the grammar rules are used as expected. When unification is off, the feature constraints are ignored and only the constitutent structure rules are used. The default setting is on. Note that this command assumes that a word grammar is loaded (see section 4.5.2) and that the grammar option is set to on (see section 4.5.6.2).
4.5.6.3 Save settings
save [filespec]
The save command writes the current setting to a "take" file named filespec. If filespec is not specified, the settings are written to a file named PCKIMMO.TAK in the current directory. On start-up, PC-KIMMO automatically tries to load default settings from PCKIMMO.TAK (or PC-KIMMO.TAK).
log [filespec]
The log command turns the logging mechanism on. When logging is on, the information displayed on the screen during execution of generation or recognition operations is also written to a disk file whose name is specified in the command line. The filespec can contain a path; for example, B:\ENGLISH\ENGLISH.LOG is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). If no file name is given, a log file named PCKIMMO.LOG is written to the default directory. If a file name with no extension is given, a log file with the extension .LOG is written to the default directory. If a log command is given when a log file is already open, then the open log file is closed before the new log file is created. Logging records the processing of these commands: generate}, recognize, file compare generate, file compare recognize, file compare pairs, file generate, and file recognize. Tracing displays are also recorded in a log file.
close
The close command turns logging off and closes the log file.
status
The status command displays the names of the rules, lexicon and grammar files currently loaded, the name of the log file (if logging is on), the comment delimiter character, and the status of the processing options controlled by the set command. It can also be invoked with the synonyms show status or show.
The show rule command first displays the number, on/off status, and name of the rule (similar to the list rules command). If the rule is turned on, it then displays each column header of the state table for that rule with the set of feasible pairs that it specifies. This command is used primarily for debugging purposes.
show lexicon sublexicon-name
The show lexicon command displays the contents of a sublexicon. It shows each lexical item, its gloss, and its continuation class. If the continuation class of a lexical entry names an alternation, the alternation is expanded into a list of sublexicon names. Note that this command displays the parts of the lexical entry in the following order (rather than the order in which they appear in the lexicon file): lexical item, gloss, continuation class.
The generate command accepts as input a lexical form and returns one or more surface forms. If no lexical form argument is given, PC-KIMMO supplies a special generator prompt where forms can be typed in directly without the generate keyword. Entering a blank line at the generator prompt returns the program to the main command line prompt.
The recognize command accepts as input a surface form and returns one or more lexical forms. If no surface form argument is given, PC-KIMMO supplies a special recognizer prompt where forms can be typed in directly without the recognize keyword. Entering a blank line at the recognizer prompt returns the program to the main command line prompt.
The synthesize command accepts as input a morphological form (a sequence of morpheme glosses separated by spaces) and returns one or more surface forms. If no morphological form argument is given, PC-KIMMO supplies a special synthesizer prompt where forms can be typed in directly without the synthesize keyword. Entering a blank line at the synthesizer prompt returns the program to the main command line prompt.
[file] compare generate [filespec]
The compare generate command reads lexical forms from a file, submits them to the generator for analysis, and compares the resulting surface form(s) with the expected results listed in the file. The filespec can contain a path; for example, B:\ENGLISH\ENGLISH.GEN is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). A generation comparison file has the default extension .GEN and the default file name DATA.GEN. The format of the generation comparison file is described in section 4.7.4. If a word grammar is in use, the tree option and the features option must both be set off.
Results of the comparison are reported according to the setting of the verbose option (see the set verbose command described in section 4.5.6.1). If verbose is set off, only exceptions (that is, actual results from the generator that are different from the expected results as specified in the file) are reported. A dot is displayed on the screen as each input (lexical) form is processed. If verbose is set on, each group of lexical and surface forms in the file is displayed, either with an error message for wrong comparisons or the message OK if the actual and expected results match exactly.
[file] compare recognize [filespec]
The compare recognize command reads surface forms from a file, submits them to the recognizer for analysis, and compares the resulting lexical form(s) with the expected results specified in the file. The filespec can contain a path; for example, B:\ENGLISH\ENGLISH.REC is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). A recognition comparison file has the default extension .REC and the default file name DATA.REC. The format of the recognition comparison file is described in section 4.7.5. If a word grammar is in use, the tree option and the features option must both be set off.
Results of the comparison are reported according to the setting of the verbose option (see the set verbose command described in section 4.5.6.1). If verbose is set off, only exceptions (that is, actual results from the recognizer that are different from the expected results as specified in the file) are reported. A dot is displayed on the screen as each input (surface) form is processed. If verbose is set on, each group of surface and lexical forms in the file is displayed, either with an error message for wrong comparisons or the message OK if the actual and expected results compared identically.
[file] compare pairs [filespec]
The compare pairs command allows lexical:surface pairs of forms listed in the file specified on the command line to be compared in both directions. The filespec can contain a path; for example, B:\ENGLISH\ENGLISH.PAI is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). A pairs comparison file has the default extension .PAI and the default file name DATA.PAI. The format of the pairs comparison file is described in section 4.7.6. If a word grammar is in use, the tree option and the features option must both be set off.
PC-KIMMO considers each pair of forms (a lexical form followed by its surface form). The lexical form is input to the generator to produce one or more surface forms. The surface form listed in the file is compared with the generated surface forms to see if there is a successful match. The surface form listed in the file is then input to the recognizer to produce one or more lexical forms. The lexical form listed in the file is compared with the recognized lexical forms to see if there is a successful match.
Results of the comparison are reported according to the setting of the verbose option (see the set verbose command described in section 4.5.6.1). If verbose is set off, only exceptions (that is, one of the comparisons failed) are reported. A dot is displayed on the screen as each pair of forms is processed. If verbose is set on, each pair of lexical and surface forms in the file is displayed, either with an error message for wrong comparisons or the message OK if the forms match exactly.
[file] compare synthesize [filespec]
The compare synthesize command reads morphological forms from a file, submits them to the synthesizer for analysis, and compares the resulting surface form(s) with the expected results listed in the file. The filespec can contain a path; for example, B:\ENGLISH\ENGLISH.SYN is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). A generation comparison file has the default extension .SYN and the default file name DATA.SYN. The format of the synthesis comparison file is described in section 4.7.6A. A morphological form is a sequence of morpheme glosses separated by spaces. Results of the comparison are reported according to the setting of the verbose option (see the set verbose command described in section 4.5.6.1). If verbose is set off, only exceptions (that is, actual results from the synthesizer that are different from the expected results as specified in the file) are reported. A dot is displayed on the screen as each input (morphological) form is processed. If verbose is set on, each group of morphological and surface forms in the file is displayed, either with an error message for wrong comparisons or the message OK if the actual and expected results match exactly.
The file generate command reads lexical forms from a file, submits them to the generator for analysis, and returns each lexical form followed by the resulting surface form(s). The format of the generation input file is described in section 4.7.7.
If an output-filespec argument is specified, the results are written to that file; otherwise, the results are displayed on the screen. The format of the output file created by this command is identical to a comparison generation file. The filespec of either file can contain a path; for example, B:\ENGLISH\ENGLISH.FG is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). The command does not recognize any default file names or extensions.
The verbose option (see the set verbose command described in section 4.5.6.1) has no effect on the file generate command.
The file recognize command reads surface forms from a file, submits them to the recognizer for analysis, and returns each surface form followed by the resulting lexical form(s). The format of the recognition input file is described in section 4.7.8. If an output-filespec argument is specified, the results are written to that file; otherwise the results are displayed on the screen. The format of the output file created by this command is identical to a comparison recognition file. The filespec of either file can contain a path; for example, B:\ENGLISH\ENGLISH.FR is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). The command does not recognize any default file names or extensions.
The verbose option (see the set verbose command described in section 4.5.6.1) has no effect on the file recognize command.
The file synthesize command reads morphological forms from a file, submits them to the synthesizer for analysis, and returns each morphological form followed by the resulting surface form(s). The format of the synthesis input file is described in section 4.7.8A. A morphological form is a sequence of morpheme glosses separated by spaces.
If an output-filespec argument is specified, the results are written to that file; otherwise, the results are displayed on the screen. The filespec of either file can contain a path; for example, B:\ENGLISH\ENGLISH.FS is a fully specified path under MS-DOS (see section 4.5.18 for information on Macintosh directory paths). The command does not recognize any default file names or extensions.
The verbose option (see the set verbose command described in section 4.5.6.1) has no effect on the file synthesize command.
The system command allows you to execute an operating system command from within PC-KIMMO. It is available only for the MS-DOS and UNIX versions of PC-KIMMO. For example, on an IBM PC-compatible computer, the command system dir will execute the DOS directory command. If no command argument is given, then PC-KIMMO is pushed into the background and a new system command processor shell is started. While you are in the shell, you can execute any commands or programs. To leave the shell and return to PC-KIMMO, type exit. On an IBM PC-compatible computer, the system command will not work unless a copy of the DOS system file COMMAND.COM is available.
The system command has the alias ! (exclamation point), which does not require a space between it and the following command. For example, !dir performs the DOS directory command.
The edit command attempts to edit a file using the editing program specified by the operating system environment variable EDITOR. It is available only for the MS-DOS and UNIX versions of PC-KIMMO. If this environment variable is not defined, then the command will try to use EDLIN (on a DOS machine) or vi (on a UNIX machine) to edit the file. To set the environment variable, include a line such as this in your AUTOEXEC.BAT file: SET EDITOR=filespec
where filespec specifies the path and full file name of your editing program, for example, \UTIL\EMACS.EXE. You can use the edit command, for example, to invoke your text editor and modify the rules or lexicon files. After saving the files and leaving the editor, you must load the files again in order for PC-KIMMO to utilize the changes.
The exit command causes PC-KIMMO to exit back to the operating system.
quit
The command quit is the same as exit.
PC-KIMMO considers the directory (folder) where the program file resides to be the current or default directory. While running PC-KIMMO, it is often necessary to access files in other directories. The Macintosh version of PC-KIMMO has a special command for changing directories called CD. Its command syntax is CD pathname, where pathname is a concatenation of volume and directory names. The Macintosh uses the colon as a separator character between directory names in much the same way that MS-DOS uses a \ or backslash character. There are two types of pathnames: full pathnames and partial pathnames. A full pathname starts with the name of the root directory (or volume name). For example, a full pathname to the English directory might look like this:
MyDisk:PC-KIMMO:EnglishA partial pathname starts with the name of a directory whose position in the directory tree must be specified relative to the current directory. A partial pathname always starts with a colon (whereas a full pathname never does). For example, if the current directory is MyDisk, the partial pathname to specify the English directory is:
:PC-KIMMO:EnglishThus if we are running PC-KIMMO while in the MyDisk directory and we want to change to the English directory, we issue the command:
CD :PC-KIMMO:EnglishHere are some things to remember when using Macintosh pathnames with the CD command.