Method and apparatus for producing and accessing composite data |
| OF THE INVENTION Reference will now be made in detail to the preferred embodiments of this ... |
|
Method, system and program product for animated web page construction and display |
| It is therefore one object of the present invention to provide a system and method for constructing ... |
|
Encoding and transferring media content onto removable storage |
| In the following detailed description of embodiments of the invention, reference is made to the ... |
|
Method and apparatus for refreshing a non-clocked memory |
| OF THE INVENTION A combinatorial decoding device and/or programmable refresh according to the ... |
|
Solid state sound lamp |
| The present invention comprises a sound light source that overcomes the problems caused by the ... |
|
Automated audit methodology for design |
| 1. A method of auditing a design process, said method comprising: producing a library usage file ... |
|
|
Method and system for bootstrapping statistical processing into a rule-based natural language parser
| Details |
Inventors: Richardson, Stephen Darrow; Heidorn, George E.;
Assignee: Microsoft Corporation (Redmond, WA)
Primary Examiner: Trammell; James P.
Assistant Examiner: Nguyen; Cuong H.
Attorney, Agent or Firm: Seed and Berry LLP
A method and system for bootstrapping statistical processing into a rule-based natural language parser is provided. In a preferred embodiment, a statistical bootstrapping software facility optimizes the operation of a robust natural language parser that uses a set of lexicon entries to determine possible parts of speech of words from an input string and a set of rules to combine words from the input string into syntactic structures. The facility first operates the parser in a statistics compilation mode, in which, for each of many sample input strings, the parser attempts to apply all applicable rules and lexicon entries. While the parser is operating in the statistics compilation mode, the facility compiles statistics indicating the likelihood of success of each rule and lexicon entry, based on the success of each rule and lexicon entry when applied in the statistics compilation mode. After a sufficient body of likelihood of success statistics have been compiled, the facility operates the parser in an efficient parsing mode, in which the facility uses the compiled statistics to optimize the operation of the parser. In order to parse an input string in the efficient parsing mode, the facility causes the parser to apply applicable rules and lexicon entries in the descending order of the likelihood of their success as indicated by the statistics compiled in the statistics compilation mode. |
|
DETAILED DESCRIPTION OF THE INVENTION I. INTRODUCTION A method and system for bootstrapping statistical processing into a rule-based natural language parser is provided. In a preferred embodiment, the invention comprises a statistical bootstrapping software facility ("the facility"), shown as element 208 in FIG. 2, for automatically compiling and using statistics to improve the performance of a rule-based natural language parser, which generates syntax trees to represent the organization of plain-text sentences. Such a parser uses a set of lexicon entries to identify the part of speech of words, and a set of rules to combine words from an input string into syntactic structures, or "records," eventually combining the records into a syntactic tree representing the entire input string. A parser is said to "apply" lexicon entries and rules in order to produce new records. A parser may apply a lexicon entry when the word to which it corresponds appears in the input string, and does so by creating a new record, then copying lexical information such as part of speech, person, and number from the lexicon entry to the created record. A parser may apply a rule that combines existing records by first evaluating conditions associated with the rule. If the conditions of the applied rule are satisfied, then the facility creates a new record and adds information to the created record, such as record type and information about the combined records, as specified by the rule. The facility functions as a parser control program for a conventional rule-based parser. FIG. 1 is a flow diagram showing the overall operation of the facility. In step 101-103, the facility operates the parser in a statistics compilation mode, during which the facility compiles statistics indicating the success rate of the parser when it applies each lexicon entry and each rule while parsing a "corpus," or large sample of representative text. In this mode, the facility in steps 101-102 causes the parser to apply every rule and lexicon entry which may be applied ("applicable" rules and lexicon entries) to create "records," or prospective parse tree nodes
|
|