Quoted Name Searching in Pyparsing with searchparser.py
The searchparser.py module has a flaw when it comes to quoted phrases with punctuation.
Searching for something like this:
“C. Montgomery Burns, Esq.”
results in a nasty stacktrace:
Traceback (most recent call last): File "searchparser.py", line 302, inif ParserTest().Test(): File "searchparser.py", line 289, in Test r = self.Parse(item) File "searchparser.py", line 170, in Parse return self.evaluate(self._parser(query)[0]) File "/var/lib/python-support/python2.5/pyparsing.py", line 1049, in parseString loc, tokens = self._parse( instring, 0 ) File "/var/lib/python-support/python2.5/pyparsing.py", line 925, in _parseNoCache loc,tokens = self.parseImpl( instring, preloc, doActions ) File "/var/lib/python-support/python2.5/pyparsing.py", line 2560, in parseImpl return self.expr._parse( instring, loc, doActions, callPreParse=False ) File "/var/lib/python-support/python2.5/pyparsing.py", line 925, in _parseNoCache loc,tokens = self.parseImpl( instring, preloc, doActions ) File "/var/lib/python-support/python2.5/pyparsing.py", line 2431, in parseImpl raise maxException pyparsing.ParseException: Expected """ (at char 2), (line:1, col:3)
The reason is that words in searchparser.py are defined as consisting of just letters and numbers (or alphanums, in pyparsing-speak) in lines 94 and 95:
Word(alphanums)
The solution is to define a string which contains all the possible punctuation characters to expect in a quoted search string, and include it in the parser grammar for word.
For example, for people’s names, the likely punctuation characters to expect are:
punctuation = ",.'`&-"
So, using that definition at the start of the parser(self) method, we edit the lines for operatorWord() to look like this:
operatorWord = Group(Combine(Word(alphanums+punctuation) + Suppress('*'))).setResultsName('wordwildcard') | \
Group(Word(alphanums+punctuation)).setResultsName('word')
With that change, we can find Monty Burns using his exact name, periods and commas included.
Tags: HOW-TO, pyparsing, Python, searchparser.py







September 28th, 2010 at 4:16 am
Thanks for your improvement! You could add a link to your page in the pyparsing Wiki for people who want to improve their search query parser. Thanks, Rudolph