How to parse keywords and strings from a line of text - textx

Have a file keywords.tx with
Commands:
keywords = 'this' & 'way'
;
StartWords:
keywords = 'bag'
;
Then a file mygram.tx with
import keywords
MyModel:
keyword*=StartWords[' ']
name+=Word[' ']
;
Word:
text=STRING
;
'''
My data file has one line with "bag hello soda this way".
Would like to see result have attributes of keyword='bag' name='hello soda' and command='this way'.
Not sure how to get grammar to handle: keywords words keywords making sure that 2nd keywords are not included in the words. Another way to express is startwords words commands

If I understood your goal you can do something like this:
from textx import metamodel_from_str
mm = metamodel_from_str('''
File:
lines+=Line;
Line:
start=StartWord
words+=Word
command=Command;
StartWord:
'bag' | 'something';
Command:
'this way' | 'that way';
Word:
!Command ID;
''')
input = '''
bag hello soda this way
bag hello soda that way
something hello this foo this way
'''
model = mm.model_from_str(input)
assert len(model.lines) == 3
l = model.lines[1]
assert l.start == 'bag'
assert l.words == ['hello', 'soda']
assert l.command == 'that way'
There are several things to note:
You don't have to specify [' '] as a separator rule in your repetitions as by default whitespaces are skipped,
To specify alternatives use |,
You can use a syntactic predicate ! to check if something is ahead and proceed only if it isn't. In the rule Word this is used to assure that commands are not consumed by the Word repetition in the Line rule.
You can add more start words and commands simply by adding more alternatives to these rules,
If you want to be more permissive and capture commands even if user specified multiple whitespaces between command words (e.g. this way) you can either use regex matches or e.g. specify match like:
Command:
'this ' 'way' | 'that ' 'way';
which will match a single space as a part of this and than arbitrary number of whitespaces before way which will be thrown away.
There is a comprehensive documentation with examples on the textX site so I suggest to take a look and go through some of the provided examples.

Related

Strip special characters and space of a DB column to compare in rails

I have 4 types of last_name:
"Camp Bell"
"CAMPBELL"
"CampBellJr."
"camp bell jr."
Now, in rails when an user is searched by it's last name like camp bell, I want to show all the 4 records. So, I tried:
RAILS
stripped_name = params[last_name].gsub(/\W/, '')
#=> "campbell"
User.where("LOWER(REPLACE(last_name, '/\W/', '')) LIKE ?", "#{stripped_name}%")
Give me only 2 records with following last_name:
"CAMPBELL"
"CampBellJr."
I guess, this is because, the mysql REPLACE is not working correctly with regex.
Any ideas?
EDIT
Guys, sorry for the confusion. My idea is to strip off all special characters including space. So I'm trying to use \W regex.
For example, the input can be: camp~bell... But, it should still fetch result.
You can check for both stripped_name without space and ones that include both names seperated with space like this.
stripped_name = params[last_name].gsub(/\W/, '')
split_names = params[last_name].split(" ")
User.where('name LIKE ? OR (name LIKE ? AND name LIKE ?)', "%#{stripped_name}%", "%#{split_names[0]}%", "%#{split_names[1]}%")
Next step would to search for complete array of split names not just first two.
Here my solution:
User.where("REPLACE(last_name, ' ', '') ILIKE CONCAT ('%', REPLACE('?', ' ', ''),'%')", stripped_name)
ILIKE is like LIKE but the I is for insensitive case.
To understand easily step by step:
lastname ILIKE '%campbell% you need % because you want lastname
contain this string, not necessary at the begin or the end of you
string.
'campbell%' => search string who begin by campbell
'%campbell' => search string who finish by campbell
We need generate '%campbell%, so we use CONCAT for that
I just use a simply REPLACE, but maybe you should use a regex.

Python3 Creating basic functions - Remove specific character strings/translating

I am playing around with python3 in idle and after discovering what Pig Latin is I am trying to write a couple functions and am confused about where to start/what python specific words/functions I should be using.
(i) 1 parameter -I am trying to translate words from pig latin back to English. It would always end with "ay" and only have one hyphen. Essentially I am trying to remove the hypen and "ay" at the end of the pig latin word. I believe I need to start by finding the position of the hyphen. Then I want it to extract from the string 2 substrings: the portion before the hyphen and the portion between the hyphen and the "ay" at the end. For example, when given the string "at-thay", the two substrings are "at" and "th". Following this I want to combine the two substrings to create the English word. (example above would return "that")
(ii) I want a function that takes a single parameter, a string, and finds the first position within that string where any of the characters "aeiouAEIOU" appears. For example, given the string "is", your function should return 0, and given the string "Pig", I want the function to return 1. If the string doesn't have any of the listed vowels in it at all, such as with the string "shh", then the function should return the length of the string, which would be 3 in this example.
(iii) I want a function that returns the translation of a single word from English to Pig Latin. The translation consists of everything from the first vowel onward then a hyphen, the portion of the word that preceded the first vowel, and the letters "ay".
I know this is a tall order but any help with any of these 3 would greatly help!
you could do:
def unPigWord( word ):
"""This function takes any Pig Latin word and returns that word in English."""
parts = word.split( "-" )
return parts[1][0: len(parts[1])-2] + parts[0]
def findVowel( word ):
for i, char in enumerate( word ):
if char in "aeiouAEIOU":
return i
return i + 1
def pigWord( word ):
fvi = findVowel( word )
return word[fvi:] + "-" + word[0:fvi] + "ay"

Confused about this nested function

I am reading the Python Cookbook 3rd Edition and came across the topic discussed in 2.6 "Searching and Replacing Case-Insensitive Text," where the authors discuss a nested function that is like below:
def matchcase(word):
def replace(m):
text = m.group()
if text.isupper():
return word.upper()
elif text.islower():
return word.lower()
elif text[0].isupper():
return word.capitalize()
else:
return word
return replace
If I have some text like below:
text = 'UPPER PYTHON, lower python, Mixed Python'
and I print the value of 'text' before and after, the substitution happens correctly:
x = matchcase('snake')
print("Original Text:",text)
print("After regsub:", re.sub('python', matchcase('snake'), text, flags=re.IGNORECASE))
The last "print" command shows that the substitution correctly happens but I am not sure how this nested function "gets" the:
PYTHON, python, Python
as the word that needs to be substituted with:
SNAKE, snake, Snake
How does the inner function replace get its value 'm'?
When matchcase('snake') is called, word takes the value 'snake'.
Not clear on what the value of 'm' is.
Can any one help me understand this clearly, in this case?
Thanks.
When you pass a function as the second argument to re.sub, according to the documentation:
it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.
The matchcase() function itself returns the replace() function, so when you do this:
re.sub('python', matchcase('snake'), text, flags=re.IGNORECASE)
what happens is that matchcase('snake') returns replace, and then every non-overlapping occurrence of the pattern 'python' as a match object is passed to the replace function as the m argument. If this is confusing to you, don't worry; it is just generally confusing.
Here is an interactive session with a much simpler nested function that should make things clearer:
In [1]: def foo(outer_arg):
...: def bar(inner_arg):
...: print(outer_arg + inner_arg)
...: return bar
...:
In [2]: f = foo('hello')
In [3]: f('world')
helloworld
So f = foo('hello') is assigning a function that looks like the one below to a variable f:
def bar(inner_arg):
print('hello' + inner_arg)
f can then be called like this f('world'), which is like calling bar('world'). I hope that makes things clearer.

Separating keywords by space and searching MySQL database

I am working with PHP and MySQL. I'll provide a simple table below:
------------------------------------------------------
| id | text |
------------------------------------------------------
| 1 | The quick brown fox jumped over the lazy dog |
------------------------------------------------------
I'd like users to be able to search using keywords separated by spaces. I've got a simple SQL query for the above which is SELECT * FROM table WHERE text LIKE %[$keyword]% which means if I search for "The quick fox", I won't get any results. Is there a way to separate the keywords by spaces so if I search for "The quick fox", it will run 3 searches. One for "The", one for "quick", and one for "fox" - removing all white spaces. Though it should only display one result since it all belongs to the same row instead of 3 since all 3 keywords matched the data in the row.
What's the best way to do this? All suggestions to do this better are always welcome. Thanks!
[EDIT]
Just thought of this now, would separating the keywords by comma (,) be a better option?
You might consider a regular expression via REGEXP to separate the words into an or group.
SELECT *
FROM tbl
WHERE
LOWER(`text`) REGEXP '\b(the|quick|fox)\b'
Matches:
The quick brown fox jumps over the lazy dog
Quick, get the fox!
I ate the cake
Doesn't match
Brown dogs
Inside PHP, you can construct this expression by splitting your search string on the spaces and imploding it back on | after escaping each component.
$str = "the quick brown fox";
$kwds = explode(" ", $str);
$kwds = array_map("mysql_real_escape_string", $kwds);
$regexp = "\b(" . implode("|", $kwds) . ")\b";
Then use REGEXP '$regexp' in your statement.
Addendum:
Since you didn't mention it in the OP, I want to be sure you aware of MySQL's full text searching capabilities on MyISAM tables, in case it can meet your need. From your description, full text doesn't sound like exactly your requirement, but you should review it as a possibility: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
I did it by a different approach of concatenating the query string.
//this line contains four words which are being used for a search
$str = "the quick brown fox";
//the query string is written in little pieces, stored in variable "$uquery"
$uquery = "select * from tablename where";
//a new variable "$keywords" is going to hold the values as an array
$keywords = explode(" ", $str);
$keywords = array_map("mysql_real_escape_string", $keywords);
//"$k" will be storing the total no. of elements in the array, should be holding "4" for this example
$k = count($keywords);
//defining variable "$i" equals to zero
$i = 0;
//initiating a while loop which will be continued 4 times as we have only 4 keywords in this example
while($i <= $k){
//here comes another new variable "$uquery_", following part will be concatenated with the previous piece of "$uquery" later
$uquery_ .= " topic like convert(_utf8 '%$keywords[$i]%' USING latin1)";
//for more explanation let me tell you that how "$keywords[$i]" will help in above piece of code, for every value of "$i" it will be fetching a keyword from the array, see below:
//$keywords[0]= "the" for $i =0
//$keywords[1]= "quick" for $i =1
//$keywords[2]= "brown" for $i =2
//$keywords[3]= "fox" for $i =3
//now concatenating logical && operator after every round of the loop in "$uquery_" but not for the last round as the query needs to be ended after last keyword
if($i != $k){ $uquery_ .= " &&"; }
//adding 1 to $i after every round
$i++;
}
//now our main variable "$uquery" is being concatenated with "$uquery_"
$uquery .= "$uquery_";
The above piece of code will be generating following query:
select * from tablename where topic like convert(_utf8 '%the%' USING latin1) && topic like convert(_utf8 '%quick%' USING latin1) && topic like convert(_utf8 '%brown%' USING latin1) && topic like convert(_utf8 '%fox%' USING latin1)
Note: "topic" is supposed to be the column name in mysql table, you can replace it with the column name as defined in your table.
I hope it will help a few of the people. If you have any questions feel free to ask me via Live Chat feature on my website http://www.79xperts.com
Regards
Adnan Saeed

Ruby override .index() in String to search for a character or its HTML equivalent

So... I've been working with WYSIWYG editors, and have realized, that they occasionally replace certain characters with the hex codes for that character, like the ' or the & for example.
How do I override String's index method such that it includes these hex codes?
Like, when do somestring.index("\'hello there") how do I get it to search \' and '
note: single quote is escaped for clarity against double quotes.
what is the most efficient way to do this kind of string search?
is there something like this already built in.
Also, since I'm using external tools, I don't really have a say in the format things are in.
THE SOLUTION:
search_reg_exp = Regexp.escape(str).gsub(/(both|options|or|more)/, "(both|options|or|more)")
long_str.index(search_reg_exp)
ORIGINAL ANSWER:
String#index doesn't just work for single characters, it can be used for a substring of any length, and you can give it a regular expression which would probably be best in this case:
some_string = "Russell's teapot"
another_string = "Russell's teapot"
apostrophe_expr = /'|'/
some_string.index apostrophe_expr
# => 7
another_string.index apostrophe_expr
# => 7
Another option would be to just decode the HTML entities before you start manipulating the string. There are various gems for this including html_helpers:
require 'html_helpers'
another_string = "Russell's teapot"
yet_another_string = HTML::EntityCoder.decode_entities another_string
# => "Russell's teapot"
yet_another_string.index "'"
# => 7
yet_another_string.index ?' # bonus syntax tip--Ruby 1.9.1+
# => 7