How can I eliminate left recursion in the following grammar? - language-agnostic

Here's the grammar, which is supposed to describe a language of nested braces with commas as delimiters:
L ::= {L} | L,L |
A few more examples of strings I'd expect the grammar to accept and reject:
Accept:
{,{,,{,}},,{,}}
{{{{}}}}
{,{}}
Reject:
{}{}
{,{}{}}
{{},{}

Done by hand:
L ::= { L } | { L } , | , L | ε
Or, instead of just winging it we could use a more systematic approach and apply the algorithm from Wikipedia on removing immediate left recursion:
L ::= { L } L1 | L1
L1 ::= ε | , L L1

First of all, that grammar won't accept your first example, since it requires the commas to be after the close brace and before the open brace. I would suggest to re-write it as
L::= {L} | ,L
This won't get rid of the left recursion, but it will at least match your acceptable answers.

Related

Boolean Logic and Truth Tables

I have been Googling around and haven't been able to find a solution. If anyone can link me or explain this, I'd appreciate it.
I have this expression:
¬aΛb | aΛ¬b. Λ is AND, ¬ is NOT.
The truth table is:
A B Expression
--------------
T T F
T F T
F T F
F F T
I am confused as to why they aren't all FALSE. For example, if I were to consider a and b as false: ¬a and ¬b gets precedence, so they become true. But ¬a (TRUE) Λ b (FALSE) is FALSE. And since Λ gets precedence, a (FALSE) Λ ¬b (TRUE) is again FALSE. So FALSE | FALSE = FALSE, right?
Likewise, for a|b|c|d|e, where | is OR. Why is it that when only d is FALSE, and the other are true:
T T T F T
= FALSE
The calculator you're using uses | to mean NAND, not OR. You should use + for OR. Then the truth table comes out as expected. x NAND y is TRUE except when x AND y are true; and NAND has the same precedence as AND, so without parentheses the operators bind leftmost first. A fully parenthesized version of your formula is:
((((not a) and b) nand a) and (not b))
Generating a truth table based on this gives the observed result.

Can Octave combine elements and indices in expressions?

I'd like to create a matrix of elements with each value in column i the ith power of the value in column 1. Easy with a for loop, but is there a way to combine matrix elements and their indices in expressions?
Do you mean something like this?
M = M(:,1) .^ (1:size(M,2));
It is easy to generate an array of indices to manipulate and/or operate on.
Note: For older versions of MATLAB the above gives an error, you need to use bsxfun:
M = bsxfun(#power, M(:,1), 1:size(M,2));
Note 2: If your inputs are v=[3;5;7] and n=3 you can translate the above to
M = v .^ (1:n);
What about this:
F = #(x, n) bsxfun (#realpow, x(:), 1:n);
Example:
>> F ([3;5;7], 3)
ans =
3 9 27
5 25 125
7 49 343

Postgres column with string unicode values

i have a column query_params with type TEXT but the values are stored as a string unicode. each value in the column is prefixed with a u and i'm struggling to remove it
Is there a way to remove the u from the values and convert the values dictionary to columns?
for example, the query SELECT query_params FROM api_log LIMIT 2 returns two rows
{
u'state': u'CA',
u'page_size': u'1000',
u'market': u'Western',
u'requested_at': u'2014-10-28T00:00:00+00:00'
},
{
u'state': u'NY',
u'page_size': u'1000',
u'market': u'Eastern',
u'requested_at': u'2014-10-28T00:10:00+00:00'
}
is it possible to handle unicode in postgres and convert to columns:
state | page_size | market | requested_at
------+-----------+----------+---------------------------
CA | 1000 | Western | 2014-10-28T00:00:00+00:00
NY | 1000 | Eastern | 2014-10-28T00:10:00+00:00
Thanks for any help.
You should remove u letters and replace single quotes with double ones to get properly formatted json. Then you can use the ->> operator to get its attributes:
select
v->>'state' as state,
v->>'page_size' as page_size,
v->>'market' as market,
v->>'requested_at' as requested_at
from (
select regexp_replace(query_params, 'u\''([^\'']*)\''', '"\1"', 'g')::json as v
from api_log
) s;
Test the solution in SqlFiddle.
Read about POSIX Regular Expression in the documentation.
Find an explanation of the regexp expression in regex101.com.

MySQL select UTF-8 string with '=' but not with 'LIKE'

I have a table with some words that come from medieval books and have some accented letters that doesn't exists anymore in modern latin1 alphabet. I can represent these letters easily with UTF-8 combining characters. For example, to create a "J" with a tilde, I use the UTF-8 sequence \u004A+\u0303 and the J becomes accented with a tilde.
The table uses utf8 encoding and the field collation is utf8_unicode_ci.
My problem is the following: If I try to select the entire string, I receive the correct answer. If I try to select using 'LIKE', I receive the wrong answer.
For example:
mysql> select word, hex(word) from oldword where word = 'hua';
+--------+--------------+
| word | hex(word) |
+--------+--------------+
| hũa | 6875CC8361 |
| huã | 6875C3A3 |
| hua | 687561 |
| hũã | 6875CC83C3A3 |
+--------+--------------+
4 rows in set (0,04 sec)
mysql> select word, hex(word) from oldword where word like 'hua';
+-------+------------+
| word | hex(word) |
+-------+------------+
| huã | 6875C3A3 |
| hua | 687561 |
+-------+------------+
2 rows in set (0,04 sec)
I don't want to search only the entire word. I want to search words that start with some substring. Eventually the searched word is the entire word.
How could I select the partial string using like and match all the strings?
I tried to create a custom collation using this information, but the server became unstable and only after a lot of trials and errors I was able to revert to the utf8_unicode_ci collation again and the server returned to normal condition.
EDIT: There's a problem with this site and some characters don't display correctly. Please see the results on these pastebins:
http://pastebin.com/mckJTLFX
http://pastebin.com/WP87QvgB
After seeing Marcus Adams' answer I realized that the REPLACE function could be the solution for this problem, although he didn't mentioned this function.
As I have only two different combining characters (acute and tilde), combined with other ASCII characters, for example j with tilde, j with acute, m with tilde, s with tilde, and so on. I just have to replace these two characters when using LIKE.
After searching the manual, I learned about the UNHEX function that helped me to properly represent the combining characters alone in the query to remove them.
The combining tilde is represented by CC83 in HEX code and the acute is represented by CC81 in HEX.
So, the query that solves my problem is this one.
SELECT word, REPLACE(REPLACE(word, UNHEX("CC83"), ""), UNHEX("CC81"), "")
FROM oldword WHERE REPLACE(REPLACE(word, UNHEX("CC83"), ""), UNHEX("CC81"), "")
LIKE 'hua%';`
The problem is that LIKE performs the comparison character-by-character and when using the "combining tilda", it literally is two characters, though it displays as one (assuming your client supports displaying it as such).
There will never be a case where comparing e.g. hu~a to hua character-by-character will match because it's comparing ~ with a for the third character.
Collations (and coercions) work in your favor and handle such things when comparing the string as a whole, but not when comparing character-by-character.
Even if you considered using SUBSTRING() as a hack instead of using LIKE with a wildcard % to perform a prefix search, consider the following:
SELECT SUBSTRING('hũa', 1, 3) = 'hua'
-> 0
SELECT SUBSTRING('hũa', 1, 4) = 'hua'
-> 1
You kind of have to know the length you're going for or brute force it like this:
SELECT * FROM oldword
WHERE SUBSTRING(word, 1, 3) = 'hua'
OR SUBSTRING(word, 1, 4) = 'hua'
OR SUBSTRING(word, 1, 5) = 'hua'
OR SUBSTRING(word, 1, 6) = 'hua'
According to this:
ũ collates equal to plain U in all utf8 collations on 5.6.
j́ collates equal to plain J in most collations; exceptions:
utf8_general*ci because it is actually j plus an accent. And the "general" collations only look at one character (as distinguished from byte) at a time. Most collations take into consideration multiple characters, such as ch or ll in Spanish or ss in German.
utf8_roman_ci, which is quite an oddball. j́=i=j
(LIKE does not exactly follow the regular rules of collation. I am not versed on the details, but I think that J is represented as 2 characters causes it to work differently in LIKE than in WHERE or ORDER BY. Furthermore, I don't know whether REPLACE() collates like LIKE or the other places.)
You can use the % symbol like a wildcard character. For example this:
SELECT word
FROM myTable
WHERE word LIKE 'hua%';
This will pull all records that start with hua and have 0+ characters following it. Here is an SQL Fiddle example.

An explanation for BNF rule

I'm investigating the mysql's SQL parser at the moment.
And here is the interesting thing I have noticed and cannot explain:
(in sql_yacc.yy)
predicate:
...
| bit_expr BETWEEN_SYM bit_expr AND_SYM predicate
{
$$= new (YYTHD->mem_root) Item_func_between($1,$3,$5);
if ($$ == NULL)
MYSQL_YYABORT;
}
The same is on the Expression Syntax page:
predicate:
...
| bit_expr [NOT] BETWEEN bit_expr AND predicate
That means that
foo BETWEEN 1 AND bar BETWEEN 1 AND 2
query syntactically correct, while it makes no sense at all.
My question: what this could be used for? What would we miss if used
bit_expr [NOT] BETWEEN bit_expr AND bit_expr
instead?
LOL (not a LOL anymore actually)
this query executes WITHOUT errors:
select * from users where id between 1 and id between 1 and 10;
// returns row with id = 1
select * from users where id between 2 and id between 2 and 10;
Empty set (0.00 sec)
(update is added here) ... and actually it is expected.
Presumably it converts the second expression straight to 0 or 1 and uses it as an operand.
UPD:
I've filed a bug - http://bugs.mysql.com/bug.php?id=69208
It's definitely not an expected syntax at all
UPD 2: so looks like it's just a minor typo that doesn't change a parser behaviour at all (well, to be clear it makes it unnoticeable slower for a common BETWEEN expression).
Your analysis is basically correct:
foo BETWEEN 1 AND bar BETWEEN 1 AND 2
is parsed as:
foo BETWEEN 1 AND (bar BETWEEN 1 AND 2)
and the second (parenthesized) predicate will presumably evaluate to either 0 or 1 (for false or true). Therefore, if bar is not between 1 and 2, the set of selected values from foo will be empty (because foo BETWEEN 1 AND 0 is a shorthand for foo >= 1 AND f <= 0 and there are no values for which that is true, even allowing for NULLs). Contrariwise, if bar is between 1 and 2, then the set of selected values from foo will be the set where `foo1 equals 1.
And alternative question to your "what would you lose if you replaced the 'predicate' term with 'bit_expr'?" might be "would you gain anything if you replaced the 'bit_expr' with 'predicate'?"
Without a careful scrutiny of the complete grammar (or, at least, scrutiny of the parts referenced by bit_expr and predicate, and possibly a review of places where bit_expr and predicate are used), it is hard to know the answer to either question.