Regex giving infinite loopback instead of ending - mysql

Could anyone explain how my MySQL regex script would loop infinitely instead of returning true or false?
'^[[:alnum:]]+([_\\.\\-]?[[:alnum:]]+)*#[[:alnum:]]+([_\\.\\-]?[[:alnum:]]+)*(\\.[[:alnum:]]{2,4})+$'
Is there a way to make MySQL return false if it detects an infinite loop?
My entire sql query is:
SELECT
cusworkemail NOT REGEXP '^[[:alnum:]]+([_\\.\\-]?[[:alnum:]]+)*#[[:alnum:]]+([_\\.\\-]?[[:alnum:]]+)*(\\.[[:alnum:]]{2,4})+$' AS invalid_value,
cusworkemail,
num,
cusid_list
FROM (
SELECT
IFNULL(cusworkemail, '') AS cusworkemail,
count(*) AS num,
GROUP_CONCAT(DISTINCT cusid) AS cusid_list
FROM (SELECT cusid, cusworkemail FROM dealCRM.cus WHERE cusworkemail != '' AND cusworkemail IS NOT NULL) AS t
GROUP BY cusworkemail
ORDER BY num DESC
-- LIMIT 0, 10000
) AS c
HAVING invalid_value;
The query will throw an error on regex timeout.
Here is an example of an email that will cause an infinite loop:
"sdasa#kj.nhg " with a space at the end.
Does the parser not detect that it is repeating the same internal state?

The pattern [[:alnum:]]+([_\\.\\-]?[[:alnum:]]+)* is potentially causing a lot of backtracking. It is not an infinite loop, just a lot of looping which will hit a limit.
Take for instance this string which should not match: "abcdefg"
It will first match [[:alnum:]]+ against "abcdefg", then execute ()* zero times and then find that there is no "#". So it backtracks making [[:alnum:]]+ to match one fewer character: "abcdef". The ()* part can now execute once. But it needs to backtrack again. We can summarise this as:
abcdefg()
abcdef(g)
abcde(fg)
abcd(efg)
abc(defg)
ab(cdefg)
a(bcdefg)
None of this divisions of the characters into the first or second [[:alnum:]]+ pattern helps to get past that #. In the example you have given this backtracking will occur at the end so that there is a Cartesian product of backtracking in the part before the # and in the part after it.
Conclusion: you should remove that ?. It is not an optional match. The optional part is already reflected by the surrounding *. Either there is a hyphen/dot/underscore and the pattern in parentheses should be executed, or there is not, and then that pattern should not be executed.
Note you have this pattern twice, so two corrections are needed.

Related

Why the order of evaluation for expressions involving user variables is undefined?

From MySQL Manual the output of the following query is not guaranteed to be same always.
SET #a := 0;
SELECT
#a AS first,
#a := #a + 1 AS second,
#a := #a + 1 AS third,
#a := #a + 1 AS fourth,
#a := #a + 1 AS fifth,
#a := #a + 1 AS sixth;
Output:
first second third fourth fifth sixth
0 1 2 3 4 5
Quoting from the Manual:
However,the order of evaluation for expressions involving user
variables is undefined;
I want to know the story behind.
So my question is : Why the order of evaluation for expressions involving user variables is undefined?
The order of evaluation of expressions in the select is undefined. For the most part, you only notice this when you have variables, because the errors result in erroneous information.
Why? The SQL standard does not require the order of evaluation, so each database is free to decide how to evaluate the expressions. Typically such decisions are left to the optimizer.
TL;DR MySQL user-defined variables are not intended to be used that way. An SQL statement describes a result set, not a series of operations. The documentation isn't clear about what variable assignments even mean. But you can't both read and write a variable. And assignment order within SELECT clause is not defined. And all you can assume is that assignments in an outer SELECT clause are done for some one output row.
Almost all the code you see like yours has undefined behaviour. Some sensible people demonstrate via the implementation code for operators & optimization what a particular implementation actually does. But that behaviour can't be relied on for the next release.
Read the documentation. Reading and writing the same variable is undefined. When it's not done, any variable read is fixed within a statement. There is no order to assignments. For SELECTs with only DETERMINISTIC functions (whose values are determined by argument values) the result is defined by a conceptual evaluation execution. But there is no connection between that and user variable. What an assignment ever means is not clear: the documention says "each select expression is evaluated only when sent to the client". This seems to be saying that there's no guarantee a row is even "selected" except in the sense of put into a result set per an outermost SELECT clause. The order of assignments in a SELECT is not defined. And even if assignments are conceptually done for every row, they can only depend on the row value, so that's the same as saying the assignment is done only once, for some row. And since assignment order is not defined, that row can be any row. So assuming that that is what the documentation means, all you can expect is that if you don't read and write from the same variable in a SELECT statement then each variable assignment in the outermost SELECT will have happened in some order for one output row.
It depends on database's optimizer's decision. That's why it's uncertain. But mostly optimizer decides as the way we predict the result.

nested "select " query in mysql

hi i am executing nested "select" query in mysql .
the query is
SELECT `btitle` FROM `backlog` WHERE `bid` in (SELECT `abacklog_id` FROM `asprint` WHERE `aid`=184 )
I am not getting expected answer by the above query. If I execute:
SELECT abacklog_id FROM asprint WHERE aid=184
separately
I will get abacklog_id as 42,43,44,45;
So if again I execute:
SELECT `btitle` FROM `backlog` WHERE `bid` in(42,43,44,45)
I will get btitle as scrum1 scrum2 scrum3 msoffice
But if I combine those queries I will get only scrum1 remaining 3 atitle will not get.
You Can Try As Like Following...
SELECT `age_backlog`.`ab_title` FROM `age_backlog` LEFT JOIN `age_sprint` ON `age_backlog`.`ab_id` = `age_sprint`.`as_backlog_id` WHERE `age_sprint`.`as_id` = 184
By using this query you will get result with loop . You will be able to get all result with same by place with comma separated by using IMPLODE function ..
May it will be helpful for you... If you get any error , Please inform me...
What you did is to store comma separated values in age_sprint.as_backlog_id, right?
Your query actually becomes
SELECT `ab_title` FROM `age_backlog` WHERE `ab_id` IN ('42,43,44,45')
Note the ' in the IN() function. You don't get separate numbers, you get one string.
Now, when you do
SELECT CAST('42,43,44,45' AS SIGNED)
which basically is the implicit cast MySQL does, the result is 42. That's why you just get scrum1 as result.
You can search for dozens of answers to this problem here on SO.
You should never ever store comma separated values in a database. It violates the first normal form. In most cases databases are in third normal form or BCNF or even higher. Lower normal forms are just used in some special cases to get the most performance, usually for reporting issues. Not for actually working with data. You want 1 row for every as_backlog_id.
Again, your primary goal should be to get a better database design, not to write some crazy functions to get each comma separated number out of the field.

MySQL Query Tuning - Why is using a value from a variable so much slower than using a literal?

UPDATE: I've answered this myself below.
I'm trying to fix a performance issue in a MySQL query. What I think I'm seeing, is that assigning the result of a function to a variable, and then running a SELECT with a compare against that variable is relatively slow.
If for testings sake however, I replace the compare to the variable with a compare to the string literal equivalent of what I know that function will return (for a given scenario), then the query runs much faster.
For example:
...
SET #metaphone_val := double_metaphone(p_parameter)); -- double metaphone is user defined
SELECT
SQL_CALC_FOUND_ROWS
t.col1,
t.col2,
...
FROM table t
WHERE
t.pre_set_metaphone_string = #metaphone_val -- OPTION A
t.pre_set_metaphone_string = 'PRN' -- OPTION B (Literal function return value for a given name)
If I use the line in option A, the query is slow.
If I use the line in option B, then the query is fast as you would expect any simple string compare to be.
Why?
Was finished writing the question when the answer hit me, so posting anyway for knowledge sharing!
I realised that the return value of the metaphone function was UTF8.
The compare to a latin1 field was obviously incurring a fairly heavy performance overhead.
I replaced the variable assignment with:
SET #metaphone_val:= CONVERT(double_metaphone(p_parameter) USING latin1);
Now the query runs as fast as I would expect.

MySQL order by problems

I have the following codes..
echo "<form><center><input type=submit name=subs value='Submit'></center></form>";
$val=$_POST['resulta']; //this is from a textarea name='resulta'
if (isset($_POST['subs'])) //from submit name='subs'
{
$aa=mysql_query("select max(reservno) as 'maxr' from reservation") or die(mysql_error()); //select maximum reservno
$bb=mysql_fetch_array($aa);
$cc=$bb['maxr'];
$lines = explode("\n", $val);
foreach ($lines as $line) {
mysql_query("insert into location_list (reservno, location) values ('$cc', '$line')")
or die(mysql_error()); //insert value of textarea then save it separately in location_list if \n is found
}
If I input the following data on the textarea (assume that I have maximum reservno '00014' from reservation table),
Davao - Cebu
Cebu - Davao
then submit it, I'll have these data in my location_list table:
loc_id || reservno || location
00001 || 00014 || Davao - Cebu
00002 || 00014 || Cebu - Davao
Then this code:
$gg=mysql_query("SELECT GROUP_CONCAT(IF((#var_ctr := #var_ctr + 1) = #cnt,
location,
SUBSTRING_INDEX(location,' - ', 1)
)
ORDER BY loc_id ASC
SEPARATOR ' - ') AS locations
FROM location_list,
(SELECT #cnt := COUNT(1), #var_ctr := 0
FROM location_list
WHERE reservno='$cc'
) dummy
WHERE reservno='$cc'") or die(mysql_error()); //QUERY IN QUESTION
$hh=mysql_fetch_array($gg);
$ii=$hh['locations'];
mysql_query("update reservation set itinerary = '$ii' where reservno = '$cc'")
or die(mysql_error());
is supposed to update reservation table with 'Davao - Cebu - Davao' but it's returning this instead, 'Davao - Cebu - Cebu'. I was previously helped by this forum to have this code working but now I'm facing another difficulty. Just can't get it to work. Please help me. Thanks in advance!
I got it working (without ORDER BY loc_id ASC) as long as I set phpMyAdmin operations loc_id ascending. But whenever I delete all data, it goes back as loc_id descending so I have to reset it. It doesn't entirely solve the problem but I guess this is as far as I can go. :)) I just have to make sure that the table column loc_id is always in ascending order. Thank you everyone for your help! I really appreciate it! But if you have any better answer, like how to set the table column always in ascending order or better query, etc, feel free to post it here. May God bless you all!
The database server is allowed to rewrite your query to optimize its execution. This might affect the order of the individual parts, in particular the order in which the various assignments are executed. I assume that some such reodering causes the result of the query to become undefined, in such a way that it works on sqlfiddle but not on your actual production system.
I can't put my finger on the exact location where things go wrong, but I believe that the core of the problem is the fact that SQL is intended to work on relations, but you try to abuse it for sequential programming. I suggest you retrieve the data from the database using portable SQL without any variable hackery, and then use PHP to perform any post-processing you might need. PHP is much better suited to express the ideas you're formulating, and no optimization or reordering of statements will get in your way there. And as your query currently only results in a single value, fetching multiple rows and combining them into a single value in the PHP code shouldn't increase complexety too much.
Edit:
While discussing another answer using a similar technique (by Omesh as well, just as the answer your code is based upon), I found this in the MySQL manual:
As a general rule, you should never assign a value to a user variable
and read the value within the same statement. You might get the
results you expect, but this is not guaranteed. The order of
evaluation for expressions involving user variables is undefined and
may change based on the elements contained within a given statement;
in addition, this order is not guaranteed to be the same between
releases of the MySQL Server.
So there are no guarantees about the order these variable assignments are evaluated, therefore no guarantees that the query does what you expect. It might work, but it might fail suddenly and unexpectedly. Therefore I strongly suggest you avoid this approach unless you have some relaibale mechanism to check the validity of the results, or really don't care about whether they are valid.

Subtracting the value from the last row using variable assignment in MySQL

According to the MySQL documentation:
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get
the results you expect, but this is not guaranteed.
http://dev.mysql.com/doc/refman/5.6/en/user-variables.html
However, in the book High Perfomance MySQL there are a couple of examples of using this tactic to improve query performance anyway.
Is the following an anti-pattern and if so is there a better way to write the query while maintaining good performance?
set #last = null;
select tick, count-#last as delta, #last:=count from measurement;
For clarification, my goal is to find the difference between this row and the last. My table has a primary key on tick which is a datetime column.
Update:
After trying Shlomi's suggestion, I have reverted back to my original query. It turns out that using a case statement with aggregate functions produces unexpected behavior. See for example:
case when (#delta := (max(measurement.count) - #lastCount)) AND 0 then null
when (#lastCount := measurement.count) AND 0 then null
else #delta end
It appears that mysql evaluates the expressions that don't contain aggregate functions on a first pass through the results, and then evaluates the aggregate expressions on a second (grouping) pass. It appears to evaluate the case expression during or after that second pass and use the precalculated values from the first pass in that evaluation. The result is that the third line #delta is always the initial value of #delta (because assignment didn't happen until the grouping pass). I attempted to incorporate a group function into the line with #delta but couldn't get it to behave as expected. So I ultimately when back to my original query which didn't have this problem.
I would still love to hear any more suggestions about how to better handle a query like this.
Update 2:
Sorry for the lack of response on this question, I didn't have a chance to investigate further until now.
Using Shlomi's solution it looks like I had a problem because I was using a group by function when I read my #last variable but not when I set it. My code looked something like this:
CASE
WHEN (#delta := count - #last) IS NULL THEN NULL
WHEN (#last:= count ) IS NULL THEN NULL
ELSE (CASE WHEN cumulative THEN #delta ELSE avg(count) END)
END AS delta
MySQL appears to process expressions that don't contain aggregate functions in a first pass and ones that do in a second pass. The strange thing in the code above is that even when cumulative evaluates to true MySQL must see the AVG aggregate function in the ELSE clause and decides to evaluate the whole inner CASE expression in the second pass. Since #delta is set in an expression without an aggregate function it seems to be getting set on the first pass and by the time the second pass happens MySQL is done evaluating the lines that set #delta and #last.
Ultimately I seem to have found a fix by including aggregate functions in the first expressions as well. Something like this:
CASE
WHEN (#delta := max(count) - #last) IS NULL THEN NULL
WHEN (#last:= max(count) ) IS NULL THEN NULL
ELSE (CASE WHEN cumulative THEN #delta ELSE avg(count) END)
END AS delta
My understanding of what MySQL is doing is purely based on testing and conjecture since I didn't read the source code, but hopefully this will help others who might run into similar problems.
I am going to accept Shlomi's answer because it really is a good solution. Just be careful how you use aggregate functions.
I've researched this issue in depth, and wrote a few improvements on the above.
I offer a solution in this post, which uses functions whose order can be expected. Also consider my talk last year.
Constructs such as CASE and functions such as COALESCE have known underlying behavior (at least until this is changed, right?).
For example, a CASE clause inspects the WHEN conditions one by one, by order of definition.
Consider a rewrite of the original query:
select
tick,
CASE
WHEN (#delta := count-#last) IS NULL THEN NULL
WHEN (#last:=count ) IS NULL THEN NULL
ELSE #delta
END AS delta
from
measurement,
(select #last := 0) s_init
;
The CASE clause has three WHEN conditions. It executes them by order until it meets the first that succeeds. I've written them such that the first two will always fail. It therefore executes the first, then turns to execute the second, then finally returns the third. Always.
I thus overcome the problem of expecting order of evaluation, which is a real and true problem, mostly evident when you start adding more complex clauses such as GROUP BY, DISTINCT, ORDER BY and such.
As a final note, my solution differs from yours in the first row on the result set -- with yours' it returns NULL, with mine it returns the delta between 0 and count. Had I used NULL I would have needed to change the WHEN conditions in some other way -- making sure they would fail on NULL values.