Wordnet MySQL statement doesn't complete - mysql

I'm using the Wordnet SQL database from here: http://wnsqlbuilder.sourceforge.net
It's all built fine and users with appropriate privileges have been set.
I'm trying to find synonyms of words and have tried to use the two example statements at the bottom of this page: http://wnsqlbuilder.sourceforge.net/sql-links.html
SELECT synsetid,dest.lemma,SUBSTRING(src.definition FROM 1 FOR 60) FROM wordsXsensesXsynsets AS src INNER JOIN wordsXsensesXsynsets AS dest USING(synsetid) WHERE src.lemma = 'option' AND dest.lemma <> 'option'
SELECT synsetid,lemma,SUBSTRING(definition FROM 1 FOR 60) FROM wordsXsensesXsynsets WHERE synsetid IN ( SELECT synsetid FROM wordsXsensesXsynsets WHERE lemma = 'option') AND lemma <> 'option' ORDER BY synsetid
However, they never complete. At least not in any reasonable amount of time and I have had to cancel all of the queries. All other queries seem to work find and when I break up the second SQL example, I can get the individual parts to work and complete in reasonable times (about 0.40 seconds)
When I try and run the full statement however, the MySQL command line client just hangs.
Is there a problem with this syntax? What is causing it to take so long?
EDIT:
Output of "EXPLAIN SELECT ..."
Output of "EXPLAIN EXTENDED ...; SHOW WARNINGS;"

I did more digging and looking into the various statements used and found the problem was in the IN command.
MySQL repeats the statement for every single row in the database. This is the cause of the hang, as it had to run through hundreds of thousands of records.
My remedy to this was to split the command into two separate database calls first getting the synsets, and then dynamically creating a bound SQL string to look for the words in the synsets.

Related

Common Table Expressions -- Using a Variable in the Predicate

I've written a common table expression to return hierarchical information and it seems to work without issue if I hard code a value into the WHERE statement. If I use a variable (even if the variable contains the same information as the hard coded value), I get the error The maximum recursion 100 has been exhausted before statement completion.
This is easier shown with a simple example (note, I haven't included the actual code for the CTE just to keep things clearer. If you think it's useful, I can certainly add it).
This Works
WITH Blder
AS
(-- CODE IS HERE )
SELECT
*
FROM Blder as b
WHERE b.PartNo = 'ABCDE';
This throws the Max Recursion Error
DECLARE #part CHAR(25);
SET #part = 'ABCDE'
WITH Blder
AS
(-- CODE IS HERE )
SELECT
*
FROM Blder as b
WHERE b.PartNo = #part;
Am I missing something silly? Or does the SQL engine handle hardcoded values and parameter values differently in this type of scenario?
Kindly put semicolon at the end of your variable assignment statement
SET #part ='ABCDE';
Your SELECT statement is written incorrectly: the SQL Server Query Optimizer is able to optimize away the potential cycle if fed the literal string, but not when it's fed a variable, which uses the plan that developed from the statistics.
SQL Server 2016 improved on the Query Optimizer, so if you could migrate your DB to SQL Server 2016 or newer, either with the DB compatibility level set to 130 or higher (for SQL Server 2016 and up), or have it kept at 100 (for SQL Server 2008) but with OPTION (USE HINT ('ENABLE_QUERY_OPTIMIZER_HOTFIXES')) added to the bottom of your SELECT statement, you should get the desired result without the max recursion error.
If you are stuck on SQL Server 2008, you could also add OPTION (RECOMPILE) to the bottom of your SELECT statement to create an ad hoc query plan that would be similar to the one that worked correctly.

MySQL 5.7 RAND() and IF() without LIMIT leads to unexpected results

I have the following query
SELECT t.res, IF(t.res=0, "zero", "more than zero")
FROM (
SELECT table.*, IF (RAND()<=0.2,1, IF (RAND()<=0.4,2, IF (RAND()<=0.6,3,0))) AS res
FROM table LIMIT 20) t
which returns something like this:
That's exactly what you would expect. However, as soon as I remove the LIMIT 20 I receive highly unexpected results (there are more rows returned than 20, I cut it off to make it easier to read):
SELECT t.res, IF(t.res=0, "zero", "more than zero")
FROM (
SELECT table.*, IF (RAND()<=0.2,1, IF (RAND()<=0.4,2, IF (RAND()<=0.6,3,0))) AS res
FROM table) t
Side notes:
I'm using MySQL 5.7.18-15-log and this is a highly abstracted example (real query is much more difficult).
I'm trying to understand what is happening. I do not need answers that offer work arounds without any explanations why the original version is not working. Thank you.
Update:
Instead of using LIMIT, GROUP BY id also works in the first case.
Update 2:
As requested by zerkms, I added t.res = 0 and t.res + 1 to the second example
The problem is caused by a change introduced in MySQL 5.7 on how derived tables in (sub)queries are treated.
Basically, in order to optimize performance, some subqueries are executed at different times and / or multiple times leading to unexpected results when your subquery returns non-deterministic results (like in my case with RAND()).
There are two easy (and likewise ugly) workarounds to get MySQL to "materialize" (aka return deterministic results) these subqueries: Use LIMIT <high number> or GROUP BY id both of which force MySQL to materialize the subquery and return the expected results.
The last option is turn off derived_merge in the optimizer_switch variable: derived_merge=off (make sure to leave all the other parameters as they are).
Further readings:
https://mysqlserverteam.com/derived-tables-in-mysql-5-7/
Subquery's rand() column re-evaluated for every repeated selection in MySQL 5.7/8.0 vs MySQL 5.6

nested "select " query in mysql

hi i am executing nested "select" query in mysql .
the query is
SELECT `btitle` FROM `backlog` WHERE `bid` in (SELECT `abacklog_id` FROM `asprint` WHERE `aid`=184 )
I am not getting expected answer by the above query. If I execute:
SELECT abacklog_id FROM asprint WHERE aid=184
separately
I will get abacklog_id as 42,43,44,45;
So if again I execute:
SELECT `btitle` FROM `backlog` WHERE `bid` in(42,43,44,45)
I will get btitle as scrum1 scrum2 scrum3 msoffice
But if I combine those queries I will get only scrum1 remaining 3 atitle will not get.
You Can Try As Like Following...
SELECT `age_backlog`.`ab_title` FROM `age_backlog` LEFT JOIN `age_sprint` ON `age_backlog`.`ab_id` = `age_sprint`.`as_backlog_id` WHERE `age_sprint`.`as_id` = 184
By using this query you will get result with loop . You will be able to get all result with same by place with comma separated by using IMPLODE function ..
May it will be helpful for you... If you get any error , Please inform me...
What you did is to store comma separated values in age_sprint.as_backlog_id, right?
Your query actually becomes
SELECT `ab_title` FROM `age_backlog` WHERE `ab_id` IN ('42,43,44,45')
Note the ' in the IN() function. You don't get separate numbers, you get one string.
Now, when you do
SELECT CAST('42,43,44,45' AS SIGNED)
which basically is the implicit cast MySQL does, the result is 42. That's why you just get scrum1 as result.
You can search for dozens of answers to this problem here on SO.
You should never ever store comma separated values in a database. It violates the first normal form. In most cases databases are in third normal form or BCNF or even higher. Lower normal forms are just used in some special cases to get the most performance, usually for reporting issues. Not for actually working with data. You want 1 row for every as_backlog_id.
Again, your primary goal should be to get a better database design, not to write some crazy functions to get each comma separated number out of the field.

How could I know how much time it takes to query items in a table of MYSQL?

Our website has a problem: The visiting time of one page is too long. We have found out that it has a n*n matrix in that page; and for each item in the matrix, it queries three tables from MYSQL database. Every item in that matrix do the query quiet alike.
So I wonder maybe it is the large amount of MYSQL queries lead to the problem. And I want to try to fix it. Here is one of my confusions I list below:
1.
m = store.execute('SELECT X FROM TABLE1 WHERE I=1')
result = store.execute('SELECT Y FROM TABLE2 WHERE X in m')
2.
r = store.execute('SELECT X, Y FROM TABLE2');
result = []
for each in r:
i = store.execute('SELECT I FROM TABLE1 WHERE X=%s', each[0])
if i[0][0]=1:
result.append(each)
It got about 200 items in TABLE1 and more then 400 items in TABLE2. I don't know witch part takes the most time, so I can't make a better decision of how to write my sql statement.
How could I know how much time it takes to do some operation in MYSQL? Thank you!
Rather than installing a bunch of special tools, you could take a dead-simple approach like this (pardon my Ruby):
start = Time.new
# DB query here
puts "Query XYZ took #{Time.now - start} sec"
Hopefully you can translate that to Python. OR... pardon my Ruby again...
QUERY_TIMES = {}
def query(sql)
start = Time.new
connection.execute(sql)
elapsed = Time.new - start
QUERY_TIMES[sql] ||= []
QUERY_TIMES[sql] << elapsed
end
Then run all your queries through this custom method. After doing a test run, you can make it print out the number of times each query was run, and the average/total execution times.
For the future, plan to spend some time learning about "profilers" (if you haven't already). Get a good one for your chosen platform, and spend a little time learning how to use it well.
I use the MySQL Workbench for SQL development. It gives response times and can connect remotely to MySQL servers granted you have the permission (which in this case will give you a more accurate reading).
http://www.mysql.com/products/workbench/
Also, as you've realized it appears you have a SQL statement in a for loop. That could drastically effect performance. You'll want to take a different route with retrieving that data.

Why do our queries get stuck on the state "Writing to net" in MySql?

We have a lot of queries
select * from tbl_message
that get stuck on the state "Writing to net". The table has 98k rows.
The thing is... we aren't even executing any query like that from our application, so I guess the question is:
What might be generating the query?
...and why does it get stuck on the state "writing to net"
I feel stupid asking this question, but I'm 99,99% sure that our application is not executing a query like that to our database... we are however executing a couple of querys to that table using WHERE statement:
SELECT Count(*) as StrCount FROM tbl_message WHERE m_to=1960412 AND m_restid=948
SELECT Count(m_id) AS NrUnreadMail FROM tbl_message WHERE m_to=2019422 AND m_restid=440 AND m_read=1
SELECT * FROM tbl_message WHERE m_to=2036390 AND m_restid=994 ORDER BY m_id DESC
I have searched our application several times for select * from tbl_message but haven't found anything... But still our query-log on our mysql server is full of Select * from tbl_message queries
Since applications don't magically generate queries as they like, I think that it's rather likely that there's a misstake somewhere in your application that's causing this. Here's a few suggestions that you can use to track it down. I'm guessing that your using PHP, since your using MySQL, so I'll use that for my examples.
Try adding comments in front of all your queries in the application, like this:
$sqlSelect = "/* file.php, class::method() */";
$sqlSelect .= "SELECT * FROM foo ";
$sqlSelect .= "WHERE criteria";
The comment will show up in your query log. If you're using some kind database api wrapper, you could potentially add these messages automatically:
function query($sql)
{
$backtrace = debug_backtrace();
// The function that executed the query
$prev = $backtrace[1];
$newSql = sprintf("/* %s */ ", $prev["function"]);
$newSql .= $sql;
mysql_query($newSql) or handle_error();
}
In case you're not using a wrapper, but rather executing the queries directly, you could use the runkit extension and the function runkit_function_rename to rename mysql_query (or whatever you're using) and intercept the queries.
There are (at least) two data retrieval modes for mysql. With the c api you either call mysql_store_result() or mysql_use_result().
mysql_store_result() returns when all result data is transferred from the MySQL server to your process' memory, i.e. no data has to be transferred for further calls to mysql_fetch_row().
However, by using mysql_use_result() each record has to be fetched individually if and when mysql_fetch_row() is called. If your application does some computing that takes longer than the time period specified in net_write_timeout between two calls to mysql_fetch_row() the MySQL server considers your connection to be timed out.
Temporarily enable the query log by putting
log=
into your my.cnf file, restart mysql and watch the query log for those mystery queries (you don't have to give the log a name, it'll assume one from the host value).