MariaDB performance issue with "Where IN" clause - mysql

I got an issue with my SQL code. We developed an application which runs on MySQL, and there it runs fine. So I decided to give MariaDB a try and installed it on a dev machine. On a certain query Stmt, i have a performance issue I do not understand. The query is the following:
SELECT SAMPLES.*, UNIX_TIMESTAMP(SAMPLES.SAMPLE_DATE) as TIMESTAMP,RAWS.VALUE, DATAKEYS.RAW_ID, DATAKEYS.DATA_KEY_VALUE, DATAKEYS.DATA_KEY_ID, KEYDEF.KEY_NAME, KEYDEF.LDD_ID
FROM
PDS.TABLE_SAMPLES SAMPLES
RIGHT OUTER JOIN PDS.TABLE_RAW_VALUES RAWS ON SAMPLES.SAMPLE_ID = RAWS.SAMPLE_ID
RIGHT OUTER JOIN PDS.TABLE_SAMPLE_DATA_KEYS DATAKEYS ON(DATAKEYS.RAW_ID = RAWS.RAW_ID AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID) OR
(DATAKEYS.RAW_ID = 0 AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID)
RIGHT OUTER JOIN PDS.TABLE_DATA_KEY_DEFINITION KEYDEF ON(DATAKEYS.DATA_KEY_ID = KEYDEF.DATA_KEY_ID)
WHERE
SAMPLES.SAMPLE_ID IN(1991331,1991637,1991941,2046105,2046411,2046717,2047023,2047635,2047941,2048247)
AND (SAMPLES.PARAMETER_ID = 9)
GROUP BY DATAKEYS.DATA_KEY_ID, RAWS.RAW_ID, DATAKEYS.DATA_KEY_ID
ORDER BY SAMPLES.SAMPLE_ID, DATAKEYS.RAW_ID;
As long as I got only ONE value in the "WHERE IN" condition, the query takes ~10ms to execute. That's about the same MySQL 5.6 took.
As soon as I add another value there, the query time raises to several minutes. In MySQL, it raises very slowly, the Query shown up tehre takes ~150ms on MySQL and about 140 seconds on the new MariaDB installation using exactly the same datasets.
I'm no SQL expert, can you give me some clues how to optimize the query to run as expected?

The right outer joins are being converted to inner joins by the where clause. So, just use the proper join type (I'm not sure if this affects the optimization of the query, but it could):
SELECT SAMPLES.*, UNIX_TIMESTAMP(SAMPLES.SAMPLE_DATE) as TIMESTAMP,RAWS.VALUE, DATAKEYS.RAW_ID, DATAKEYS.DATA_KEY_VALUE, DATAKEYS.DATA_KEY_ID, KEYDEF.KEY_NAME, KEYDEF.LDD_ID
FROM PDS.TABLE_SAMPLES SAMPLES JOIN
PDS.TABLE_RAW_VALUES RAWS
ON SAMPLES.SAMPLE_ID = RAWS.SAMPLE_ID JOIN
PDS.TABLE_SAMPLE_DATA_KEYS DATAKEYS
ON (DATAKEYS.RAW_ID = RAWS.RAW_ID AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID) OR
(DATAKEYS.RAW_ID = 0 AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID) JOIN
PDS.TABLE_DATA_KEY_DEFINITION KEYDEF
ON DATAKEYS.DATA_KEY_ID = KEYDEF.DATA_KEY_ID)
WHERE SAMPLES.SAMPLE_ID IN (1991331, 1991637, 1991941, 2046105, 2046411, 2046717, 2047023, 2047635, 2047941, 2048247) AND
(SAMPLES.PARAMETER_ID = 9)
GROUP BY DATAKEYS.DATA_KEY_ID, RAWS.RAW_ID, DATAKEYS.DATA_KEY_ID
ORDER BY SAMPLES.SAMPLE_ID, DATAKEYS.RAW_ID;
Next, the best index for this query -- regardless of the number of values in the IN is the composite index PDS.TABLE_SAMPLES(PARAMETER_ID, SAMPLE_ID). This handles the WHERE clause.
Because your query runs quickly under some circumstances, I assume the other tables have the appropriate indexes for the joins.

Instead of operator 'IN' try using 'exists' and the use the subquery
instead of using sample_id's.

Related

Mysql - Add if condition in the Mysql Query or a where clause

I have written a join query in MySQL which works well and shows the result.
I am trying to write a MySQL query that shows 2 additional columns with some calculations
If isPercent=1 then
New Column1=price*currentPercent/100
New Column2=LineItemQuantity*price
I tried to write this query in PHP but since there are 100,000s records it is timing out.
Here is MySQL query and the results shown below
Select
wl.LineItems_LineItemID,
wl.LineItemQuantity,
pj.IsPercent,
pj.CurrentPercent,
pj.CurrentRate,
cb.Price
from
WorkOrderLineItems wl,
PayScaleLoaclJObCodes pj,
ClientBillingRates cb
where
wl.LineItems_LineItemID=pj.JobCodeID
AND wl.LineItems_LineItemID=cb.ClientBillingRates_ID
AND pj.PayScalesLocal_ID='33'
I would write the query this way:
SELECT
wl.LineItems_LineItemID,
wl.LineItemQuantity,
pj.IsPercent,
pj.CurrentPercent,
pj.CurrentRate,
cb.Price,
IF(pj.IsPercent=1, cb.Price*pj.CurrentPercent/100, NULL) AS `New Column 1`,
IF(pj.IsPercent=1, wl.LineItemQuantity*cb.Price, NULL) AS `New Column 2`
FROM
WorkOrderLineItems wl
JOIN PayScaleLoaclJObCodes pj ON wl.LineItems_LineItemID = pj.JobCodeID
JOIN ClientBillingRates cb ON wl.LineItems_LineItemID = cb.ClientBillingRates_ID
WHERE pj.PayScalesLocal_ID = '33'
As in the comments above, I encourage you to use JOIN syntax instead of relying on old-fashioned comma-style joins.
As for the query timing out, I would guess that you don't have the right indexes to support this query. If you want help with query optimization, you should run SHOW CREATE TABLE <tablename> for each table in your query, and post the output in your question.

sphinx search field weights

I am getting a syntax error on my query in my .conf file
Everything worked great until I added the OPTION field_weights. What am I doing wrong for defining my field weights?
here is the query for my sphinx index
source tx3nh_users : src {
sql_query_range = SELECT MIN(id), MAX(id) FROM tx3nh_users
sql_query = SELECT u.id, p.fullname, p.email, s.staff_title, s.bio FROM tx3nh_users AS u LEFT JOIN tx3nh_user_attributes AS p ON u.id=p.internalKey LEFT JOIN oxv5v_su_staff AS s ON u.id=s.user_id WHERE u.id>=$start AND u.id<=$end OPTION field_weights=(p.fullname=3, s.staff_title=2, s.bio=1)
}
sql_query is a SQL query that indexer runs against your actual database. So it needs to be a valid MySQL query. Its intrepreted and executed by MySQL, to return your actual data, which then indexer turns into a sphinx index.
On the other hand OPTION field_weights is from sphinxQL. So you add it to the SphinxQL query, when you make an actaul query against the index.
sphinxQL> SELECT id FROM tx3nh_users WHERE MATCH('keyword1')
OPTION field_weights=(p.fullname=3, s.staff_title=2, s.bio=1)
Because its a query time paramater the weights arent written to the index, and so you can choose the weights on a per query basis, rather than the same weights for all queries.

Optimizing MySQL query with nested select statements?

I've got read-only access to a MySQL database, and I need to loop through the following query about 9000 times, each time with a different $content_path_id. I'm calling this from within a PERL script that's pulling the '$content_path_id's from a file.
SELECT an.uuid FROM alf_node an WHERE an.id IN
(SELECT anp.node_id FROM alf_node_properties anp WHERE anp.long_value IN
(SELECT acd.id FROM alf_content_data acd WHERE acd.content_url_id = $content_path_id));
Written this way, it's taking forever to do each query (approximately 1 minute each). I'd really rather not wait 9000+ minutes for this to complete if I don't have to. Is there some way to speed up this query? Maybe via a join? My current SQL skills are embarrassingly rusty...
This is an equivalent query using joins. It depends what indexes are defined on the tables how this will perform.
If your Perl interface has the notion of prepared statements, you may be able to save some time by preparing once and executing with 9000 different binds.
You could also possibly save time by building one query with a big acd.content_url_id In ($content_path_id1, $content_path_id2, ...) clause
Select
an.uuid
From
alf_node an
Inner Join
alf_node_properties anp
On an.id = anp.node_id
Inner Join
alf_content_data acd
On anp.long_value = acd.id
Where
acd.content_url_id = $content_path_id
Try this extension to Laurence's solution which replaces the long list of OR's with an additional JOIN:
Select
an.uuid
From alf_node an
Join alf_node_properties anp
On an.id = anp.node_id
Join alf_content_data acd
On anp.long_value = acd.id
Join (
select "id1" as content_path_id union all
select "id2" as content_path_id union all
/* you get the idea */
select "idN" as content_path_id
) criteria
On acd.content_url_id = criteria.content_path_id
I have used SQL Server syntax above but you should be able to translate it readily.

MySQL QUERY in preparing for too long

The following SQL has a preparing time of 30+ second. Is the SQL which is wrong, or the fact that I have close to one million result in the database? Can this SQL be optimized not to have it in preparing for that long?
UPDATE url_source_wp SET hash="ASDF2"
WHERE (url_source_wp.id NOT IN (
SELECT url_done_wp.url_source_wp FROM url_done_wp WHERE url_done_wp.url_group = 4)
)
AND (hash IS NULL) LIMIT 50
If preparation is your issue, you can pre compile it to a stored procedure.
See this :http://dev.mysql.com/doc/refman/5.0/en/stored-routines.html
It seems like you could more optimally do this update across a JOIN, avoiding the use of the sub-select.
UPDATE
url_source_wp AS s
INNER JOIN url_done_wp AS d
ON s.id = d.url_source_wp
SET
s.hash = 'ASDF2'
WHERE
s.hash IS NULL
AND d.url_group = 4
You need to make sure you have indexes on s.id, d.url_source_wp, s.hash, and d.url_group. Also, note that you can't use LIMIT with multi-table syntax, so if this is important this suggestion will likely not work for you.

which of these mysql queries is more efficient, using left join or not

i have a following sql query
$select_query_1 = SELECT * FROM user_module_comments WHERE useid = '$hash' ORDER BY id DESC LIMIT 0, 25
while($table = mysql_fetch_array($select_query_1)){
$user_moid = $table['canvas'];
$user_xtract_canvas = mysql_query("SELECT mcanvas FROM user_module WHERE uid = '$user_moid' LIMIT 1");
$selected = mysql_fetch_array($user_xtract_canvas);
$user_canvas_extract = $selected['mcanvas']; // this is what i need
}
OR this sql query
$select_query = SELECT user_module_comments.useid, user_module.mcanvas FROM user_module_comments LEFT JOIN user_module ON user_module.uid = user_module_comments.useid WHERE useid = '$hash' ORDER BY user_module_comments.id DESC LIMIT 0, 25
which of these queries is more efficient
thank
The JOIN is likely to be far, far faster than doing related queries in a loop. In general it is almost always faster to do one query than to do n queries. I only say "almost always" because I'm sure someone can come up with a use case where the opposite may be true.
There is a lot of overhead involved with MySQL compiling the SQL statement over and over in the loop, executing it, and fetching a rowset. Using the single statement eliminates all of that overhead.
You should install Xdebug and actually profile these statements in PHP to find out how long they take to execute.