Optimizing MySQL query with nested select statements? - mysql

I've got read-only access to a MySQL database, and I need to loop through the following query about 9000 times, each time with a different $content_path_id. I'm calling this from within a PERL script that's pulling the '$content_path_id's from a file.
SELECT an.uuid FROM alf_node an WHERE an.id IN
(SELECT anp.node_id FROM alf_node_properties anp WHERE anp.long_value IN
(SELECT acd.id FROM alf_content_data acd WHERE acd.content_url_id = $content_path_id));
Written this way, it's taking forever to do each query (approximately 1 minute each). I'd really rather not wait 9000+ minutes for this to complete if I don't have to. Is there some way to speed up this query? Maybe via a join? My current SQL skills are embarrassingly rusty...

This is an equivalent query using joins. It depends what indexes are defined on the tables how this will perform.
If your Perl interface has the notion of prepared statements, you may be able to save some time by preparing once and executing with 9000 different binds.
You could also possibly save time by building one query with a big acd.content_url_id In ($content_path_id1, $content_path_id2, ...) clause
Select
an.uuid
From
alf_node an
Inner Join
alf_node_properties anp
On an.id = anp.node_id
Inner Join
alf_content_data acd
On anp.long_value = acd.id
Where
acd.content_url_id = $content_path_id

Try this extension to Laurence's solution which replaces the long list of OR's with an additional JOIN:
Select
an.uuid
From alf_node an
Join alf_node_properties anp
On an.id = anp.node_id
Join alf_content_data acd
On anp.long_value = acd.id
Join (
select "id1" as content_path_id union all
select "id2" as content_path_id union all
/* you get the idea */
select "idN" as content_path_id
) criteria
On acd.content_url_id = criteria.content_path_id
I have used SQL Server syntax above but you should be able to translate it readily.

Related

Perform a SQL Query

I have this query in PHP MySQL PDO:
SELECT p.las_plano_id, p.mensalidade_diferenciada, v.las_tipos_planos_id, t.valor_mensalidade
FROM isw_planos AS p
INNER JOIN isw_planos_vinculos AS v
ON p.las_plano_id =
(SELECT v.las_plano_id
FROM isw_planos_vinculos
WHERE v.data_encerramento IS NULL
ORDER BY v.data_adesao
DESC LIMIT 1)
INNER JOIN isw_planos_tipos AS t
ON v.las_tipos_planos_id = t.id
WHERE p.ativo = 1
But.. the result generate a long delay.. it's possible to perform this query to execute more fast?
Thnaks..
I suspect the error is with v.:
This looks wrong: SELECT v.las_plano_id ... since v is outside the subquery. Please check the aliases used.
If removing v. does not help, please provide SHOW CREATE TABLE so we can see the indexes, etc.

Rewrite MySQL query to MongoDB

i really need little bit of help,i am new to MongoDB, is it possible to join collection and to rewrite following query to MongoDB, or if it is not possible, how can i realise this one query? Thanks
SELECT SQL_NO_CACHE
b.id,
b.id_pol AS vrsta_pola,
b.id_dxzavljanstvo AS drzavljanstvo,
o.naziv AS opstina,
bol.naziv AS bolnica,
b.id_odeljenje AS odeljenje_prijema,
b.datum_prijema,
b.id_uputna_dijagnoza AS uputna_dijagnoza,
b.id_osnovni_uzzok_hospitalizacije AS osnovni_uzrok_hospitalizacije,
b.datum_ispisa,
b.bx_dana_lezanja,
b.id_odeljenje_otpust AS odeljenje_otpust,
b.id_vrsta_otpusta,
b.id_obdukovan,
b.id_spoljni_uzzok_povrede AS spoljni_uzzok_povrede
FROM bol_rac_fl b
LEFT JOIN bolnica bol ON b.idbolnica = bol.id_bolnica
LEFT JOIN opstina o ON b.id_opstina = o.id_opstina
WHERE b.id_opstina = 70220
I read that it is not possible to join collection , and it work with lookup, but i don't understand how is possible to get similar result such as MySQL will give back after execution of this query?

MariaDB performance issue with "Where IN" clause

I got an issue with my SQL code. We developed an application which runs on MySQL, and there it runs fine. So I decided to give MariaDB a try and installed it on a dev machine. On a certain query Stmt, i have a performance issue I do not understand. The query is the following:
SELECT SAMPLES.*, UNIX_TIMESTAMP(SAMPLES.SAMPLE_DATE) as TIMESTAMP,RAWS.VALUE, DATAKEYS.RAW_ID, DATAKEYS.DATA_KEY_VALUE, DATAKEYS.DATA_KEY_ID, KEYDEF.KEY_NAME, KEYDEF.LDD_ID
FROM
PDS.TABLE_SAMPLES SAMPLES
RIGHT OUTER JOIN PDS.TABLE_RAW_VALUES RAWS ON SAMPLES.SAMPLE_ID = RAWS.SAMPLE_ID
RIGHT OUTER JOIN PDS.TABLE_SAMPLE_DATA_KEYS DATAKEYS ON(DATAKEYS.RAW_ID = RAWS.RAW_ID AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID) OR
(DATAKEYS.RAW_ID = 0 AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID)
RIGHT OUTER JOIN PDS.TABLE_DATA_KEY_DEFINITION KEYDEF ON(DATAKEYS.DATA_KEY_ID = KEYDEF.DATA_KEY_ID)
WHERE
SAMPLES.SAMPLE_ID IN(1991331,1991637,1991941,2046105,2046411,2046717,2047023,2047635,2047941,2048247)
AND (SAMPLES.PARAMETER_ID = 9)
GROUP BY DATAKEYS.DATA_KEY_ID, RAWS.RAW_ID, DATAKEYS.DATA_KEY_ID
ORDER BY SAMPLES.SAMPLE_ID, DATAKEYS.RAW_ID;
As long as I got only ONE value in the "WHERE IN" condition, the query takes ~10ms to execute. That's about the same MySQL 5.6 took.
As soon as I add another value there, the query time raises to several minutes. In MySQL, it raises very slowly, the Query shown up tehre takes ~150ms on MySQL and about 140 seconds on the new MariaDB installation using exactly the same datasets.
I'm no SQL expert, can you give me some clues how to optimize the query to run as expected?
The right outer joins are being converted to inner joins by the where clause. So, just use the proper join type (I'm not sure if this affects the optimization of the query, but it could):
SELECT SAMPLES.*, UNIX_TIMESTAMP(SAMPLES.SAMPLE_DATE) as TIMESTAMP,RAWS.VALUE, DATAKEYS.RAW_ID, DATAKEYS.DATA_KEY_VALUE, DATAKEYS.DATA_KEY_ID, KEYDEF.KEY_NAME, KEYDEF.LDD_ID
FROM PDS.TABLE_SAMPLES SAMPLES JOIN
PDS.TABLE_RAW_VALUES RAWS
ON SAMPLES.SAMPLE_ID = RAWS.SAMPLE_ID JOIN
PDS.TABLE_SAMPLE_DATA_KEYS DATAKEYS
ON (DATAKEYS.RAW_ID = RAWS.RAW_ID AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID) OR
(DATAKEYS.RAW_ID = 0 AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID) JOIN
PDS.TABLE_DATA_KEY_DEFINITION KEYDEF
ON DATAKEYS.DATA_KEY_ID = KEYDEF.DATA_KEY_ID)
WHERE SAMPLES.SAMPLE_ID IN (1991331, 1991637, 1991941, 2046105, 2046411, 2046717, 2047023, 2047635, 2047941, 2048247) AND
(SAMPLES.PARAMETER_ID = 9)
GROUP BY DATAKEYS.DATA_KEY_ID, RAWS.RAW_ID, DATAKEYS.DATA_KEY_ID
ORDER BY SAMPLES.SAMPLE_ID, DATAKEYS.RAW_ID;
Next, the best index for this query -- regardless of the number of values in the IN is the composite index PDS.TABLE_SAMPLES(PARAMETER_ID, SAMPLE_ID). This handles the WHERE clause.
Because your query runs quickly under some circumstances, I assume the other tables have the appropriate indexes for the joins.
Instead of operator 'IN' try using 'exists' and the use the subquery
instead of using sample_id's.

MySQL QUERY in preparing for too long

The following SQL has a preparing time of 30+ second. Is the SQL which is wrong, or the fact that I have close to one million result in the database? Can this SQL be optimized not to have it in preparing for that long?
UPDATE url_source_wp SET hash="ASDF2"
WHERE (url_source_wp.id NOT IN (
SELECT url_done_wp.url_source_wp FROM url_done_wp WHERE url_done_wp.url_group = 4)
)
AND (hash IS NULL) LIMIT 50
If preparation is your issue, you can pre compile it to a stored procedure.
See this :http://dev.mysql.com/doc/refman/5.0/en/stored-routines.html
It seems like you could more optimally do this update across a JOIN, avoiding the use of the sub-select.
UPDATE
url_source_wp AS s
INNER JOIN url_done_wp AS d
ON s.id = d.url_source_wp
SET
s.hash = 'ASDF2'
WHERE
s.hash IS NULL
AND d.url_group = 4
You need to make sure you have indexes on s.id, d.url_source_wp, s.hash, and d.url_group. Also, note that you can't use LIMIT with multi-table syntax, so if this is important this suggestion will likely not work for you.

Why is Linq2SQL generating a nested query instead of using a JOIN?

I'm trying to understand why Linq is generating the SQL that it is for the statement below:
var dlo = new DataLoadOptions();
dlo.LoadWith<TemplateNode>(x => x.TemplateElement);
db.LoadOptions = dlo;
var data = from node in db.TemplateNodes
where node.TemplateId == someValue
orderby node.Left
select node;
Which generates the following SQL:
SELECT [t2].[Id],
[t2].[ParentId],
[t2].[TemplateId],
[t2].[ElementId],
[t2].[Left] AS [Left],
[t2].[Right] AS [Right],
[t2].[Id2],
[t2].[Content]
FROM (SELECT ROW_NUMBER() OVER (ORDER BY [t0].[Left]) AS [ROW_NUMBER],
[t0].[Id],
[t0].[ParentId],
[t0].[TemplateId],
[t0].[ElementId],
[t0].[Left],
[t0].[Right],
[t1].[Id] AS [Id2],
[t1].[Content]
FROM [dbo].[TemplateNode] AS [t0]
INNER JOIN [dbo].[TemplateElement] AS [t1]
ON [t1].[Id] = [t0].[ElementId]
WHERE [t0].[TemplateId] = 16 /* #p0 */) AS [t2]
WHERE [t2].[ROW_NUMBER] > 1 /* #p1 */
ORDER BY [t2].[ROW_NUMBER]
There is a Foreign Key from TemplateNode.ElementId to TemplateElement.Id.
I would have expected the query to produce a JOIN, like so:
SELECT * FROM TemplateNode
INNER JOIN TemplateElement ON TemplateNode.ElementId = TemplateElement.Id
WHERE TemplateNode.TemplateId = #TemplateId
As per the suggestions in the answers to this question I have profiled both queries and the JOIN is 3 times faster than the nested query.
I'm using a .NET 4.0 Windows Forms app to test with SQL Server 2008 SP2 64bit developer edition.
The only reason that LINQ-SQL would generate the ROW_NUMBER query is due to the Skip Method. As bizare as the above SQL seems, I think within T-SQL there is no construct for simple paging like MySQL's Limit 10,25, so you get the above SQL when using Skip and Take.
I would assume that there is a Skip being used for paging purposes and LINQ-SQL is modifying the query. If you use an application like LINQ-Pad you can run different LINQ queries to see their generated SQL.
Your example of a join is not equivalent. You cannot get the ROW_NUMBER and subsequently select only rows WHERE ROW_NUMBER > 1 with a simple join. You would have to do a sub-select or similar to get this result.