"where not in (subquery)" very slow for a large table - mysql

I want to check whether cust_data_card_id exists or not in table tbl_cust_data , exists in table tbl_cust_dump_data.cust_data_card_id etc.
I wrote the following code
select * from tbl_cust_dump_data
where tbl_cust_dump_data.cust_data_card_id NOT IN
(select cust_data_card_id from tbl_cust_data);
When the query returns more than 18000 rows in tbl_cust_data it doesn't return any result, it only shows the loading for long time, but it is okay smaller sets of data

Try:
SELECT *
FROM tbl_cust_dump_data t1
WHERE NOT EXISTS
(SELECT 0
FROM tbl_cust_data t2
WHERE t2.cust_data_card_id = t1.cust_data_card_id)
With this query you will be able to select rows from tbl_cust_dump_data, if cust_data_card_id doesn't exist in table tbl_cust_data.

I think there was a similar question asked not long ago >> HERE <<. Make sure you have cust_data_card_id indexed and please post the result of
EXPLAIN
select * from tbl_cust_dump_data
where tbl_cust_dump_data.cust_data_card_id NOT IN
(select cust_data_card_id from tbl_cust_data);
So that we can see what can be optimized further.

Related

Can a mysql query be extended as child of another query?

Lets assume we have got query1 as follow :
select * from users where status = 1
this will output some results,I can cache these data, now the second query is :
select * from users where status = 1 and point >= 50
as you see the second query is somehow the child of first query, it returns a subset of last query data and has common code as well, is there a way which I can speed up my second query by using first query results and shorten my code using the first query code?
Yes, you use nested queries:
select x.*
from
(
select * from users
where status = 1
) as x
where x.point >= 50;

Refactoring a large union statement to use a SELECT which breaks early

It's a bit difficult to explain the situation, but currently I'm generating massive unions to accomplish this. They look at bit like:
(
SELECT
ipaddress
FROM post
WHERE ipaddress = 'someipaddress'
AND userid NOT IN (1, {$postinfo['userid']}, {$vbulletin->options['sdwikipostuserid']})
LIMIT 1
)
UNION
(
SELECT
ipaddress
FROM post
WHERE ipaddress = 'someotheripaddress'
AND userid NOT IN (1, {$postinfo['userid']}, {$vbulletin->options['sdwikipostuserid']})
LIMIT 1
)
These get huge fast, but seem to be the fastest way for me to accomplish this right now. I've tried refactoring it to something like:
SELECT
ipaddress
FROM post
WHERE ipaddress in ('all ips', .....)
AND userid NOT IN (1, {$postinfo['userid']}, {$vbulletin->options['sdwikipostuserid']})
GROUP BY ipaddress
But this is around x5 slower than the massive union statement. The big issue is that the post table is huuuuuge, so the refactored SQL is forced to look through the entire table where each union statement can break after finding a single instance. Is there any way to specify the SQL to break on finding the first unique group?
Anyone have tips on how to refactor the huge union statement above into something cleaner?
You can write the query like this:
select i.ipaddress
from (select 'someipaddress' as ipaddress union all
select 'someotheripaddress'
) i
where exists (select 1
from posts p
where p.ipaddress = i.ipaddress and
p.userid NOT IN (1, {$postinfo['userid']}, {$vbulletin->options['sdwikipostuserid']})
);
This is optimized with an index on posts(ipaddress, userid) -- one index, two columns.

Stored procedure is so slowly when using count distinct

When I run the stored procedure for the first time, it is so slow and the process lasts for 1 minute, and then I run it again and it lasts 10 seconds.
Following is my main sql statement, please help me to check out , thank you very much!
example 1
SELECT sql_no_cache view_address.is_facility,count(DISTINCT
view_address.provider_id)as totalCount FROM pv_mview_provider_address view_address WHERE
view_address.network_group_id=5047 AND view_address.carrier_group_id=93 GROUP BY
view_address.is_facility;
explain:
example 2:
SELECT SQL_NO_CACHE is_facility,count(distinct provider_id) FROM (SELECT
view_address.provider_id,view_address.is_facility FROM pv_mview_provider_address
view_address WHERE view_address.network_group_id=5047 AND view_address.carrier_group_id=93
) as p GROUP BY is_facility
explain:
this sql will spend 10 s to load the data.
The table stores 4000,0000 rows.
Thank you very much!
For this query:
select sql_no_cache a.is_facility,
count(distinct a.provider_id) as totalCount
from pv_mview_provider_address a
where a.network_group_id = 5047 and
a.carrier_group_id = 93
group by a.is_facility;
You want an index. The best index is pv_mview_provider_address(network_group_id, carrier_group_id, is_facility). However, if the reference in the from clause is a view and not a table, then you need to figure out what is happening with the view.

MySQL SELECT 1 vs SELECT `field_id` AND COUNT 1 vs COUNT (*) or COUNT (`field_id`) Performance wise

I have a very simple question.
I want to know if a certain database row exists.
I generally use :
SELECT 1 FROM `my_table` WHERE `field_x` = 'something'
Then I fetch the result with :
$row = self::$QueryObject->fetch();
And check if any results :
if(isset($row[1]) === true){
return(true);
}
You can do this also with :
COUNT 1 FROM `my_table` WHERE `field_x` = 'something'
And similar to COUNT * FROMmy_tableandCOUNT field_id FROM `my_table
But I was wondering.. How does this relate to performance?
Are there any cons to using SELECT 1 or COUNT 1??
My feeling says that select INTEGER 1 means the lowest load.
But is this actually true??
Can anyone enlighten me?
Actually all your solutions are suboptimal :) What you do with your queries is reading every row there is to be found, even if you add limit. Do it like this:
SELECT EXISTS ( SELECT 1 FROM `my_table` WHERE `field_x` = 'something');
EXISTS returns 1 if something was found, 0 if not. It stops searching as soon as an entry was found. What you select in the subquery doesn't matter, you can even select null.
Also keep in mind, that COUNT(*) or COUNT(1) are very different from COUNT(column_name). COUNT(*) counts every row, while COUNT(column_name) only count the rows that are not null.
If you add the LIMIT 1 to the end of the query then SELECT works better than COUNT especially when you have a large table.

Why is this SQL query with subquery very slow?

I have this query:
select *
from transaction_batch
where id IN
(
select MAX(id) as id
from transaction_batch
where status_id IN (1,2)
group by status_id
);
The inner query runs very fast (less than 0.1 seconds) to get two ID's, one for status 1, one for status 2, then it selects based on primary key so it is indexed. The explain query says that it's searching 135k rows using where only, and I cannot for the life of me figure out why this is so slow.
The inner query is run seperatly for every row of your table over and over again.
As there is no reference to the outer query in the inner query, I suggest you split those two queries and just insert the results of the inner query in the WHERE clause.
select b.*
from transaction_batch b
inner join (
select max(id) as id
from transaction_batch
where status_id in (1, 2)
group by status_id
) bm on b.id = bm.id
my first post here.. sorry about the lack of formatting
I had a performance problem shown below:
90sec: WHERE [Column] LIKE (Select [Value] From [Table]) //Dynamic, slow
1sec: WHERE [Column] LIKE ('A','B','C') //Hardcoded, fast
1sec: WHERE #CSV like CONCAT('%',[Column],'%') //Solution, below
I had tried joining rather than subquerying.
I had also tried a hardcoded CTE.
I had lastly tried a temp table.
None of these standard options worked, and I was not willing to dosp_execute option.
The only solution that worked as:
DECLARE #CSV nvarchar(max) = Select STRING_AGG([Value],',') From [Table];
// This yields #CSV = 'A,B,C'
...
WHERE #CSV LIKE CONCAT('%',[Column],'%')