MySQL optimization with joins, group by with large number of data - mysql

The following query takes a whopping 6 seconds to execute and I can't seem to figure out why. I have an index on the table. But like it doesn't do much to speed up the query.
Query :
SELECT `AD`.`id`, `CAM`.`cam_name`, `CUI`.`cui_id`, `CAM`.`cam_id`, `AD`.`api_json_response_data` AS `refused_by_api`
FROM `tbl_api_data` AS `AD`
LEFT JOIN `tbl_camp_user_info` AS `CUI` ON `AD`.`cui_id` = `CUI`.`cui_id`
JOIN `tbl_campaign` AS `CAM` ON `CAM`.`cam_id` = `CUI`.`cui_campaign_id`
JOIN `tbl_usr_lead_setting` AS `ULS` ON `CUI`.`cui_id` = `ULS`.`cui_id`
WHERE `CUI`.`cui_status` = 'active'
AND `CAM`.`cam_status` = 'active'
AND `ULS`.`uls_status` = 'active'
AND `AD`.`status` = 'error'
AND `CUI`.`cui_cron_status` = '1'
AND `CUI`.`cui_created_date` >= '2021-07-01 00:00:00'
GROUP BY `AD`.`cui_id`
I have an index on the below table:
tbl_api_data - id,cui_id
tbl_camp_user_info - cui_id,cui_campaign_id,cui_cron_status (cui_status - not)
tbl_campaign - cam_id, cam_status
tbl_usr_lead_setting - cui_id,uls_status
index image
Total number of record in each table :
tbl_api_data - 297,297 rows
tbl_camp_user_info - 843,390 rows
tbl_campaign - 334 rows
tbl_usr_lead_setting - 879,390 rows
And query Result has 376 rows.
If I have used limit on above query like below.Result is 1o rows But it will take 8.278 sec.That's also too much.
SELECT `AD`.`id`, `CAM`.`cam_name`, `CUI`.`cui_id`, `CAM`.`cam_id`, `AD`.`api_json_response_data` AS `refused_by_api`
FROM `tbl_api_data` AS `AD`
LEFT JOIN `tbl_camp_user_info` AS `CUI` ON `AD`.`cui_id` = `CUI`.`cui_id`
JOIN `tbl_campaign` AS `CAM` ON `CAM`.`cam_id` = `CUI`.`cui_campaign_id`
JOIN `tbl_usr_lead_setting` AS `ULS` ON `CUI`.`cui_id` = `ULS`.`cui_id`
WHERE `CUI`.`cui_status` = 'active'
AND `CAM`.`cam_status` = 'active'
AND `ULS`.`uls_status` = 'active'
AND `AD`.`status` = 'error'
AND `CUI`.`cui_cron_status` = '1'
AND `CUI`.`cui_created_date` >= '2021-07-01 00:00:00'
GROUP BY `AD`.`cui_id`
LIMIT 10
I'm stuck on this for last 1 week. I really need to optimize the above query. Please help me today if possible.
Help would be appreciated. Thank you.

Going by what you posted, you have a composite index on tbl_api_data - id,cui_id. In the SQL you are joining this table with another table using "cui_id" field and you are also using this field for group by. However, you havent added index on this field. That can be a reason.
Remember that the composite index you posted cant be used for this join and group by because "cui_id" is not the leftmost field (or first field in composite index).
So try adding a separate index on "cui_id"

That seems to be a JOIN, not a LEFT JOIN.
These composite indexes may help:
AD: INDEX(status, cui_id, id, api_json_response_data)
CUI: INDEX(cui_status, cui_cron_status, cui_created_date,
cui_id, cui_campaign_id)
CAM: INDEX(cam_status, cam_id, cam_name)
ULS: INDEX(uls_status, cui_id)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
A LIMIT without and ORDER BY can lead to any random subset of the rows being returned.
"tbl_api_data - id,cui_id" -- Assuming that id is the PRIMARY KEY, this index is likely to be useless. That is, don't start a secondary index with the PK's columns(s).

Related

Which and what type indexes should i create to optimize these mysql querys?

I'm thinking about creating an hashed index in (1) because it uses equalities and bit map on (2) because the state can only be 'accepted' or 'not accepted'. What else can i use? And also my problem is that i can only try b-tree indexes on mysql oracle..
(1)select​ R.user_id from​ rent as R
inner​ ​join​ supervise S on​
R.adress = S.adress
and​ R.space_id = S.space_id
group​ ​by​ R.user_id
having​ ​count​(​distinct​ S.supervisor_id) = 1
(2) select​ ​distinct​ P.adress, P.code from​ space as P where​ (P.adress, P.code) ​not​ ​in ​(
select​ P.adress, P.code ​from​ space as P
natural​ ​join​ rent as R
natural​ ​join​ state as E ​where​ E.state = ‘accepted’)
Since there is no directly limiting criterias in query #1, it will likely be done using a merge join, and no index will improve that.
For query #2, how selective is the criteria E.state = 'accepted'? If very selective (< 5-15% of query result), then index on E.state, indexes for the joins from E to R and from R to P, and index on P.adress, P.code.
Composite index on each table:
INDEX(space_id, adress)
Don't use WHERE(a,b) IN ... -- it performs very poorly.
Don't use IN ( SELECT ... ) -- it often performs poorly.
Instead, use a JOIN.
For state, have
INDEX(state)
(or is it already the PRIMARY KEY?)
If you need more help after all that, provide SHOW CREATE TABLE and EXPLAIN SELECT ....

MYSQL remove NOT IN subquery on table itself

I have the following query which displays a list of accounts with a certain margin level:
SELECT
crm_margincall.id,
crm_margincall.CreationTime,
ba.name AS crm_bankaccount_id,
crm_margincall.name,
crm_margincall.MarginCallLevel,
crm_margincall.UseOfEquityForMargin,
crm_margincall.MarginRequired,
crm_margincall.NetEquityForMargin,
crm_margincall.MarginDeficit,
crm_margincall.balance,
crm_margincall.deposited,
crm_margincall.prefunded,
crm_margincall.required
FROM
crm_margincall
LEFT JOIN
crm_bankaccount ba ON crm_margincall.crm_bankaccount_id = ba.id
WHERE
crm_margincall.name = 'MarginCall'
AND
crm_margincall.MarginCallLevel >= 100
AND
crm_margincall.crm_account_id NOT IN
(
SELECT
x.crm_account_id
FROM
crm_margincall x
WHERE
x.crm_account_id = crm_margincall.crm_account_id
AND
x.name = 'LevelDrop'
AND
x.MarginCallLevel < 100
AND
x.id > crm_margincall.id
)
ORDER BY
id
DESC
This query, on a table of ~22.500 records takes >10 seconds to run, this is caused by the subquery defining the NOT IN section (tried NOT EXISTS, isnt much faster). How can I join this table on itself to achieve the same effect?
This query, on a table of ~22.500 records takes >10 seconds to run,
this is caused by the subquery defining the NOT IN section (tried NOT
EXISTS, isnt much faster). How can I join this table on itself to
achieve the same effect?
This can be done in several ways, but a scan of 22500 records taking 10" means either a hardware issue, or a very inefficient JOIN.
The most likely cause of the latter is a missing index or a misconfigured index, and to investigate this, you need to issue an EXPLAIN:
EXPLAIN SELECT ...
Totally shooting in the dark, judging from the selected columns being used, I'd try with
CREATE INDEX test_index ON crm_margincall(name, crm_account_id, MarginCallLevel, id)
Other improvements might be possible, but you'd need to prepare a sample structure with some fake data in a SQLfiddle to really allow debugging.
Try something like this:
SELECT
crm_margincall.id,
crm_margincall.CreationTime,
ba.name AS crm_bankaccount_id,
crm_margincall.name,
crm_margincall.MarginCallLevel,
crm_margincall.UseOfEquityForMargin,
crm_margincall.MarginRequired,
crm_margincall.NetEquityForMargin,
crm_margincall.MarginDeficit,
crm_margincall.balance,
crm_margincall.deposited,
crm_margincall.prefunded,
crm_margincall.required
FROM
crm_margincall
LEFT JOIN
crm_bankaccount ba ON crm_margincall.crm_bankaccount_id = ba.id
INNER JOIN
(
SELECT
x.crm_account_id
FROM
crm_margincall x
WHERE
x.name = 'LevelDrop'
AND
x.MarginCallLevel < 100
AND
x.id > crm_margincall.id
) tt ON crm_margincall.crm_account_id = tt.crm_account_id
WHERE
crm_margincall.name = 'MarginCall'
AND
crm_margincall.MarginCallLevel >= 100
ORDER BY
id
DESC

SQL statement hanging up in MySQL database

I am needing some SQL help. I have a SELECT statement that references several tables and is hanging up in the MySQL database. I would like to know if there is a better way to write this statement so that it runs efficiently and does not hang up the DB? Any help/direction would be appreciated. Thanks.
Here is the code:
Select Max(b.BurID) As BurID
From My.AppTable a,
My.AddressTable c,
My.BurTable b
Where a.AppID = c.AppID
And c.AppID = b.AppID
And (a.Forename = 'Bugs'
And a.Surname = 'Bunny'
And a.DOB = '1936-01-16'
And c.PostcodeAnywhereBuildingNumber = '999'
And c.PostcodeAnywherePostcode = 'SK99 9Q9'
And c.isPrimary = 1
And b.ErrorInd <> 1
And DateDiff(CurDate(), a.ApplicationDate) <= 30)
There is NO mysql error in the log. Sorry.
Pro tip: use explicit JOINs rather than a comma-separated list of tables. It's easier to see the logic you're using to JOIN that way. Rewriting your query to do that gives us this.
select Max(b.BurID) As BurID
From My.AppTable AS a
JOIN My.AddressTable AS c ON a.AppID = c.AppID
JOIN My.BurTable AS b ON c.AppID = b.AppID
WHERE (a.Forename = 'Bugs'
And a.Surname = 'Bunny'
And a.DOB = '1936-01-16'
And c.PostcodeAnywhereBuildingNumber = '999'
And c.PostcodeAnywherePostcode = 'SK99 9Q9'
And c.isPrimary = 1
And b.ErrorInd <> 1
And DateDiff(CurDate(), a.ApplicationDate) <= 30)
Next pro tip: Don't use functions (like DateDiff()) in WHERE clauses, because they defeat using indexes to search. That means you should change the last line of your query to
AND a.ApplicationDate >= CurDate() - INTERVAL 30 DAY
This has the same logic as in your query, but it leaves a naked (and therefore index-searchable) column name in the search expression.
Next, we need to look at your columns to see how you are searching, and cook up appropriate indexes.
Let's start with AppTable. You're screening by specific values of Forename, Surname, and DOB. You're screening by a range of ApplicationDate values. Finally you need AppID to manage your join. So, this compound index should help. Its columns are in the correct order to use a range scan to satisfy your query, and contains the needed results.
CREATE INDEX search1 USING BTREE
ON AppTable
(Forename, Surname, DOB, ApplicationDate, AppID)
Next, we can look at your AddressTable. Similar logic applies. You'll enter this table via the JOINed AppID, and then screen by specific values of three columns. So, try this index
CREATE INDEX search2 USING BTREE
ON AddressTable
(AppID, PostcodeAnywherePostcode, PostcodeAnywhereBuildingNumber, isPrimary)
Finally, we're on to your BurTable. Use similar logic as the other two, and try this index.
CREATE INDEX search3 USING BTREE
ON BurTable
(AppID, ErrorInd, BurID)
This kind of index is called a compound covering index, and can vastly speed up the sort of summary query you have asked about.

Doctrine issue - Different queries, same results but not with Doctrine

i'm having a little issue with doctrine using symfony 1.4 (I think it's using doctrine 1.2). I have 2 queries, using raw sql in the mysql console, they produce the same resultset. The queries can be generated using this code :
$dates = Doctrine::getTable('Picture')
->createQuery('a')
->select('substr(a.created_at,1,10) as date')
->leftjoin('a.PictureTag pt ON a.id = pt.picture_id')
->leftjoin('pt.Tag t ON t.id = pt.tag_id')
->where('a.created_at <= ?', date('Y-m-d 23:59:59'))
->orderBy('date DESC')
->groupby('date')
->limit(ITEMS_PER_PAGE)
->offset(ITEMS_PER_PAGE * $this->page)
->execute();
If I remove the two joins, it changes the query, but the resultset it's the same.
But using doctrine execute(), one produces only one row.
Somebody have an idea on what's going on here?
PS : Picture table has id, title, file, created_at (format 'Y-m-d h:i:s'), the Tag table is id, name and PictureTag is an relationship table with id and the two foreign keys.
PS 2 : Here are the two sql queries produced (the first without joins)
SELECT substr(l.created_at, 1, 10) AS l__0 FROM lupa_picture l WHERE (l.created_at <= '2010-03-19 23:59:59') GROUP BY l__0 ORDER BY l__0 DESC LIMIT 4
SELECT substr(l.created_at, 1, 10) AS l__0 FROM lupa_picture l LEFT JOIN lupa_picture_tag l2 ON (l.id = l2.picture_id) LEFT JOIN lupa_tag l3 ON (l3.id = l2.tag_id) WHERE (l.created_at <= '2010-03-19 23:59:59') GROUP BY l__0 ORDER BY l__0 DESC LIMIT 4
I had something similar this week. Doctrine's generated SQL (from the Symfony debug toolbar) worked fine in phpMyAdmin, but failed when running the query as in your question. Try adding in the following into your query:
->setHydrationMode(Doctrine::HYDRATE_SCALAR)
and see if it gives you the expected result. If so, it's down to the Doctrine_Collection using the Picture primary key as the index in the collection. If you have more than 1 result with the same index, Doctrine will refuse to add it into the collection, so you only end up with 1 result. I ended up running the query using a different table rather than the one I wanted, which resulted in a unique primary key and then the results I wanted appeared.
Well, the solution was that...besides substr(), it needs another column of the table. Using select(substr(), a.created_at) made it work

indexes in mysql SELECT AS or using Views

I'm in over my head with a big mysql query (mysql 5.0), and i'm hoping somebody here can help.
Earlier I asked how to get distinct values from a joined query
mysql count only for distinct values in joined query
The response I got worked (using a subquery with join as)
select *
from media m
inner join
( select uid
from users_tbl
limit 0,30) map
on map.uid = m.uid
inner join users_tbl u
on u.uid = m.uid
unfortunately, my query has grown more unruly, and though I have it running, joining into a derived table is taking too long because there is no indexes available to the derived query.
my query now looks like this
SELECT mdate.bid, mdate.fid, mdate.date, mdate.time, mdate.title, mdate.name,
mdate.address, mdate.rank, mdate.city, mdate.state, mdate.lat, mdate.`long`,
ext.link,
ext.source, ext.pre, meta, mdate.img
FROM ext
RIGHT OUTER JOIN (
SELECT media.bid,
media.date, media.time, media.title, users.name, users.img, users.rank, media.address,
media.city, media.state, media.lat, media.`long`,
GROUP_CONCAT(tags.tagname SEPARATOR ' | ') AS meta
FROM media
JOIN users ON media.bid = users.bid
LEFT JOIN tags ON users.bid=tags.bid
WHERE `long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND date = '2009-02-23'
GROUP BY media.bid, media.date
ORDER BY media.date, users.rank DESC
LIMIT 0, 30
) mdate ON (mdate.bid = ext.bid AND mdate.date = ext.date)
phew!
SO, as you can see, if I understand my problem correctly, i have two derivative tables without indexes (and i don't deny that I may have screwed up the Join statements somehow, but I kept messing with different types, is this ended up giving me the result I wanted).
What's the best way to create a query similar to this which will allow me to take advantage of the indexes?
Dare I say, I actually have one more table to add into the mix at a later date.
Currently, my query is taking .8 seconds to complete, but I'm sure if I could take advantage of the indexes, this could be significantly faster.
First, check for indices on ext(bid, date), users(bid) and tags(bid), you should really have them.
It seems, though, that it's LONG and LAT that cause you most problems. You should try keeping your LONG and LAT as a (coordinate POINT), create a SPATIAL INDEX on this column and query like that:
WHERE MBRContains(#MySquare, coordinate)
If you can't change your schema for some reason, you can try creating additional indices that include date as a first field:
CREATE INDEX ix_date_long ON media (date, `long`)
CREATE INDEX ix_date_lat ON media (date, lat)
These indices will be more efficient for you query, as you use exact search on date combined with a ranged search on axes.
Starting fresh:
Question - why are you grouping by both media.bid and media.date? Can a bid have records for more than one date?
Here's a simpler version to try:
SELECT
mdate.bid,
mdate.fid,
mdate.date,
mdate.time,
mdate.title,
mdate.name,
mdate.address,
mdate.rank,
mdate.city,
mdate.state,
mdate.lat,
mdate.`long`,
ext.link,
ext.source,
ext.pre,
meta,
mdate.img,
( SELECT GROUP_CONCAT(tags.tagname SEPARATOR ' | ')
FROM tags
WHERE ext.bid = tags.bid
ORDER BY tags.bid GROUP BY tags.bid
) AS meta
FROM
ext
LEFT JOIN
media ON ext.bid = media.bid AND ext.date = media.date
JOIN
users ON ext.bid = users.bid
WHERE
`long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND ext.date = '2009-02-23'
AND users.userid IN
(
SELECT userid FROM users ORDER BY rank DESC LIMIT 30
)
ORDER BY
media.date,
users.rank DESC
LIMIT 0, 30
You might want to compare your perforamnces against using a temp table for each selection, and joining those tables together.
create table #whatever
create table #whatever2
insert into #whatever select...
insert into #whatever2 select...
select from #whatever join #whatever 2
....
drop table #whatever
drop table #whatever2
If your system has enough memory to hold full tables this might work out much faster. It depends on how big your database is.