Looking to optimize this query
SELECT gwt.z, gwt.csp, gwt.status, gwt.cd, gwt.disp, gwt.5d, gwt.6d, gwt.si, gwt.siad, gwt.prbd,
CONCAT(gwt.1, gwt.2, gwt.3, gwt.4, gwt.5, gwt.6, gwt.7, gwt.8, gwt.9),
group_concat(gws.res order by line_no), gwt.scm, gm.me, gwt.p, gwt.scd
from gwt
left outer join gws on gwt.csp = gws.csp
left join gm on gwt.scm = gm.mid
where gwt.zone = 1
and (status like '1%' or status like '2%' or status like '3%' or
status like '4%' or status like '5%' or status like '6%')
group by gwt.csp
Using EXPLAIN, gwt has 4110 rows, gws has 920k rows, and gm has 2800 rows.
The query loaded fine when I was only querying status like 1%, but since I've added additional statuses to display, I get a timeout error.
I would suggest the following.
Be sure that each table has an index on what looks like its primary key:
gwt.csp
gm.mid
For gwt, create another index on (zone, status) and change the join condition to:
gwt.zone = 1 and status >= '1' and status < '7'
This is equivalent to your list, but it will allow the execution engine to use an index.
That might be enough to fix the query. Finally, you can put an index on gws.csp, to see if that speeds things up.
Is "csp" a one-to-one relationship? You might have a problem with the query creating a giant result set, if it is not.
Since the gws table has two orders of magnitude more rows than the other tables, this is the one to focus on. If you want to design your index to target this particular query, then the first step is straightforward. Namely, you'll want to add an index on the joined column (gws.csp) and make sure to include all selected columns -- gws.res and gws.line_no(?) -- in the index.
The above should improve the speed of the query dramatically. A secondary concern would be to make sure that the gwt table has an index with status as the first column.
Related
SELECT COUNT(DISTINCT r.id)
FROM views v
INNER JOIN emails e ON v.email_id = e.id
INNER JOIN recipients r ON e.recipient_id = r.id
INNER JOIN campaigns c ON e.campaign_id = c.id
WHERE c.centre_id IS NULL;
... or, "how many unique email opens have we had? (on general campaigns)"
Currently takes about a minute and a half to run on an Amazon RDS instance. Total rows for the tables involved are roughly:
campaigns: 250
recipients: 330,000
views: 530,000
emails: 1,380,000
EXPLAIN gives me:
1 SIMPLE r index PRIMARY UNIQ_146632C4E7927C74 767 NULL 329196 Using index
1 SIMPLE e ref PRIMARY,IDX_4C81E852E92F8F78,IDX_4C81E852F639F774 IDX_4C81E852E92F8F78 111 ecomms.r.id 1 Using where
1 SIMPLE v ref IDX_11F09C87A832C1C9 IDX_11F09C87A832C1C9 111 ecomms.e.id 1 Using where; Using index
1 SIMPLE c eq_ref PRIMARY,IDX_E3737470463CD7C3 PRIMARY 110 ecomms.e.campaign_id 1 Using where
What can I do to get this total faster?
You need to join recipients only if you are not enforcing a foreign key constraint between recipients.id and emails.recipent_id, and you want to exclude recipients who are not (any longer) enlisted in the recipients table. Otherwise, omit that table from the join straight away; you can use emails.recipient_id instead of recipients.id. Omitting that join should be a big win.
Alternatively, omit recipients from the join on the basis that it is not relevant to the question posed, which is about unique emails opened, not about unique recipients to open any email. In that case you should be able to just SELECT COUNT(*) FROM ... because each emails row is already unique.
Other than that, it looks like you're already getting good use of your indexes, though I confess I find the EXPLAIN PLAN output difficult to read, especially without headings. Still, it looks like your query doesn't read the base tables at all, so it's unlikely that adding new indexes would help.
You could try executing an OPTIMIZE TABLE on the tables involved in your query, though that probably sounds more hopeful than it should.
You should periodically run ANALYZE TABLE on the tables involved in this query, to give the query optimizer has the greatest likelihood of choosing the best possible plan. It looks like the optimizer is already choosing a reasonable plan, though, so this may not help much.
If you still need better performance then there are other possibilities (including moving to faster hardware), but they are too numerous to discuss here.
You want MySQL to be able to utilize the WHERE clause to limit the result set immediately. In order to do that, you need the proper indexes to join from campaigns to emails, then from emails to recipients and views.
Put an index on campaigns.centre_id to aid the search (satisfy the WHERE clause). I'm assuming campaigns.id is the primary key on that table.
Put an index on emails.campaign_id to aid the join to emails from campaigns. Add recipient_id and email_id to that index to provide a covering index.
Now, the EXPLAIN result should show the tables in order, starting from campaigns, then emails, then the other two. MySQL will still need an internal temporary table to apply the DISTINCT. Are you sure you need that?
I'm assuming emails.id and recipients.id are the primary keys.
i have a problem with a query for a web site. This is the situation:
I have 3 table:
articoli = where there are all article
clasart = where there are all the matches between the code article and class code - 32314 rows
classificazioni = where there are all matches between class code and name of class - 2401 rows
and this is the query
SELECT a.clar_classi , b.CLA_DESCRI
FROM clasart a JOIN (
SELECT art.AI_CAPOCODI, art.ai_codirest
FROM (select * from clasart where clar_azienda = 'SRL') a
JOIN (
SELECT AI_CAPOCODI, AI_CODIREST,AI_DT_CREAZ,
AI_DESCRIZI, AI_CATEMERC, concat(AI_CAPOCODI, AI_CODIREST) as codice, aI_grupscon
FROM articoli
WHERE AI_AZIENDA = 'SRL' AND AI_CATEMERC LIKE '0101______' AND AI_FLAG_NOW = 0 AND AI_CAPOCODI <> 'zzz'
) art ON trim(a.CLAR_ARTICO) = art.AI_CODIREST
JOIN classificazioni b ON a.CLAR_CLASSI = b.CLA_CODICE
WHERE b.CLA_CODICE LIKE 'AA51__'
group by CLAR_ARTICO) art ON trim(CLAR_ARTICO) = concat(art.AI_CAPOCODI, art.ai_codirest)
JOIN classificazioni b ON a.CLAR_CLASSI = b.CLA_CODICE
WHERE CLAR_AZIENDA = 'SRL' AND CLAR_CLASSI like 'CO____'
The time of run is 16 second. The time increase to 16 second when join with classificazioni.
You can help me? Thanks
Introduce following indexing using the queries below and after that the query will start running within a second or two:
ALTER TABLE articoli ADD INDEX idx_artc_az_cat_flg_cap (AI_AZIENDA, AI_FLAG_NOW, AI_CAPOCODI, AI_CATEMERC);
The above query will introduce the multi-column indexes on articoli table. The indexing work similar way how hash tables or keys of the array work to directly identifying the row on which the target value(s) match. Using multi-column will result in comparison of less number of rows.
Do not use trim(a.CLAR_ARTICO): make sure that before insertion the values are trimmed but not at the time of joining. This can result in skipping the index files and the join comparison can be expensive this way.
Let's move to next steps:
Introduce index on clar_azienda using following query:
ALTER TABLE clasart ADD INDEX idx_cls_az (clar_azienda);
If art.AI_CODIREST is not a primary/foreign key you'll need to introduce index there using the query below:
ALTER TABLE classificazioni ADD INDEX idx_clsi_cd (CLA_CODICE);
We are almost done, you'll just need to index CLAR_AZIENDA as well the same way how I indexed the above columns. Let me also tell you what is what in index column last query so you can write your own.
ALTER TABLE <tableName> ADD INDEX <indexKey (<column to be indexed>);
Let me know if you still have issues, remember you can run these queries after selecting your database from PhpMyAdmin (SQL tabl) or on mysql console.
I want to get data that is separated on three tables:
app_android_devices:
id | associated_user_id | registration_id
app_android_devices_settings:
owner_id | is_user_id | notifications_receive | notifications_likes_only
app_android_devices_favorites:
owner_id | is_user_id | image_id
owner_id is either the id from app_android_devices or the associated_user_id, indicated by is_user_id.
That is because the user of my app should be able to login to their account or use the app anonymously. If the user logged in he will have the same settings and likes on all devices.
associated_user_id is 0 if the device is used anonymously or the user ID from another table.
Now i've got the following query:
SELECT registration_id
FROM app_android_devices d
JOIN app_android_devices_settings s
ON ((d.id=s.owner_id AND
s.is_user_id=0)
OR (
d.associated_user_id=s.owner_id AND
s.is_user_id=1))
JOIN app_android_devices_favorites f
ON (((d.id=f.owner_id AND
f.is_user_id=0)
OR
d.associated_user_id=f.owner_id AND
f.is_user_id=1)
AND f.image_id=86)
WHERE s.notifications_receive=1
AND (s.notifications_likes_only=0 OR f.image_id=86);
To decide if the device should receive a push notification on a new comment. I've set the following keys:
app_android_devices: id PRIMARY, associated_user_id
app_android_devices_settings: (owner_id, is_user_id) UNIQUE, notifications_receive, notifications_likes_only
app_android_devices_favorites: (owner_id, is_user_id, image_id) UNIQUE
I've noticed that the above query is really slow. If I run EXPLAIN on that query I see that MySQL is using no keys at all, although there are possible_keys listed.
What can I do to speed this query up?
Having such complicated JOIN conditions makes life hard for everyone. It makes life hard for the developer who wants to understand your query, and for the query optimizer that wants to give you exactly what you ask for while preferring more efficient operations.
So the first thing that I want to do, when you tell me that this query is slow and not using any index, is to take it apart and put it back together with simpler JOIN conditions.
From the way you describe this query, it sounds like the is_user_id column is a sort of state variable telling you whether the user is or is not logged in to your app. This is awkward to say the least; what happens if s.is_user_id != f.is_user_id? Why store this in both tables? For that matter, why store this in your database at all, instead of in a cookie?
Perhaps there's something I'm not understanding about the functionality you're going for here. In any case, the first thing I see that I want to get rid of is the OR in your JOIN conditions. I'm going to try to avoid making too many assumptions about which values in your query represent user input; here's a slightly generic example of how you might be able to rewrite these JOIN conditions as a UNION of two SELECT statements:
SELECT ... FROM
app_android_devices d
JOIN
app_android_devices_settings s ON d.id = s.owner_id
JOIN
app_android_devices_favorites f ON d.id = f.owner_id
WHERE s.is_user_id = 0 AND f.is_user_id = 0 AND ...
UNION ALL
SELECT ... FROM
app_android_devices d
JOIN
app_android_devices_settings s ON d.associated_user_id = s.owner_id
JOIN
app_android_devices_favorites f ON d.associated_user_id = f.owner_id
WHERE s.is_user_id = 1 AND f.is_user_id = 1 AND ...
If these two queries hit your indexes and are very selective, you might not notice the additional overhead (creation of a temporary table) required by the UNION operation. It looks as though one of your result sets may even be empty, in which case the cost of the UNION should be nil.
But, maybe this doesn't work for you; here's another suggestion for an optimization you might pursue. In your original query, you have the following condition:
WHERE s.notifications_receive=1
AND (s.notifications_likes_only=0 OR f.image_id=86);
This isn't too cryptic - you want results only when the notifications_receive setting is true, and only if the notifications_likes_only setting is false or the requested image is a "favorite" image. Depending on the state of notifications_likes_only, it looks like you may not even care about the favorites table - wouldn't it be nice to avoid even reading from that table unless absolutely necessary?
This looks like a good case for EXISTS(). Instead of joining app_android_devices_favorites, try using a condition like this:
WHERE s.notifications_receive = 1
AND (s.notifications_likes_only = 0
OR EXISTS(SELECT 1 FROM app_android_devices_favorites
WHERE image_id = 86 AND owner_id = s.owner_id)
It doesn't matter what you try to SELECT in an EXISTS() subquery; some people prefer *, I like 1, but even if you gave specific columns it wouldn't affect the execution plan.
I have a table of widgets that looks like:
id (integer)
referrer (varchar(255))
width (integer)
height (integer)
... and some others.
I also have a table of events that look like:
id (integer)
widgetid (integer)
eventtype (string)
created_at (datetime)
... and some others.
I'm looking to get a sample table of data that finds, for each widget, the details about itself and related events for certain event types (once for event types A, B and C, and once for event type A only).
I need to be using non-vendor-specific (i.e. ANSI SQL) for this, it needs to work both on PostgreSQL as well as MySQL.
I'm trying something akin to this, but it's very slow:
SELECT w.id, w.referrer, w.width, w.height, COUNT(e.widgetid), COUNT(f.widgetid)
FROM widgets w
JOIN events e on (e.widgetid = w.id AND e.eventtype = 'A')
JOIN events f on (f.widgetid = w.id AND f.eventtype IN ('A','B','C'))
GROUP BY w.id;
but it's incredibly slow (naturally).
There are indexes on e.widgetid, e.eventtype and w.id.
Am I structuring this right, and how may I make this faster (indexing on the widgetid of course nonwithstanding)?
I thought of doing subqueries, but without knowing the widget ID for each row (is there a function for that?) I haven't got very far.
I'm also not entirely sure which JOIN I should be using either. I think (but correct me if I'm wrong) that a LEFT or INNER JOIN would be appropriate for this.
Cheers
Your JOIN is slow because you don't have indexes, or you have indexes but not for the values you are JOINing with.
Add an index for id, widgetid and eventtype and I assure you it will show a substantial speed increase.
so I have a 560mb db with the largest table 500mb(over 10 million rows)
my query hase to join 5 tables and takes about 10 seconds to finish....
SELECT DISTINCT trips.tripid AS tripid,
stops.stopdescrption AS "perron",
Date_format(segments.segmentstart, "%H:%i") AS "time",
Date_format(trips.tripend, "%H:%i") AS "arrival",
Upper(routes.routepublicidentifier) AS "lijn",
plcend.placedescrption AS "destination"
FROM calendar
JOIN trips
ON calendar.vsid = trips.vsid
JOIN routes
ON routes.routeid = trips.routeid
JOIN places plcstart
ON plcstart.placeid = trips.placeidstart
JOIN places plcend
ON plcend.placeid = trips.placeidend
JOIN segments
ON segments.tripid = trips.tripid
JOIN stops
ON segments.stopid = stops.stopid
WHERE stops.stopid IN ( 43914, 23899, 23925, 23908,
23913, 19899, 23871, 43902,
23876, 25563, 18956, 19912,
23889, 23861, 23879, 23884,
23856, 19920, 19898, 23916,
23894, 20985, 23930, 20932,
20986, 22434, 20021, 19893,
19903, 19707, 19935 )
AND calendar.vscdate = Str_to_date('25-10-2011', "%e-%c-%Y")
AND segments.segmentstart >= Str_to_date('15:56', "%H:%i")
AND routes.routeservicetype = 0
AND segments.segmentstart > "00:00:00"
ORDER BY segments.segmentstart
what are things I can do to speed this up? any tips are welcome, i'm pretty new to sql...
but I can't change the structure of the db because it's not mine...
Use EXPLAIN to find the bottlenecks: http://dev.mysql.com/doc/refman/5.0/en/explain.html
Then perhaps, add indexes.
If you don't need to select ALL rows, use LIMIT to limit returned result count.
Just looking at the query, I would say that you should make sure that you have indexes on trips.vsid, calendar.vscdate, segments.segmentstart and routes.routeservicetype. I assume that there is already indexes on all the primary keys in the tables.
Using explain as Briedis suggested would show you how well the indexes work.
You might want to add covering indexes for some tables, like for example an index on trips.vsid where tripid and routeid are included. That way the database can use only the index for the data that is needed from the table, and not read from the actual table.
Edit:
The execution plan tells you that it successfully uses indexes for everything except the segments table, where it does a table scan and filters by the where condition. You should try to make a covering index for segments.segmentstart by including tripid and stopid.
Try adding a clusters index to the routes table on both routeservicetype and routeid.
Depending on the frequency of the data within the routeservicetype field, you may get an improvement by shrinking the amount of data being compared in the join to the trips table.
Looking at the explain plan, you may also want to force the sequence of the table usage by using STRAIGHT_JOIN instead of JOIN (or INNER JOIN), as I've had real improvements with this technique.
Essentially, put the table with the smallest row-count of extracted data at the beginning of the query, and the largest row count table at the end (in this case possibly the segments table?), with the exception of simple lookups (eg. for descriptions).
You may also consider altering the WHERE clause to filter the segments table on stopid instead of the stops table, and creating a clustered index on the segments table on (stopid, tripid and segmentstart) - this index will be effectively able to satisfy two joins and two where clauses from a single index...
To build the index...
ALTER TABLE segments ADD INDEX idx_qry_helper ( stopid, tripid, segmentstart );
And the altered WHERE clause...
WHERE segments.stopid IN ( 43914, 23899, 23925, 23908,
23913, 19899, 23871, 43902,
23876, 25563, 18956, 19912,
23889, 23861, 23879, 23884,
23856, 19920, 19898, 23916,
23894, 20985, 23930, 20932,
20986, 22434, 20021, 19893,
19903, 19707, 19935 )
:
:
At the end of the day, a 10 second response for what appears to be a complex query on a fairly large dataset, isn't all that bad!