I want to get data that is separated on three tables:
app_android_devices:
id | associated_user_id | registration_id
app_android_devices_settings:
owner_id | is_user_id | notifications_receive | notifications_likes_only
app_android_devices_favorites:
owner_id | is_user_id | image_id
owner_id is either the id from app_android_devices or the associated_user_id, indicated by is_user_id.
That is because the user of my app should be able to login to their account or use the app anonymously. If the user logged in he will have the same settings and likes on all devices.
associated_user_id is 0 if the device is used anonymously or the user ID from another table.
Now i've got the following query:
SELECT registration_id
FROM app_android_devices d
JOIN app_android_devices_settings s
ON ((d.id=s.owner_id AND
s.is_user_id=0)
OR (
d.associated_user_id=s.owner_id AND
s.is_user_id=1))
JOIN app_android_devices_favorites f
ON (((d.id=f.owner_id AND
f.is_user_id=0)
OR
d.associated_user_id=f.owner_id AND
f.is_user_id=1)
AND f.image_id=86)
WHERE s.notifications_receive=1
AND (s.notifications_likes_only=0 OR f.image_id=86);
To decide if the device should receive a push notification on a new comment. I've set the following keys:
app_android_devices: id PRIMARY, associated_user_id
app_android_devices_settings: (owner_id, is_user_id) UNIQUE, notifications_receive, notifications_likes_only
app_android_devices_favorites: (owner_id, is_user_id, image_id) UNIQUE
I've noticed that the above query is really slow. If I run EXPLAIN on that query I see that MySQL is using no keys at all, although there are possible_keys listed.
What can I do to speed this query up?
Having such complicated JOIN conditions makes life hard for everyone. It makes life hard for the developer who wants to understand your query, and for the query optimizer that wants to give you exactly what you ask for while preferring more efficient operations.
So the first thing that I want to do, when you tell me that this query is slow and not using any index, is to take it apart and put it back together with simpler JOIN conditions.
From the way you describe this query, it sounds like the is_user_id column is a sort of state variable telling you whether the user is or is not logged in to your app. This is awkward to say the least; what happens if s.is_user_id != f.is_user_id? Why store this in both tables? For that matter, why store this in your database at all, instead of in a cookie?
Perhaps there's something I'm not understanding about the functionality you're going for here. In any case, the first thing I see that I want to get rid of is the OR in your JOIN conditions. I'm going to try to avoid making too many assumptions about which values in your query represent user input; here's a slightly generic example of how you might be able to rewrite these JOIN conditions as a UNION of two SELECT statements:
SELECT ... FROM
app_android_devices d
JOIN
app_android_devices_settings s ON d.id = s.owner_id
JOIN
app_android_devices_favorites f ON d.id = f.owner_id
WHERE s.is_user_id = 0 AND f.is_user_id = 0 AND ...
UNION ALL
SELECT ... FROM
app_android_devices d
JOIN
app_android_devices_settings s ON d.associated_user_id = s.owner_id
JOIN
app_android_devices_favorites f ON d.associated_user_id = f.owner_id
WHERE s.is_user_id = 1 AND f.is_user_id = 1 AND ...
If these two queries hit your indexes and are very selective, you might not notice the additional overhead (creation of a temporary table) required by the UNION operation. It looks as though one of your result sets may even be empty, in which case the cost of the UNION should be nil.
But, maybe this doesn't work for you; here's another suggestion for an optimization you might pursue. In your original query, you have the following condition:
WHERE s.notifications_receive=1
AND (s.notifications_likes_only=0 OR f.image_id=86);
This isn't too cryptic - you want results only when the notifications_receive setting is true, and only if the notifications_likes_only setting is false or the requested image is a "favorite" image. Depending on the state of notifications_likes_only, it looks like you may not even care about the favorites table - wouldn't it be nice to avoid even reading from that table unless absolutely necessary?
This looks like a good case for EXISTS(). Instead of joining app_android_devices_favorites, try using a condition like this:
WHERE s.notifications_receive = 1
AND (s.notifications_likes_only = 0
OR EXISTS(SELECT 1 FROM app_android_devices_favorites
WHERE image_id = 86 AND owner_id = s.owner_id)
It doesn't matter what you try to SELECT in an EXISTS() subquery; some people prefer *, I like 1, but even if you gave specific columns it wouldn't affect the execution plan.
Related
SELECT COUNT(DISTINCT r.id)
FROM views v
INNER JOIN emails e ON v.email_id = e.id
INNER JOIN recipients r ON e.recipient_id = r.id
INNER JOIN campaigns c ON e.campaign_id = c.id
WHERE c.centre_id IS NULL;
... or, "how many unique email opens have we had? (on general campaigns)"
Currently takes about a minute and a half to run on an Amazon RDS instance. Total rows for the tables involved are roughly:
campaigns: 250
recipients: 330,000
views: 530,000
emails: 1,380,000
EXPLAIN gives me:
1 SIMPLE r index PRIMARY UNIQ_146632C4E7927C74 767 NULL 329196 Using index
1 SIMPLE e ref PRIMARY,IDX_4C81E852E92F8F78,IDX_4C81E852F639F774 IDX_4C81E852E92F8F78 111 ecomms.r.id 1 Using where
1 SIMPLE v ref IDX_11F09C87A832C1C9 IDX_11F09C87A832C1C9 111 ecomms.e.id 1 Using where; Using index
1 SIMPLE c eq_ref PRIMARY,IDX_E3737470463CD7C3 PRIMARY 110 ecomms.e.campaign_id 1 Using where
What can I do to get this total faster?
You need to join recipients only if you are not enforcing a foreign key constraint between recipients.id and emails.recipent_id, and you want to exclude recipients who are not (any longer) enlisted in the recipients table. Otherwise, omit that table from the join straight away; you can use emails.recipient_id instead of recipients.id. Omitting that join should be a big win.
Alternatively, omit recipients from the join on the basis that it is not relevant to the question posed, which is about unique emails opened, not about unique recipients to open any email. In that case you should be able to just SELECT COUNT(*) FROM ... because each emails row is already unique.
Other than that, it looks like you're already getting good use of your indexes, though I confess I find the EXPLAIN PLAN output difficult to read, especially without headings. Still, it looks like your query doesn't read the base tables at all, so it's unlikely that adding new indexes would help.
You could try executing an OPTIMIZE TABLE on the tables involved in your query, though that probably sounds more hopeful than it should.
You should periodically run ANALYZE TABLE on the tables involved in this query, to give the query optimizer has the greatest likelihood of choosing the best possible plan. It looks like the optimizer is already choosing a reasonable plan, though, so this may not help much.
If you still need better performance then there are other possibilities (including moving to faster hardware), but they are too numerous to discuss here.
You want MySQL to be able to utilize the WHERE clause to limit the result set immediately. In order to do that, you need the proper indexes to join from campaigns to emails, then from emails to recipients and views.
Put an index on campaigns.centre_id to aid the search (satisfy the WHERE clause). I'm assuming campaigns.id is the primary key on that table.
Put an index on emails.campaign_id to aid the join to emails from campaigns. Add recipient_id and email_id to that index to provide a covering index.
Now, the EXPLAIN result should show the tables in order, starting from campaigns, then emails, then the other two. MySQL will still need an internal temporary table to apply the DISTINCT. Are you sure you need that?
I'm assuming emails.id and recipients.id are the primary keys.
I have three different tables - subscribers, unsubscribers, mass subscribers.
I'd like to print out each email from the mass subscribers table. However that email can only be printed if it doesn't exist in both subscribers and unsubscribers tables.
I know how to do this with arrays, however I want a plain mysql query.
What would mysql query be?
Thanks!
You can do that with a subquery (this is slow! Please read below the line):
SELECT email
FROM subscribers
WHERE email NOT IN(SELECT email FROM unsubscribers)
However, this is very bad for performance. I suggest you change the way you have your database, with just 1 table subscribers, and add a column active(tinyint). When someone unsubscribes, you set that value from 1 to 0. After that you can stay in 1 table:
SELECT email FROM subscribers WHERE active=1
This is faster because of some reasons:
No subquery
The where is bad, because you are going to select a heap of data, and compare strings
Selecting on integer in VERY fast (especially when you index it)
Apart from the fact that this is faster, it would be better for your database structure. You dont want two tables doing almost the same, with emailadresses. This will create duplicate data and a chance for misalignments
You sound like someone who doesn't have much experience with SQL. Your title does point in the right direction. Here is how you put the components together:
select m.*
from mass_subscribers m
where not exists (select 1 from subscribers s where s.email = m.email) and
not exists (select 1 from unsubscribers u where u.email = m.email);
NOT EXISTS happens to be a very good choice for this type of query; it is typically pretty efficient in both MySQL and other databases.
Without subqueries, using join
SELECT mass_subscribers.*
FROM mass_subscribers ms
LEFT JOIN subscribers s ON ms.email=s.email
LEFT JOIN unsubscribers us ON us.email=s.email
WHERE
ms.email IS NULL
AND
us.email IS NULL
What is faster:
Using a join to get userdetails for posts or only get the post data which includes the userid, collect the userIDs and after the posts are queried run one:
SELECT x,y,z FROM users WHERE id in (1,2,3,4,5,6,7...etc.)
Short:
What is better?:
SELECT x,y,z,userid
FROM posts
WHERE id > x
ORDER BY id
LIMIT 20
SELECT x,y,z
FROM users
WHERE id IN (1,2,3,4,5,6,7...etc.)
or:
SELECT p.x,p.y,p.z, u.username,u.useretc,u.user.etc
FROM posts p
INNER JOIN users u
ON u.id = p.userid
AND id > n
ORDER BY id
LIMIT 20
In some scenarios this could reduce the querying of the user table to 2 instead of 20 times. A page in a discussion where only two user posted.
anyway the second way is better:
You have only one call to database instead of two - so the channel between your DB and Application server is less loaded
Second way usually should be faster and less memory consuming because analyser can decide better how to manage its resources (it has all the requirements in one query)
In first example you force database to use not-cached queries (second query of the first example is not constant because in-list has different amount of inputs) so it parses the second query more often which leads to performance losses
If I'm not wrong... normally dealing with INNER JOIN is more readable and cleaner.
I would suggest for join query. Because of following reason :
Cleaner and readable.
Join will hit the DB only once. Which will be fast. Otherwise you will have to get the details in a data structure and again use the same details in another query.
Usual usage is JOIN over separate queries as it is more readable and easier to write.
But there frameworks like http://www.notorm.com/#performance which leverge first method using separate queries and have impressive results.
Looking to optimize this query
SELECT gwt.z, gwt.csp, gwt.status, gwt.cd, gwt.disp, gwt.5d, gwt.6d, gwt.si, gwt.siad, gwt.prbd,
CONCAT(gwt.1, gwt.2, gwt.3, gwt.4, gwt.5, gwt.6, gwt.7, gwt.8, gwt.9),
group_concat(gws.res order by line_no), gwt.scm, gm.me, gwt.p, gwt.scd
from gwt
left outer join gws on gwt.csp = gws.csp
left join gm on gwt.scm = gm.mid
where gwt.zone = 1
and (status like '1%' or status like '2%' or status like '3%' or
status like '4%' or status like '5%' or status like '6%')
group by gwt.csp
Using EXPLAIN, gwt has 4110 rows, gws has 920k rows, and gm has 2800 rows.
The query loaded fine when I was only querying status like 1%, but since I've added additional statuses to display, I get a timeout error.
I would suggest the following.
Be sure that each table has an index on what looks like its primary key:
gwt.csp
gm.mid
For gwt, create another index on (zone, status) and change the join condition to:
gwt.zone = 1 and status >= '1' and status < '7'
This is equivalent to your list, but it will allow the execution engine to use an index.
That might be enough to fix the query. Finally, you can put an index on gws.csp, to see if that speeds things up.
Is "csp" a one-to-one relationship? You might have a problem with the query creating a giant result set, if it is not.
Since the gws table has two orders of magnitude more rows than the other tables, this is the one to focus on. If you want to design your index to target this particular query, then the first step is straightforward. Namely, you'll want to add an index on the joined column (gws.csp) and make sure to include all selected columns -- gws.res and gws.line_no(?) -- in the index.
The above should improve the speed of the query dramatically. A secondary concern would be to make sure that the gwt table has an index with status as the first column.
SELECT COUNT(*)
FROM song AS s
JOIN user AS u
ON(u.user_id = s.user_id)
WHERE s.is_active = 1 AND s.public = 1
The s.active and s.public are index as well as u.user_id and s.user_id.
song table row count 310k
user table row count 22k
Is there a way to optimize this? We're getting 1 second query times on this.
Ensure that you have a compound "covering" index on song: (user_id, is_active, public). Here, we've named the index covering_index:
SELECT COUNT(s.user_id)
FROM song s FORCE INDEX (covering_index)
JOIN user u
ON u.user_id = s.user_id
WHERE s.is_active = 1 AND s.public = 1
Here, we're ensuring that the JOIN is done with the covering index instead of the primary key, so that the covering index can be used for the WHERE clause as well.
I also changed COUNT(*) to COUNT(s.user_id). Though MySQL should be smart enough to pick the column from the index, I explicitly named the column just in case.
Ensure that you have enough memory configured on the server so that all of your indexes can stay in memory.
If you're still having issues, please post the results of EXPLAIN.
Perhaps write it as a stored procedure or view... You could also try selecting all the IDs first then running the count on the result... if you do it all as one query it may be faster. Generally optimisation is done by using nested selects or making the server do the work so in this context that is all I can think of.
SELECT Count(*) FROM
(SELECT song.user_id FROM
(SELECT * FROM song WHERE song.is_active = 1 AND song.public = 1) as t
JOIN user AS u
ON(t.user_id = u.user_id))
Also be sure you are using the correct kind of join.