I'm trying to run
SELECT id, max(event_date) dt
FROM worker
LEFT JOIN events ON id = attendee
WHERE worker.type='CON'
GROUP BY id;
but the query takes much too long. Is there a way to make this faster?
worker and events are somewhat large (10,000 records) tables joined one-to-many. (Most workers have 0 associated events, but some have up to 10 or so.) But I wouldn't think this is so large that the query would fail. Ideas?
Posting the exact table schemas would allow for better analysis.
I'm guessing attendee and type are not indexed.
Try prefixing your query with "explain" (see http://dev.mysql.com/doc/refman/5.1/en/explain-output.html) for details on the execution plan.
Without a schema, this is forced to be a guess. But does the events table have both an id field and an attendee field?
Does being more explicit in your query help?
SELECT worker.id, max(events.event_date) dt
FROM worker
LEFT JOIN events ON worker.id = events.attendee
WHERE worker.type='CON'
GROUP BY worker.id;
(I've had to assume which tables each field associates to.)
Related
I'm writing a query in mysql to join two tables. And both tables have more than 50,000 records.
Table EMP Columns
empid,
project,
code,
Status
Table EMPINFO
empid,
project,
code,
projecttype,
timespent,
skills
In each table there is candidate key [empid, project, code]
So when I join the table using INNER join
like this INNER JOIN
ON a.empid = b.empid
and a.project = b.project
and a.code = b.code
I'm getting the result, but if I add count(*) in outer query to count number of records, it takes lot of time something connection gets failed.
Is there any way to speed up to get number of records ?
And I would like to hear more suggestions to speed up inner join query as well having same candidate key in both tables.
INDEX(empid, project, code) -- in any order.
Are these tables 1:1? If so, why do the JOIN in order to do the COUNT?
Please provide SHOW CREATE TABLE. (If there are datatype differences, this could be a big problem.)
Please provide the actual SELECT.
How much RAM do you have? Please provide SHOW VARIABLES LIKE '%buffer%';.
I have three different tables - subscribers, unsubscribers, mass subscribers.
I'd like to print out each email from the mass subscribers table. However that email can only be printed if it doesn't exist in both subscribers and unsubscribers tables.
I know how to do this with arrays, however I want a plain mysql query.
What would mysql query be?
Thanks!
You can do that with a subquery (this is slow! Please read below the line):
SELECT email
FROM subscribers
WHERE email NOT IN(SELECT email FROM unsubscribers)
However, this is very bad for performance. I suggest you change the way you have your database, with just 1 table subscribers, and add a column active(tinyint). When someone unsubscribes, you set that value from 1 to 0. After that you can stay in 1 table:
SELECT email FROM subscribers WHERE active=1
This is faster because of some reasons:
No subquery
The where is bad, because you are going to select a heap of data, and compare strings
Selecting on integer in VERY fast (especially when you index it)
Apart from the fact that this is faster, it would be better for your database structure. You dont want two tables doing almost the same, with emailadresses. This will create duplicate data and a chance for misalignments
You sound like someone who doesn't have much experience with SQL. Your title does point in the right direction. Here is how you put the components together:
select m.*
from mass_subscribers m
where not exists (select 1 from subscribers s where s.email = m.email) and
not exists (select 1 from unsubscribers u where u.email = m.email);
NOT EXISTS happens to be a very good choice for this type of query; it is typically pretty efficient in both MySQL and other databases.
Without subqueries, using join
SELECT mass_subscribers.*
FROM mass_subscribers ms
LEFT JOIN subscribers s ON ms.email=s.email
LEFT JOIN unsubscribers us ON us.email=s.email
WHERE
ms.email IS NULL
AND
us.email IS NULL
What is faster:
Using a join to get userdetails for posts or only get the post data which includes the userid, collect the userIDs and after the posts are queried run one:
SELECT x,y,z FROM users WHERE id in (1,2,3,4,5,6,7...etc.)
Short:
What is better?:
SELECT x,y,z,userid
FROM posts
WHERE id > x
ORDER BY id
LIMIT 20
SELECT x,y,z
FROM users
WHERE id IN (1,2,3,4,5,6,7...etc.)
or:
SELECT p.x,p.y,p.z, u.username,u.useretc,u.user.etc
FROM posts p
INNER JOIN users u
ON u.id = p.userid
AND id > n
ORDER BY id
LIMIT 20
In some scenarios this could reduce the querying of the user table to 2 instead of 20 times. A page in a discussion where only two user posted.
anyway the second way is better:
You have only one call to database instead of two - so the channel between your DB and Application server is less loaded
Second way usually should be faster and less memory consuming because analyser can decide better how to manage its resources (it has all the requirements in one query)
In first example you force database to use not-cached queries (second query of the first example is not constant because in-list has different amount of inputs) so it parses the second query more often which leads to performance losses
If I'm not wrong... normally dealing with INNER JOIN is more readable and cleaner.
I would suggest for join query. Because of following reason :
Cleaner and readable.
Join will hit the DB only once. Which will be fast. Otherwise you will have to get the details in a data structure and again use the same details in another query.
Usual usage is JOIN over separate queries as it is more readable and easier to write.
But there frameworks like http://www.notorm.com/#performance which leverge first method using separate queries and have impressive results.
I have a table of widgets that looks like:
id (integer)
referrer (varchar(255))
width (integer)
height (integer)
... and some others.
I also have a table of events that look like:
id (integer)
widgetid (integer)
eventtype (string)
created_at (datetime)
... and some others.
I'm looking to get a sample table of data that finds, for each widget, the details about itself and related events for certain event types (once for event types A, B and C, and once for event type A only).
I need to be using non-vendor-specific (i.e. ANSI SQL) for this, it needs to work both on PostgreSQL as well as MySQL.
I'm trying something akin to this, but it's very slow:
SELECT w.id, w.referrer, w.width, w.height, COUNT(e.widgetid), COUNT(f.widgetid)
FROM widgets w
JOIN events e on (e.widgetid = w.id AND e.eventtype = 'A')
JOIN events f on (f.widgetid = w.id AND f.eventtype IN ('A','B','C'))
GROUP BY w.id;
but it's incredibly slow (naturally).
There are indexes on e.widgetid, e.eventtype and w.id.
Am I structuring this right, and how may I make this faster (indexing on the widgetid of course nonwithstanding)?
I thought of doing subqueries, but without knowing the widget ID for each row (is there a function for that?) I haven't got very far.
I'm also not entirely sure which JOIN I should be using either. I think (but correct me if I'm wrong) that a LEFT or INNER JOIN would be appropriate for this.
Cheers
Your JOIN is slow because you don't have indexes, or you have indexes but not for the values you are JOINing with.
Add an index for id, widgetid and eventtype and I assure you it will show a substantial speed increase.
I have a table structure like the following:
user
id
name
profile_stat
id
name
profile_stat_value
id
name
user_profile
user_id
profile_stat_id
profile_stat_value_id
My question is:
How do I evaluate a query where I want to find all users with profile_stat_id and profile_stat_value_id for many stats?
I've tried doing an inner self join, but that quickly gets crazy when searching for many stats. I've also tried doing a count on the actual user_profile table, and that's much better, but still slow.
Is there some magic I'm missing? I have about 10 million rows in the user_profile table and want the query to take no longer than a few seconds. Is that possible?
Typically databases are able to handle 10 million records in a decent manner. I have mostly used oracle in our professional environment with large amounts of data (about 30-40 million rows also) and even doing join queries on the tables has never taken more than a second or two to run.
On IMPORTANT lessson I realized whenever query performance was bad was to see if the indexes are defined properly on the join fields. E.g. Here having index on profile_stat_id and profile_stat_value_id (user_id I am assuming is the primary key) should have indexes defined. This will definitely give you a good performance increaser if you have not done that.
After defining the indexes do run the query once or twice to give DB a chance to calculate the index tree and query plan before verifying the gain
Superficially, you seem to be asking for this, which includes no self-joins:
SELECT u.name, u.id, s.name, s.id, v.name, v.id
FROM User_Profile AS p
JOIN User AS u ON u.id = p.user_id
JOIN Profile_Stat AS s ON s.id = p.profile_stat_id
JOIN Profile_Stat_Value AS v ON v.id = p.profile_stat_value_id
Any of the joins listed can be changed to a LEFT OUTER JOIN if the corresponding table need not have a matching entry. All this does is join the central User_Profile table with each of the other three tables on the appropriate joining column.
Where do you think you need a self-join?
[I have not included anything to filter on 'the many stats'; it is not at all clear to me what that part of the question means.]