I have two tables in MySQL. One contains users and the other contains transaction data.
I am trying to find out the top 50 users that made the most profit.
CREATE TABLE IF NOT EXISTS `t_logs` (
`tid` int(11) NOT NULL AUTO_INCREMENT,
`userid` int(11) NOT NULL,
`amount` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`tid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
CREATE TABLE IF NOT EXISTS `t_users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`login` varchar(100) NOT NULL,
`name` varchar(100) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
Query:
SELECT *, (SELECT SUM(amount) FROM `t_logs` WHERE userid=u.id LIMIT 0,1) AS profit
FROM `t_users` u
ORDER BY profit DESC
LIMIT 0,50
Now the problem is that if I have 1000 entries in the t_users table and 3000 transactions in the t_logs table, this query takes 25seconds on a VDS with Apache and 2GB of RAM, or 9seconds on my local computer using XAMPP(I have 16GB of RAM).
Question is: Is there anything more that I can do to optimize all this? Maybe change the table engine from MyISAM to something else? Or maybe my query is not effective? Or the only solution is to add more RAM to the VDS.
If we try to add 10000 users and 10000 logs, the query takes 250 seconds on the VDS. What are my options if I expect to have more than 50000 users and more than 1 million logs?
SELECT u.* , p.profit
FROM `t_users` u
LEFT JOIN (
SELECT userid, SUM(amount) as profit FROM `t_logs` GROUP BY userid) AS p
ON p.userid=u.id
ORDER BY p.profit DESC
LIMIT 0,50
You should be using a LEFT JOIN with GROUP BY instead of a subquery, and you need an index on t_logs.userid:
SELECT u.*, SUM(l.amount) AS profit
FROM t_users u
LEFT JOIN t_logs l
ON l.userid = u.id
GROUP BY u.id
ORDER BY profit DESC
LIMIT 50
Related
anyone know a more efficient way to execute this query?
SELECT SQL_CALC_FOUND_ROWS p.*, IFNULL(SUM(v.visits),0) AS visits,
FROM posts AS p
LEFT JOIN visits_day v ON v.post_id=p.post_id
GROUP BY post_id
ORDER BY post_id DESC LIMIT 20 OFFSET 0
The visits_day table has one record per day, per user, per post. With the growth of the table this query is extremely slow.
I cant add a column with the total visit count because I need to list the posts by more visits per day or per week, etc.
Does anyone know a beter solution to this?
Thanks
CREATE TABLE `visits_day` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`post_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`day` date NOT NULL,
`visits` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=52302 DEFAULT CHARSET=utf8
CREATE TABLE `posts` (
`post_id` int(11) NOT NULL AUTO_INCREMENT,
`link` varchar(300) NOT NULL,
`date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`title` varchar(500) NOT NULL,
`img` varchar(300) NOT NULL,
PRIMARY KEY (`post_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1027 DEFAULT CHARSET=utf8
With SQL_CALC_FOUND_ROWS, the query must evaluate everything, just not deliver all the rows. Getting rid of that should be beneficial.
To actually touch only 20 rows, we need to get through the WHERE, GROUP BY and ORDER BY with a single index. Otherwise, we might have to touch all the rows, sort them then deliver 20. The obvious index is (post_id); I suspect that is already indexed as PRIMARY KEY(post_id)? (It would help if you provide SHOW CREATE TABLE when asking questions.)
Another way to do the join, and get the desired result of zero, is as follows. Note that it eliminates the need for GROUP BY.
SELECT p.*,
IFNULL( ( SELECT SUM(v.visits)
FROM visits_day
WHERE post_id = p.post_id
),
0) AS visits
FROM posts AS p
ORDER BY post_id DESC
LIMIT 20 OFFSET 0
If you really need the count, then consider SELECT COUNT(*) FROM posts.
ON v.post_id=p.post_id in your query and WHERE post_id = p.post_id beg for INDEX(post_id) on visits_day. That will speed up both variants considerably.
I have a chatting application. I have an api which returns list of users who the user talked. But it takes a long to mysql return a list messages when it reachs 100000 rows of data.
This is my messages table
CREATE TABLE IF NOT EXISTS `messages` (
`_id` int(11) NOT NULL AUTO_INCREMENT,
`fromid` int(11) NOT NULL,
`toid` int(11) NOT NULL,
`message` text NOT NULL,
`attachments` text NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '0',
`date` datetime NOT NULL,
`delete` varchar(50) NOT NULL,
`uuid_read` varchar(250) NOT NULL,
PRIMARY KEY (`_id`),
KEY `fromid` (`fromid`,`toid`,`status`,`delete`,`uuid_read`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=118561 ;
and this is my users table (simplified)
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`login` varchar(50) DEFAULT '',
`sex` tinyint(1) DEFAULT '0',
`status` varchar(255) DEFAULT '',
`avatar` varchar(30) DEFAULT '0',
`last_active` datetime DEFAULT NULL,
`active` tinyint(1) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=15523 ;
And here is my query (for user with id 1930)
select SQL_CALC_FOUND_ROWS `u_id`, `id`, `login`, `sex`, `birthdate`, `avatar`, `online_status`, SUM(`count`) as `count`, SUM(`nr_count`) as `nr_count`, `date`, `last_mesg` from
(
(select `m`.`fromid` as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, (COUNT(`m`.`_id`)-SUM(`m`.`status`)) as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`fromid`=`m`.`fromid`) left join `users` as u on `u`.`id`=`m`.`fromid` where `m`.`toid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
UNION
(select `m`.toid as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, 0 as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`toid`=`m`.`toid`) left join `users` as u on `u`.`id`=`m`.`toid` where `m`.`fromid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
order by `date` desc ) as `f` group by `u_id` order by `date` desc limit 0,10
Please help to optimize this query
What I need,
Who user talked to (name, sex, and etc)
What was the last message (from me or to me)
Count of messages (all)
Count of unread messages (only to me)
The query works well, but takes too long.
The output must be like this
You have some design problems on your query and database.
You should avoid keywords as column names, as that delete column or the count column;
You should avoid selecting columns not declared in the group by without an aggregation function... although MySQL allows this, it's not a standard and you don't have any control on what data will be selected;
Your not like construction may cause a bad behavior on your query because '%1930;%' may match 11930; and 11930 is not equal to 1930;
You should avoid like constructions starting and ending with % wildcard, which will cause the text processing to take longer;
You should design a better way to represent a message deletion, probably a better flag and/or another table to save any important data related with the action;
Try to limit your result before the join conditions (with a derived table) to perform less processing;
I tried to rewrite your query the best way I understood it. I've executed my query in a messages table with ~200.000 rows and no indexes and it performed in 0,15 seconds. But, for sure you should create the right indexes to help it perform better when the amount of data increase.
SELECT SQL_CALC_FOUND_ROWS
u.id,
u.login,
u.sex,
u.birthdate,
u.avatar,
u.last_active AS online_status,
g._count,
CASE WHEN m.toid = 1930
THEN g.nr_count
ELSE 0
END AS nr_count,
m.`date`,
m.message AS last_mesg
FROM
(
SELECT
MAX(_id) AS _id,
COUNT(*) AS _count,
COUNT(*) - SUM(m.status) AS nr_count
FROM messages m
WHERE 1=1
AND m.`delete` NOT LIKE '%1930;%'
AND
(0=1
OR m.fromid = 1930
OR m.toid = 1930
)
GROUP BY
CASE WHEN m.fromid = 1930
THEN m.toid
ELSE m.fromid
END
ORDER BY MAX(`date`) DESC
LIMIT 0, 10
) g
INNER JOIN messages AS m ON 1=1
AND m._id = g._id
LEFT JOIN users AS u ON 0=1
OR (m.fromid <> 1930 AND u.id = m.fromid)
OR (m.toid <> 1930 AND u.id = m.toid)
ORDER BY m.`date` DESC
;
I have this payments table, with about 2M entries
CREATE TABLE IF NOT EXISTS `payments` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) unsigned NOT NULL,
`date` datetime NOT NULL,
`valid_until` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `date_id` (`date`,`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2113820 ;
and this users table from ion_auth plugin/library for CodeIgniter, with about 320k entries
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`ip_address` varbinary(16) NOT NULL,
`username` varchar(100) NOT NULL,
`password` varchar(80) NOT NULL,
`salt` varchar(40) DEFAULT NULL,
`email` varchar(100) NOT NULL,
`activation_code` varchar(40) DEFAULT NULL,
`forgotten_password_code` varchar(40) DEFAULT NULL,
`forgotten_password_time` int(11) unsigned DEFAULT NULL,
`remember_code` varchar(40) DEFAULT NULL,
`created_on` int(11) unsigned NOT NULL,
`last_login` int(11) unsigned DEFAULT NULL,
`active` tinyint(1) unsigned DEFAULT NULL,
`first_name` varchar(50) DEFAULT NULL,
`last_name` varchar(50) DEFAULT NULL,
`company` varchar(100) DEFAULT NULL,
`phone` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name` (`first_name`,`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=322435 ;
I'm trying to get both the user information and his last payment. Ordering(ASC or DESC) by ID, first and last name, the date of the payment, or the payment expiration date. To create a table showing users with expired payments, and valid ones
I've managed to get the data correctly, but most of the time, my queries take 1+ second for a single user, and 40+ seconds for 30 users. To be honest I have no idea if it's possible to get the information under 1 second. Also probably my application is never going to reach this number of entries, probably a maximum of 10k payments and 300 users
My query, works pretty well with few entries and it's easy to change the ordering:
SELECT users.id, users.first_name, users.last_name, users.email, final.id AS payment_id, payment_date, final.valid_until AS payment_valid_until
FROM users
LEFT JOIN (
SELECT * FROM (
SELECT payments.id, payments.user_id, payments.date AS payment_date, payments.valid_until
FROM payments
ORDER BY payments.valid_until DESC
) AS p GROUP BY p.user_id
) AS final ON final.user_id = users.id
ORDER BY id ASC
LIMIT 0, 30"
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY users ALL NULL NULL NULL NULL 322269 Using where; Using temporary; Using filesort
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 50
4 DEPENDENT SUBQUERY users_deactivated unique_subquery user_id user_id 4 func 1 Using index
2 DERIVED <derived3> ALL NULL NULL NULL NULL 2072327 Using temporary; Using filesort
3 DERIVED payments ALL NULL NULL NULL NULL 2072566 Using filesort
I'm open to any suggestions and tips, since I'm new to PHP, MySQL and stuff, and don't really know if I'm doing the correct way
I would first suggest removing the ORDER BY clause from your subquery -- I don't see how it's helping as you're reordering by id in your outer query.
You should also be able to move your GROUP BY statement into your subquery:
SELECT users.id, users.first_name, users.last_name, users.email, final.id AS payment_id, payment_date, final.valid_until AS payment_valid_until
FROM users
LEFT JOIN (
SELECT payments.id, payments.user_id, payments.date AS payment_date, payments.valid_until
FROM payments
GROUP BY payments.user_id
) AS final ON final.user_id = users.id
ORDER BY users.id ASC
LIMIT 0, 30
Given your comments, how about this -- not sure it would be better than your current query, but ORDER BY can be expensive:
SELECT users.id, users.first_name, users.last_name, users.email, p.id AS payment_id, p.payment_date, p.valid_until AS payment_valid_until
FROM users
LEFT JOIN payments p ON p..user_id = users.id
LEFT JOIN (
SELECT user_id, MAX(valid_until) Max_Valid_Until
FROM payments
GROUP BY user_id
) AS maxp ON p.user_id = maxp.user_id and p.valid_until = maxp.max_valid_until
ORDER BY users.id ASC
LIMIT 0, 30
use an index on the payments table for users, that and do the group by on the payments table...
alter table payments add index (user_id);
your query
ORDER BY users.id ASC
alter table payments drop index user_id;
and why don't you use the payments "id" instead of "valid_until" ? Is there a reason to not trust the ids are sequential? if you don't trust the id add index to the valid_until field:
alter table payments add index (valid_until) desc;
and don't forget to drop it later
alter table payments drop index valid_intil;
if the query is still slow you will need to cache the results... this means you need to improve your schema, here is a suggestion:
create table last_payment
(user_id int,
constraint pk_last_payment primary key user_id references users(id),
payment_id int,
constraint fk_last_payment foreign key payment_id references payments(id)
);
alter table payments add index (user_id);
insert into last_payment (user_id, payment_id)
(select user_id, max(id) from payments group by user_id);
#here you probably use your own query if the max (id) does not refer to the last payment...
alter table payments drop index user_id;
and now comes the magic:
delimiter |
CREATE TRIGGER payments_trigger AFTER INSERT ON payments
FOR EACH ROW BEGIN
DELETE FROM last_payment WHERE user_id = NEW.user_id;
INSERT INTO last_payment (user_id, payment_id) values (NEW.user_id, NEW.id);
END;
|
delimiter ;
and now every-time you want to know the last payment made you need to query the payments_table.
select u.*, p.*
from users u inner join last_payment lp on (u.id = lp.user_id)
inner join payments on (lp.payment_id = p.id)
order by user_id asc;
Maybe something like this...
SELECT u.id
, u.first_name
, u.last_name
, u.email
, p.id payment_id
, p.payment_date
, p.payment_valid_until
FROM users u
JOIN payments p
ON p.user_id = u.id
JOIN
( SELECT user_id,MAX(p.valid_until) max_valid_until FROM payments GROUP BY user_id ) x
ON x.user_id = p.user_id
AND x.may_valid_until = p.valid_until;
The problem with joining to a sub query is that MySql internally generates the result of the sub query before performing the join. This is expensive in resources and is probably taking the time. Best solution is to change the query to avoid sub queries.
SELECT users.id, users.first_name, users.last_name, users.email, max(payments.id) AS payment_id, max(payments.date) as payment_date, max(payments.valid_until) AS payment_valid_until
FROM users
LEFT JOIN payments use index (user_id) on payments.user_id=users.id
group by users.id
ORDER BY id ASC
LIMIT 0, 30
This query is only correct , however, if the largest values for valid_until, payment_date and payment_date are always in the same record.
SELECT payments.users_id, users.first_name, users.last_name,
users.email, (final.id), MAX(payment.date), MAX(final.valid_until)
FROM payments final
JOIN users ON final.user_id = users.id
GROUP BY final.user_id
ORDER BY final.user_id ASC
LIMIT 0, 30
The idea is to flatten the payments first.
The MAX fields of course are of different payment records.
Speed up
Above I did a MySQL specific thing: final.id without MAX. Better not use the field at all.
If you could leave out the payments.id, it would be faster (with the appropiate index).
KEY `user_date` (`user_id`, `date` DESC ),
KEY `user_valid` (`user_id`, `valid_until` DESC ),
I have a query that gets product IDs based on keywords.
SELECT indx_search.pid
FROM indx_search
LEFT JOIN word_index_mem ON (word_index_mem.word = indx_search.word)
WHERE indx_search.word = "phone"
GROUP BY indx_search.pid
ORDER BY indx_search.pid ASC
LIMIT 0,20
This works well but now I'm trying to go a step further and implement "price range" into this query.
CREATE TABLE `price_range` (
`pid` int(11) NOT NULL,
`range_id` tinyint(3) NOT NULL,
PRIMARY KEY (`pid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
This table simply contains product IDs and a range_id. The price range values are stored here:
CREATE TABLE `price_range_values` (
`ID` tinyint(3) NOT NULL AUTO_INCREMENT,
`rangeFrom` float(10,2) NOT NULL,
`rangeTo` float(10,2) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM AUTO_INCREMENT=32 DEFAULT CHARSET=latin1
I want to GROUP price_range.range_id with COUNT() of how many products match the certain price range within the current query. I still want to receive my 20 results of product IDs.
So something along the lines of:
SELECT indx_search.pid, price_range.range_id as PriceRangeID, COUNT(price_range.range_id) as PriceGroupTotal
FROM indx_search
LEFT JOIN windex_mem ON ( windex_mem.word = indx_search.word )
LEFT JOIN price_range ON ( price_range.pid = indx_search.pid )
WHERE indx_search.word = "memory"
GROUP BY indx_search.pid, PriceRangeID
ORDER BY indx_search.pid ASC
LIMIT 0 , 20
Is this possible to accomplish without busting an additional query?
Try to use a subquery with a LIMIT clause, e.g. -
SELECT
i_s.pid, p_r.range_id as PriceRangeID, COUNT(p_r.range_id) as PriceGroupTotal
FROM
(SELECT * FROM indx_search WHERE i_s.word = 'memory' ORDER BY i_s.pid ASC LIMIT 0 , 20) i_s
LEFT JOIN
windex_mem w_m ON w_m.word = i_s.word
LEFT JOIN
price_range p_r ON p_r.pid = i_s.pid
GROUP BY
i_s.pid, PriceRangeID
In the same datbase I have a table messages whos columns: id, title, text I want. I want only the records of which title has no entries in the table lastlogon who's title equivalent is then named username.
I have been using this SQL command in PHP, it generally took 2-3 seconds to pull up:
SELECT DISTINCT * FROM messages WHERE title NOT IN (SELECT username FROM lastlogon) LIMIT 1000
This was all good until the table lastlogon started to have about 80% of the values table messages. Messages has about 8000 entries, lastlogon about 7000. Now it takes about a minute to 2 minutes for it to go through. MySQL shoots up to very high CPU usage.
I tried the following but had no luck reducing the time:
SELECT id,title,text FROM messages a LEFT OUTER JOIN lastlogon b ON (a.title = b.username) LIMIT 1000
Why all of a sudden is it taking so long for such low amount of entries? I tried restarting mysql and apache multiple times. I am using debian linux.
Edit: Here are the structures
--
-- Table structure for table `lastlogon`
--
CREATE TABLE IF NOT EXISTS `lastlogon` (
`username` varchar(25) NOT NULL,
`lastlogon` date NOT NULL,
`datechecked` date NOT NULL,
PRIMARY KEY (`username`),
KEY `username` (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-- --------------------------------------------------------
--
-- Table structure for table `messages`
--
CREATE TABLE IF NOT EXISTS `messages` (
`id` smallint(9) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`email` varchar(50) NOT NULL,
`text` mediumtext,
`folder` tinyint(2) NOT NULL,
`read` smallint(5) unsigned NOT NULL,
`dateline` int(10) unsigned NOT NULL,
`ip` varchar(15) NOT NULL,
`attachment` varchar(255) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`username` varchar(300) NOT NULL,
`error` varchar(500) NOT NULL,
PRIMARY KEY (`id`),
KEY `title` (`title`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=9010 ;
Edit 2
Edited structure with new indexes.
After putting an index on both messages.title and lastlogon.username I came up with these results:
Showing rows 0 - 29 (623 total, Query took 74.4938 sec)
First: replace the key on title, with a compound key on title + id
ALTER TABLE messages DROP INDEX title;
ALTER TABLE messages ADD INDEX title (title, id);
Now change the select to:
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.
LIMIT 1000;
Or
SELECT m.* FROM messages m
WHERE m.title NOT IN (SELECT l.username FROM lastlogon l)
-- GROUP BY m.id DESC -- faster than distinct, I don't think you need it though.
LIMIT 1000;
Another problem with the slowness is the SELECT m.* part.
By selecting all column, you are forcing MySQL to do extra work.
Only select the columns you need:
SELECT m.title, m.name, m.email, ......
This will speed up the query as well.
There's another trick you can use:
Replace the limit 1000 with a cutoff date.
Step 1: Add an index on timestamp (or whatever field you want to use for the cutoff).
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE (m.id > (SELECT MIN(M2.ID) FROM messages m2 WHERE m2.timestamp >= '2011-09-01'))
AND l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.
I suggest you to add an index on messages.title . Then try to run again the query and test the performance.