Can I speedup this mysql query? - mysql

part of query is:
SELECT * FROM `o` WHERE ....
AND `id` IN (SELECT DISTINCT `id` FROM `o` WHERE `activity` = '1' AND `date` < '20130310' ORDER BY `id` ASC)
AND `id` NOT IN (SELECT DISTINCT `id` FROM `o` WHERE `activity` = '1' AND `date` BETWEEN '20130310' AND '20130329' ORDER BY `id` ASC)
....
Desc: IDs that Before 20130310 have activity and Between 20130310 AND 20130329 doesnt have activity
1) Can I speedup this mysql query?
2) Does help ORDER BY to inscrease speed of IN and NOT IN ?

As IN() subqueries are generally slow in MySQL (at leasr before 5.6), you better use join
SELECT * FROM `o`
LEFT JOIN
(SELECT `id` AS `idactive` FROM `o` WHERE `activity` = '1' AND `date` BETWEEN '20130310' AND '20130329') as t
USING `id`
WHERE `activity` = '1' AND `date` < '20130310' AND `idactive` IS NULL

you can do it like that
SELECT * FROM `o` WHERE ....
AND `id` IN (SELECT DISTINCT `id` FROM `o`
WHERE `activity` = '1' AND `date` < '20130310'
ORDER BY `id` ASC)
you dont need the second AND because you already choosed date < 20130310 so it will not select 20130311 20130312 ...20130329
or you can simply do this query
SELECT * FROM `o` WHERE ....
AND `activity` = '1' AND `date` < '20130310'
GROUP BY `id`
ORDER BY `id` ASC

If id is your primary key (as it should be), one of your two date-related clauses is useless (`date` < '20130310' always implies NOT (`date` BETWEEN '20130310' AND '20130329'), allowing you to simplify the thing to:
select *
from o
where activity = 1
and date < '20130310'
In the event id is actually one among other fields (it's a terribly confusing choice for a name, in that case) that is referencing a separate table in a 1-n or n-m relationship, I assume you're looking for ids where at least one has activity in the date range, but no new activity since. In this case, your query can be simplified to:
SELECT * FROM `o` WHERE ....
AND `id` IN (SELECT `id` FROM `o` WHERE `activity` = '1' AND `date` < '20130310')
AND `id` NOT IN (SELECT `id` FROM `o` WHERE `activity` = '1' AND `date` BETWEEN '20130310' AND '20130329')
Put another way, the distinct and the order by clauses are both pointless.
In either case, if relevant, also note that your date field should be of type date, rather than of type varchar.

Related

How to using group by in strict mode correct?

I have the table for messages like this:
CREATE TABLE `message` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`from_id` bigint(20) unsigned NOT NULL,
`to_id` bigint(20) unsigned NOT NULL,
`body` text COLLATE utf8mb4_unicode_ci NOT NULL,
`status` tinyint(4) NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
and try to select each last message from userX to userY, but mysql always says that
I have nonaggregated columns (without only_full_group_by its work well).
How I can select this in strict mode? Not working query for example :
select
t1.created_at,
t1.from_id,
t1.to_id,
t1.body,
t1.status,
( select created_at from test1.message where from_id = t1.from_id order by created_at desc limit 1 ) as last_timestamp
from test1.message as t1
group by t1.from_id
having t1.created_at = last_timestamp
Don't aggregate - instead, filter.
One option uses a correlated subquery that computes the maximum created_at per user:
select m.*
from message m
where m.created_at = (
select max(m1.created_at) from message m1 where m1.from_id = m.from_id
)
You can also use the anti-left join pattern:
select m.*
from message m
left join message m1 on m1.from_id = m.from_id and m1.createt_at > m.created_at
where m1.id is null
This phrases as: get the records from which no other record exists with the same from_id and a greater created_at.
You don't even need to be using GROUP BY here, and without it your current query should actually be valid:
SELECT
t1.created_at,
t1.from_id,
t1.to_id,
t1.body,
t1.status,
(SELECT t2.created_at FROM test1.message t2
WHERE t2.from_id = t1.from_id
ORDER BY t2.created_at DESC LIMIT 1) AS last_timestamp
FROM test1.message AS t1
HAVING t1.created_at = last_timestamp;
MySQL has overloaded the HAVING operator to be usable in place of a WHERE clause, with the additional feature that aliases defined in the SELECT clause can actually be used there.

mysql: inner join with in and not in

I'm trying to pull rows from one table "articles" based on specific category tags from table "article_category_reference", to exclude articles that have a specific tag. I have this query right now:
SELECT DISTINCT
a.article_id,
a.`title`,
a.`text`,
a.`date`
FROM
`articles` a
INNER JOIN `article_category_reference` c ON
a.article_id = c.article_id AND c.`category_id` NOT IN (54)
WHERE
a.`active` = 1
ORDER BY
a.`date`
DESC
LIMIT 15
The problem is, it seems to grab rows even if they do have a row in the "article_category_reference" table where "category_id" matches "54". I've also tried it in the "where" clause and it makes no difference.
Keep in mind I'm using "NOT IN" as it may be excluding multiple tags.
SQL fiddle to show it: http://sqlfiddle.com/#!9/b2172/1
Tables:
CREATE TABLE `article_category_reference` (
`ref_id` int(11) NOT NULL,
`article_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `articles` (
`article_id` int(11) UNSIGNED NOT NULL,
`author_id` int(11) UNSIGNED NOT NULL,
`date` int(11) NOT NULL,
`title` varchar(120) NOT NULL,
`text` text CHARACTER SET utf8mb4 NOT NULL,
`active` int(1) NOT NULL DEFAULT '1'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
One option is to use an EXISTS clause:
SELECT DISTINCT
a.article_id,
a.title,
a.text,
a.date
FROM articles a
WHERE
a.active = 1 AND
NOT EXISTS (SELECT 1 FROM article_category_reference c
WHERE a.article_id = c.article_id AND c.category_id = 54)
ORDER BY
a.date DESC
LIMIT 15;
The logical problem with your current approach of checking the category in the WHERE clause is that it is checking individual records. You need to assert that all category records for a given article, in aggregate, do not match the category you wish to exclude. An EXISTS clause, as I have written above, is one way to do it. Using GROUP BY in a subquery is another way.
The NOT IN condition is evaluated for each joined rows. Since you have same article_id with multiple category_id-values, the ones that do match the NOT IN condition will get picked.
See SQLFiddle.
To select articles that do not have any rows with category_id 54 use a subquery:
SELECT
a.article_id,
a.`title`,
a.`text`,
a.`date`
FROM `articles` a
WHERE a.`active` = 1 AND
a.`article_id` not in (
SELECT c.article_id
FROM `article_category_reference` c
WHERE c.`category_id` = 54
)
ORDER BY a.`date`
DESC
LIMIT 15

MysqL big table query optimization

I have a chatting application. I have an api which returns list of users who the user talked. But it takes a long to mysql return a list messages when it reachs 100000 rows of data.
This is my messages table
CREATE TABLE IF NOT EXISTS `messages` (
`_id` int(11) NOT NULL AUTO_INCREMENT,
`fromid` int(11) NOT NULL,
`toid` int(11) NOT NULL,
`message` text NOT NULL,
`attachments` text NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '0',
`date` datetime NOT NULL,
`delete` varchar(50) NOT NULL,
`uuid_read` varchar(250) NOT NULL,
PRIMARY KEY (`_id`),
KEY `fromid` (`fromid`,`toid`,`status`,`delete`,`uuid_read`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=118561 ;
and this is my users table (simplified)
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`login` varchar(50) DEFAULT '',
`sex` tinyint(1) DEFAULT '0',
`status` varchar(255) DEFAULT '',
`avatar` varchar(30) DEFAULT '0',
`last_active` datetime DEFAULT NULL,
`active` tinyint(1) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=15523 ;
And here is my query (for user with id 1930)
select SQL_CALC_FOUND_ROWS `u_id`, `id`, `login`, `sex`, `birthdate`, `avatar`, `online_status`, SUM(`count`) as `count`, SUM(`nr_count`) as `nr_count`, `date`, `last_mesg` from
(
(select `m`.`fromid` as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, (COUNT(`m`.`_id`)-SUM(`m`.`status`)) as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`fromid`=`m`.`fromid`) left join `users` as u on `u`.`id`=`m`.`fromid` where `m`.`toid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
UNION
(select `m`.toid as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, 0 as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`toid`=`m`.`toid`) left join `users` as u on `u`.`id`=`m`.`toid` where `m`.`fromid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
order by `date` desc ) as `f` group by `u_id` order by `date` desc limit 0,10
Please help to optimize this query
What I need,
Who user talked to (name, sex, and etc)
What was the last message (from me or to me)
Count of messages (all)
Count of unread messages (only to me)
The query works well, but takes too long.
The output must be like this
You have some design problems on your query and database.
You should avoid keywords as column names, as that delete column or the count column;
You should avoid selecting columns not declared in the group by without an aggregation function... although MySQL allows this, it's not a standard and you don't have any control on what data will be selected;
Your not like construction may cause a bad behavior on your query because '%1930;%' may match 11930; and 11930 is not equal to 1930;
You should avoid like constructions starting and ending with % wildcard, which will cause the text processing to take longer;
You should design a better way to represent a message deletion, probably a better flag and/or another table to save any important data related with the action;
Try to limit your result before the join conditions (with a derived table) to perform less processing;
I tried to rewrite your query the best way I understood it. I've executed my query in a messages table with ~200.000 rows and no indexes and it performed in 0,15 seconds. But, for sure you should create the right indexes to help it perform better when the amount of data increase.
SELECT SQL_CALC_FOUND_ROWS
u.id,
u.login,
u.sex,
u.birthdate,
u.avatar,
u.last_active AS online_status,
g._count,
CASE WHEN m.toid = 1930
THEN g.nr_count
ELSE 0
END AS nr_count,
m.`date`,
m.message AS last_mesg
FROM
(
SELECT
MAX(_id) AS _id,
COUNT(*) AS _count,
COUNT(*) - SUM(m.status) AS nr_count
FROM messages m
WHERE 1=1
AND m.`delete` NOT LIKE '%1930;%'
AND
(0=1
OR m.fromid = 1930
OR m.toid = 1930
)
GROUP BY
CASE WHEN m.fromid = 1930
THEN m.toid
ELSE m.fromid
END
ORDER BY MAX(`date`) DESC
LIMIT 0, 10
) g
INNER JOIN messages AS m ON 1=1
AND m._id = g._id
LEFT JOIN users AS u ON 0=1
OR (m.fromid <> 1930 AND u.id = m.fromid)
OR (m.toid <> 1930 AND u.id = m.toid)
ORDER BY m.`date` DESC
;

Need help creating SQL query

I have thus two tables:
CREATE TABLE `workers` (
`id` int(7) NOT NULL AUTO_INCREMENT,
`number` int(7) NOT NULL,
`percent` int(3) NOT NULL,
`order` int(7) NOT NULL,
PRIMARY KEY (`id`)
);
CREATE `data` (
`id` bigint(15) NOT NULL AUTO_INCREMENT,
`workerId` int(7) NOT NULL,
PRIMARY KEY (`id`)
);
I want to return the first worker (order by order ASC) that his number of rows in the table data times percent(from table workers) /100 is smaller than number(from table workers.
I have tried this query:
SELECT workers.id, COUNT(data.id) AS `countOfData`
FROM `workers` as workers, `data` as data
WHERE data.workerId = workers.id
AND workers.percent * `countOfData` < workers.number
LIMIT 1
But I get the error:
#1054 - Unknown column 'countOfData' in 'where clause'
This should work:
SELECT A.id
FROM workers A
LEFT JOIN (SELECT workerId, COUNT(*) AS Quant
FROM data
GROUP BY workerId) B
ON A.id = B.workerId
WHERE (COALESCE(Quant,0) * `percent`)/100 < `number`
ORDER BY `order`
LIMIT 1
You could calculate the number of rows per worker in a subquery. The subquery can be joined to the worker table. If you use a left join, a worker with no data rows will be considered:
select *
from workers w
left join
(
select workerId
, count(*) as cnt
from data
group by
workerId
) d
on w.id = d.workerId
where coalesce(d.cnt, 0) * w.percent / 100 < w.number
order by
w.order
limit 1

Mysql select if only all sizes equal 0

I have some mysql tables:
`items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`cat_id_p` int(11) NOT NULL,
`cat_id` int(11) DEFAULT NULL,
`brand_id` int(11) DEFAULT NULL,
...
)
`items_sizes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`item_id` int(11) NOT NULL,
`size_id` int(11) NOT NULL,
`count` int(11) DEFAULT '1',
...
)
And i need to select items which only have items_sizes.count < 1 and not items which have at least count > 1
Here is sql query:
SELECT
DISTINCT `items`.*
FROM
(`items2`)
LEFT JOIN `items_sizes` ON items_sizes`.`item_id` = `items`.`id`
WHERE ...
AND `items_sizes`.`item_id` = items.id
AND `items_sizes`.`count` < 1
GROUP BY `items`.`id`
ORDER BY `items`.`id` desc
LIMIT 30
But it does not work... May be i need If statement ?
SOLVED! JUST with SUM and HAVING
SELECT DISTINCT `items`.*, sum(items_sizes.count)
FROM (`items`)
LEFT JOIN `items_sizes` ON `items_sizes`.`item_id` = `items`.`id`
WHERE ...
GROUP BY `items`.`id`
having sum(items_sizes.count)=0
ORDER BY `items`.`id` desc LIMIT 30
SELECT
DISTINCT *
FROM
`items`
WHERE
NOT EXISTS (
SELECT * FROM `items_sizes`
WHERE
`items_sizes`.`item_id` = `items`.`id`
AND `items_sizes`.`count` > 0
)
-- ...
ORDER BY `id` desc
LIMIT 30
Assuming the table name items2 in your FROM clause is a typo or the items.* is a typo and should be items2.*...
You have no aggregate functions (SUM(), COUNT(), AVG()), so there is no need for the GROUP BY. It also appears you have mixed the WHERE clause up with the ON clause used in your JOIN. The first WHERE condition should not be there:
SELECT
DISTINCT items2.*
FROM
items2
LEFT JOIN items_sizes ON items2.id = items_sizes.item_id
WHERE ...
AND items_sizes.count < 1
ORDER BY items2.id desc
LIMIT 30
Note that the part of your WHERE clause that we don't see (WHERE ...) might be significant here as well...
The LEFT JOIN is probably unnecessary, and can just be a JOIN because the items_sizes.count < 1 will eliminate the NULL values the LEFT JOIN would have returned anyway.