SQL query optimization - multiple rows from another table

SQL query optimization - multiple rows from another table - mysql

I have my tables like this (currently)
CREATE TABLE `users` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(64) DEFAULT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `user_opts` (
`user_id` bigint(20) NOT NULL,
`opt1` varchar(64) DEFAULT NULL,
`opt2` TINYINT(4) DEFAULT NULL,
`opt3` varchar(64) DEFAULT NULL,
KEY `user_id_idx` (`user_id`)
);
I want to be able to do queries like this:
SELECT DISTINCT name
FROM users
WHERE
id = 1 AND (
EXISTS ( SELECT 1 FROM user_opts WHERE user_opts.user_id = users.id AND user_opts.opt1 = 'a' AND user_opts.opt3 = 'c') OR
EXISTS ( SELECT 1 FROM user_opts WHERE user_opts.user_id = users.id AND user_opts.opt1 = 'b' AND user_opts.opt2 = 1)
);
and this:
SELECT DISTINCT name
FROM users
WHERE
id = 1 AND (
EXISTS ( SELECT 1 FROM user_opts WHERE user_opts.user_id = users.id AND user_opts.opt1 = 'a' AND user_opts.opt3 = 'e') AND
EXISTS ( SELECT 1 FROM user_opts WHERE user_opts.user_id = users.id AND user_opts.opt1 = 'b' AND user_opts.opt2 = 1)
);
The obvious problem I'm starting to have is that the more users the queries goes slower and slower. I know I could refactor the first type of the queries (using OR) by JOINing the table, but the JOIN itself would be slow since I can't have a PK on the user_opts table.
How could I restructure my data (and the queries) so I can do efficient/fast searches? Preferably, if possible, I would like to keep the same queries for both AND and OR types, just switching the condition between the two.
DB Fiddle url
Thanks!

You can use aggregation:
select user_id
from user_opts uo
where opt3 = 'c' or opt2 = 1
group by user_id
having sum(opt3 = 'c') >= 1 and
sum(opt2 = 1) >= 1;
This handles the case when the two options are set on the same row in user_opts.

Adding these two indexes would speed up the EXISTS:
INDEX(user_id, opt1, opt3)
INDEX(user_id, opt1, opt2)
Your schema is a variant on EAV, which is notoriously inefficient and clumsy. Is there a good reason not to have opt2 and opt3 in users?

Related

Tweets implementation MySQL and queries

I am implementing a simple follow/followers system in MySQL. So far I have three tables that look like:
CREATE TABLE IF NOT EXISTS `User` (
`user_id` INT AUTO_INCREMENT PRIMARY KEY,
`username` varchar(40) NOT NULL ,
`pswd` varchar(255) NOT NULL,,
`email` varchar(255) NOT NULL ,
`first_name` varchar(40) NOT NULL ,
`last_name` varchar(40) NOT NULL,
CONSTRAINT uc_username_email UNIQUE (username , email)
);
-- Using a middle table for users to follow others on a many-to-many base
CREATE TABLE Following (
follower_id INT(6) NOT NULL,
following_id INT(6) NOT NULL,
KEY (`follower_id`),
KEY (`following_id`)
)
CREATE TABLE IF NOT EXISTS `Tweet` (
`tweet_id` INT AUTO_INCREMENT PRIMARY KEY,
`text` varchar(280) NOT NULL ,
-- I chose varchar vs TEXT as the latter is not stored in the database server’s memory.
-- By querying text data MySQL has to read from it from the disk, much slower in comparison with VARCHAR.
`publication_date` DATETIME NOT NULL,,
`username` varchar(40),
FOREIGN KEY (`username`) REFERENCES `user`(`username`)
ON DELETE CASCADE
);
Lets say I want to write a query that returns the 10 latest tweets by users followed by the user with username "Tom". What is the best way to writhe that query and return results with username, first name, last name, text and publication date.
Also if one minute later I want to query again 10 latest tweets and assuming someone Tom follows tweets during that minute, how do I query the database to not select tweets that have already shown in the first query?

To answer your first question:
SELECT u1.username, u1.first_name, u1.last_name, t.text, t.publication_date
FROM Tweet t
JOIN User u1 ON t.username = u1.username
JOIN Following f ON f.following_id = u1.user_id
JOIN User u2 ON u2.user_id = f.follower_id
WHERE u2.username = 'Tom'
ORDER BY t.publication_date DESC
LIMIT 10
For the second part, simply take the tweet_id from the first row of the first query (so the latest tweet_id value) and use it in the WHERE clause for the next query i.e.
WHERE u2.username = 'Tom'
AND t.tweet_id > <value from previous query>

To get latest 10 tweets for Tom:
select flg.username, flg.first_name, flg.last_name, t.tweet_id, t.text, t.publication_date
from user flr
inner join following f on f.follower_id = flr.user_id
inner join user flg on flg.user_id = f.following_id
inner join tweet t on t.username = flg.username
where flr.username = 'Tom'
order by tweet_id desc
limit 10
To get the next 10 tweets, pass in the max tweet_id, and apply an additional condition in the where clause:
where flr.username = 'Tom'
and t.tweet_id > <previous_max_tweet_id>

Mysql query with multiple selects results in high CPU load

I'm trying to do a link exchange script and run into a bit of trouble.
Each link can be visited by an IP address a number of x times (frequency in links table). Each visit costs a number of credits (spend limit given in limit in links table)
I've got the following tables:
CREATE TABLE IF NOT EXISTS `contor` (
`key` varchar(25) NOT NULL,
`uniqueHandler` varchar(30) DEFAULT NULL,
`uniqueLink` varchar(30) DEFAULT NULL,
`uniqueUser` varchar(30) DEFAULT NULL,
`owner` varchar(50) NOT NULL,
`ip` varchar(15) DEFAULT NULL,
`credits` float NOT NULL,
`tstamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`key`),
KEY `uniqueLink` (`uniqueLink`),
KEY `uniqueHandler` (`uniqueHandler`),
KEY `uniqueUser` (`uniqueUser`),
KEY `owner` (`owner`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `links` (
`unique` varchar(30) NOT NULL DEFAULT '',
`url` varchar(1000) DEFAULT NULL,
`frequency` varchar(5) DEFAULT NULL,
`limit` float NOT NULL DEFAULT '0',
PRIMARY KEY (`unique`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I've got the following query:
$link = MYSQL_QUERY("
SELECT *
FROM `links`
WHERE (SELECT count(key) FROM contor WHERE ip = '$ip' AND contor.uniqueLink = links.unique) <= `frequency`
AND (SELECT sum(credits) as cost FROM contor WHERE contor.uniqueLink = links.unique) <= `limit`")
There are 20 rows in the table links.
The problem is that whenever there are about 200k rows in the table contor the CPU load is huge.
After applying the solution provided by #Barmar:
Added composite index on (uniqueLink, ip) and droping all other indexes except PRIMARY, EXPLAIN gives me this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY l ALL NULL NULL NULL NULL 18
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 15
2 DERIVED pop_contor index NULL contor_IX1 141 NULL 206122

Try using a join rather than a correlated subquery.
SELECT l.*
FROM links AS l
LEFT JOIN (
SELECT uniqueLink, SUM(ip = '$ip') AS ip_visits, SUM(credits) AS total_credits
FROM contor
GROUP BY uniqueLink
) AS c
ON c.uniqueLink = l.unique AND ip_visits <= frequency AND total_credits <= limit
If this doesn't help, try adding an index on contor.ip.

The current query is of the form:
SELECT l.*
FROM `links` l
WHERE l.frequency >= ( SELECT COUNT(ck.key)
FROM contor ck
WHERE ck.uniqueLink = l.unique
AND ck.ip = '$ip'
)
AND l.limit >= ( SELECT SUM(sc.credits)
FROM contor sc
WHERE sc.uniqueLink = l.unique
)
Those correlated subqueries are going to each your lunch. And your lunchbox too.
I'd suggest testing an inline view that performs both of the aggregations from contor in one pass, and then join the result from that to the links table.
Something like this:
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip' AND c.key IS NOT NULL) AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits
For optimal performance of the aggregation inline view query, provide a covering index that MySQL can use to optimize the GROUP BY (avoiding a Using filesort operation)
CREATE INDEX `contor_IX1` ON `contor` (`uniqueLink`, `credits`, `ip`) ;
Adding that index renders the uniqueLink index redundant, so also...
DROP INDEX `uniqueLink` ON `contor` ;
EDIT
Since we have a guarantee that contor.key column is non-NULL (i.e. the NOT NULL constraint), this part of the query above is unneeded AND c.key IS NOT NULL, and can be removed. (I also removed the key column from the covering index definition above.)
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip') AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits

Using rows from the first query in the sub-query?

I have a bit complicated query:
SELECT SQL_CALC_FOUND_ROWS DISTINCT l1.item_id, l1.uid, l2.id, l2.uid, u.prename, l1.item_id, l2.item_id,
(SELECT SUM(cnt) FROM
(
SELECT DISTINCT
p1.item_id,
COUNT(*) AS cnt
FROM pages_likes AS p1
JOIN pages_likes AS p2 ON p1.item_id = p2.item_id AND p1.status = p2.status
WHERE p1.uid = 391 AND p2.uid = 1091
GROUP BY p1.id
ORDER BY p1.date DESC
) AS t) AS total
FROM pages_likes l1
JOIN users u on u.id = l1.uid
JOIN pages_likes l2 on l1.item_id = l2.item_id
JOIN users_likes ul on l1.uid = ul.uid
WHERE ul.date >= DATE_SUB(NOW(),INTERVAL 1 WEEK)
AND l1.uid != 1091 AND l2.uid = 1091
AND (l1.status = 1 AND l2.status = 1)
AND u.gender = 2
GROUP BY l1.uid
ORDER BY
total DESC,
l1.uid DESC,
l1.date DESC
What I expect: It should display all users, sorted by total page likes we have in common that also are the most liked users this week.
The thing is that I inserted values (391 and 1091) as user id to test the query. But since it should be dynamic I'll need to use the row of the first query l1.uid in the subquery, so it should be WHERE p1.uid = l1.uid AND p2.uid = 1091 but mysql can't find the row.
status = 1 means user liked this page, status = 0 means user disliked this page.
Table structure here:
CREATE TABLE pages_likes
(
id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT,
uid INT NOT NULL,
date DATETIME NOT NULL,
item_id INT,
status TINYINT
);
CREATE INDEX item_index ON pages_likes (item_id);
CREATE INDEX uid_index ON pages_likes (uid);
CREATE TABLE users
(
id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT,
fb_uid VARCHAR(255),
email VARCHAR(100) NOT NULL,
pass VARCHAR(50) NOT NULL,
gender TINYINT NOT NULL,
birthdate DATE,
signup DATETIME NOT NULL,
lang VARCHAR(10) NOT NULL,
username VARCHAR(255),
prename VARCHAR(255) NOT NULL,
surname VARCHAR(255) NOT NULL,
projects VARCHAR(255) NOT NULL,
views INT DEFAULT 0,
verified DATETIME
);
CREATE UNIQUE INDEX id_index ON users (id);
CREATE INDEX uid_index ON users (id);
CREATE TABLE users_likes
(
id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
uid INT NOT NULL,
date DATETIME NOT NULL,
item_id INT,
status TINYINT
);
CREATE INDEX item_index ON users_likes (item_id);
CREATE INDEX uid_index ON users_likes (uid);

Have you tried different alias names for your subquery and then use alias from outer query? It works for me in this simple example: http://rextester.com/MJOL87502
Sadly I cannot test in your sqlfiddle, since that site often doesn't respond or throws errors (like it does now).
You could also use Window Functions and replace your subselect with something as simple as SUM(*) OVER (PARTITION BY p1.id, p1.item_id), but MySQL does not support Window Functions.

MySQL left join doesnt give me what i expect

I'd like some help with an left join statement thats not doing what i, probably incorrectly, think it should do.
there are two tables:
cd:
CREATE TABLE `cd` (
`itemID` int(11) NOT NULL AUTO_INCREMENT,
`title` text NOT NULL,
`artist` text NOT NULL,
`genre` text NOT NULL,
`tracks` int(11) NOT NULL,
PRIMARY KEY (`itemID`)
)
loans
CREATE TABLE `loans` (
`itemID` int(11) NOT NULL,
`itemType` varchar(20) NOT NULL,
`userID` int(11) NOT NULL,
`dueDate` date NOT NULL,
PRIMARY KEY (`itemID`,`itemType`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
and i want to select all cd's thats not in loans using a left join and then an where dueDate is null
select
t.itemID,
t.artist as first,
t. title as second,
(select AVG(rating) from ac9039.ratings where itemType = 'cd' and itemId = t.itemID) as `rating avarage`,
(select COUNT(rating) from ac9039.ratings where itemType = 'cd' and itemId = t.itemID) as `number of ratings`
from
cd t left join loans l
on t.itemID = l.itemID
where l.itemType = 'cd' and l.dueDate is null;
this one however returns an empty table even though there are plenty rows in cd with itemIDs thats not in loans
now i was under the understanding that the left join should preserv the righthandside and fill the columns from the lefthandside with null values
but this does not seem to be the case, can anbyone enlighten me?

Your WHERE condition causes the error. The L.ItemType = 'cd' will always return false if the L.DueDate IS NULL is true. (All of your fields are NOT NULL, so the DueDate can only be NULL if there is no matching records, but in this case the ItemType field will be NULL too).
Another point is that your query is semantically incorrect. You are trying to get the record from the cd table where the loans table do not contains any rows with dueDates.
The second table acts as a condition, so it should go to the WHERE conditions.
Consider to use the EXISTS statement to achieve your goal:
SELECT
t.itemID,
t.artist as first,
t. title as second,
(select AVG(rating) from ac9039.ratings where itemType = 'cd' and itemId = t.itemID) as `rating avarage`,
(select COUNT(rating) from ac9039.ratings where itemType = 'cd' and itemId = t.itemID) as `number of ratings`
FROM
cd t
WHERE
NOT EXISTS (SELECT 1 FROM loans l WHERE t.itemID = l.itemID AND L.itemType = 'cd')
Based on your data model you have to add another condition to the subquery to filter out those records which are out-of-date now (dueDate is earlier than the current time)
This is the case, when you do not delete outdated loan records.
NOT EXISTS (SELECT 1 FROM loans l WHERE t.itemID = l.itemID AND AND L.itemType = 'cd' l.dueDate > NOW())

MySQL update a table and select from the same table in a subquery

I have table of link
CREATE TABLE `linktable` (
`id ` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`idParent` BIGINT(20) UNSIGNED NOT NULL,
`Role` ENUM('Contacts','Expert','...') NULL DEFAULT NULL,
`idChild` BIGINT(20) UNSIGNED NOT NULL,
PRIMARY KEY (`idt`),
UNIQUE INDEX `UK_Parent_Child_Role` (`idParent`, `idChild`, `Role`)
)
I want to update this table and don’t break the unique key.
With other database I make something like this :
Update linktable lt1 Set lt1.Parent = :ziNew Where lt1.idParent = :ziOld
and not exists (select * from linktable lt2 where lt2.idParent = :ziNew and lt1.role = lt2.role and lt1.idChild = lt2.idChild);
How to make this with MySQL ?

Using your same syntax for variables, you would do this with a join:
Update linktable lt1 left outer join
(select *
from linktable lt2
where lt2.idParent = :ziNew
) lt2
on lt1.role = lt2.role and lt1.idChild = lt2.idChild
Set lt1.Parent = :ziNew
Where lt1.Parent =:ziOld and lt2.idParent is null;
The problem in MySQL is that the subquery is one the same table as the updated table. If it were a different table, then the original form with not exists would still work.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL query optimization - multiple rows from another table - mysql

You can use aggregation: select user_id from user_opts uo where opt3 = 'c' or opt2 = 1 group by user_id having sum(opt3 = 'c') >= 1 and sum(opt2 = 1) >= 1; This handles the case when the two options are set on the same row in user_opts.

Adding these two indexes would speed up the EXISTS: INDEX(user_id, opt1, opt3) INDEX(user_id, opt1, opt2) Your schema is a variant on EAV, which is notoriously inefficient and clumsy. Is there a good reason not to have opt2 and opt3 in users?

Related

Tweets implementation MySQL and queries

Mysql query with multiple selects results in high CPU load

Using rows from the first query in the sub-query?

MySQL left join doesnt give me what i expect

MySQL update a table and select from the same table in a subquery

Categories

Resources