My problem is that I have two different queries that work well in different
situations.
SCHEMA
messages
message_id, entity_id, message, timestamp
subscription
user_id, entity_id
users
user_id
entities
entity_id
Situation 1: Lots of message entries, and at least one relevant subscription entry
Situation 2: Few message entries and/or few, or zero, subscription entries that are relevant
My Two Queries are:
SELECT messages.*
FROM messages
STRAIGHT_JOIN subscription ON subscription.entity_id = messages.entity_id
WHERE subscription.user_id = 1
ORDER BY messages.timestamp DESC
LIMIT 50
This query works well in situation 1 (.000x seconds): Lots of message entries, and at least one relevant subscription entry. thisquery will take 1.7+ seconds in situation 2.
SELECT messages.*
FROM messages
INNER JOIN subscription ON subscription.entity_id = messages.entity_id
WHERE subscription.user_id = 1
ORDER BY messages.timestamp DESC
LIMIT 50
This query works well in situation 2 (.000x seconds): Few message entries and/or few, or zero, subscription entries that are relevant. This query will take 1.3+ seconds in situation 1.
Is there a query that I can use that can get the best of both worlds? If not, what's the best way to
handle this case?
Indexes:
( subscription.user_id, subscription.entity_id )
( subscription.entity_id )
( messages.entity_id, messages.timestamp )
( messages.timestamp )
EXPLAIN INFO
LIMIT 50
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | messages | index | idx_timestamp | idx_timestamp | 4 | NULL | 50 | |
| 1 | SIMPLE | subscription | eq_ref | PRIMARY,entity_id,user_id | PRIMARY | 16 | const, messages.entity_id | 1 | Using index |
Without Limit
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | messages | ALL | entity_id_2,entity_id | NULL | NULL | NUL | 255069 | Using filesort|
| 1 | SIMPLE | subscription | eq_ref | PRIMARY,entity_id,user_id | PRIMARY | 16 | const, messages.entity_id | 1 | Using index |
CREATE TABLE STATEMENTS:
With ~5000 rows
subscription | CREATE TABLE `subscription` (
`user_id` bigint(20) unsigned NOT NULL,
`entity_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`user_id`,`entity_id`),
KEY `entity_id` (`entity_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
with ~255,000 rows
messages | CREATE TABLE `messages` (
`message_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`entity_id` bigint(20) unsigned NOT NULL,
`message` varchar(255) NOT NULL DEFAULT '',
`timestamp` int(10) unsigned NOT NULL,
PRIMARY KEY (`message_id`),
KEY `entity_id` (`entity_id`,`timestamp`),
KEY `idx_timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Try changing your WHERE with a AND
SELECT messages.*
FROM messages
STRAIGHT_JOIN subscription ON subscription.entity_id = messages.entity_id
AND subscription.user_id = 1
ORDER BY messages.timestamp DESC
LIMIT 50
OR
SELECT messages.*
FROM messages
INNER JOIN subscription ON subscription.entity_id = messages.entity_id
AND subscription.user_id = 1
ORDER BY messages.timestamp DESC
LIMIT 50
OR may be this way :
SELECT messages.*
FROM subscription
STRAIGHT_JOIN messages ON subscription.entity_id = messages.entity_id
WHERE subscription.user_id = 1
ORDER BY messages.timestamp DESC
LIMIT 50
Related
I have two tables I want to join on one attribute (Sensor_id). Then I want to GROUP BY on the same attribute but I need the result is ORDER BY Timestamp DESC attribute. So I used a subquery to first ORDER BY Timestamp DESC and then the outer query will GROUP BY Sensor_id
First table: Sensors_colocation
=========================================================================================
| Sensor_id | Sensor_longitude | Sensor_latitude | Paese | Pseudonimo | limit1 | limit2 |
=========================================================================================
Second table: log
===========================================
| Id | Mac_reali | Mac_random | Timestamp |
===========================================
Using
SELECT * FROM log AS L JOIN Sensors_colocation AS S ON L.Id = S.Sensor_id ORDER BY L.Id ASC, L.Timestamp DESC
I get what I want on every of the two servers I have.
The problem is when I perform the full query
SELECT * FROM (
SELECT * FROM log AS L JOIN Sensors_colocation AS S ON L.Id = S.Sensor_id
ORDER BY L.Id ASC, L.Timestamp DESC) AS temp
GROUP BY temp.Id
on one server I get the results sorted by Timestamp DESC and grouped by Id. On the other server (that has the same structure but different data) I get the results sorted by Timestamp ASC and grouped by Id. I don't understand why if I use a subquery the ORDER BY I have in my inner query is not considered.
Can you help me?
EDIT: My goal is to have all the attributes of the joined tables but only the last entry speaking of Timestamp of every Id.
EDIT2:
Not working:
10.1.41-MariaDB-0+deb9u1
CREATE TABLE `log` (
`Id` int(11) NOT NULL,
`Mac_reali` int(11) NOT NULL,
`Mac_random` int(11) NOT NULL,
`Timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
ALTER TABLE `log`
ADD PRIMARY KEY (`Id`,`Timestamp`);
CREATE TABLE `Sensors_colocation` (
`Sensor_id` int(11) NOT NULL,
`Sensor_longitude` decimal(7,6) NOT NULL,
`Sensor_latitude` decimal(8,6) NOT NULL,
`Paese` varchar(32) NOT NULL,
`Pseudonimo` varchar(32) NOT NULL,
`limit1` int(11) NOT NULL,
`limit2` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
ALTER TABLE `Sensors_colocation`
ADD PRIMARY KEY (`Sensor_id`);
Working:
5.6.33-log
CREATE TABLE IF NOT EXISTS `log` (
`Id` int(11) NOT NULL,
`Mac_reali` int(11) NOT NULL,
`Mac_random` int(11) NOT NULL,
`Timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`Id`,`Timestamp`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `Sensors_colocation` (
`Sensor_id` int(11) NOT NULL,
`Sensor_longitude` decimal(7,6) NOT NULL,
`Sensor_latitude` decimal(8,6) NOT NULL,
`Paese` varchar(32) NOT NULL,
`Pseudonimo` varchar(32) NOT NULL,
`limit1` int(11) NOT NULL,
`limit2` int(11) NOT NULL,
PRIMARY KEY (`Sensor_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
EDIT3:
Consider the output of inner query (I do not write some attributes that we don't need)
Id | Mac_reali | Timestamp | Sensor_id | Pseudonimo
1 | 30 | "2019-09-29 17:27:33" | 1 | Manarola(Stazione)
1 | 23 | "2019-09-29 17:25:33" | 1 | Manarola(Stazione)
1 | 57 | "2019-09-29 17:23:33" | 1 | Manarola(Stazione)
2 | 12 | "2019-09-29 17:28:42" | 2 | Vernazza(Stazione)
2 | 33 | "2019-09-29 17:26:42" | 2 | Vernazza(Stazione)
2 | 12 | "2019-09-29 17:24:42" | 2 | Vernazza(Stazione)
3 | 23 | "2019-09-29 17:33:42" | 3 | Monterosso(Stazione)
3 | 17 | "2019-09-29 17:31:42" | 3 | Monterosso(Stazione)
3 | 16 | "2019-09-29 17:29:42" | 3 | Monterosso(Stazione)
From the "working" server, from the outer query I get
Id | Mac_reali | Timestamp | Sensor_id | Pseudonimo
1 | 30 | "2019-09-29 17:27:33" | 1 | Manarola(Stazione)
2 | 12 | "2019-09-29 17:28:42" | 2 | Vernazza(Stazione)
3 | 23 | "2019-09-29 17:33:42" | 3 | Monterosso(Stazione)
From the "not working" server I get the opposite speaking of Timestamp (as if ORDER BY is ignored)
Id | Mac_reali | Timestamp | Sensor_id | Pseudonimo
1 | 57 | "2019-09-29 17:23:33" | 1 | Manarola(Stazione)
2 | 12 | "2019-09-29 17:24:42" | 2 | Vernazza(Stazione)
3 | 16 | "2019-09-29 17:29:42" | 3 | Monterosso(Stazione)
My goal is to have all the attributes of the joined tables but only the last entry speaking of Timestamp of every Id.
Consider this approach that uses a correlated subquery to ensure that there is no other log record for the same id with a greater timestamp:
SELECT *
FROM log l
INNER JOIN sensors_colocation s ON l.id = s.sensor_id
WHERE NOT EXISTS (
SELECT 1
FROM log l1
WHERE l1.id = l.id AND l1.timestamp > l.timestamp
)
ORDER BY l.id ASC, l.timestamp DESC
If you are running MySQL 8.0, you can get the same result by using window function ROW_NUMBER() to rank records by descending timestamp within groups of records having the same id, and then filtering on the top record per group:
SELECT *
FROM (
SELECT
l.*,
s.*,
ROW_NUMBER() OVER(PARTITION BY l.id ORDER BY l.timestamp DESC) rn
FROM log l
INNER JOIN sensors_colocation s ON l.id = s.sensor_id
) x
WHERE rn = 1
Note: for performance, you need an index on log(id, timestamp).
I have a multi tenant application with a single database. I've a "entity" table where all objects are stored. "sahred_entity" table is used to store objects that are shared by a Tenant X to Tenant Y. For example "Tenant 2" can share "Entity with ID 4" to "Tenant 1".
In the example below "Entity with ID 4" is shared to "Tenant 1" and "Tenant 3"
+--------+--------------------------------------------------
| Table | Create Table
+--------+--------------------------------------------------
| entity | CREATE TABLE `entity` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`tenant_id` int(10) unsigned NOT NULL,
`added_at` timestamp NOT NULL,
`color` varchar(20) NOT NULL,
`size` varchar(5) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1 |
+--------+--------------------------------------------------
+---------------+---------------------------------------
| Table | Create Table
+---------------+---------------------------------------
| shared_entity | CREATE TABLE `shared_entity` (
`tenant_to` int(10) unsigned NOT NULL,
`tenant_from` int(10) unsigned NOT NULL,
`entity_id` int(10) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+---------------+---------------------------------------
The sample data is
select * from entity;
+----+-----------+---------------------+--------+------+
| id | tenant_id | added_at | color | size |
+----+-----------+---------------------+--------+------+
| 1 | 1 | 2019-03-07 00:00:00 | red | m |
| 2 | 1 | 2019-03-07 00:00:00 | green | xl |
| 3 | 2 | 2019-03-07 00:00:00 | green | xl |
| 4 | 2 | 2019-03-07 00:00:00 | red | m |
| 5 | 3 | 2019-03-07 00:00:00 | yellow | l |
+----+-----------+---------------------+--------+------+
select * from shared_entity;
+-----------+-------------+-----------+
| tenant_to | tenant_from | entity_id |
+-----------+-------------+-----------+
| 1 | 2 | 4 |
| 3 | 2 | 4 |
+-----------+-------------+-----------+
Now I need to create a simple search query. For now I found two ways how to do it. The first is via self joining
SELECT e.* FROM `entity` as e
LEFT JOIN entity as e1 ON (e.id = e1.id AND e1.tenant_id = 1)
LEFT JOIN entity as e2 ON (e.id = e2.id AND e2.id IN (4))
WHERE (e1.id IS NOT NULL OR e2.id IS NOT NULL) AND e.`color` = 'red';
The second is via sub query and union
SELECT * FROM
(
SELECT * FROM entity as e1 WHERE e1.tenant_id = 1
UNION
SELECT * FROM entity as e2 WHERE e2.id IN(4)
) as entity
WHERE color = 'red';
Both of queries return expected result
+----+-----------+---------------------+-------+------+
| id | tenant_id | added_at | color | size |
+----+-----------+---------------------+-------+------+
| 1 | 1 | 2019-03-07 00:00:00 | red | m |
| 4 | 2 | 2019-03-07 00:00:00 | red | m |
+----+-----------+---------------------+-------+------+
But which approach is better for large tables? How to create right index? Or maybe there is a better solution?
You could also use the following query to get the same results
SELECT *
FROM entity
WHERE (tenant_id = 1 or id = 4) AND color = 'red'
It is not clear to me why you need all the joins
Every table should have a PRIMARY KEY. shared_entity needs PRIMARY KEY(tenant_from, tenant_to, entity_id); any order would probably suffice.
As for performance, hogan's suggestion, together with INDEX(color), is fine for a small table:
SELECT *
FROM entity
WHERE (tenant_id = 1 OR id = 4)
AND color = 'red'
But OR prevents most forms of optimization. If color is selective enough, then this is not a problem; it will simply scan through all the "red" items checking each for tenent_id and for id.
If there are thousands of red items, this will run faster:
( SELECT *
FROM entity
WHERE tenant_id = 1
AND color = 'red' )
UNION DISTINCT
( SELECT *
FROM entity
WHERE id = 4
AND color = 'red' )
together with
INDEX(color, tenant_id) -- in either order
-- PRIMARY KEY(id) -- already exists and is unique
UNION DISTINCT can be sped up to UNION ALL if you know that tenant-1 and id-4 don't refer to the same row.
I have two tables, item and prices.
The prices table holds the price of each item in the item table. There is one extra field named counter in the prices table - which is stored and incremented by one in periodic manner. So, for each counter there will be a set of N rows in the prices table, where N is the number of items in the item table.
CREATE TABLE `item` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`name` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
)
CREATE TABLE `prices` (
`id` bigint(9) NOT NULL AUTO_INCREMENT,
`price` float(10,2) NOT NULL,
`ts` datetime NOT NULL,
`counter` int(10) unsigned DEFAULT NULL,
`item_id` int(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ts` (`ts`),
KEY `counter` (`item_id`,`counter`) USING BTREE
) ENGINE=InnoDB
Now, I need to find out the maximum value of price corresponding to each item - on any particular day - along with the counter value.
There could be multiple rows with the same price for any item - need to select the first occurrence.
I have tried the following query.
SELECT p.id, p.counter, max.price
FROM prices AS p
JOIN (
SELECT item_id, MAX( price ) as price
FROM prices
WHERE ts
BETWEEN '2017-07-28 00:00:00'
AND '2017-07-28 23:59:59'
GROUP BY item_id
)
max
ON max.item_id = p.item_id
and p.price= max.price
This doesn't give the desired result.
How do I correct my query?
Thanks.
Edit - Sample data
select id, item_id, price,counter from prices order by item_id, price;
+-------+---------+-------+---------+
| id | item_id | price | counter |
+-------+---------+-------+---------+
| 30192 | 54 | 18.95 | 200 |
| 15061 | 54 | 19.15 | 100 |
| 7503 | 54 | 19.45 | 50 |
| 22598 | 54 | 19.75 | 150 |
| 30127 | 100 | 30.20 | 200 |
| 22569 | 100 | 30.35 | 150 |
| 15033 | 100 | 30.35 | 100 |
| 7460 | 100 | 30.90 | 50 |
| 15084 | 115 | 25.35 | 100 |
| 7533 | 115 | 25.65 | 50 |
| 22623 | 115 | 25.75 | 150 |
| 30152 | 115 | 26.20 | 200 |
+-------+---------+-------+---------+
Need to get the following output.
id, item_id, price, counter
22598 1 19.75 150
7460 2 30.90 50
30152 3 26.20 200
ps: neglected timestamp for the time being.
You shouldn't do a join, just try a subselect, ie:
SELECT p.id, p.counter, p.max_price FROM
(SELECT item_id, MAX( price ) as max_price
FROM prices
WHERE ts
BETWEEN '2017-07-28 00:00:00'
AND '2017-07-28 23:59:59'
GROUP BY item_id) AS p;
Listings table
+------------+---------+
| name | id |
+------------+---------+
| Example 1 | 1 |
| Example 2 | 2 |
| Example 3 | 3 |
| Example 4 | 4 |
| Example 5 | 5 |
| Example 6 | 6 |
+------------+---------+
Categories table
+------------+---------+
| name | id |
+------------+---------+
| Catname 1 | 1 |
| Catname 2 | 2 |
| Catname 3 | 3 |
+------------+---------+
ListingCats table
+--------+---------+
| cat_id | list_id |
+--------+---------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
| 3 | 3 |
| 2 | 2 |
| 1 | 5 |
| 2 | 6 |
+--------+---------+
I am trying to build 2 queries which should be simple.
The first thing needed is to get a count of how many listings in the listings table corelate to a given category ID in the listingcats table.
The second part is getting all of the data (*) in the rows from the listings table that corelate to the given category id in the listingcats table.
I have tried a number of joins and for some reason none want to work properly. Can anyone help based on the example tables given above please. The 'given' category ID in this case would be '1'.
For the first query, you can use a simple join, and return a count
SELECT COUNT(Name)
FROM Listings l
JOIN ListingCats lc ON l.id = lc.cat_id
WHERE lc.cat_id = 1
This will return all rows from the listings table such that the listings id has a corresponding cat_id in the listingcats table, but exclusive to those that have a cat_id of 1. Then, the count aggregate function returns the number of rows.
For the second one, you can just use the same subquery above, but without the aggregate function, and select all values.
SELECT * FROM Listings l
JOIN ListingCats lc ON l.id = lc.cat_id
WHERE lc.cat_id = 1
Try those, please let me know if they work or not and I will try to work through them more with you.
EDIT
After looking back at the question, if you are given a specific cat_id you don't even need to use a join, you can simply query the listings table for one that has that id. If the given id is one:
SELECT COUNT(Name)
FROM Listings l
WHERE l.id = 1
And then again, even more broad for the second one:
SELECT * FROM Listings l WHERE l.id = 1
CREATE TABLE `listings` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(10) NOT NULL default '0',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `categories` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(10) NOT NULL default '0',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `listings_cats` (
`cat_id` int(10) unsigned NOT NULL,
`list_id` int(10) unsigned NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
SELECT c.id, c.name, COUNT(lc.list_id) as the_count
FROM categories c
JOIN listings_cats lc ON (lc.cat_id = c.id)
GROUP BY c.id;
SELECT l.id, l.name, c.name AS category_name
FROM listings l JOIN listings_cats lc ON (lc.list_id = l.id)
JOIN categories c ON (lc.cat_id = c.id);
Suppose, we have a table:
SELECT * FROM users_to_courses;
+---------+-----------+------------+---------+
| user_id | course_id | pass_date | file_id |
+---------+-----------+------------+---------+
| 1 | 1 | 2014-01-01 | 1 |
| 1 | 1 | 2014-01-01 | 2 |
| 1 | 1 | 2014-02-01 | 3 |
| 1 | 1 | 2014-02-01 | 4 |
+---------+-----------+------------+---------+
Schema:
CREATE TABLE `users_to_courses` (
`user_id` int(10) unsigned NOT NULL,
`course_id` int(10) unsigned NOT NULL,
`pass_date` date NOT NULL,
`file_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`user_id`, `course_id`, `pass_date`, `file_id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
One user can pass a certain course multiple times, and every time he passes multiple certificates can be generated. user_id and course_id store the links to users and courses tables. file_id - to files table, where info about certificate files is stored.
In our example user #1 has passed course #1 twice and every time 2 certificates were issued: 4 records totally.
How can I get this data: for user_id=1 for every course get MAX(pass_date) and all the files, attached to this date. So far I could only get this:
SELECT
users_to_courses.course_id,
MAX(users_to_courses.pass_date) AS max_passed_date,
GROUP_CONCAT(users_to_courses.file_id SEPARATOR ',') AS files
FROM
users_to_courses
WHERE
users_to_courses.user_id=1
GROUP BY
users_to_courses.course_id;
+-----------+-----------------+---------+
| course_id | max_passed_date | files |
+-----------+-----------------+---------+
| 1 | 2014-02-01 | 1,2,3,4 |
+-----------+-----------------+---------+
I need this:
+-----------+-----------------+---------+
| course_id | max_passed_date | files |
+-----------+-----------------+---------+
| 1 | 2014-02-01 | 3,4 |
+-----------+-----------------+---------+
I think, this requires a compound GROUP BY.
fiddle
Try the below query it first gets max date for all the records and then we can join only those record in the outer query. You can use the same query for more than one user by adding group by utc.user_id
SELECT
utc.course_id,
mdt.maxDate AS max_passed_date,
GROUP_CONCAT(utc.file_id SEPARATOR ',') AS files
FROM
users_to_courses utc
join
(SELECT MAX(pass_date) AS maxDate, course_id cId, user_id uId
FROM users_to_courses GROUP BY user_id, course_id) AS mdt
ON
mdt.uId = utc.user_id
AND
mdt.cId = utc.course_id
AND
mdt.maxDate = utc.pass_date
WHERE
utc.user_id=1
GROUP BY
utc.course_id;