MySQL JOIN with SUBQUERY very slow - mysql

I have a forum and I would like to see the latest topics with the author's name and the last user who answered
Table Topic (forum)
| idTopic | IdParent | User | Title | Text |
--------------------------------------------------------
| 1 | 0 | Max | Help! | i need somebody |
--------------------------------------------------------
| 2 | 1 | Leo | | What?! |
Query:
SELECT
Question.*,
Response.User AS LastResponseUser
FROM Topic AS Question
LEFT JOIN (
SELECT User, IdParent
FROM Topic
ORDER BY idTopic DESC
) AS Response
ON ( Response.IdParent = Question.idTopic )
WHERE Question.IdParent = 0
GROUP BY Question.idTopic
ORDER BY Question.idTopic DESC
Output:
| idTopic | IdParent | User | Title | Text | LastResponseUser |
---------------------------------------------------------------------------
| 1 | 0 | Max | Help! | i need somebody | Leo |
---------------------------------------------------------------------------
Example:
http://sqlfiddle.com/#!2/22f72/4
The query works, but is very slow (more or less 0.90 seconds over 25'000 record).
How can I make it faster?
UPDATE
comparison between the proposed solutions
http://sqlfiddle.com/#!2/94068/22

If using your current schema, I'd recommend adding indexes (particularly a clustered index (primary key)) and simplifying your SQL to let mySQL do the work of optimising the statement, rather than forcing it to run a subquery, sort the results, then run the main query.
CREATE TABLE Topic (
idTopic INT
,IdParent INT
,User VARCHAR(100)
,Title VARCHAR(255)
,Text VARCHAR(255)
,CONSTRAINT Topic_PK PRIMARY KEY (idTopic)
,CONSTRAINT Topic_idTopic_UK UNIQUE (idTopic)
,INDEX Topic_idParentIdTopic_IX (idParent, idTopic)
);
INSERT INTO Topic (idTopic, IdParent, User, Title, Text) VALUES
(1, 0, 'Max', 'Help!', 'i need somebody'),
(2, 1, 'Leo', '', 'What!?');
SELECT Question.*
, Response.User AS LastResponseUser
FROM Topic AS Question
LEFT JOIN Topic AS Response
ON Response.IdParent = Question.idTopic
WHERE Question.IdParent = 0
order by Question.idTopic
;
http://sqlfiddle.com/#!2/7f1bc/1
Update
In the comments you mentioned you only want the most recent response. For that, try this:
SELECT Question.*
, Response.User AS LastResponseUser
FROM Topic AS Question
LEFT JOIN (
select a.user, a.idParent
from Topic as a
left join Topic as b
on b.idParent = a.idParent
and b.idTopic > a.idTopic
where b.idTopic is null
) AS Response
ON Response.IdParent = Question.idTopic
WHERE Question.IdParent = 0
order by Question.idTopic
;
http://sqlfiddle.com/#!2/7f1bc/3

Assuming the highest IDTopic is the last responses user...
and assuming you want to return topics without responses...
Select A.IDTopic, A.IDParent, A.User, A.Title, A.Text,
case when b.User is null then 'No Response' else B.User end as LastReponseUser
FROM topic A
LEFT JOIN Topic B
on A.IdTopic = B.IDParent
and B.IDTopic = (Select max(IDTopic) from Topic
where IDParent=B.IDParent group by IDParent)
WHERE A.IDParent =0

Related

SQL where not exists with multiple rows and status

I have the following tables (minified for the sake of simplicity):
CREATE TABLE IF NOT EXISTS `product_bundles` (
bundle_id int AUTO_INCREMENT PRIMARY KEY,
-- More columns here for bundle attributes
) ENGINE=InnoDB;
CREATE TABLE IF NOT EXISTS `product_bundle_parts` (
`part_id` int AUTO_INCREMENT PRIMARY KEY,
`bundle_id` int NOT NULL,
`sku` varchar(255) NOT NULL,
-- More columns here for product attributes
KEY `bundle_id` (`bundle_id`),
KEY `sku` (`sku`)
) ENGINE=InnoDB;
CREATE TABLE IF NOT EXISTS `products` (
`product_id` mediumint(8) AUTO_INCREMENT PRIMARY KEY,
`sku` varchar(64) NOT NULL DEFAULT '',
`status` char(1) NOT NULL default 'A',
-- More columns here for product attributes
KEY (`sku`),
) ENGINE=InnoDB;
And I want to show only the 'product bundles' that are currently completely in stock and defined in the database (since these get retrieved from a third party vendor, there is no guarantee the SKU is defined). So I figured I'd need an anti-join to retrieve it accordingly:
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE 1
AND NOT EXISTS (
SELECT *
FROM product_bundle_parts AS parts
LEFT JOIN products AS products ON parts.sku = products.sku
WHERE parts.bundle_id = bundles.bundle_id
AND products.status = 'A'
AND products.product_id IS NULL
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
Now, I sincerely thought this would filter out the products by status, however, that seems not to be the case. I then changed one thing up a bit, and the query never finished (although I believe it to be correct):
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE 1
AND NOT EXISTS (
SELECT *
FROM product_bundle_parts AS parts
LEFT JOIN products AS products ON parts.sku = products.sku
AND products.status = 'A'
WHERE parts.bundle_id = bundles.bundle_id
AND products.product_id IS NULL
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
Example data:
product_bundles
bundle_id | etc.
1 |
2 |
3 |
product_bundle_parts
part_id | bundle_id | sku
1 | 1 | 'sku11'
2 | 1 | 'sku22'
3 | 1 | 'sku33'
4 | 1 | 'sku44'
5 | 2 | 'sku55'
6 | 2 | 'sku66'
7 | 3 | 'sku77'
8 | 3 | 'sku88'
products
product_id | sku | status
101 | 'sku11' | 'A'
102 | 'sku22' | 'A'
103 | 'sku33' | 'A'
104 | 'sku44' | 'A'
105 | 'sku55' | 'D'
106 | 'sku66' | 'A'
107 | 'sku77' | 'A'
108 | 'sku99' | 'A'
Example result: Since the product status of product #105 is 'D' and 'sku88' from part #8 was not found:
bundle_id | etc.
1 |
I am running Server version: 10.3.25-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04
So there are a few questions I have.
Why does the first query not filter out products that do not have the status A.
Why does the second query not finish?
Are there alternative ways of achieving the same thing in a more efficient matter, as this looks rather cumbersome.
First of all, I've read that SQL_CALC_FOUND_ROWS * is much slower than running two separate query (COUNT(*) and then SELECT * or, if you make your query inside another programming language, like PHP, executing the SELECT * and then count the number of rows of the result set)
Second: your first query returns all the boundles that doesn't have ANY active products, while you need the boundles with ALL products active.
I'd change it in the following:
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE NOT EXISTS (
SELECT 'x'
FROM product_bundle_parts AS parts
LEFT JOIN products ON (parts.sku = products.sku)
WHERE parts.bundle_id = bundles.bundle_id
AND COALESCE(products.status, 'X') != 'A'
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
I changed the products.status = 'A' in products.status != 'A': in this way the query will return all the boundles that DOESN'T have inactive products (I also removed the condition AND products.product_id IS NULL because it should have been in OR, but with a loss in performance).
You can see my solution in SQLFiddle.
Finally, to know why your second query doesn't end, you should check the structure of your tables and how they are indexed. Executing an Explain on the query could help you to find eventual issues on the structure. Just put the keyword EXPLAIN before the SELECT and you'll have your "report" (EXPLAIN SELECT * ....).

SQL Order by parent and child

Basically I need help in my query here. I want to be in right order which is child must be under parents name and in A-Z order. But if I add a subChild under child (Split 1) seem the order is wrong. It should be under Room Rose.
p/s : A subChild also can create another subChild
HERE I PROVIDE A DEMO
Appreciate your help me get this ordered correctly?
SELECT A.venueID
, B.mainVenueID
, A.venueName
FROM tblAdmVenue A
LEFT
JOIN tblAdmVenueLink B
ON A.venueID = B.subVenueID
ORDER
BY COALESCE(B.mainVenueID, A.venueID)
, B.mainVenueID IS NOT NULL
, A.venueID
I want it return an order something like this.
venueName
--------------
Banquet
Big Room
-Room Daisy
-Room Rose
-Split 1
Hall
-Meeting Room WP
Seem this recursive approach also in not working
WITH venue_ctg AS (
SELECT A.venueID, A.venueName, B.mainVenueID
FROM tblAdmVenue A LEFT JOIN tblAdmVenueLink B
ON A.venueID = B.subVenueID
WHERE B.mainVenueID IS NULL
UNION ALL
SELECT A.venueID, A.venueName, B.mainVenueID
FROM tblAdmVenue A LEFT JOIN tblAdmVenueLink B
ON A.venueID = B.subVenueID
WHERE B.mainVenueID IS NOT NULL
)
SELECT *
FROM venue_ctg ORDER BY venueName
output given
For your data you can use this:
To display this correctly, you can use a SEPARATPR like comma, and split the returned data, and check the hirarchy
-- schema
CREATE TABLE tblAdmVenue (
venueID VARCHAR(225) NOT NULL,
venueName VARCHAR(225) NOT NULL,
PRIMARY KEY(venueID)
);
CREATE TABLE tblAdmVenueLink (
venueLinkID VARCHAR(225) NOT NULL,
mainVenueID VARCHAR(225) NOT NULL,
subVenueID VARCHAR(225) NOT NULL,
PRIMARY KEY(venueLinkID)
-- FOREIGN KEY (DepartmentId) REFERENCES Departments(Id)
);
-- data
INSERT INTO tblAdmVenue (venueID, venueName)
VALUES ('LA43', 'Big Room'), ('LA44', 'Hall'),
('LA45', 'Room Daisy'), ('LA46', 'Room Rose'),
('LA47', 'Banquet'), ('LA48', 'Split 1'),
('LA49', 'Meeting Room WP');
INSERT INTO tblAdmVenueLink (venueLinkID, mainVenueID, subVenueID)
VALUES ('1', 'LA43', 'LA45'), ('2', 'LA43', 'LA46'),
('3', 'LA46', 'LA48'), ('4', 'LA44', 'LA49');
✓
✓
✓
✓
with recursive cte (subVenueID, mainVenueID,level) as (
select subVenueID,
mainVenueID, 1 as level
from tblAdmVenueLink
union
select p.subVenueID,
cte.mainVenueID,
cte.level+1
from tblAdmVenueLink p
inner join cte
on p.mainVenueID = cte.subVenueID
)
select
CONCAT(GROUP_CONCAT(b.venueName ORDER BY level DESC SEPARATOR '-->') ,'-->',a.venueName)
from cte c
LEFT JOIN tblAdmVenue a ON a.venueID = c.subVenueID
LEFT JOIN tblAdmVenue b ON b.venueID = c.mainVenueID
GROUP BY subVenueID;
| CONCAT(GROUP_CONCAT(b.venueName ORDER BY level DESC SEPARATOR '-->') ,'-->',a.venueName) |
| :----------------------------------------------------------------------------------------- |
| Big Room-->Room Daisy |
| Big Room-->Room Rose |
| Big Room-->Room Rose-->Split 1 |
| Hall-->Meeting Room WP |
db<>fiddle here
You want your data ordered in alphabetical order and depth first.
A common solution for this is to traverse the structure from the top element, concatenating the path to each item as you go. You can then directly use the path for ordering.
Here is how to do it in MySQL 8.0 with a recursive query
with recursive cte(venueID, venueName, mainVenueID, path, depth) as (
select v.venueID, v.venueName, cast(null as char(100)), venueName, 0
from tblAdmVenue v
where not exists (select 1 from tblAdmVenueLink l where l.subVenueID = v.venueID)
union all
select v.venueID, v.venueName, c.venueID, concat(c.path, '/', v.venueName), c.depth + 1
from cte c
inner join tblAdmVenueLink l on l.mainVenueID = c.venueID
inner join tblAdmVenue v on v.venueID = l.subVenueID
)
select * from cte order by path
The anchor of the recursive query selects top nodes (ie rows whose ids do not exist in column subVenueID of the link table). Then, the recursive part follows the relations.
As a bonus, I added a level column that represents the depth of each node, starting at 0 for top nodes.
Demo on DB Fiddle:
venueID | venueName | mainVenueID | path | depth
:------ | :-------------- | :---------- | :------------------------- | ----:
LA47 | Banquet | null | Banquet | 0
LA43 | Big Room | null | Big Room | 0
LA45 | Room Daisy | LA43 | Big Room/Room Daisy | 1
LA46 | Room Rose | LA43 | Big Room/Room Rose | 1
LA48 | Split 1 | LA46 | Big Room/Room Rose/Split 1 | 2
LA44 | Hall | null | Hall | 0
LA49 | Meeting Room WP | LA44 | Hall/Meeting Room WP | 1
Use only one table, not two. The first table has all the info needed.
Then start the CTE with the rows WHERE mainVenueID IS NULL, no JOIN needed.
This may be a good tutorial: https://stackoverflow.com/a/18660789/1766831
Its 'forest' is close to what you want.
I suppose you have:
table tblAdmVenue A is the venue list; and
table tblAdmVenueLink B is the tree relation table for parent-child
For your question on how to get a correct sorting order, I think one of the trick is to concatenate the parent venue names.
with q0(venueID, venueName, mainVenueID, venuePath) as (
select
A.venueID,
A.venueName,
null,
A.venueName
from tblAdmVenue A
left join tblAdmVenue B on A.venueID = B.subVenueID
where B.mainVenueID is null
union all
select
A.venueID,
A.venueName,
q0.venueID,
q0.venuePath + char(9) + A.venueName
from q0
inner join tblAdmVenue B on q0.venueID = B.mainVenueID
inner join tblAdmVenue A on A.venueID = B.subVenueID
)
select venueID, venueName, mainVenueID
from q0
order by venuePath

Create trigger for several rows

I have table users AND orders. After every UPDATE row in orders. I want update DATA in users table namely concat(OLD.DATA + ID which was updated).
Table 'users'.
ID NAME DATA
1 John 1|2
2 Michael 3|4
3 Someone 5
Table 'orders'.
ID USER CONTENT
1 1 ---
2 1 ---
3 2 ---
4 2 ---
5 3 ---
For example:
SELECT `data` from `users` where `id` = 2; // Result: 3|4
UPDATE `orders` SET '...' WHERE `id` > 0;
**NEXT LOOP**
UPDATE `users` SET `data` = concat(OLD.data, ID.rowUpdated) WHERE `user` = 1;
UPDATE `users` SET `data` = concat(OLD.data, ID.rowUpdated) WHERE `user` = 1;
UPDATE `users` SET `data` = concat(OLD.data, ID.rowUpdated) WHERE `user` = 2;
UPDATE `users` SET `data` = concat(OLD.data, ID.rowUpdated) WHERE `user` = 2;
UPDATE `users` SET `data` = concat(OLD.data, ID.rowUpdated) WHERE `user` = 3;
Result:
SELECT data from users where id = 1; // Result: 1|2|1|2
SELECT data from users where id = 2; // Result: 3|4|3|4
SELECT data from users where id = 3; // Result: 5|5
How can I do it?
I think you are making the same mistake I made not too long ago, ie storing an array/object in a column.
I would recommend using the following tables in your scenario:
users
+-----------+-----------+
| id | user_name |
+-----------+-----------+
| 1 | John |
+-----------+-----------+
| 2 | Michael |
+-----------+-----------+
orders
+-----------+-----------+------------+
| id | user_id |date_ordered|
+-----------+-----------+------------+
| 1 | 1 | 2019-03-05 |
+-----------+-----------+------------+
| 2 | 2 | 2019-03-05 |
+-----------+-----------+------------+
Where user_id is the foreign key to users
sales
+-----------+-----------+------------+------------+------------+
| id | order_id | item_sku | qty | price |
+-----------+-----------+------------+------------+------------+
| 1 | 1 | 1001 | 1 | 2.50 |
+-----------+-----------+------------+------------+------------+
| 2 | 1 | 1002 | 2 | 3.00 |
+-----------+-----------+------------+------------+------------+
| 3 | 2 | 1001 | 2 | 2.00 |
+-----------+-----------+------------+------------+------------+
where order_id is the foreign key to orders
Now for the confusing part. You will need to use a series of JOINs to access the relevant data for each user.
SELECT
t3.id AS user_id,
t3.user_name,
t1.id AS order_id,
t1.date_ordered,
SUM((t2.price * t2.qty)) AS order_total
FROM orders t1
JOIN sales t2 ON (t2.order_id = t1.id)
LEFT JOIN users t3 ON (t1.user_id = t3.id)
WHERE user_id=1
GROUP BY order_id;
This will return:
+-----------+--------------+------------+------------+--------------+
| user_id | user_name | order_id |date_ordered| order_total |
+-----------+--------------+------------+------------+--------------+
| 1 | John | 1 | 2019-03-05 | 8.50 |
+-----------+--------------+------------+------------+--------------+
These type of JOIN statements should come up in basically any project using a relational database (that is, if you are designing your DB correctly). Typically I create a view for each of these complicated queries, which can then be accessed with a simple SELECT * FROM orders_view
For example:
CREATE
ALGORITHM = UNDEFINED
DEFINER = `root`#`localhost`
SQL SECURITY DEFINER
VIEW orders_view AS (
SELECT
t3.id AS user_id,
t3.user_name,
t1.id AS order_id,
t1.date_ordered,
SUM((t2.price * t2.qty)) AS order_total
FROM orders t1
JOIN sales t2 ON (t2.order_id = t1.id)
LEFT JOIN users t3 ON (t1.user_id = t3.id)
GROUP BY order_id
)
This can then be accessed by:
SELECT * FROM orders_view WHERE user_id=1;
Which would return the same results as the query above.
Depending on your needs, you will probably need to add a few more tables (addresses, products etc.) and several more rows to each of these tables. Very often you will find that you need to JOIN 5+ tables into a view, and sometimes you might need to JOIN the same table twice.
I hope this helps despite it not exactly answering your question!
It is probably a bad idea to update the USERS table after inserting into (or updating) the ORDERS table. Avoid storing data twice. In your case: you can always get all "order ids" for a user by querying the ORDERS table. Thus, you don't need to store them in the USERS table (again). Example (tested with MySQL 8.0, see dbfiddle):
Tables and data
create table users( id integer primary key, name varchar(30) ) ;
insert into users( id, name ) values
(1, 'John'),(2, 'Michael'),(3, 'Someone') ;
create table orders(
id integer primary key
, userid integer
, content varchar(3) references users (id)
);
insert into orders ( id, userid, content ) values
(101, 1, '---'),(102, 1, '---')
,(103, 2, '---'),(104, 2, '---'),(105, 3, '---') ;
Maybe a VIEW - similar to the one below - will do the trick. (Advantage: you don't need additional columns or tables.)
-- View
-- Inner SELECT: group order ids per user (table ORDERS).
-- Outer SELECT: fetch the user name (table USERS)
create or replace view userorders (
userid, username, userdata
)
as
select
U.id, U.name, O.orders_
from (
select
userid
, group_concat( id order by id separator '|' ) as orders_
from orders
group by userid
) O join users U on O.userid = U.id ;
Once the view is in place, you can just SELECT from it, and you will always get the current "userdata" eg
select * from userorders ;
-- result
userid username userdata
1 John 101|102
2 Michael 103|104
3 Someone 105
-- add some more orders
insert into orders ( id, userid, content ) values
(1000, 1, '***'),(4000, 1, '***'),(7000, 1, '***')
,(2000, 2, ':::'),(5000, 2, ':::'),(8000, 2, ':::')
,(3000, 3, '###'),(6000, 3, '###'),(9000, 3, '###') ;
select * from userorders ;
-- result
userid username userdata
1 John 101|102|1000|4000|7000
2 Michael 103|104|2000|5000|8000
3 Someone 105|3000|6000|9000

Efficient multiple subqueries from multiple tables with OR and ORDER BY/LIMIT

Question covers doubts on efficient SQL query for multiple subqueries:
I have 3 tables. I want to get details from table 1, based on filtering done from table 2 and table 3. Currently I am using IN clause on table 2 and table 3 but it takes around 6 seconds for 2M users. I tried join also but it was slower than subquery.
Table1:
mysql> describe users;
Field | Type | Null | Key | Default
| uuid | varchar(36) | NO | PRI | NULL
| firstname | varchar(512) | YES | | NULL
| status | varchar(512) | YES | | NULL
| createdAt | timestamp | YES | | CURRENT_TIMESTAMP
Table 2:
describe homes;
| Field | Type | Null | Key | Default | Extra
| uuid | varchar(50) | NO | PRI | NULL
| phoneNumberHash | varchar(512) | YES | MUL | NULL
| secondaryPhoneNumberHash | varchar(512) | YES | MUL | NULL
Table 3:
describe utility_tags:
| Field | Type | Null | Key | Default |
| tag_name | varchar(50) | NO | MUL | NULL |
| tag_value | varchar(50) | NO | MUL | NULL |
| user_id | varchar(50) | NO | MUL | NULL |
I have index on all the required fields ie.
User Table : Index on uuid
Home Table : Separate Index on phoneNumberHash and secondaryPhoneNumberHash
Utility_Tags: Separate Index on tag_name and tag_value
Query I am running:
SELECT uuid, firstname
FROM users
WHERE ( uuid in (
SELECT `uuid`
FROM `homes`
WHERE ( ( `phoneNumberHash` = '02c' OR `secondaryPhoneNumberHash` = '02c' ))
)
OR uuid in (
SELECT `user_id`
FROM `utility_tags`
WHERE ( `tag_name` = 'ACCOUNT_NUMBER' AND `tag_value`= '13' )
))
AND `status` != 'DELETED'
ORDER BY `createdAt` DESC LIMIT 10 OFFSET 0;
The query is slow and takes around 6 sec when there are 2M rows in user and homes table.
I tried join query:
SELECT users.uuid, firstname
FROM users inner join homes on homes.uuid=users.uuid
inner join utility_tags on utility_tags.user_id=users.uuid
WHERE ( phoneNumberHash = '02c' OR secondaryPhoneNumberHash = '02cd0' )
OR ( tag_name = 'ACCOUNT_NUMBER' AND tag_value= '1311851988' )
AND `status` != 'DELETED'
ORDER BY `createdAt` DESC
LIMIT 10 OFFSET 0;
This takes around 30 seconds.
Any help is highly appreciated.
You are selecting certain rows from your users table based on matches in your other tables. You're using a complex IN( ... ) clause for that.
Let's look at the contents of that clause for optimization possibilities. Here's one way you generate a set of uuid values.
SELECT uuid
FROM homes
WHERE phoneNumberHash = '02c'
OR secondaryPhoneNumberHash = '02c'
Here's the other
SELECT user_id
FROM utility_tags
WHERE tag_name = 'ACCOUNT_NUMBER'
AND tag_value= '13'
Let's recast all this as a UNION of several sets of uuid values, like this.
SELECT uuid FROM homes WHERE phoneNumberHash = '02c'
UNION
SELECT uuid FROM homes WHERE secondaryPhoneNumberHash = '02c'
UNION
SELECT user_id AS uuid
FROM utility_tags
WHERE tag_name = 'ACCOUNT_NUMBER'
AND tag_value= '13'
That union of three queries does the same thing as all your OR clauses. The first two of those queries should (if you're using InnoDB) be optimized by the indexes on phoneNumberHash and secondaryPhoneNumberHash respectively. The third query in that union needs a compound index on (tag_name, tag_value, user_id) to perform efficiently.
The cool thing about UNION is it does the same sort of set creation as OR, but lets you write queries within the UNION that are more likely to use indexes. I suggest you experiment with this UNION query and appropriate indexes until you're happy with its performance. Then you can use it in your outer query.
(It's possible that the query planner has become smart enough to handle phoneNumberHash = '02c' OR secondaryPhoneNumberHash = '02c' as a UNION all by itself, exploiting your two indexes one after the other. Recent MySQL versions have made great progress in query planning.)
So that leaves us with the outer query:
SELECT uuid, firstname
FROM users
WHERE matching uuids
AND status != 'DELETED'
ORDER BY createdAt DESC
LIMIT 10 OFFSET 0
This is hard to make sargable. The query planner doesn't like != operators. It likes = best because index equality scans are cheap. It likes <, <=, >=, and > OK because range scans are almost as cheap. But you're stuck with !=.
Also, the query planner hates ORDER BY ... LIMIT because it has to sort a whole mess of rows just to discard all except a tiny number.
The following compound covering index MAY optimize this query: (createdAt, status, uuid, firstname). The query planner may be able to dodge the separate ORDER BY if it has an index that provides both the match criteria and the needed results. It's also possible that this index will be better. (createdAt, status, uuid, status, firstname) You'll need to try them both. Don't keep them both, only the one that helps best.
Putting it all together:
SELECT u.uuid, u.firstname
FROM users u
JOIN (
SELECT uuid FROM homes WHERE phoneNumberHash = '02c'
UNION
SELECT uuid FROM homes WHERE secondaryPhoneNumberHash = '02c'
UNION
SELECT user_id AS uuid
FROM utility_tags
WHERE tag_name = 'ACCOUNT_NUMBER'
AND tag_value= '13'
) s ON s.uuid = u.uuid
WHERE status != 'DELETED'
ORDER BY createdAt DESC
LIMIT 10 OFFSET 0
Things get interesting on megarow tables when you want subsecond query response. http://use-the-index-luke.com/ is a fine reference for this stuff.
Your main problem is you're selecting from users first - move it to last so its index can be used (subqueries can't be indexed).
Also, SQL OR is notorious, mainly because (almost always) at most 1 index can be used.
Select from the subquery first, so the index into users can be used
Ensure there are indexes on all looked-up columns, ie (uuid), (phoneNumberHash), (secondaryPhoneNumberHash) and (tag_name, tag_value)
Break up your query to eradicate OR
Try this:
SELECT uuid, firstname
FROM (
SELECT uuid
FROM homes
WHERE phoneNumberHash = '02c'
UNION
SELECT uuid
FROM homes
WHERE secondaryPhoneNumberHash = '02c'
SELECT user_id
FROM utility_tags
WHERE tag_name = 'ACCOUNT_NUMBER'
AND tag_value = 13
) x
JOIN users ON users.uuid = x.uuid
AND status != 'DELETED'
ORDER BY createdAt DESC
LIMIT 10 OFFSET 0
Notice also that the test for status != 'DELETED' is in the join condition (not the WHERE clause), so it's executed at join time, not post-join, which will boost performance especially if there are a lot of deleted users.

Ordering result over non-trivial expression

I have a posts/comments database that I am unable to order correctly.
I need it to be ordered primarily by its id but if its parent_id does not equal its id, it is placed after its parent and also these children would ordered by id.
Here is my current database.
CREATE TABLE `questions` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`parent_id` int(10) NOT NULL,
`entry_type` varchar(8) NOT NULL,
`entry_content` varchar(1024) NOT NULL,
`entry_poster_id` varchar(10) NOT NULL,
`entry_status` varchar(1) NOT NULL,
`entry_score` varchar(10) NOT NULL,
`time_posted` varchar(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ;
--
-- Dumping data for table `questions`
--
INSERT INTO `questions` VALUES(1, 1, 'question', 'How do I does SQL?', 'CodyC', '0', '2', '1308641965');
INSERT INTO `questions` VALUES(2, 1, 'answer', 'Easy, you eat cheese!', 'PatrickS', '0', '-4', '1308641965');
INSERT INTO `questions` VALUES(3, 2, 'comment', 'WTF are you on noobass?!', 'FraserK', '0', '100', '1308641965');
INSERT INTO `questions` VALUES(4, 1, 'answer', 'blah', '5', '0', '0', '1308642204');
INSERT INTO `questions` VALUES(5, 4, 'comment', 'blah2', '4', '0', '0', '1308642247');
INSERT INTO `questions` VALUES(6, 2, '2', '3', '3', '3', '3', '3');
and my current query
SELECT *
FROM questions
WHERE parent_id =1
OR parent_id
IN (
SELECT id
FROM questions
WHERE parent_id =1
AND parent_id != id
)
how do I order so that order id to that each object comes after its parent, where the id = parent_id means is a base level and has no parent?
This seems to work:
SELECT *
FROM questions
order by case when parent_id != id then parent_id else id end, id;
But it depends whether you want grandchildren before children etc. Your question doesn't specify.
However, if you use this technique you can make your ordering term(s) as complicated as you like - it doesn't need to be a selected column - just make up what you need.
Looks a bit complicated with mysql, but you can use PHP for it. Use recursive function. This will be easy to handle.
here is a function from code bank. It simply creates a unorder list tree. You can modify it to suit your requirments
function output_lis_pages($parentID = 0)
{
$stack = array(); //create a stack for our <li>'s
$arr = array();
$sql = "select pageid, pagetitle, pagelink, parentid
from pages
where parentid = $parentID
order by orderid";
$crs = mysql_query($sql);
if(mysql_num_rows($crs)==0)
{
// no child menu exists for this page
return false;
}
else
{
while($crow = mysql_fetch_array($crs))
{
$arr [] = array(
'pagetitle'=> stripslashes($crow["pagetitle"]),
'pagelink'=> $crow["pagelink"],
'parentid'=>$crow["parentid"],
'pageid'=>$crow["pageid"]
);
}
}
foreach($arr as $a)
{
$str = '';
//if the item's parent matches the parentID we're outputting...
if($a['parentid']==$parentID)
{
if($a['pagelink']=="")
$tmplink = "page.php?pageid=".$a['pageid'];
else
$tmplink = $a['pagelink'];
$str.='<li>'.$a['pagetitle']."";
$subStr = output_lis_pages($a['pageid']);
if($subStr){
$str.="\n".'<ul>'.$subStr.'</ul>'."\n";
}
$str.='</li>'."\n";
$stack[] = $str;
}
}
//If we have <li>'s return a string
if(count($stack)>0)
{
return join("\n",$stack);
}
//If no <li>'s in the stack, return false
return false;
}
SELECT *
, CASE WHEN parent_id = 1 THEN id ELSE parent_id END AS sort_level
FROM questions
WHERE parent_id = 1
OR parent_id
IN (
SELECT id
FROM questions
WHERE parent_id = 1
AND parent_id != id
)
ORDER BY sort_level
, id
You've run into the old bugbear of relational database systems. They aren't fun to work with when your data is hierarchic. You have the issue of trying to produce what is really a particular walk of a graph from database records. That is tough without recursive features in your SQL dialect. Here is a link that might help:
http://explainextended.com/2009/03/17/hierarchical-queries-in-mysql/
See also, on StackOverflow: What are the options for storing hierarchical data in a relational database?
After a week of trying i could not get it to work with the query, so i decided just to do it in PHP, this will also reduce load off the MySQL engine. Here is my php for anyone that wishes to reference it.
$question_id = $database->escape_string($question_id); //escape input
$q = "SELECT * FROM questions WHERE parent_id = $question_id OR parent_id IN (SELECT id FROM questions WHERE parent_id = $question_id AND parent_id != id) ORDER BY parent_id , id";
$database->dbquery($q);//query the DB
while($row = $database->result->fetch_assoc()){//Process results to standard array.
//other irrelevant stuff happens here
$unsorted[] = $row;
}
$question = array_shift($unsorted);//take the question off the array
$sorted[] = $question;//add it to the start of the sorted array
$qusetion_id = $question['id'];
foreach($unsorted as $row){//this creates a multidimensional hierarchy of the answers->comments
if($row['parent_id'] == $question_id){//if its an answer
$sorted_multi[$row['id']] = array();//create a new answer sub-array
$sorted_multi[$row['id']][] = $row;//append it
}else{
$sorted_multi[$row['parent_id']][] = $row;//append the answer to the correct sub-array
}
}
foreach($sorted_multi as $temp){//converts the multidimensional into a single dimension appending it to the sorted array.
foreach($temp as $row){
$sorted[] = $row;
}
}
Tedious yes, but it works out better in the end because of other unforeseen processing that needs to be done post-mysql.
Thanks for all the responses though :):):)
looking into your question and reading your comment - "i need to get ONE specific question, with ALL the answers and comments", I think you are looking to show every question followed by its answer followed by its comments. Right?
And if so, this is your query:
SELECT `id`,
(CASE
WHEN `entry_type` = 'question' THEN CONCAT(`id`, '-', `parent_id`)
WHEN `entry_type` = 'answer' THEN CONCAT(`id`, '-', `parent_id`)
WHEN `entry_type` = 'comment' THEN CONCAT(`parent_id`, '-', `id`)
END) `sort_order`,
`entry_type`, `entry_content`
FROM `questions`
ORDER BY `sort_order`;
The above query will give you every question, followed by its first answer, followed by the comments to its first answer; then the second answer, followed by the comments to the second answer and so on.
So for the INSERTs that you had given, this will be the output:
+----+------------+------------+--------------------------+
| id | sort_order | entry_type | entry_content |
+----+------------+------------+--------------------------+
| 1 | 1-1 | question | How do I does SQL? |
| 2 | 2-1 | answer | Easy, you eat cheese! |
| 3 | 2-3 | comment | WTF are you on noobass?! |
| 6 | 2-6 | comment | 3 |
| 4 | 4-1 | answer | blah |
| 5 | 4-5 | comment | blah2 |
+----+------------+------------+--------------------------+
Hope it helps.
EDIT: Updated query to fetch answers and comments for only ONE question
SELECT `id`,
(CASE
WHEN (`entry_type` IN ('question', 'answer')) THEN `id`
WHEN `entry_type` = 'comment' THEN `parent_id`
END) `sort_order_1`,
(CASE
WHEN (`entry_type` IN ('question', 'answer')) THEN `parent_id`
WHEN `entry_type` = 'comment' THEN `id`
END) `sort_order_2`,
(CASE
WHEN (`entry_type` IN ('question', 'answer')) THEN `parent_id`
WHEN `entry_type` = 'comment' THEN (SELECT `Q1`.`parent_id` FROM `questions` `Q1` WHERE `Q1`.`id` = `Q`.`parent_id`)
END) `question_id`,
`entry_type`, `entry_content`
FROM `questions` `Q`
HAVING `question_id` = 1
ORDER BY `sort_order_1`, `sort_order_2`;
OUTPUT:
+----+--------------+--------------+-------------+------------+--------------------------+
| id | sort_order_1 | sort_order_2 | question_id | entry_type | entry_content |
+----+--------------+--------------+-------------+------------+--------------------------+
| 1 | 1 | 1 | 1 | question | How do I does SQL? |
| 2 | 2 | 1 | 1 | answer | Easy, you eat cheese! |
| 3 | 2 | 3 | 1 | comment | WTF are you on noobass?! |
| 6 | 2 | 6 | 1 | comment | 3 |
| 4 | 4 | 1 | 1 | answer | blah |
| 5 | 4 | 5 | 1 | comment | blah2 |
+----+--------------+--------------+-------------+------------+--------------------------+
You can change the HAVING part to fetch answers and comments for a specific question. Hope this helps!
EDIT 2: another possible implementation might be (but I think it might have some performance implications for large tables):
SELECT `a`.`id` AS `question_id`, `a`.`entry_content` AS `question`,
`b`.`id` AS `answer_id`, `b`.`entry_content` AS `answer`,
`c`.`id` AS `comment_id`, `c`.`entry_content` AS `comment`
FROM `questions` `a`
LEFT JOIN `questions` `b` ON (`a`.`id` = `b`.`parent_id` AND `b`.`entry_type` = 'answer')
LEFT JOIN `questions` `c` ON (`b`.`id` = `c`.`parent_id` AND `c`.`entry_type` = 'comment')
WHERE `a`.`entry_type` = 'question'
AND `a`.`id` = 1
ORDER BY `a`.`id`, `b`.`id`, `c`.`id`;
OUTPUT:
+----+--------------------+------+-----------------------+------+--------------------------+
| id | question | id | answer | id | comment |
+----+--------------------+------+-----------------------+------+--------------------------+
| 1 | How do I does SQL? | 2 | Easy, you eat cheese! | 3 | WTF are you on noobass?! |
| 1 | How do I does SQL? | 2 | Easy, you eat cheese! | 6 | 3 |
| 1 | How do I does SQL? | 4 | blah | 5 | blah2 |
+----+--------------------+------+-----------------------+------+--------------------------+
Simply use the "ORDER BY" clause to select the ordering you want!
SELECT *
FROM questions
WHERE parent_id =1
OR parent_id
IN (
SELECT id
FROM questions
WHERE parent_id =1
AND parent_id != id
)
ORDER BY Parent_id , id