Selecting parent records when child mathes criteria - mysql

I am trying to limit returned results of users to results that are "recent" but where users have a parent, I also need to return the parent.
CREATE TABLE `users` (
`id` int(0) NOT NULL,
`parent_id` int(0) NULL,
`name` varchar(255) NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `times` (
`id` int(11) NOT NULL,
`time` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (1, NULL, 'Alan');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (2, 1, 'John');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (3, NULL, 'Jerry');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (4, NULL, 'Bill');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (5, 1, 'Carl');
INSERT INTO `times`(`id`, `time`) VALUES (2, '2019-01-01 14:40:38');
INSERT INTO `times`(`id`, `time`) VALUES (4, '2019-01-01 14:40:38');
http://sqlfiddle.com/#!9/91db19
In this case I would want to return Alan, John and Bill, but not Jerry because Jerry doesn't have a record in the times table, nor is he a parent of someone with a record. I am on the fence about what to do with Carl, I don't mind getting the results for him, but I don't need them.
I am filtering tens of thousands of users with hundreds of thousands of times records, so performance is important. In general I have about 3000 unique id's coming from times that could be either an id, or a parent_id.
The above is a stripped down example of what I am trying to do, the full one includes more joins and case statements, but in general the above example should be what we work with, but here is a sample of the query I am using (full query is nearly 100 lines):
SELECT id AS reference_id,
CASE WHEN (id != parent_id)
THEN
parent_id
ELSE null END AS parent_id,
parent_id AS family_id,
Rtrim(last_name) AS last_name,
Rtrim(first_name) AS first_name,
Rtrim(email) AS email,
missedappt AS appointment_missed,
appttotal AS appointment_total,
To_char(birth_date, 'YYYY-MM-DD 00:00:00') AS birthday,
To_char(first_visit_date, 'YYYY-MM-DD 00:00:00') AS first_visit,
billing_0_30
FROM users AS p
RIGHT JOIN(
SELECT p.id,
s.parentid,
Count(p.id) AS appttotal,
missedappt,
billing0to30 AS billing_0_30
FROM times AS p
JOIN (SELECT missedappt, parent_id, id
FROM users) AS s
ON p.id = s.id
LEFT JOIN (SELECT parent_id, billing0to30
FROM aging) AS aging
ON aging.parent_id = p.id
WHERE p.apptdate > To_char(Timestampadd(sql_tsi_year, -1, Now()), 'YYYY-MM-DD')
GROUP BY p.id,
s.parent_id,
missedappt,
billing0to30
) AS recent ON recent.patid = p.patient_id
This example is for a Faircom C-Tree database, but I also need to implement a similar solution in Sybase, MySql, and Pervasive, so just trying to understand what I should do for best performance.
Essentially what I need to do is somehow get the RIGHT JOIN to also include the users parent.

NOTES:
based on your fiddle config I'm assuming you're using MySQL 5.6 and thus don't have support for Common Table Expressions (CTE)
I'm assuming each name (child or parent) is to be presented as separate records in the final result set
We want to limit the number of times we have to join the times and users tables (a CTE would make this a bit easier to code/read).
The main query (times -> users(u1) -> users(u2)) will give us child and parent names in separate columns so we'll use a 2-row dynamic table plus a case statement to to pivot the columns into their own rows (NOTE: I don't work with MySQL and didn't have time to research if there's a pivot capability in MySQL 5.6)
-- we'll let 'distinct' filter out any duplicates (eg, 2 'children' have same 'parent')
select distinct
final.name
from
-- cartesian product of 'allnames' and 'pass' will give us
-- duplicate lines of id/parent_id/child_name/parent_name so
-- we'll use a 'case' statement to determine which name to display
(select case when pass.pass_no = 1
then allnames.child_name
else allnames.parent_name
end as name
from
-- times join users left join users; gives us pairs of
-- child_name/parent_name or child_name/NULL
(select u1.id,u1.parent_id,u1.name as child_name,u2.name as parent_name
from times t
join users u1
on u1.id = t.id
left
join users u2
on u2.id = u1.parent_id) allnames
join
-- poor man's pivot code:
-- 2-row dynamic table; no join clause w/ allnames will give us a
-- cartesian product; the 'case' statement will determine which
-- name (child vs parent) to display
(select 1 as pass_no
union
select 2) pass
) final
-- eliminate 'NULL' as a name in our final result set
where final.name is not NULL
order by 1
Result set:
name
==============
Alan
Bill
John
MySQL fiddle

Related

Search Column after LEFT JOIN

Currently I have two tables.
Customers:
id
name
status
1
adam
1
2
bob
1
3
cain
2
Orders:
customer_id
item
1
apple
1
banana
1
bonbon
2
carrot
3
egg
I'm trying to do an INNER JOIN first then use the resulting table to query against.
So a user can type in a partial name or partial item and get all the names and items.
For example if a user type in "b" it would kick back:
customer_id
name
status
items
1
adam
1
apple/banana/bonbon
2
bob
1
carrot
What I am currently doing is:
SELECT * FROM(
SELECT customers.* , GROUP_CONCAT(orders.item SEPARATOR '|') as items
FROM customers
LEFT JOIN orders
ON customers.id = orders.customer_id
group by customers.id
) as t
WHERE t.status = 1 AND ( t.name LIKE "%b%" OR t.items LIKE "%b%")
Which does work, but it is incredibly slow (+2 seconds).
The strange part though is if I run the queries individually the subquery executes in .0004 seconds and the outer query executes in .006 seconds.
But for some reason combining them increases the wait time a lot.
Is there a more efficient way to do this?
CREATE TABLE IF NOT EXISTS `customers` (
`id` int(6),
`name` varchar(255) ,
`status` int(6),
PRIMARY KEY (`id`,`name`,`status`)
);
INSERT INTO `customers` (`id`, `name` , `status`) VALUES
('1', 'Adam' , 1),
('2', 'bob' , 1),
('3', 'cain' , 2);
CREATE TABLE IF NOT EXISTS `orders` (
`customer_id` int(6),
`item` varchar(255) ,
PRIMARY KEY (`customer_id`,`item`)
);
INSERT INTO `orders` (`customer_id`, `item`) VALUES
('1', 'apple'),
('1', 'banana'),
('1', 'bonbon'),
('2', 'carrot'),
('3', 'egg');
According to the query, you are trying to perform a full-text search on the fields name and item. I would suggest adding full-text indexes to them using ngram tokenisation as you are looking up by part of a word:
ALTER TABLE customers ADD FULLTEXT INDEX ft_idx_name (name) WITH PARSER ngram;
ALTER TABLE orders ADD FULLTEXT INDEX ft_idx_item (item) WITH PARSER ngram;
In this case, your query would look as follows:
SELECT
customers.*, GROUP_CONCAT(orders.item SEPARATOR '|')
FROM
customers
LEFT JOIN orders on customers.id = orders.customer_id
WHERE
orders.customer_id IS NOT NULL
AND customers.status = 1
AND (MATCH(customers.name) AGAINST('bo')
OR MATCH(orders.item) AGAINST('bo'))
GROUP BY
customers.id
If needed, you could modify ngram_token_size MySQL system variable as its value is 2 by default, which means two or more characters should be input to perform the search.
Another approach is to implement it by means of a dedicated search engine, e.g. Elasticsearch, when requirements evolve.
SELECT * FROM(
SELECT customers.* , GROUP_CONCAT(orders.item SEPARATOR '|') as items
FROM customers
LEFT JOIN orders
ON customers.id = orders.customer_id AND customers.name LIKE "%adam" AND orders.item LIKE "%b"
group by customers.AI
It will be faster to filter the records when starting to left join

Insert into multiple selects from different tables

I have tables users (id, email), permissions (id, description) and users_permissions (user_id, permission_id, created) with many to many relation.
I need to select user with some email and assign to him all permissions from table permissions, which he does not have.
Now I am trying to assign at least all permissions, but I am getting error
Subquery returns more than 1 row
My query:
insert into `users_permissions` (`user_id`, `permission_id`, `created`)
select
(select `id` from `users` where `email` = 'user-abuser#gmail.com') as `user_id`,
(select `id` from `permissions`) as `permission_id`,
now() as `created`;
If a subquery (inside SELECT) returns more than one row, MySQL does not like it.
Another way to achieve your requirement is using CROSS JOIN between Derived Tables (subquery in the FROM clause):
INSERT INTO `users_permissions` (`user_id`, `permission_id`, `created`)
SELECT
u.id,
p.id,
NOW()
FROM
users AS u
CROSS JOIN permissions AS p
WHERE u.email = 'user-abuser#gmail.com'

MySQL union within derived table (related_id=a AND related_id=b) OR (related_id=z)

I have the following tables: users, tags, tags_data.
tags_data contains tag_id and user_id columns to link the users with tags in a 1 user to many tags relationship.
What is the best way of listing all users that have either tag_id 1001 AND 1003, OR tag_id 1004?
EDIT: By this I mean there could be other related tags as well, or not, just so long as there is definitely either 1004 OR (1001 AND 1003).
At the moment I've got two methods of doing this, both using a UNION in a derived table, either in the FROM clause or in an INNER JOIN clause...
SELECT subsel.user_id, users.name
FROM ( SELECT user_id
FROM tags_data
WHERE tag_id IN (1001, 1003)
GROUP BY user_id
HAVING COUNT(tag_id)=2
UNION
SELECT user_id
FROM tags_data
WHERE tag_id=1004
) AS subsel
LEFT JOIN users ON subsel.user_id=users.user_id
Or
SELECT users.user_id, users.name
FROM users
INNER JOIN ( SELECT user_id
FROM tags_data
WHERE tag_id IN (1001, 1003)
GROUP BY user_id
HAVING COUNT(tag_id)=2
UNION
SELECT user_id
FROM tags_data
WHERE tag_id=1004
) AS subsel ON users.user_id=subsel.user_id
There are other tables which I'll be LEFT JOINing on to this. 50k+ rows in the users table and 150k+ rows in the tags_data table.
This is a batch job to export data to another system so not a real-time query run by an end user, so performance isn't massively critical. However I'd like to try and get the best result I can. The query for the derived table should actually be pretty fast and it makes sense to narrow the scope of the result set down before I then add further joins, functions and calculated fields to the results returned to the client. I will be running these on a larger dataset later to see if there is any performance difference but running EXPLAIN shows an almost identical execution plan.
Generally I try and avoid UNIONs unless absolutely necessary. But I think in this case I almost have to have a UNION somewhere by definition, because of the two effectively unrelated criteria.
Is there another method that I could be using here?
And is there some sort of specific database terminology for this sort of problem?
Full example schema:
CREATE TABLE IF NOT EXISTS `tags` (
`tag_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`tag_name` varchar(255) NOT NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1006 ;
INSERT INTO `tags` (`tag_id`, `tag_name`) VALUES
(1001, 'tag1001'),
(1002, 'tag1002'),
(1003, 'tag1003'),
(1004, 'tag1004'),
(1005, 'tag1005');
CREATE TABLE IF NOT EXISTS `tags_data` (
`tags_data_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`tags_data_id`),
KEY `user_id` (`user_id`,`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=11 ;
INSERT INTO `tags_data` (`tags_data_id`, `user_id`, `tag_id`) VALUES
(1, 1, 1001),
(2, 1, 1002),
(3, 1, 1003),
(4, 5, 1001),
(5, 5, 1003),
(6, 5, 1005),
(7, 8, 1004),
(8, 9, 1001),
(9, 9, 1002),
(10, 9, 1004);
CREATE TABLE IF NOT EXISTS `users` (
`user_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=11 ;
INSERT INTO `users` (`user_id`, `name`) VALUES
(1, 'user1'),
(2, 'user2'),
(3, 'user3'),
(4, 'user4'),
(5, 'user5'),
(6, 'user6'),
(7, 'user7'),
(8, 'user8'),
(9, 'user9'),
(10, 'user10');
If you are looking for performance on MySQL you should definitely avoid using nested queries and unions — most of them result in a temporary table creation and scanning without indexes. There are rare examples that the derived temporary table still uses indexes and that only work on some specific circumstances and MySQL distributions.
My suggestion would be to rewrite the query to inner/outer joins only, like this:
select distinct u.* from users as u
left outer join tags_data as t on
t.user_id=u.user_id and t.tag_id=1003
inner join tags_data as t2 on
t2.user_id=u.user_id
and (t2.tag_id=1004 or (t2.tag_id=1001 and t.tag_id=1003));
If you can be sure that no user can have both 1004 and (1001 and 1003) tags, you may also remove the "distinct" from this query, which would avoid a temporary table creation.
You should also definitely use indexes, like these:
create index tags_data__user_id__idx on tags_data(user_id);
create index tags_data__tag_id__idx on tags_data(tag_id);
This would make a 150k+ result set very easy to query.
Use an inner query that groups up all tags for each user into one value, then use a simple filter in the where clause:
select u.*
from users u
join (
select user_id, group_concat(tag_id order by tag_id) tags
from tags_data
group by user_id
) t on t.user_id = u.user_id
where tags rlike '1001.*1003|1004'
See SQLFiddle of this query running against your sample data.
If there where many tags, you could add where tag_id in (1001, 1003, 1004) to the inner query to reduce the size of the tags list as a small optimization. Testing will show whether this makes much difference.
This should perform pretty well, because each table is scanned only once.
Efficient, but inelegant, and not flexible at all:
SELECT users.*
FROM users
LEFT JOIN tags_data AS tag1001
ON (tag1001.user_id = users.user_id AND tag1001.tag_id = 1001)
LEFT JOIN tags_data AS tag1003
ON (tag1003.user_id = users.user_id AND tag1003.tag_id = 1003)
LEFT JOIN tags_data AS tag1004
ON (tag1004.user_id = users.user_id AND tag1004.tag_id = 1004)
WHERE (tag1001.tag_id AND tag1003.tag_id) OR (tag1004.tag_id);

How can I combine four queries into one query?

Structure of my tables:
posts (id, name, user_id, about, time)
comments (id, post_id, user_id, text, time)
users_votes (id, user, post_id, time)
users_favs ( id, user_id, post_id, time)
How can I combine these four queries (not with UNION):
SELECT `id`, `name`, `user_id`, `time` FROM `posts` WHERE `user_id` = 1
SELECT `post_id`, `user_id`, `text`, `time` FROM `comments` WHERE `user_id` = 1
SELECT `user`, `post_id`, `time` FROM `users_votes` WHERE `user` = 1
SELECT `user_id`, `post_id`, `time` FROM `users_favs` WHERE `user_id` = 1
Should I use JOINs?
What would the SQL query for this be?
You don't want to join these together.
The kind of JOIN you'd use to retrieve this would end up doing a cross-product of all the rows it finds. This means that if you had 4 posts, 2 comments, 3 votes, and 6 favorites you'd get 4*2*3*6 rows in your results instead of 4+2+3+6 when doing separate queries.
The only time you'd want to JOIN is when the two things are intrinsically related. That is, you want to retrieve the posts associated with a favorite, a vote, or a comment.
Based on your example, there's no such commonality in these things.

mysql insert multi row query result into table

I came across a scenario where I need to "upgrade" a table with data I obtain from another query. I am adding missing values so I will need to insert, but I cant seem to get it right.
The destination table is the following
CREATE TABLE `documentcounters` (
`UID` int,
`DataChar`,
`SeqNum` ,
`LastSignature`,
`DocumentType`,
`SalesTerminal`,
`Active`,
PRIMARY KEY (`UID`)
) ENGINE=InnoDB
and I am trying to do something like
INSERT INTO documentcounters
SELECT Q1.in_headers, -1,NULL, 17,0,0 FROM
(SELECT DISTINCT(DocumentSeries) as in_headers FROM transactionsheaders )AS Q1
LEFT JOIN
(SELECT DISTINCT(DataChar) as in_counters FROM documentcounters)AS Q2
ON Q1.in_headers=Q2.in_counters WHERE Q2.in_counters IS NULL;
I left UID out because I want the insert statement to create it, but I get a "Column count doesn't match" which makes sense (darn!)
Doing something like
INSERT INTO `documentcounters`
(`DataChar`,`SeqNum`,`LastSignature`,`DocumentType`,`SalesTerminal`,`Active`)
VALUES
(
(SELECT Q1.in_headers FROM
(SELECT DISTINCT(DocumentSeries) as in_headers FROM transactionsheaders )AS Q1
LEFT JOIN
(SELECT DISTINCT(DataChar) as in_counters FROM documentcounters)AS Q2
ON Q1.in_headers=Q2.in_counters WHERE Q2.in_counters IS NULL),-1,NULL,17,0,0
);
yields a "Subquery returns more than 1 row" error.
Any ideas how I can make this work?
Cheers
INSERT INTO `documentcounters`
(`DataChar`,`SeqNum`,`LastSignature`,`DocumentType`,`SalesTerminal`,`Active`)
SELECT Q1.in_headers, -1,NULL, 17,0,0 FROM
(SELECT DISTINCT(DocumentSeries) as in_headers FROM transactionsheaders )AS Q1
LEFT JOIN
(SELECT DISTINCT(DataChar) as in_counters FROM documentcounters)AS Q2
ON Q1.in_headers=Q2.in_counters WHERE Q2.in_counters IS NULL;
This will work if UID is defined as auto_increment.
If you want the INSERT to create the UID values, then UID must be defined as an auto-incrementing column.
CREATE TABLE `documentcounters` (
`UID` INT NOT NULL AUTO_INCREMENT,
...