MySQL join, empty rows in junction table - mysql

I have three tables I'd like to join in a way that produces all records from one table and any matching records or NULL from another table. It is imperative that all records from the first table be returned. I thought I had done this before but I can't remember when or where and MySQL just isn't playing ball.
SELECT VERSION();
5.0.51a-3ubuntu5.7
/* Some table that may or may not be needed, included to give context */
CREATE TABLE `t1` (
`a` int(4) NOT NULL,
`a_name` varchar(10) NOT NULL,
PRIMARY KEY (`a`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `t2` ( /* "One table" */
`b` int(2) NOT NULL,
`b_name` varchar(10) NOT NULL,
PRIMARY KEY (`b`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `t3` ( /* "Another table" */
`a` int(4) NOT NULL,
`b` int(2) NOT NULL,
PRIMARY KEY (`a`,`b`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO t1 VALUES (1, '1-one'),(2, '1-two'),(3, '1-three');
INSERT INTO t2 VALUES (1, '2-one'),(2, '2-two'),(3, '2-three'),
(4, '2-four'),(5, '2-five');
INSERT INTO t3 VALUES (1,1),(1,2),(1,3),(1,4),(2,2),(2,5);
t3 is a junction table for t1 and t2. The result set I'm looking for should look like this for any a=n:
n=1
b | b_name | a
-------------------
1 | 2-one | 1
2 | 2-two | 1
3 | 2-three | 1
4 | 2-four | 1
5 | 2-five | NULL
n=2
b | b_name | a
-------------------
1 | 2-one | NULL
2 | 2-two | 2
3 | 2-three | NULL
4 | 2-four | NULL
5 | 2-five | 2
n=7
b | b_name | a
-------------------
1 | 2-one | NULL
2 | 2-two | NULL
3 | 2-three | NULL
4 | 2-four | NULL
5 | 2-five | NULL
The value of a in the result set actually isn't important as long as it unambiguously reflects the presence or absence of records in t3 (does that make sense?).
The query
SELECT t2.b, t2.b_name, a
FROM t2
LEFT /* OUTER */ JOIN t3 ON t3.b = t2.b
WHERE (
a = 2
OR
a IS NULL
);
returns
b | b_name | a
-------------------
2 | 2-two | 2
5 | 2-five | 2
Can this be done?

SELECT t2.b, t2.b_name, MAX(IF(a=2, a, NULL)) AS a
FROM t2
LEFT OUTER JOIN t3
ON t3.b = t2.b
GROUP by t2.b
ORDER BY b ASC;

Something like this? If not, can you give an example of the output you'd like to get.
select * from t1
left join t3 using(a)
left join t2 using(b)

Related

MySQL Update Table from Another Table Recursively

I have two tables, table1 and table2, where table1 is updated to fill in missing (null) values based on matching fields in table2 to create a more complete table1.
I have tried numerous queries such as
UPDATE table1 INNER JOIN table2...SET...
and
UPDATE table1 SET ... (SELECT)...
However my results are always incomplete. Note that I have a much larger dataset in both tables in terms of both columns and rows). I just used this as simpler example.
Rules:
1) The `keyword` from table2 looks for a match in `keyword` in table1 and must accept partial matches.
2) No values can be overwritten in table1 (update null values only)
3) The lookup order is per run_order in table2.
Specific example:
Table1:
|-----|-------------------------------|----------|-----|---------|-------|
|t1_id|keyword |category |month|age |skill |
|-----|-------------------------------|----------|-----|---------|-------|
| 1 |childrens-crafts-christmas |kids | | | |
| 2 |kids-costumes | | |tween | |
| 3 |diy-color-printable |printable | | |easy |
| 4 |toddler-halloween-costume-page | | | | |
|-----|-------------------------------|----------|-----|---------|-------|
Table2:
|-----|---------|---------|----------|-----|----------|-------|
|t2_id|run_order|keyword |category |month|age |skill |
|-----|---------|---------|----------|-----|----------|-------|
| 1 | 1 |children | | |4-11 yrs | |
| 2 | 2 |printable| | | |easy |
| 3 | 3 |costume |halloween | 10 |0-12 years| |
| 4 | 4 |toddler | | |1-3 years | |
| 5 | 5 |halloween|holiday | 10 | | |
|-----|---------|---------|----------|-----|----------|-------|
Result:
|-----|-------------------------------|----------|-----|---------|-------|
|t1_id|keyword |category |month|age |skill |
|-----|-------------------------------|----------|-----|---------|-------|
| 1 |childrens-crafts-christmas |kids | |4-11 yrs | |
| 2 |kids-costumes |halloween | 10 |tween | |
| 3 |diy-color-printable printable |printable | | |easy |
| 4 |toddler-halloween-costume-page |holiday | 10 |1-3 years| |
|-----|-------------------------------|----------|-----|---------|-------|
MySQL for schema and table data:
DROP TABLE IF EXISTS table1;
DROP TABLE IF EXISTS table2;
CREATE TABLE `table1` (
`t1_id` INT NOT NULL AUTO_INCREMENT,
`keyword` VARCHAR(200) NULL,
`category` VARCHAR(45) NULL,
`month` VARCHAR(45) NULL,
`age` VARCHAR(45) NULL,
`skill` VARCHAR(45) NULL,
PRIMARY KEY (`t1_id`));
CREATE TABLE `table2` (
`t2_id` INT NOT NULL AUTO_INCREMENT,
`run_order` INT NULL,
`keyword` VARCHAR(200) NULL,
`category` VARCHAR(45) NULL,
`month` INT NULL,
`age` VARCHAR(45) NULL,
`skill` VARCHAR(45) NULL,
PRIMARY KEY (`t2_id`));
INSERT INTO `table1` (`keyword`, `category`) VALUES ('childrens-crafts-christmas', 'kids');
INSERT INTO `table1` (`keyword`, `age`) VALUES ('kids-costumes', 'tween');
INSERT INTO `table1` (`keyword`, `category`, `skill`) VALUES ('diy-color-printable', 'printable', 'easy');
INSERT INTO `table1` (`keyword`) VALUES ('toddler-halloween-costume-page');
INSERT INTO `table2` (`run_order`, `keyword`, `age`) VALUES (1, 'children', '4-11 yrs');
INSERT INTO `table2` (`run_order`, `keyword`, `skill`) VALUES (2, 'printable', 'easy');
INSERT INTO `table2` (`run_order`, `keyword`, `category`, `month`, `age`) VALUES (3, 'costume', 'halloween', 10, '0-12 years');
INSERT INTO `table2` (`run_order`, `keyword`, `age`) VALUES (4, 'toddler', '1-3 years');
INSERT INTO `table2` (`run_order`, `keyword`, `category`, `month`) VALUES (5, 'halloween', 'holiday', 10);
You want to update empty values in table1 with the value in the corresponding column of the first matching record in table2, run_order wise. A typical solution is to use a combination of correlated subqueries to find the matching records in table2 and keep only the one with lowest run_order.
Here is a query that will update null categories with this logic:
update table1 t1
set category = (
select category
from table2 t2
where t2.run_order = (
select min(t22.run_order)
from table2 t22
where
t1.keyword like concat('%', t22.keyword, '%')
and t22.category is not null
)
)
where t1.category is null
This assumes that run_order is unique in table2 (which seems relevant in your use case).
You can extend the logic for more columns with coalesce(). Here is the solution for columns category and month:
update table1 t1
set
category = coalesce(
t1.category,
(
select category
from table2 t2
where t2.run_order = (
select min(t22.run_order)
from table2 t22
where
t1.keyword like concat('%', t22.keyword, '%')
and t22.category is not null
)
)
),
month = coalesce(
t1.month,
(
select month
from table2 t2
where t2.run_order = (
select min(t22.run_order)
from table2 t22
where
t1.keyword like concat('%', t22.keyword, '%')
and t22.month is not null
)
)
)
where t1.category is null or t1.month is null
Demo on DB Fiddle
You can use a join between the 2 tables using a like expression and the if nul() function to ensure you don't overwrite non null values.
UPDATE table1 t1
INNER JOIN table2 t2 ON t1.keyword like concat("%",t2.keyword,"%")
SET
t1.category = ifnull(t1.category,t2.category),
t1.age = ifnull(t1.age,t2.age),
t1.skill = ifnull(t1.skill,t2.skill);

Creating summary VIEW from fields from multiple tables

I am trying to write a select query for creating a view in MySQL. Each row in the view should display weekly summary (sum, avg) for user values collected from multiple tables. The tables are similar to each-other but not identical. The view should include rows also in case other table doesn't have a values for that week. Something like this:
| week_year | sum1 | avg1 | sum2 | user_id |
| --------- | ---- | ---- | ---- | ------- |
| 201840 | | | 3 | 1 |
| 201844 | 45 | 55 | | 1 |
| 201845 | 55 | 65 | | 1 |
| 201849 | 65 | 75 | | 1 |
| 201849 | 75 | 85 | 3 | 2 |
The tables (simplified) are as follows:
CREATE TABLE IF NOT EXISTS `t1` (
`user_id` INT NOT NULL AUTO_INCREMENT,
`date` DATE NOT NULL,
`value1` int(3) NOT NULL,
`value2` int(3) NOT NULL,
PRIMARY KEY (`user_id`,`date`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `t2` (
`id` INT NOT NULL AUTO_INCREMENT,
`date` DATE NOT NULL,
`value3` int(3) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `t3` (
`t3_id` INT NOT NULL,
`user_id` INT NOT NULL
) DEFAULT CHARSET=utf8;
My current solution doesn't seem reasonable and I am not sure how it would perform in case of thousands of rows:
select ifnull(yearweek(q1.date1), yearweek(q1.date2)) as week_year,
sum(value1) as sum1,
avg(value2) as avg1,
sum(value3) as sum2,
q1.user_id
from (select t2.date as date2,
t1.date as date1,
ifnull(t3.user_id, t1.user_id) as user_id,
t1.value1,
t1.value2,
t2.value3
from t2
join t3 on t3.t3_id=t2.id
left join t1 on yearweek(t1.date) = yearweek(t2.date) and t1.user_id = t3.user_id
union
select t2.date as date2,
t1.date as date1,
ifnull(t3.user_id, t1.user_id) as user_id,
t1.value1,
t1.value2,
t2.value3
from t2
join t3 on t3.t3_id=t2.id
right join t1 on yearweek(t1.date) = yearweek(t2.date) and t1.user_id = t3.user_id) as q1
group by week_year, user_id;
DB Fiddle
Is the current solution okay performance wise or are there better options? In case of in the future third (or fourth) table is added, how would I manage the query? Should I consider creating a separate table, that is updated with triggers?
Thanks in advance.
Another way you can do it is to union all the data and then group it. You'll have to perf test to see which is better:
SELECT
yearweek(date),
SUM(value1) as sum1,
AVG(value2) as avg1,
SUM(value3) as sum2
FROM
(
SELECT user_id, date, value1, value2, CAST(null as INT) as value3 FROM t1
UNION ALL
SELECT user_id, date, null, null, value3 FROM t2 INNER JOIN t3 ON t2.id = t3.t3_id
)
GROUP BY
user_id,
yearweek(date)
Hopefully mysql won't take issue with casting null to an int..

Update mySQL field with variable number of other fields

i have a mySQL table set up like this
+----+----------+---------+
| id | parentid | content |
+----+----------+---------+
| 1 | 0 | a |
| 2 | 1 | b |
| 3 | 0 | c |
| 4 | 3 | d |
| 5 | 3 | e |
| 6 | 3 | f |
+----+----------+---------+
what i would like to do is concatenate the content of the children onto the end of the parent (then delete the children, but i will do that later), in ASC order based on id. so the result should look like this (without children)
+----+----------+---------+
| id | parentid | content |
+----+----------+---------+
| 1 | 0 | ab |
| 3 | 0 | cdef |
+----+----------+---------+
the issue im running into is that as you can see a parent may have more than one child. so far the query i have is
UPDATE table
SET content = CONCAT(content,
...
) ORDER BY id ASC
im not sure what to place in the ... section to grab all of the children and append them in order they were retrieved. maybe im going about it the wrong way. any help will be greatly appreciated
One option is:
/*Table structure for table `table` */
DROP TABLE IF EXISTS `table`;
CREATE TABLE `table` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`parentid` INT(11) UNSIGNED NOT NULL,
`content` VARCHAR(4) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB;
/*Data for the table `table` */
INSERT INTO `table`(`parentid`,`content`)
VALUES
(0,'a'),(1,'b'),(0,'c'),(3,'d'),(3,'e'),(3,'f');
UPDATE `table`
INNER JOIN
(SELECT `t0`.`id`, CONCAT(`t0`.`content`, GROUP_CONCAT(`t1`.`content` SEPARATOR '')) AS `content`
FROM `table` `t0`
INNER JOIN `table` `t1` ON `t0`.`id` = `t1`.`parentid`
WHERE `t0`.`parentid` = 0
GROUP BY `t1`.`parentid`) `der`
SET `table`.`content` = `der`.`content`
WHERE `table`.`id` = `der`.`id`;
DELETE FROM `table` WHERE `parentid` > 0;
SQL Fiddle demo
You have to perfom so much changes to the table that it might be better to create another one with the new data and drop the current one.
The query that will get you the results you're looking for is:
SELECT
min(t1.id) id,
0 parentid,
group_concat(t1.content ORDER BY t1.id separator '') content
FROM t t1
LEFT JOIN t t2 ON t1.parentid = t2.id
GROUP BY coalesce(t2.id, t1.id);
Fiddle here.

How to rewrite a NOT IN subquery as join

Let's assume that the following tables in MySQL describe documents contained in folders.
mysql> select * from folder;
+----+----------------+
| ID | PATH |
+----+----------------+
| 1 | matches/1 |
| 2 | matches/2 |
| 3 | shared/3 |
| 4 | no/match/4 |
| 5 | unreferenced/5 |
+----+----------------+
mysql> select * from DOC;
+----+------+------------+
| ID | F_ID | DATE |
+----+------+------------+
| 1 | 1 | 2000-01-01 |
| 2 | 2 | 2000-01-02 |
| 3 | 2 | 2000-01-03 |
| 4 | 3 | 2000-01-04 |
| 5 | 3 | 2000-01-05 |
| 6 | 3 | 2000-01-06 |
| 7 | 4 | 2000-01-07 |
| 8 | 4 | 2000-01-08 |
| 9 | 4 | 2000-01-09 |
| 10 | 4 | 2000-01-10 |
+----+------+------------+
The columns ID are the primary keys and the column F_ID of table DOC is a not-null foreign key that references the primary key of table FOLDER. By using the 'DATE' of documents in the where clause, I would like to find which folders contain only the selected documents. For documents earlier than 2000-01-05, this could be written as:
SELECT DISTINCT d1.F_ID
FROM DOC d1
WHERE d1.DATE < '2000-01-05'
AND d1.F_ID NOT IN (
SELECT d2.F_ID
FROM DOC d2 WHERE NOT (d2.DATE < '2000-01-05')
);
and it correctly returns '1' and '2'. By reading
http://dev.mysql.com/doc/refman/5.5/en/rewriting-subqueries.html
the performance for big tables could be improved if the subquery is replaced with a join. I already found questions related to NOT IN and JOINS but not exactly what I was looking for. So, any ideas of how this could be written with joins ?
The general answer is:
select t.*
from t
where t.id not in (select id from s)
Can be rewritten as:
select t.*
from t left outer join
(select distinct id from s) s
on t.id = s.id
where s.id is null
I think you can apply this to your situation.
select distinct d1.F_ID
from DOC d1
left outer join (
select F_ID
from DOC
where date >= '2000-01-05'
) d2 on d1.F_ID = d2.F_ID
where d1.date < '2000-01-05'
and d2.F_ID is null
If I understand your question correctly, that you want to find the F_IDs representing folders which only contains documents from before '2000-01-05', then simply
SELECT F_ID
FROM DOC
GROUP BY F_ID
HAVING MAX(DATE) < '2000-01-05'
Sample Table and Insert Statements
CREATE TABLE `tleft` (
`id` int(2) NOT NULL,
`name` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `tright` (
`id` int(2) NOT NULL,
`t_left_id` int(2) DEFAULT NULL,
`description` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
INSERT INTO `tleft` (`id`, `name`)
VALUES
(1, 'henry'),
(2, 'steve'),
(3, 'jeff'),
(4, 'richards'),
(5, 'elon');
INSERT INTO `tright` (`id`, `t_left_id`, `description`)
VALUES
(1, 1, 'sample'),
(2, 2, 'sample');
Left Join : SELECT l.id,l.name FROM tleft l LEFT JOIN tright r ON l.id = r.t_left_id ;
Returns Id : 1, 2, 3, 4, 5
Right Join : SELECT l.id,l.name FROM tleft l RIGHT JOIN tright r ON l.id = r.t_left_id ;
Returns Id : 1,2
Subquery Not in tright : select id from tleft where id not in ( select t_left_id from tright);
Returns Id : 3,4,5
Equivalent Join For above subquery :
SELECT l.id,l.name FROM tleft l LEFT JOIN tright r ON l.id = r.t_left_id WHERE r.t_left_id IS NULL;
AND clause will be applied during the JOIN and WHERE clause will be applied after the JOIN .
Example : SELECT l.id,l.name FROM tleft l LEFT JOIN tright r ON l.id = r.t_left_id AND r.description ='hello' WHERE r.t_left_id IS NULL ;
Hope this helps

Join two tables where table A has a date value and needs to find the next date in B below the date in A

I got this table "A":
| id | date |
===================
| 1 | 2010-01-13 |
| 2 | 2011-04-19 |
| 3 | 2011-05-07 |
| .. | ... |
and this table "B":
| date | value |
======================
| 2009-03-29 | 0.5 |
| 2010-01-30 | 0.55 |
| 2011-08-12 | 0.67 |
Now I am looking for a way to JOIN those two tables having the "value" column in "B" mapped to the dates in "A". The tricky part for me here is that table "B" only stores the change date and the new value. Now when I need this value in table "A" the SQL needs to look back what date is the next below the date it is asking the value for.
So in the end the JOIN of those tables should look like this:
| id | date | value |
===========================
| 1 | 2010-01-13 | 0.5 |
| 2 | 2011-04-19 | 0.55 |
| 3 | 2011-05-07 | 0.55 |
| .. | ... | ... |
How can I do this?
-- Create and fill first table
CREATE TABLE `id_date` (
`id` int(11) NOT NULL auto_increment,
`iddate` date NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `id_date` VALUES(1, '2010-01-13');
INSERT INTO `id_date` VALUES(2, '2011-04-19');
INSERT INTO `id_date` VALUES(3, '2011-05-07');
-- Create and fill second table
CREATE TABLE `date_val` (
`mydate` date NOT NULL,
`myval` varchar(4) collate utf8_bin NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `date_val` VALUES('2009-03-29', '0.5');
INSERT INTO `date_val` VALUES('2010-01-30', '0.55');
INSERT INTO `date_val` VALUES('2011-08-12', '0.67');
-- Get the result table as asked in question
SELECT iddate, t2.mydate, t2.myval
FROM `id_date` t1
JOIN date_val t2 ON t2.mydate <= t1.iddate
AND t2.mydate = (
SELECT MAX( t3.mydate )
FROM `date_val` t3
WHERE t3.mydate <= t1.iddate )
What we're doing:
for each date in the id_date table (your table A),
we find the date in the date_val table (your table B)
which is the highest date in the date_val table (but still smaller than the id_date.date)
You could use a subquery with limit 1 to look up the latest value in table B:
select id
, date
, (
select value
from B
where B.date < A.date
order by
B.date desc
limit 1
) as value
from A
I have been inspired by the other answers but ended with my own solution using common table expressions:
WITH datecombination (id, adate, bdate) AS
(
SELECT id, A.date, MAX(B.Date) as Bdate
FROM tableA A
LEFT JOIN tableB B
ON B.date <= A.date
GROUP BY A.id, A.date
)
SELECT DC.id, DC.adate, B.value FROM datecombination DC
LEFT JOIN tableB B
ON DC.bdate = B.bdate
The INNER JOIN return rows when there is at least one match in both tables. Try this.
Select A.id,A.date,b.value
from A inner join B
on A.date=b.date