MySQL query with multiple full joins - mysql

I have a query which gets all the jobs from a database, some jobs don't have a languagepair and I need to get those too while still getting the languagepair information for jobs witch have a languagepair, I understand this is done with a full join but full joins do not exist in mySQL, I read about it and I need to do some sort of UNION.
If I get NULLS as source & target for jobs that do not have a languagepair it is good.
This is the query I have at the moment:
SELECT jobName, source.name AS source, target.name AS target FROM (
(SELECT jobs.name AS jobName, lp.sourceId, lp.targetId FROM jobs **JOIN languagePairs** lp
ON lp.id = jobs.languagePairId)
UNION
(SELECT jobs.name AS jobName, lp.sourceId, lp.targetId FROM collectiveJobs JOIN jobs ON jobs.id = collectiveJobs.jobId
**JOIN languagePairs lp** on jobs.languagePairId = lp.id
WHERE collectiveJobs.freelancerId = 1)
) AS jobs **JOIN languages** source ON source.id = sourceId **JOIN languages** target ON target.id = targetId;
I think but I am not sure the full joins need to happen at the bold joins. There also needs to be some sort of checking for null (I think) in the query.
Off course I could do this programmatically but it would be nice to have 1 query for it.
DB schema:
create table languages
(
id int auto_increment primary key,
name varchar(255) not null
)
create table languagePairs
(
id int auto_increment
primary key,
sourceId int not null,
targetId int not null,
constraint languagePair_sourceId_targetId_uindex
unique (sourceId, targetId),
constraint languagePair_language_id_fk_source
foreign key (sourceId) references languages (id),
constraint languagePair_language_id_fk_target
foreign key (targetId) references languages (id)
)
create table jobs
(
id int auto_increment
primary key,
name varchar(255) null,
freelancerId int null,
languagePairId int null,
constraint jobs_freelancers_id_fk
foreign key (freelancerId) references freelancers (id),
constraint jobs_languagePairs_id_fk
foreign key (languagePairId) references languagePairs (id)
)
create table collectiveJobs
(
id int auto_increment
primary key,
jobId int not null,
freelancerId int not null,
constraint collectiveJobs_freelancerId_jobId_uindex
unique (freelancerId, jobId),
constraint collectiveJobs_freelancers_id_fk
foreign key (freelancerId) references freelancers (id),
constraint collectiveJobs_jobs_id_fk
foreign key (jobId) references jobs (id)
)
create table freelancers
(
id int auto_increment primary key
)
Sample data:
INSERT INTO datamundi.jobs (id, name, freelancerId, languagePairId) VALUES (1, 'Job 1', 1, 1);
INSERT INTO datamundi.jobs (id, name, freelancerId, languagePairId) VALUES (2, 'Job 2', 1, null);
If I execute the query only Job 1 gets shown.
MySQL version on development machine: mysql Ver 8.0.19 for Linux on x86_64 (MySQL Community Server - GPL)
MySQL version on production server: mysql Ver 8.0.17 for Linux on x86_64 (MySQL Community Server - GPL)
All help is truly appreciated.

I can't really delve into your specific example, but the good news is you are using MySQL 8.x. The workaround for a FULL OUTER JOIN between two tables (a and b) in MySQL is:
select * from a left join b on <predicate>
union
select * from a right join b on <predicate>
Now if you need to join complex selects instead of simple tables, them CTEs come to your rescue. For example, if the left side were a comple SELECT you would do:
with s as ( <complex-select-here> )
select * from s left join b on <predicate>
union
select * from s right join b on <predicate>
If both are complex SELECTs then:
with s as ( <complex-select-here> ),
t as ( <complex-select-here> )
select * from s left join t on <predicate>
union
select * from s right join t on <predicate>
No sweat.

This works with all LEFT joins, I am sorry, I should have tried first.
SELECT jobName, source.name AS source, target.name AS target FROM (
(SELECT jobs.name AS jobName, lp.sourceId, lp.targetId FROM jobs LEFT JOIN languagePairs lp
ON jobs.languagePairId = lp.id)
UNION
(SELECT jobs.name AS jobName, lp.sourceId, lp.targetId FROM collectiveJobs JOIN jobs ON jobs.id = collectiveJobs.jobId
LEFT JOIN languagePairs lp on jobs.languagePairId = lp.id
WHERE collectiveJobs.freelancerId = 1)
) AS jobs LEFT JOIN languages source ON source.id = sourceId LEFT JOIN languages target ON target.id = targetId;
Not sure why I taught I needed a FULL JOIN...

Related

why does index not work as expected in mysql?

I really want to why my index not working.
I have two table post, post_log.
create table post
(
id int auto_increment
primary key,
comment int null,
is_used tinyint(1) default 1 not null,
is_deleted tinyint(1) default 0 not null
);
create table post_log
(
id int auto_increment
primary key,
post_id int not null,
created_at datetime not null,
user int null,
constraint post_log_post_id_fk
foreign key (post_id) references post (id)
);
create index post_log_created_at_index
on post_log (created_at);
When I queried below, created_at index works well.
explain
SELECT *
FROM post p
INNER JOIN post_log pl ON p.id = pl.post_id
WHERE pl.created_at > DATE('2022-06-01')
AND pl.created_at < DATE('2022-06-08')
AND p.is_used is TRUE
AND p.is_deleted is FALSE;
When I queried below, it doesn't work and post table do full scan.
explain
SELECT *
FROM post p
INNER JOIN post_log pl ON p.id = pl.post_id
WHERE pl.created_at > DATE('2022-06-01')
AND pl.created_at < DATE('2022-06-08')
AND p.is_used = 1
AND p.is_deleted = 0;
And below not working either.
explain
SELECT *
FROM post p
INNER JOIN post_log pl ON p.id = pl.post_id
WHERE pl.created_at > DATE('2022-06-01')
AND pl.created_at < DATE('2022-06-08')
and p.comment = 111
what is different between 'tinyint = 1' and 'tinyint is true'?
and, why first query work correctly and the others don't work correctly??
When making the query plan, MySQL has to decide whether to first filter the post_log table using the index, or first filter the post table using the is_used and is_deleted columns.
= 1 tests for the specific value 1, while IS TRUE is true for any non-zero value. I guess it decides that when you're searching for specific values, it will be more efficient to filter the post table first because there will likely be fewer matches (since these columns aren't indexed, it doesn't know that 0 and 1 are the only values).

Improve Efficiency Of This Query: Update With Joins and Subqueries

I have a mysql database with
- table of Parcels which need to be sent to people (here 16,000 records),
indexes on account_no, service
- table of Price Rates (500,000 records) - rate depends on: delivery area, customer price rate and type of service(e.g. next day etc), indexes
on area, price rate, service
- table of first part of postcode (or zip Code), which gives area (3000)
- table of customer account, containing price rate (1600), index on price rate
The query finds the price it will cost to send the parcel and updates the customer price for that parcel with unique id
It is taking 70 seconds for 16000 parcel records to be updated with the price to send each parcel
UPDATE
tbl_parcel AS t20, (
SELECT
id, service, rate_group, area,
(
SELECT
rate
FROM
tbl_rates_all t4
WHERE
t4.service = t10.service
AND t4.area = t10.area
AND t4.rate_group = t10.rate_group
)
AS price
FROM
(
SELECT
id,
t1.service,
rate_group,
area
FROM
tbl_parcel t1
JOIN
tbl_account t2
ON t1.account_no = t2.account_no
JOIN
tbl_pr_postcode t3
ON LEFT(full_pcode, locate(' ', full_pcode) - 1) = t3.postcode
) t10
) AS src
SET
t20.customer_price = src.price
WHERE
t20.id = src.id
Takes 70 seconds for the 16000 parcel records
Ultimately it is this part that is killing the efficiency
FROM
tbl_rates_all t4
WHERE
t4.service = t10.service
AND t4.area = t10.area
AND t4.rate_group = t10.rate_group
I could have separate rates tables for each rate as this was the original design so a variable would call e.g. tbl_rates001 which might only have 3000 records and not 500,000. Problem with doing this in mysql was when creating a table name on the fly it was not possible without using a prepared statement so i thought this method was no good. Shame you couldn't use a user variable to hold the price rate number and then add this to the table rate name.
I'm quite new to databases and queries so if something is screaming at you that would help then thanks for any input
regards
ADDTION AS REQUESTED SCHEMA
CREATE TABLE `tbl_x_rate_all` (
`id` bigint(20) NOT NULL,
`service` varchar(4) NOT NULL,
`chargetype` char(1) NOT NULL,
`area` smallint(6) NOT NULL,
`rate` float(7,2) NOT NULL,
`rate_group` smallint(6) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
ALTER TABLE `tbl_x_rate_all` ADD PRIMARY KEY (`id`), ADD KEY `rate_group` (`rate_group`), ADD KEY `area` (`area`), ADD KEY `service` (`service`),
ADD KEY `chargetype` (`chargetype`);
Assuming id, rate_group and area are from t1 inside t10 then your query is a slower? version of the one below:
UPDATE
tbl_parcel AS t20
INNER JOIN (
SELECT
t1.id,
t4.rate as price
FROM tbl_parcel t1
JOIN tbl_account t2 ON t1.account_no = t2.account_no
JOIN tbl_pr_postcode t3 ON LEFT(full_pcode, locate(' ', full_pcode) - 1) = t3.postcode
LEFT JOIN tbl_rates_all t4 ON t1.service = t4.service AND t1.area = t4.area
AND t1.rate_group = t4.rate_group
) src ON t20.id = src.id
SET
t20.customer_price = src.price
WHERE
t20.id = src.id
I am guessing you can further lose the subquery which tend to be cumbersome:
UPDATE
tbl_parcel AS t20
INNER JOIN tbl_parcel t1 ON t20.id = t1.id
INNER JOIN tbl_account t2 ON t1.account_no = t2.account_no
INNER JOIN tbl_pr_postcode t3 ON LEFT(full_pcode, locate(' ', full_pcode) - 1) = t3.postcode
LEFT JOIN tbl_rates_all t4 ON t1.service = t4.service AND t1.area = t4.area
AND t1.rate_group = t4.rate_group
SET
t20.customer_price = t4.rate
WHERE
t20.id = t1.id
-- can also replace with
-- TRUE
-- or lose it altogether
;
You could try adding index on joins of t1 and t4 if you have reason to believe that the join is the bottleneck:
create index tbl_rates_all_service_area_rate_group_index
on tbl_rates_all (service, area, rate_group);
create index tbl_parcel_service_area_rate_group_index
on tbl_parcel (service, area, rate_group);
#Rich --
Nothing's jumping out. You might be able to get some marginal improvements by building some temp tables, handling the SET outside your main query, and using some OUTER APPLY instead of the nested queries.
If you're fairly new to mysql / databases in general, the EXPLAIN function can be very useful in optimization
https://dev.mysql.com/doc/refman/5.5/en/using-explain.html

How to get values through connected tables?

I have such a question. I got two tables, the first one contains comments, and the second id comments and album id to which the comment was left
> CREATE TABLE `review` (`id` VARCHAR(32) NOT NULL,
> `user_id` VARCHAR(32) NOT NULL,`comment` MEDIUMTEXT NOT NULL,
> PRIMARY KEY (`id`) )
> CREATE TABLE `review_album` (`review_id` VARCHAR(32) NOT NULL,
> `album_id` VARCHAR(32) NOT NULL, PRIMARY KEY (`review_id`,
> `album_id`), INDEX `review_album_review_idx` (`review_id`) )
I tried this way:
SELECT * from review_album JOIN review WHERE album_id = '300001'
But i got result two times.
How can I get comment text for a specific album_id?
The general syntax is:
SELECT column-names
FROM table-name1 JOIN table-name2
ON column-name1 = column-name2
WHERE condition
The general syntax with INNER is:
SELECT column-names
FROM table-name1 INNER JOIN table-name2
ON column-name1 = column-name2
WHERE condition
Note: The INNER keyword is optional: it is the default as well as the most commonly used JOIN operation.
Refrence : https://www.dofactory.com/sql/join
Try with InnerJoin
SELECT *
FROM review_album
JOIN review ON review_album.review_id=review.id
WHERE album_id = '300001'
Reference
you have forgotten the on condition, everytime you have a join you'd better specify the condition of join, otherwais you have every connection available.
Hovewer the solution
SELECT *
FROM review_album RA
JOIN review R ON RA.column_fk = R.column_fk
WHERE album_id = '300001'
Here the documentation for join https://www.w3schools.com/sql/sql_join.asp
try using this :
SELECT *
FROM review_album ra
JOIN review r ON rareview_id=r.id
WHERE album_id = '300001'

Mysql query to check if all sub_items of a combo_item are active

I am trying to write a query that looks through all combo_items and only returns the ones where all sub_items that it references have Active=1.
I think I should be able to count how many sub_items there are in a combo_item total and then compare it to how many are Active, but I am failing pretty hard at figuring out how to do that...
My table definitions:
CREATE TABLE `combo_items` (
`c_id` int(11) NOT NULL,
`Label` varchar(20) NOT NULL,
PRIMARY KEY (`c_id`)
)
CREATE TABLE `sub_items` (
`s_id` int(11) NOT NULL,
`Label` varchar(20) NOT NULL,
`Active` int(1) NOT NULL,
PRIMARY KEY (`s_id`)
)
CREATE TABLE `combo_refs` (
`r_id` int(11) NOT NULL,
`c_id` int(11) NOT NULL,
`s_id` int(11) NOT NULL,
PRIMARY KEY (`r_id`)
)
So for each combo_item, there is at least 2 rows in the combo_refs table linking to the multiple sub_items. My brain is about to make bigbadaboom :(
I would just join the three tables usually and then combo-item-wise sum up the total number of sub-items and the number of active sub-items:
SELECT ci.c_id, ci.Label, SUM(1) AS total_sub_items, SUM(si.Active) AS active_sub_items
FROM combo_items AS ci
INNER JOIN combo_refs AS cr ON cr.c_id = ci.c_id
INNER JOIN sub_items AS si ON si.s_id = cr.s_id
GROUP BY ci.c_id
Of course, instead of using SUM(1) you could just say COUNT(ci.c_id), but I wanted an analog of SUM(si.Active).
The approach proposed assumes Active to be 1 (active) or 0 (not active).
To get only those combo-items whose all sub-items are active, just add WHERE si.Active = 1. You could then reject the SUM stuff anyway. Depends on what you are looking for actually:
SELECT ci.c_id, ci.Label
FROM combo_items AS ci
INNER JOIN combo_refs AS cr ON cr.c_id = ci.c_id
INNER JOIN sub_items AS si ON si.s_id = cr.s_id
WHERE si.Active = 1
GROUP BY ci.c_id
By the way, INNER JOIN ensures that there is at least one sub-item per combo-item at all.
(I have not tested it.)
See this answer:
MySQL: Selecting foreign keys with fields matching all the same fields of another table
Select ...
From combo_items As C
Where Exists (
Select 1
From sub_items As S1
Join combo_refs As CR1
On CR1.s_id = S1.s_id
Where CR1.c_id = C.c_id
)
And Not Exists (
Select 1
From sub_items As S2
Join combo_refs As CR2
On CR2.s_id = S2.s_id
Where CR2.c_id = C.c_id
And S2.Active = 0
)
The first subquery ensures that at least one sub_item exists. The second ensures that none of the sub_items are inactive.

Violation of Primary Key constraint error

I am using SSMS 2008 and trying to insert with this query but am getting the following error:
Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK_j5c_MasterMeasures'. Cannot insert duplicate key in object 'dbo.j5c_MasterMeasures'.
The statement has been terminated.
Here is my query:
insert into J5C_MasterMeasures (studentid, measuredate, measureid, RIT)
select A.studentid, A.measuredate, B.measurename+' ' +B.LabelName, A.score_14
from [J5C_Measures_Sys] A
join [J5C_ListBoxMeasures_Sys] B on A.MeasureID = B.MeasureID
join sysobjects so on so.name = 'J5C_Measures_Sys'
join syscolumns sc on so.id = sc.id
join [J5C_MeasureNamesV2_Sys] v on v.Score_field_id = sc.name
where so.type = 'u' and sc.name = 'score_14' and a.score_14 is not null
AND A.STUDENTID IS NOT NULL AND A.MEASUREDATE IS NOT NULL AND B.MEASURENAME IS NOT NULL
group by a.studentid, a.measuredate, B.measurename, B.LabelName, A.score_14
--HAVING COUNT(*) > 1
The strange thing is that if I run just the SELECT query (without the INSERT) and include the HAVING COUNT statement, it returns 0 records for > 1. So I don't know where the duplicate is coming from!
Based on your earlier question, i believe that your primary key is A.studentid, A.measuredate, B.measurename. Please correct me if i am wrong on this.
Since you are grouping by two additional columns B.LabelName and A.score_14 in addition to your columns of your composite primary key, if there are any duplicates - which there can be provided they have different values of either B.LabelName or A.score_14 - you will violate your primary key constraint and this error will be thrown.
Your data will just not be unique enough to satisfy your primary key - which states that ONLY ONE ROW with a unique combination of A.studentid, A.measuredate, B.measurename can exist in your table
Is there data already in the J5C_MasterMeasures table? If so make sure that what you are inserting doesn't already exist
Use:
select A.studentid, A.measuredate, B.measurename+' ' +B.LabelName, A.score_14
from [J5C_Measures_Sys] A
join [J5C_ListBoxMeasures_Sys] B on A.MeasureID = B.MeasureID
join sysobjects so on so.name = 'J5C_Measures_Sys'
join syscolumns sc on so.id = sc.id
join [J5C_MeasureNamesV2_Sys] v on v.Score_field_id = sc.name
where so.type = 'u' and sc.name = 'score_14' and a.score_14 is not null
AND A.STUDENTID IS NOT NULL AND A.MEASUREDATE IS NOT NULL AND B.MEASURENAME IS NOT NULL
AND NOT EXISTS(SELECT NULL
FROM J5C_MasterMeasures x
WHERE x.studentid = a.studentid
AND x.measuredate = a.measuredate
AND x.measureid = B.measurename +' '+ B.LabelName)
group by a.studentid, a.measuredate, B.measurename, B.LabelName, A.score_14
The NOT EXISTS will filter out already existing data.
You should double-check your HAVING test. Make sure you are only including the column(s) that comprises the PK in your GROUP BY clause.