Sum of cost in MySQL - mysql

I have two tables, for example 1st has id, and name.
2nd has id, link to 1st table by id and COST.
CREATE TABLE FIRST_TABLE (id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR (100));
CREATE TABLE SECOND_TABLE (id INT PRIMARY KEY, FIRST_TABLE_ID INT NOT NULL, cost DECIMAL(10,2),
FOREIGN KEY (FIRST_TABLE_ID) REFERENCES FIRST_TABLE (ID));
INSERT INTO FIRST_TABLE (NAME) VALUES
('ONE'),
('TWO'),
('THREE');
INSERT INTO SECOND_TABLE (ID, FIRST_TABLE_ID, COST) VALUES
(1, 1, 500),
(2, 2, 400),
(3, 3, 150),
(4, 1, 500),
(5, 2, 400),
(6, 3, 150);
How to get sum of elements COST (of 2nd table), which depends on NAME (of 1st table)?
What i tried to do:
select FIRST_TABLE.NAME, sum(SECOND_TABLE.COST) TOTAL_COST
from FIRST_TABLE
left join SECOND_TABLE on FIRST_TABLE_ID = SECOND_TABLE.ID
group by FIRST_TABLE.ID
The problem is:
I have only irregular sum of cost - 1050 for every NAME.
ONE - 1050
TWO - 1050
THREE - 1050
How to get genuine values for every NAME?
And how will it look like if i have three tables and for key in 1st i have to get sum of 2nd table and 3rd table?

Here:
from FIRST_TABLE
left join SECOND_TABLE on FIRST_TABLE_ID = SECOND_TABLE.ID
The join condition is actually equivalent to:
on SECOND_TABLE.FIRST_TABLE_ID = SECOND_TABLE.ID
Both operands of the equality relate to the same table. This is not what you want. Instead, use:
select FIRST_TABLE.NAME, sum(SECOND_TABLE.COST) TOTAL_COST
from FIRST_TABLE
left join SECOND_TABLE on SECOND_TABLE.FIRST_TABLE_ID = FIRST_TABLE.ID
group by FIRST_TABLE.ID
I would also recommend using table aliases to shorten the query and make it more readable:
select t1.NAME, sum(t2.COST) TOTAL_COST
from FIRST_TABLE t1
left join SECOND_TABLE t2 on t2.FIRST_TABLE_ID = t1.ID
group by t1.ID

Related

Getting sum from a left table of leftjoined table

Below are the tables and the SQL query. I am doing a left join and trying to get SUM of a column that's in the left table and count from the right table.
Is it possible to get both in 1 query?
https://www.db-fiddle.com/f/3QuxG1DLgWJ8aGXNbnnwU1/1
select
s.test,
count(distinct s.name),
sum(s.score) score, -- need accurate score
count(a.id) attempts -- need accurate attempt count
from question s
left join attempts a on s.id = a.score_id
group by s.test
create table question (
id int auto_increment primary key,
test varchar(25),
name varchar(25),
score int
);
create table attempts (
id int auto_increment primary key,
score_id int,
attempt_no int
);
insert into question (test, name, score) values
('test1','name1', 10),
('test1','name2', 15),
('test1','name3', 20),
('test1','name4', 25),
('test2','name1', 15),
('test2','name2', 25),
('test2','name3', 30),
('test2','name4', 20);
insert into attempts (score_id, attempt_no) values
(1, 1),
(1, 2),
(1, 3),
(1, 4),
(2, 1),
(2, 2),
(2, 3),
(2, 4);
You need to pre-aggregate before the join:
select q.test, count(distinct q.name),
sum(q.score) score, -- need accurate score
sum(a.num_attempts) attempts -- need accurate attempt count
from question q left join
(select a.score_id, count(*) as num_attempts
from attempts a
group by a.score_id
) a
on q.id = a.score_id
group by q.test;
Here is a db-fiddle.
As Gordon said above, you can pre-aggregate, but his answer will get you the incorrect number of attempts, unfortunately. This is due to an issue with how you're structuring your DB schema. It looks like your question table really records scores of attempts at questions, and your attempts table is unnecessary. You should really have a question table that simply contains an ID and a name for the question, and a attempts table that contains an attempt ID, question ID, name, and score.
create table question (
id int auto_increment primary key,
test varchar(25)
);
create table attempts (
id int auto_increment primary key,
question_id int,
name varchar(25),
score int
);
Then your query becomes as simple as:
select
q.id as question_id,
count(distinct a.name) as attempters,
sum(a.score) as total_score,
count(a.id) as total_attempts
from question q join attempts a on q.id = a.question_id
group by q.id

MySQL: filter child records, include all siblings

There are two MySQL tables:
tparent(id int, some data...)
tchild(id int, parent_id int, some data...)
I need to return all columns (parent plus all children) where at least one of the children matches some criteria.
My current solution:
-- prepare sample data
DROP TABLE IF EXISTS tparent;
DROP TABLE IF EXISTS tchild;
CREATE TABLE tparent (id int, c1 varchar(10), c2 date, c3 float);
CREATE TABLE tchild(id int, parent_id int, c4 float, c5 varchar(20), c6 date);
CREATE UNIQUE INDEX tparent_id_IDX USING BTREE ON tparent (id);
CREATE UNIQUE INDEX tchild_id_IDX USING BTREE ON tchild (id);
INSERT INTO tparent
VALUES
(1, 'a', '2021-01-01', 1.23)
, (2, 'b', '2021-02-01', 1.32)
, (3, 'c', '2021-01-03', 2.31);
INSERT INTO tchild
VALUES
(10, 1, 22.333, 'argh1', '2000-01-01')
, (20, 1, 33.222, 'argh2', '2000-01-02')
, (30, 1, 44.555, 'argh3', '2000-02-02')
, (40, 2, 33.222, 'argh4', '2000-03-02')
, (50, 3, 33.222, 'argh5', '2000-04-02')
, (60, 3, 33.222, 'argh6', '2000-05-02');
-- the query
WITH parent_filter AS
(
SELECT
parent_id
FROM
tchild
WHERE
c4>44
)
SELECT
p.*,
c.*
FROM
tparent p
JOIN tchild c ON p.id = c.parent_id
JOIN parent_filter pf ON p.id = pf.parent_id;
It returns 3 rows for parent id 1 and child ids 10, 20, 30, because child id 30 has a matching record. It does not return data for any other parent id.
However, I am querying tchild twice here (first in the CTE, then again in the main query). As both tables are relatively big (10s - 100s millions of rows, 2-5 child records per parent record on average), I am hitting performance / timing issues.
Is there a better way of achieving this filtering? I.e. without having to query tchild table more than once?
did you try this version?
SELECT *
FROM tparent p
JOIN tchild c ON p.id = c.parent_id AND <criteria>
this way you limit the tchild table with the createria before the actual join
Perhaps you can use this instead:
select p.*, c.*
from tparent p
join tchild c
on p.id = c.parent_id
where exists (select 1 from tchild where <crtiteria>)
This should retrieve all rows for parent and child join when there is at least one record in the child table meeting the criteria.

Delete all duplicate rows in mysql

i have MySQL data which is imported from csv file and have multiple duplicate files on it,
I picked all non duplicates using Distinct feature.
Now i need to delete all duplicates using SQL command.
Note i don't need any duplicates i just need to fetch only noon duplicates
thanks.
for example if number 0123332546666 is repeated 11 time i want to delete 12 of them.
Mysql table format
ID, PhoneNumber
Just COUNT the number of duplicates (with GROUP BY) and filter by HAVING. Then supply the query result to DELETE statement:
DELETE FROM Table1 WHERE PhoneNumber IN (SELECT a.PhoneNumber FROM (
SELECT COUNT(*) AS cnt, PhoneNumber FROM Table1 GROUP BY PhoneNumber HAVING cnt>1
) AS a);
http://sqlfiddle.com/#!9/a012d21/1
complete fiddle:
schema:
CREATE TABLE Table1
(`ID` int, `PhoneNumber` int)
;
INSERT INTO Table1
(`ID`, `PhoneNumber`)
VALUES
(1, 888),
(2, 888),
(3, 888),
(4, 889),
(5, 889),
(6, 111),
(7, 222),
(8, 333),
(9, 444)
;
delete query:
DELETE FROM Table1 WHERE PhoneNumber IN (SELECT a.PhoneNumber FROM (
SELECT COUNT(*) AS cnt, PhoneNumber FROM Table1 GROUP BY PhoneNumber HAVING cnt>1
) AS a);
you could try using a left join with the subquery for min id related to each phonenumber ad delete where not match
delete m
from m_table m
left join (
select min(id), PhoneNumber
from m_table
group by PhoneNumber
) t on t.id = m.id
where t.PhoneNumber is null
otherwise if you want delete all the duplicates without mantain at least a single row you could use
delete m
from m_table m
INNER join (
select PhoneNumber
from m_table
group by PhoneNumber
having count(*) > 1
) t on t.PhoneNumber= m.PhoneNumber
Instead of deleting from the table, I would suggest creating a new one:
create table table2 as
select min(id) as id, phonenumber
from table1
group by phonenumber
having count(*) = 1;
Why? Deleting rows has a lot of overhead. If you are bringing the data in from an external source, then treat the first landing table as a staging table and the second as the final table.

Selecting parent records when child mathes criteria

I am trying to limit returned results of users to results that are "recent" but where users have a parent, I also need to return the parent.
CREATE TABLE `users` (
`id` int(0) NOT NULL,
`parent_id` int(0) NULL,
`name` varchar(255) NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `times` (
`id` int(11) NOT NULL,
`time` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (1, NULL, 'Alan');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (2, 1, 'John');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (3, NULL, 'Jerry');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (4, NULL, 'Bill');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (5, 1, 'Carl');
INSERT INTO `times`(`id`, `time`) VALUES (2, '2019-01-01 14:40:38');
INSERT INTO `times`(`id`, `time`) VALUES (4, '2019-01-01 14:40:38');
http://sqlfiddle.com/#!9/91db19
In this case I would want to return Alan, John and Bill, but not Jerry because Jerry doesn't have a record in the times table, nor is he a parent of someone with a record. I am on the fence about what to do with Carl, I don't mind getting the results for him, but I don't need them.
I am filtering tens of thousands of users with hundreds of thousands of times records, so performance is important. In general I have about 3000 unique id's coming from times that could be either an id, or a parent_id.
The above is a stripped down example of what I am trying to do, the full one includes more joins and case statements, but in general the above example should be what we work with, but here is a sample of the query I am using (full query is nearly 100 lines):
SELECT id AS reference_id,
CASE WHEN (id != parent_id)
THEN
parent_id
ELSE null END AS parent_id,
parent_id AS family_id,
Rtrim(last_name) AS last_name,
Rtrim(first_name) AS first_name,
Rtrim(email) AS email,
missedappt AS appointment_missed,
appttotal AS appointment_total,
To_char(birth_date, 'YYYY-MM-DD 00:00:00') AS birthday,
To_char(first_visit_date, 'YYYY-MM-DD 00:00:00') AS first_visit,
billing_0_30
FROM users AS p
RIGHT JOIN(
SELECT p.id,
s.parentid,
Count(p.id) AS appttotal,
missedappt,
billing0to30 AS billing_0_30
FROM times AS p
JOIN (SELECT missedappt, parent_id, id
FROM users) AS s
ON p.id = s.id
LEFT JOIN (SELECT parent_id, billing0to30
FROM aging) AS aging
ON aging.parent_id = p.id
WHERE p.apptdate > To_char(Timestampadd(sql_tsi_year, -1, Now()), 'YYYY-MM-DD')
GROUP BY p.id,
s.parent_id,
missedappt,
billing0to30
) AS recent ON recent.patid = p.patient_id
This example is for a Faircom C-Tree database, but I also need to implement a similar solution in Sybase, MySql, and Pervasive, so just trying to understand what I should do for best performance.
Essentially what I need to do is somehow get the RIGHT JOIN to also include the users parent.
NOTES:
based on your fiddle config I'm assuming you're using MySQL 5.6 and thus don't have support for Common Table Expressions (CTE)
I'm assuming each name (child or parent) is to be presented as separate records in the final result set
We want to limit the number of times we have to join the times and users tables (a CTE would make this a bit easier to code/read).
The main query (times -> users(u1) -> users(u2)) will give us child and parent names in separate columns so we'll use a 2-row dynamic table plus a case statement to to pivot the columns into their own rows (NOTE: I don't work with MySQL and didn't have time to research if there's a pivot capability in MySQL 5.6)
-- we'll let 'distinct' filter out any duplicates (eg, 2 'children' have same 'parent')
select distinct
final.name
from
-- cartesian product of 'allnames' and 'pass' will give us
-- duplicate lines of id/parent_id/child_name/parent_name so
-- we'll use a 'case' statement to determine which name to display
(select case when pass.pass_no = 1
then allnames.child_name
else allnames.parent_name
end as name
from
-- times join users left join users; gives us pairs of
-- child_name/parent_name or child_name/NULL
(select u1.id,u1.parent_id,u1.name as child_name,u2.name as parent_name
from times t
join users u1
on u1.id = t.id
left
join users u2
on u2.id = u1.parent_id) allnames
join
-- poor man's pivot code:
-- 2-row dynamic table; no join clause w/ allnames will give us a
-- cartesian product; the 'case' statement will determine which
-- name (child vs parent) to display
(select 1 as pass_no
union
select 2) pass
) final
-- eliminate 'NULL' as a name in our final result set
where final.name is not NULL
order by 1
Result set:
name
==============
Alan
Bill
John
MySQL fiddle

Left join with condition

Suppose I have these tables
create table bug (
id int primary key,
name varchar(20)
)
create table blocking (
pk int primary key,
id int,
name varchar(20)
)
insert into bug values (1, 'bad name')
insert into bug values (2, 'bad condition')
insert into bug values (3, 'about box')
insert into blocking values (0, 1, 'qa bug')
insert into blocking values (1, 1, 'doc bug')
insert into blocking values (2, 2, 'doc bug')
and I'd like to join the tables on id columns and the result should be like this:
id name blockingName
----------- -------------------- --------------------
1 bad name qa bug
2 bad condition NULL
3 about box NULL
This means:
I'd like to return all rows from #bug
there should be only 'qa bug' value in column 'blockingName' or NULL (if no matching row in #blocking was found)
My naive select was like this:
select * from #bug t1
left join #blocking t2 on t1.id = t2.id
where t2.name is null or t2.name = 'qa bug'
but this does not work, because it seems that the condition is first applied to #blocking table and then it is joined.
What is the simplest/typical solution for this problem? (I have a solution with nested select, but I hope there is something better)
Simply put the "qa bug" criteria in the join:
select t1.*, t2.name from #bug t1
left join #blocking t2 on t1.id = t2.id AND t2.name = 'qa bug'
correct select is:
create table bug (
id int primary key,
name varchar(20)
)
insert into bug values (1, 'bad name')
insert into bug values (2, 'bad condition')
insert into bug values (3, 'about box')
CREATE TABLE blocking
(
pk int IDENTITY(1,1)PRIMARY KEY ,
id int,
name varchar(20)
)
insert into blocking values (1, 'qa bug')
insert into blocking values (1, 'doc bug')
insert into blocking values (2, 'doc bug')
select
t1.id, t1.name,
(select b.name from blocking b where b.id=t1.id and b.name='qa bug')
from bug t1
It looks like you want to select only one row from #blocking and join that to #bug. I would do:
select t1.id, t1.name, t2.name as `blockingName`
from `#bug` t1
left join (select * from `#blocking` where name = "qa bug") t2
on t1.id = t2.id
select *
from #bug t1
left join #blocking t2 on t1.id = t2.id and t2.name = 'qa bug'
make sure the inner query only returns one row.
You may have to add a top 1 on it if it returns more than one.
select
t1.id, t1.name,
(select b.name from #blocking b where b.id=t1.id and b.name='qa bug')
from #bug t1
Here's a demo: http://sqlfiddle.com/#!2/414e6/1
select
bug.id,
bug.name,
blocking.name as blockingType
from
bug
left outer join blocking on
bug.id = blocking.id AND
blocking.name = 'qa bug'
order by
bug.id
By adding the "blocking.name" clause under the left outer join, rather than to the where, you indicate that it should also be consider "outer", or optional. When part of the where clause, it is considered required (which is why the null values were being filtered out).
BTW - sqlfiddle.com is my site.