SUM data based on a Group By statement for another table - mysql

I am trying to create a query that allows me to get the sum of a total stored in one table based on values in another table.
Specifically, I have one table called 'winning_bids', that I want to join with another table, called 'objects'. 'winning_bids' contains a User ID, and an Object ID (primary key of 'objects' table). The 'objects' table contains an Object ID, and the value of the object. I want to sum the value from the 'objects' table for each user, grouped by the User ID from the 'winning_bids' table.
I tried something like this, but it does not work:
SELECT SUM(o.value) AS total, w.uid
FROM winning_bids w
LEFT JOIN objects o ON (o.id = w.oid)
GROUP BY w.uid
This statement merely returns all of the User IDs, but with the total for only the first User ID in each row.
Any help would be appreciated, thanks.

It works fine for me.
Here is what I did to test your query:
CREATE TABLE winning_bids (uid INT NOT NULL, oid INT NOT NULL);
INSERT INTO winning_bids (uid, oid) VALUES
(1, 1),
(1, 2),
(2, 3);
CREATE TABLE objects (id INT NOT NULL, value INT NOT NULL);
INSERT INTO objects (id, value) VALUES
(1, 1),
(2, 20),
(3, 300);
SELECT SUM(o.value) AS total, w.uid
FROM winning_bids w
LEFT JOIN objects o ON (o.id = w.oid)
GROUP BY w.uid;
Result:
total uid
21 1
300 2
If you still think it doesn't work can you please post example input data that gives the wrong result when you run your query, and also specify what you believe that the correct result should be.

Related

SQL: create global alias for nested SELECT to be used in another nested SELECT

Let's assume I've got a database with two tables: people which contains person's id and his/her birth year for each person and parents which contains the (parent_id, child_id) pairs to represent the relative relationships between people. To make the explanation easier let's assume each person has either 0 children or 1 child. Here is an example of the data in the database (as a set of SQL statements to create it on MySQL):
CREATE TABLE people (
id INTEGER NOT NULL AUTO_INCREMENT,
birth_year INTEGER NOT NULL,
CONSTRAINT PRIMARY KEY(id)
);
CREATE TABLE parents (
parent_id INTEGER NOT NULL,
child_id INTEGER NOT NULL,
CONSTRAINT PRIMARY KEY(parent_id,child_id)
);
-- Not creating FOREIGN KEYS, because it's just an example
INSERT INTO people (id, birth_year) VALUES (1, 1937);
INSERT INTO people (id, birth_year) VALUES (2, 1943);
INSERT INTO people (id, birth_year) VALUES (3, 1974);
INSERT INTO people (id, birth_year) VALUES (4, 2001);
INSERT INTO people (id, birth_year) VALUES (5, 2020);
INSERT INTO parents (parent_id, child_id) VALUES (1, 4);
INSERT INTO parents (parent_id, child_id) VALUES (3, 5);
Result:
Now I want to make up a query which will retrieve the id of the person, whose child was born at the earliest age of the parent (for example, if I was born in 1234 and my child was born in 1300, then my age when my child was born was 1300 - 1234 = 66 and I would like to find a person which got their child earlier than others).
I have made up some queries for it, but each of them either didn't work or had duplications or both. The one I like most is
SELECT id AS pid, -- Parent id
(SELECT child_id FROM parents WHERE parent_id=pid) AS cid -- Child id
FROM people WHERE
EXISTS(SELECT cid) -- Only selecting parents who have children (not sure this part is correct)
ORDER BY (birth_year - (SELECT birth_year FROM people WHERE id=cid)) ASC -- Order by age when they got their child
LIMIT 1;
But this one fails in MySQL with the error:
ERROR 1054 (42S22) at line 24: Unknown column 'cid' in 'field list'
How do I fix the error? Another thing I am worried about is that as a result, I will select not only the parent's id but also the id of one of his/her children. Is it possible to avoid it?
Probably there is a better way to select the data I'm looking for?
You can get the ages by using joins:
select c.*, (p.birth_year - c.birth_year) as parent_age
from parents pa join
people p
on pa.parent_id = p.id join
people c
on pa.child_id = pc.id;
To get all the rows with the minimum, use window functions:
select x.*
from (select c.*, (p.birth_year - c.birth_year) as parent_age,
min(p.birth_year - c.birth_year) over () as min_age
from parents pa join
people p
on pa.parent_id = p.id join
people c
on pa.child_id = pc.id
) x
where parent_age = min_age;

SQL Server : how to insert-into-select-from, with scalar function in select caluse

I am having trouble, while inserting data from a select statement having scalar function call. I posted sample script below, and little explanation and question in the comments.
---target table (in my case, this is not table variable, but a regular table, here for simplicity to posted this code as table variable to get the idea)
declare #tblItems table (
Id int identity(1,1)
,ItemID int
,TranNo varchar(20)
,Qty decimal(18,3)
,SomeCalculatedValue decimal(18,3)
)
--a dummay temp table, works like a source table
declare #tblTemp table (
Id int identity(1,1)
,ItemID int
,TranNo varchar(20)
,Qty decimal(18,3)
,SomeCalculatedValue decimal(18,3)
)
--put some dummy data in target table
insert into #tblItems(ItemID, TranNo, Qty, SomeCalculatedValue)
values
(1, 'GRN-001', 10, 0),
(2, 'GRN-002', 20, 0),
(3, 'GRN-003', 15, 0),
(4, 'GRN-004', 32, 0),
(5, 'GRN-005', 18, 0)
;
--insert 3 new rows in temp table, which later I want to insert in target table
insert into #tblTemp(ItemID, TranNo, Qty, SomeCalculatedValue)
values
(1, 'GRN-006', 6, 0), -- this line is working work fine,
(1, 'GRN-007', 3, 0), -- but this line is having problem, because it is not considering the last line( with TranNo='GRN-006' )
(2, 'GRN-008', 8, 0)
--here is the actual work, I need to read data from temp table to target table
--and the key requirement is the column 'SomeCalculatedValue'
--it should call a scalar function, and within that function I have to perform some calculations based on same target table
--for each ItemID passed, that scalar function will works as: it performs some sort of calculations on existing rows
--for that particular ItemID, to simplify the understanding you can think of it as Running-Total(not actually running total, but concept
--is same that each row value will based on previous row value)
insert into #tblItems(ItemID, TranNo, Qty, SomeCalculatedValue)
select
ItemID
,TranNo
,Qty
,[dbo].[GetCalculatedValue] (ItemID, Qty) as SomeCalculatedValue -- this function will perform some calcualations
from #tblTemp
select * from #tblItems
I have two tables, #tblItems and #tblTemp. I have to insert rows from #tblTemp to #tblItems, but in the select clause of #tblTemp, I used a scalar function, lets say, GetCalculatedValue(ItemID, Qty), which performs some calculations for specific ItemID from target table, and for each row it calculates a value which should be inserting in the #tblItems. It is not really Running-Total but for the sake for understanding it can think of as running total, because each row value will depend upon last previous lines.
So problem is that when #tblTemp has more than 1 row for a particular ItemID, it should consider the rows already inserted, but I think this insert-into-select statement will insert all rows at once, so it is not considering the last lines for particular ItemID which are in same select statement. You can review the code, I posted some comments also for explanation.

Remove duplicate in SSIS package with preference over a column data

I have duplicate rows in data coming from excel sheet. In the SSIS package, I am using Sort transformation where sorting is done in ascending order by the primary key column ID. But before removing the duplicates I want to see if the email column has email with my company's domain. If so, I want other rows removed than the one having this type of email addresses. What should I do? Please refer to the image attached below.
In the data above, I want to remove two rows of John where email address are john#gmail.com. In Maria's case, I want to remove two rows having email addresses maria#gmail.com, hence preserving rows having email addresses of the domain mycompany.com. If there are multiple rows for a user having email addresses of the domain mycompany.com, I want to keep any one row with the domain email address.
Suggest please.
you can do that in sql like Kobi showed, that may be easier. But if you prefer in ssis:
My test data:
Some points:
Conditional split: First you separate rows with mycompany and those without.
Sort and non_mycompany sort: sort both output on id and remove duplicates.
mycompany_multicast: create two copy of rows with mycompany
Merge join: left join rows without mycompany to rows with mycompany. Note the join order, the purpose is to get rows without mycompany and no matching id in rows with mycompany.
Conditional split1: take rows without mycompany and no matching id in rows with mycompany. you can check id from rows with mycompany, if the id is null then the row has no matching in rows with mycompany.
union all: union the final result
You can use a statement like this:
WITH T AS
(
SELECT ROW_NUMBER() OVER (partition BY id ORDER BY id, CASE WHEN email LIKE '%#mycompany.com' THEN 0 ELSE 1 END ) rn FROM persons
)
DELETE FROM T
WHERE rn > 1
It sort all rows by similar ID and email ( the prefered mail with #mycompany is the first of the list), then add a rownumber on each group, and to finish, it delete all rows wich have a rownumber superior to 1 ( theses are duplicates)
Here is the data to test:
CREATE TABLE Persons (
id NUMERIC(5),
NAME VARCHAR(200),
email VARCHAR(400) );
INSERT INTO persons
VALUES ( 100,
'john',
'john#mycompany.com'),
( 100,
'john',
'john#gmail.com'),
( 100,
'john',
'john#gmail.com');
INSERT INTO persons
VALUES ( 200,
'maria',
'maria#mycompany.com'),
( 200,
'maria',
'maria#gmail.com'),
( 200,
'maria',
'maria#gmail.com');
INSERT INTO persons
VALUES ( 300,
'jean',
'jean#mycompany.com'),
( 300,
'jean',
'jean#gmail.com'),
( 300,
'jean',
'jean#mycompany.com'),
( 300,
'jean',
'jean#mycompany.com');
INSERT INTO persons
VALUES ( 400,
'tom',
'tom#gmail.com'),
( 400,
'tom',
'tom#gmail.com');

multiple values insert join in mysql

I have a query submitting multiple items in table a.
For example:
insert into a values(id,name) (5,'john'),(6,'smith');
Though I also need to select some third value from other table with this id.
For example:
insert into a values(id,name,money) (5,'john',(select money from b where id=5)),(6,'smith',(select money from b where id=6));
The problem with the above is that it's a bit repetitive and also uses sub selects.
I wonder if it's possible to rewrite this using JOIN, (which should also reassure that there is a relation to the table b on that given id, lest it inserts a NULL).
Any ideas?
You're allowed only one SELECT for each INSERT so you need to re-write this to select multiple rows, not insert multiple values at once. Could you create a temporary table with the two sets of values in it and INSERT those with a JOIN?
CREATE TEMPORARY TABLE _tmp_a (id INT PRIMARY KEY, name VARCHAR(255));
INSERT INTO _tmp_a (5, 'john'), (6, 'smith')
INSERT INTO a (id, name, money) SELECT _tmp_a.id, _tmp_a.name, b.money FROM _tmp_a LEFT JOIN b ON b.id=_tmp_a.id

How do I select an ID by its result set or get a better data model?

I'm having a little trouble with a selection and I was figuring I could either look for help in solving the selection or find a better way to model my data. My tables are structured such that:
Table A( a_id, a2, a3, a4) pk: a_id
Table B( b_id, a_id, b3) pk: b_id, a_id
Table B can have any number of entries for each b_id, but only one for each b_id, a_id. I want to be able to reference the set for each b_id to check for their existence so that the set is not duplicated. For example, say I had a tuple in table C
Table C( c_id, b_id ) pk:c_id
with a reference to a b_id of 1. If another tuple was to be inserted into C which results in the insertion of the same set represented by a b_id of 1 into table B, I would want the new tuple to have a b_id of 1, as well, instead of inserting into table B and using that b_id.
edit:
See this sqlfiddle. Say I wanted to insert a new object which is represented by the following inserts:
INSERT INTO B VALUES (3, 1, 2);
INSERT INTO B VALUES (3, 2, 11;
INSERT INTO B VALUES (3, 3, 5);
INSERT INTO C VALUES (2,3);
How can I query the database (or restructure) so that I can realize that the sets in Table B represented by a b_id of 1 or 3 would be the same? I would then want to change my logic so that the object being inserted is represented by the single statement:
INSERT INTO C VALUES (2,1);
A real-world-like example:
Imagine a player in a game. Each player in the game is a tuple in Table C. Each player can where any number of clothes - Table B. A piece of clothing is defined by the part of the body it covers (Table A) and its color (b3)
I want to find the player wearing a specific set of clothing. Lets say that player wore that same set again - I shouldnt have to add more data to table B, I should be able to say he wore it last game, so we'll just reference that set of clothing
You need programming language eg PHP to loop all the conditions:
SELECT B.b_id, BB.num, count(*)
FROM B, (
SELECT b_id, count(*) num
FROM B
WHERE (a_id=1 and b3=1)
OR (a_id=2 and b3=11)
OR (a_id=3 and b3=5)
OR (a_id=4 and b3=6)
-- you need programming language eg php to loop all your set data here
GROUP BY 1
) as BB
WHERE B.b_id = BB.b_id
GROUP BY 1,2
HAVING count(*) = 4 and count(*) = BB.num
-- count(*) should be manually input to match above loop of OR
The sub-query get b_id and count, join back with B to match if they are exactly same.
Which means, you need to provide exactly same set of values [a_id, b3] to get correct value of b_id, not sub-set, not sup-set, exactly match.
In your example data, if you want to return bid=1, you need provide 3 sets of [aid,b3); if you want to return bid=2, you need to provide 4 sets of [aid,b3]
If b_id, a_id do not constitute a unique identity for table B then you don't really have a pk.
Anyways, adding UNIQUE (a_id, b3) to your table B definition will prevent the duplicate entry values:
INSERT INTO B VALUES (3, 1, 2);
INSERT INTO B VALUES (3, 2, 8);
INSERT INTO B VALUES (3, 3, 10);
A foreign key constraint would then prevent:
INSERT INTO C VALUES (2,3);
But I don't think this approach is sufficient for what you are trying to do. For example would the insert into B above be ok if the existing records for B included another record, say 1, 5, 20?