I have a table that contains random data against a key with duplicate entries. I'm looking to remove the duplicates (a projection as it is called in relational algebra), but rather than discarding the attached data, sum it together. For example:
orderID cost
1 5
1 2
1 10
2 3
2 3
3 15
Should remove duplicates from orderID whilst summing each orderID's values:
orderID cost
1 17 (5 + 2 + 10)
2 6
3 15
My assumption is I'd use SELECT DISTINCT somehow, but I don't know how I'd go about doing so. I understand GROUP BY might be able to do something but I am unsure.
This is a very basic aggregation:
SELECT orderId, SUM(cost) AS cost
FROM MyTable
GROUP BY orderId
This says, for each "orderId" grouping, sum the "cost" field and return one value per group.
You can use the group by clause to get one row per distinct values of the column(s) you're grouping by - orderId in this case. You can the apply an aggregate function to get a result of the columns you aren't grouping by - sum, in this case:
SELECT orderId, SUM(cost)
FROM mytable
GROUP BY orderId
Related
I have been practising SQL, and came across this behaviour i couldnt explain. ( I am also the one who asked this question : Over() function does not cover all rows in the table) -> its a different problem.
Suppose i have a table like this
MovieRating table:
movie_id
user_id
rating
created_at
1
1
3
2020-01-12
1
2
4
2020-02-11
1
3
2
2020-02-12
1
4
1
2020-01-01
2
1
5
2020-02-17
2
2
2
2020-02-01
2
3
2
2020-03-01
3
1
3
2020-02-22
3
2
4
2020-02-25
What I am trying to do, is to rank the movie by rating, which i have this SQL query:
SELECT
movie_id,
rank() over(partition by movie_id order by avg(rating) desc) as rank_rate
FROM
MovieRating
From my previous question, i learnt that the over() function will operate in a window selected by the query, basically the window this query returns:
SELECT movie_id FROM MovieRating
So I would expect to see at least 3 rows here, for id 1, 2 and 3.
The result is however just one row:
{"headers": ["movie_id", "rank_rate"], "values": [[1, 1]]}
Why is that ? Is something wrong with my understanding regarding how over() function works ?
You need an aggregation query and use RANK() window function on its results:
SELECT movie_id,
AVG(rating) AS average_rating, -- you may remove this line if you don't actually need the average rating
RANK() OVER (ORDER BY AVG(rating) DESC) AS rank_rate
FROM MovieRating
GROUP BY movie_id
ORDER BY rank_rate;
See the demo.
Your query is an aggregation query without a group by clause and this means that it operates on the whole table and not to each movie_id. Such queries return only 1 row with the result of the aggregation.
When yo apply RANK() window function, it will operate on that single row and not on the table.
I think you mean to get one row for each movie, with its average rating.
You should use GROUP BY, not a window function:
SELECT movie_id, AVG(rating) AS avg_rating
FROM MovieRating
GROUP BY movie_id
ORDER BY avg_rating DESC;
https://www.db-fiddle.com/f/o9qLFbJEwhaHDWoTS9Qfwp/1
The reason you only got one row is that when you use an aggregate function like AVG(), that implicitly makes the query into an aggregating query. The result of the query is one row per group.
https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html says:
If you use an aggregate function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
In other words, the whole table is considered one "group" if you use AVG() but don't specify a GROUP BY expression. Because the whole table is a single group, the result is one row.
Windows defined by windowing functions are not the same as groups defined by aggregate functions. The window functions are applied after the rows have been reduced by aggregation. Since there was only one group and therefore one row in your result, the rank was 1.
I am still getting started learning Access.
I have 3 tables. Table one has Date as primary key and will have all dates. Tables 2 and 3 (Table 3 is mislabeled in the example image as a second Table 2) will both have 2 columns, Date and Amount. Tables 2 and 3 could have multiple rows with the same date (different amounts) and some may miss dates. I am looking for an output query that would have 1 row for every date in table 2 & 3 that has an amount (some dates may not have an amount in either table) and sums all those amounts for that date in 1 row. Below are example tables and the desired output query. Thanks so much for the newbie help!
I now have this code (Note that I have eliminated Table 1):
SELECT Table2.Dat, Sum(Table2.Amount) AS [Sum Of Amount], Sum(Table2.Tax) AS [Sum Of Tax]
FROM Table2
GROUP BY Table2.Dat;
UNION ALL SELECT Table3.Dat, Sum(Table3.Amount) AS [Sum Of Amount], Sum(Table3.Tax) AS [Sum Of Tax]
FROM Table3
GROUP BY Table3.Dat;
This sums the amounts from same dates for each seperate table, but does not sum the dates for both tables. I imagine it is another GROUP function but I have not been successful in forming it correctly.
Current Results from code above
Try below query.
SELECT tt.mDate AS TransactionDate, Sum(tt.SumOfAmount) AS AmountTotal
FROM (SELECT Table2.tDate as mDate, Sum(Table2.Amount) AS SumOfAmount
FROM Table2
GROUP BY tDate
UNION
SELECT Table3.tDate As mDate, Sum(Table3.Amount) AS SumOfAmount
FROM Table3
GROUP BY tDate) AS tt
GROUP BY tt.mDate;
I have 2 columns having users id participating in a transaction, source_id and destination_id. I'm building a function to sum all transactions grouped by any user participating on it, either as source or as destination.
The problem is, when I do:
select count (*) from transactions group by source_id, destination_id
it will first group by source, then by destination, I want to group them together. Is it possible using only SQL?
Sample Data
source_user_id destination_user_id
1 4
3 4
4 1
3 2
Desired result:
Id Count
4 - 3 (4 appears 3 times in any of the columns)
3 - 2 (3 appears 2 times in any of the columns)
1 - 2 (1 appear 2 times in any of the columns)
2 - 1 (1 appear 1 time in any of the columns)
As you can see on the example result, I want to know the number of times an id will appear in any of the 2 fields.
Use union all to get the id's into one column and get the counts.
select id,count(*)
from (select source_id as id from tbl
union all
select destination_id from tbl
) t
group by id
order by count(*) desc,id
edited to add: Thank you for clarifying your question. The following isn't what you need.
Sounds like you want to use the concatenate function.
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_concat
GROUP BY CONCAT(source_id,"_",destination_id)
The underscore is intended to distinguish "source_id=1, destination_id=11" from "source_id=11, destination_id=1". (We want them to be 1_11 and 11_1 respectively.) If you expect these IDs to contain underscores, you'd have to handle this differently, but I assume they're integers.
It may look like this.
Select id, count(total ) from
(select source_id as id, count (destination_user_id) as total from transactions group by source_id
union
select destination_user_id as id , count (source_id) as total from transactions group by destination_user_id ) q group by id
I just wanted to add different columns from different tables... Has anyone any idea on how to do that?
Consider I have 3 tables as below
tv sales
AC sales
cooler sales
And the tables data as follows
1)Tv Sales
Id Date NoOfSales Totalamount
1 03/05/2014 10 10000
2 04/05/2014 20 20000
3 05/05/2014 30 30000
2)Ac Sales
Id Date NoOfSales Totalamount
1 03/05/2014 10 50000
2 04/05/2014 20 60000
3 05/05/2014 30 70000
3)cooler Sales
Id Date NoOfSales Totalamount
1 03/05/2014 10 30000
2 04/05/2014 20 60000
3 05/05/2014 30 70000
Now I want to add the "Totalamount" from all the tables for a particular "date"
for example I need totalamount on 03/05/2014 as 90000
In MySQL, the easiest way to do this is with union all and aggregation:
select date, sum(totalamount) as TotalSales
from ((select date, totalamount from TvSales
) union all
(select date, totalamount from AcSales
) union all
(select date, totalamount from CoolerSales
)
) t
group by date;
The reason you want to use union all is in case the dates are different in the various tables. A join makes it possible to lose rows.
Second, having three tables with the same format is an indication of poor database design. You should really have one table with the sales and a column indicating which type of product it refers to.
You could solve your problem by making a union of the information you want to aggregate on the different tables and them sum the amounts. This would look like:
SELECT t.Date,SUM(t.Totalamount)
FROM
(
SELECT Date,Totalamount
FROM tvSales
UNION ALL
SELECT Date,Totalamount
FROM acSales
UNION ALL
SELECT Date,Totalamount
FROM coolerSales
) t
WHERE t.Date='03/05/2014'
GROUP BY t.Date
It is important that the fields of the union have the same name and type. In case they haven't the same name you should create common aliases for the 2 columns across the 3 select queries and then work with these aliases on the main query. Also the UNION should be performed including the ALL keyword in order to avoid eliminating duplicate records across the three tables.
My query is given below:
select vend_id,
COUNT(*) as num_prods
from Products
group by vend_id;
Please tell me how does this part work - select vend_id, COUNT(vend_id) as opposed to select COUNT(vend_id)?
select COUNT(vend_id)
That will return the number of rows where the vendor ID is not null
select vend_id, COUNT(*) as num_prods
from Products
group by vend_id
That will group the elements by Id's, and return, for each Id, how many rows do you have.
An example:
ID name salary start_date city region
----------- ---------- ----------- ----------------------- ---------- ------
1 Jason 40420 1994-02-01 00:00:00.000 New York W
2 Robert 14420 1995-01-02 00:00:00.000 Vancouver N
3 Celia 24020 1996-12-03 00:00:00.000 Toronto W
4 Linda 40620 1997-11-04 00:00:00.000 New York N
5 David 80026 1998-10-05 00:00:00.000 Vancouver W
6 James 70060 1999-09-06 00:00:00.000 Toronto N
7 Alison 90620 2000-08-07 00:00:00.000 New York W
8 Chris 26020 2001-07-08 00:00:00.000 Vancouver N
If you run this query, you will get One row for city, and you can apply a function (in this case, count) to that row. So, for each city, you will get the count of rows. You can also use other functions.
SELECT City, COUNT(*) as Employees
FROM Employee
GROUP BY City
The result is:
City Employees
--------- ---------
New York 3
Toronto 2
Vancouver 3
as you can compare the numbers of rows for each city
When you simply select COUNT(vend_id) with no GROUP BY clause, you get one row with the total count of rows with a non-NULL vendor ID - that last bit is important and is one reason why you may prefer COUNT(*) so as to avoid "missing" rows. Some people may argue that COUNT(*) is somehow less efficient but that's true in no DBMS I've used. In any case, if you are using a brain-dead DBMS, you can always try COUNT(1).
When you group by vend_id, you get one row per vendor ID with the count being the number of rows for that ID.
In step-by-step detail (conceptually, though there are almost certainly efficiencies to be gained by optimising), the first query:
SELECT COUNT(vend_id) AS num_prods FROM products
Get a list of all rows in products.
Count the rows where vend_id is not NULL, then deliver one row containing that count in the single num_prods column.
For the grouping one:
SELECT vend_id, COUNT(vend_id) AS num_prods FROM products GROUP BY vend_id
Get a list of all rows in products.
For each value of vend_id:
Count the rows matching that vend_id where vend_id is not NULL, then deliver one row containing the vend_id in the first column and that count in the second num_prods column.
Note that those rows with a null vend_id do not contribute to the aggregate function (count in this case).
In the first query, that simply means they don't appear in the overall total.
In the second case, it means that the output row still exists but the count will be zero. That's another good reason to use COUNT(*) or COUNT(1).
select vend_id will only select the vend_id field, where select * will select all the fields
select vend_id, COUNT(vend_id) and select COUNT(vend_id) gives same result for the count column as long as you use group by vend_id. when you use select vend_id, COUNT(vend_id) you must group by it using vend_id