Finding max without aggregate operation in relational algebra - relational-database

How can I find the maximum without using aggregate operation in relational algebra?
The schema of Database is as follows
Item(IName, Brand)
Shop(SName, City, Address)
Sells(IName, SName, Price)
How can I find the item name and snop name which is sold in maximum price without using aggregate function in relational algebra? I know to solve this with using aggregate functions but not sure without using it.

In SQL, but not in relational algebra
You can sort by Price descendingly and limit the results to 1.
Example in MySQL
SELECT IName, SName, Price
FROM Sells
ORDER BY Price DESC
LIMIT 0, 1;
The highest Priced item will be the first record in the result and due the to limit you ignore anything else.
Alternatively, you can find the r1 record which has no r2 record with higher price:
SELECT r1.IName, r1.SName, r1.Price
FROM Sells r1
LEFT JOIN Sells r2
ON r1.Price < r2.Price
WHERE r2.Price IS NULL
LIMIT 0, 1;
The query above selects the items for which we do not find a higher priced pair. LEFT JOIN allows the right-hand-side to be nonexistent, so any r1 record will have at least a pair, but those r1 records that have the greatest price will have all nulls in r2. Equating for a column with null ensures that we filter out all r1 records that had a higher priced pair. We still have a LIMIT clause, because in the case of a tie, i.e. more records share the same maximum Price then we break the tie by selecting the first item that matched the criteria.
By default, the first approach I have shown should be preferred, but, if it is not an option for some reason, then you can go by the second approach.
EDIT
Relational algebra
#philipxy pointed out in the comment section that relational algebra has no null or sorting or limit. As a result, in relational algebra one would need to write an expression that searches among the records that which has a price for which does not exist another record whose price is higher.

Related

Mysql - Sum by other column value

Here's the problem. I have a long but not very complex query:
SUM(x.value)
FROM valuetable AS x
LEFT JOIN jointable_1 AS y
LEFT JOIN jointable_2 AS z
etc
...
GROUP BY y.id, z.id
There are n amount of left joins, and I need to keep it this way, for a new left join must be available any time. I obviously get n value dublicates into SUM, since jointables can have multiple results, and I can not break any of them into subquery for flexible WHERE reasons. I need only one x.value per x.id into SUM, thats also obvious.
-I cannot add x.id to GROUP BY, since I so need one row to have sum per y.id.
-I cannot use the calculation:
SUM(x.value)*COUNT(DISTINCT x.id)/COUNT(*)
since there can be any number of x.values in sum, as different x.id-s have different amount of joins.
-I cannot go for DISTINCT x.value, since any x.id can have any x.value and they can contain same value.
-I don't know how to create a subquery for sum, since I cannot use the aggregated value (for example GROUP_CONCAT(DISTINCT x.id)) in subquery, or can I?
Anyways, thats it. I know I can rearrange the query(subqueries instead of joins, different from), but I want to leave it as the last resort. Is there a way to achieve what I want?
Sorry to say, there's no general way to do what you want without subqueries (or maybe views).
A bit of jargon: "Cardinality". For our purpose it's the number of rows in a table or a result set. (For our purpose a result set is a kind of virtual table.)
For aggregate functions like SUM(col) and COUNT(*) to give good results, we must attend to the cardinality of the table being summarized. This kind of thing
SELECT DATE(sale_time) sale_date,
store_id,
SUM(sale_amount) total_sales
FROM sale
GROUP BY DATE(sale_time), store_id
summarizes the same cardinality of result table as the underlying table, so it generates useful results.
But, if we do this
SELECT DATE(sale.sale_time) sale_date,
sale.store_id,
SUM(sale.sale_amount) total_sales,
COUNT(promo.promo_id) promos
FROM sale
LEFT JOIN promo ON sale.store_id = promo.store_id
AND DATE(sale.sale_time) = promo.promo_date
GROUP BY DATE(sale.sale_time), sale.store_id
we wreck the cardinality of the summarized result set. This will never work unless we know for sure that each store had either zero or one promo records for each given day. Why not? The LEFT JOIN operation affects the cardinality of the virtual table being summarized. That means some sale_amount values my show up in the SUM more than once, and therefore the SUM won't be correct, or trustworthy.
How can you prevent LEFT JOIN operations from messing up your cardinality? Make sure your LEFT JOIN's ON clause matches each row on the right to exactly zero rows, or exactly one row, on the left. That is, make sure you (virtual) tables on either side of the JOIN have appropriate cardinality.
(In entity-relationship jargon, your SUM fails because you join two entities with a one-to-many relationship before you do the sum.)
The theoretically cleanest way to do it is to perform both aggregate operations before the join. This joins two virtual tables in a way that the LEFT JOIN is either one-to-none or one-to-one
SELECT sales.sale_date,
sales.store_id,
sales.total_sales,
promos.promo_count
FROM (
SELECT DATE(sale_time) sale_date,
store_id,
SUM(sale_amount) total_sales
FROM sale
GROUP BY DATE(sale_time), sale_store
) sales
LEFT JOIN (
SELECT store_id,
promo_date
COUNT(*) promo_count
FROM promo
GROUP BY store_id, promo_date
) promos ON sales.store_id = promos.store_id
AND sales.sale_date = promo.promo_date
Although this SQL is complex, most servers handle this kind of pattern efficiently.
Troubleshooting tip: If you see SUM() ... FROM ... JOIN ... GROUP BY all at the same level of a query, you may have cardinality problems.

Understanding correlation in mysql

I have a table with duplicate IDs representing a person who has placed an order. Each of these orders has a date. Each order has a status code from 1 - 4. 4 means a cancelled order. I am using the following query:
SELECT
personID, MAX(date), status
FROM
orders
WHERE
status = 4
GROUP BY
personID
The problem is, while this DOES return a unique record for each person with their most recent order date, it does NOT give me the correct status. In other words, I assumed that the status would be correctly correlated to the MAX(date) and it is not. It simply pulls, seemingly at random, one of the statuses from one of the orders. Can I add specificity to say, in basic terms, give me the EXACT status from the same record as whatever the MAX(date) is.
Unfortunately, there is no simple way to get what you want. Most other RDBMS vendors don't even consider queries using aggregate functions valid unless all non-aggregated result fields are in the GROUP BY. The general solution for these kinds of questions usually involves a subquery to get the "last" records, which is then joined to the original table to get those rows.
Depending on the structure of your data this may or may not be possible. For instance, if you have multiple rows with the same personID and date there is no way to determine from those alone which one's status should be used.
To get result you want you could use:
SELECT personId, date, status
FROM orders
WHERE (personID,date) IN (SELECT personID, MAX(date)
FROM orders
-- WHERE status = 4
GROUP BY personID);
As for:
It simply pulls, seemingly at random, one of the statuses from one of the orders.
It works as intended:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate
Related: Group by clause in mySQL and postgreSQL, why the error in postgreSQL?

SUM of column that has been GROUPED BY

I am using the following query:
SELECT mgap_growth
FROM mgap_orders
WHERE account_manager_id = '159795'
GROUP BY mgap_ska_report_category
mgap_growth is a column with identical amounts that differ only per mgap_ska_report_category, which is the reason for the grouping. Now hat I have normalized the individual amounts per category, how can I use SUM to tally their total?
Here is a screenshot of the data:
I only need the SUM of the growth amounts per category, not of all of the mgap_growth records, but Im unsure as to how to SUM after the grouping.
Thanks!
EDIT FOR ADDITIONAL QUERY:
Let me throw another issue into the mix: we know I need to SUM only once per category, but what if I needed to GROUP BY CUSTOMER? I just found out that there are multiple customers in the data, each is duplicated per growth record, but differ by category. I really need to use two groupings, one for category to single out and SUM the growth amount and then another the single out the customer.
Here is an image describing the data:
If I understand you correctly, you need to sum the results from the subquery.
SELECT SUM(mgap_growth) AS total_mgap_growth
FROM (SELECT mgap_growth
from mgap_orders
WHERE account_manager_id = '159795'
GROUP BY mgap_ska_report_category) AS x
This should should show the total growth per category for that particular account manager:
SELECT sum(mgap_growth) AS Growth, mgap_ska_report_category as Category
FROM mgap_orders
WHERE account_manager_id = '159795'
GROUP BY mgap_ska_report_category
Rather than thinking of doing the SUM after the grouping, you can do the two together in the one statement. You were 99% of the way there with what you had already.
To answer your additional question in the comment, you can add another column to group by. The order that you list them in the group by section is the important part. The overall grouping comes first. So assuming 'Customer' is the customer column name you would do this:
SELECT mgap_ska_report_category as Category, Customer, sum(mgap_growth) AS Growth
FROM mgap_orders
WHERE account_manager_id = '159795'
GROUP BY mgap_ska_report_category, customer
WITH ROLLUP
Note that changing the SELECT columns in the top line was just for aesthetics, you can put them in any order and will get the same data, but this will be the easiest to read.
This shows the growth per customer by category for that particular account manager.
Edited again to add WITH ROLLUP. This will give you the totals per category as well. Try it with and without the WITH ROLLUP to see the how it changes things.

How do I use MAX() to return the row that has the max value?

I have table orders with fields id, customer_id and amt:
SQL Fiddle
And I want get customer_id with the largest amt and value of this amt.
I made the query:
SELECT customer_id, MAX(amt) FROM orders;
But the result of this query contained an incorrect value of customer_id.
Then I built such the query:
SELECT customer_id, MAX(amt) AS maximum FROM orders GROUP BY customer_id ORDER BY maximum DESC LIMIT 1;
and got the correct result.
But I do not understand why my first query not worked properly. What am I doing wrong?
And is it possible to change my second query to obtain the necessary information to me in a simpler and competent way?
MySQL will allow you to leave GROUP BY off of a query, thus returning the MAX(amt) in the entire table with an arbitrary customer_id. Most other RDBMS require the GROUP BY clause when using an aggregate.
I don't see anything wrong with your 2nd query -- there are other ways to do it, but yours will work fine.
Some versions of SQL give you a warning or error when you select a field, have an aggregate operator like MAX or SUM, and the field you are selecting does not appear in GROUP BY.
You need a more complicated query to fetch the customer_id corresponding to the max amt. Unfortunately SQL is not as naive as you think. Once such way to do this is:
select customer_id from orders where amt = ( select max(amt) from orders);
Although a solution using joins is likely more performant.
To understand why what you were trying to do doesn't make sense, replace MAX with SUM. From the stance of how aggregate operators are interpreted, it's a mere coincidence that MAX returns something that corresponds to an actual row. SUM does not have this property, for instance.
Practically your first query can be seen as if it were GROUP BY-ed into a big single group.
Also, MySQL is free to choose each output value from different source rows from the same group.
http://dev.mysql.com/doc/refman/5.7/en/group-by-extensions.html
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause.
The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The problem with MAX() is that it will select the highest value of that specified field, considering the specified field alone. The other values in the same row are not considered or given preference for the result at any degree. MySQL will usually return whatever value is the first row of the GROUP (in this case the GROUP is composed by the entire table sinse no group was specified), dropping the information of the other rows during the agregation.
To solve this, you could do that:
SELECT customer_id, amt FROM orders ORDER BY amt DESC LIMIT 1
It should return you the customer_id and the highest amt while preserving the relation between both, because no agregation was made.

SQL combine COUNT and AVG query with SELECT

I need to get the average rating and the total number of ratings for a particular user and then select all single ratings (rating_value, rating_text, creator) as well:
$rating_query = mysql_query("SELECT COUNT(1) as rating_count
,AVG(rating_value), rating_value, rating_text, creator
FROM user_rating WHERE rated_user = $user_id");
This query would return the COUNT(1) result and the AVG(rating_value) for every row, but I only need those values once.
Is there any way to do this without making 2 separate queries?
There may be a trick I'm not aware of, but I don't think that's possible to do in a single query. You could try using a GROUP BY clause if that would make sense for you, but I'm guessing it probably doesn't from the column names you're using. Any relation requires a single atomic value at any given row and column, even if that value is null. What you are requesting is that columns 1 and 2 in every row but the first have no value, and again I don't think this is possible.