mysql SELECT JOIN and GROUP BY - mysql

Here is my query:
SELECT v2.mac, v2.userag_hash, v2.area, count(*), count(distinct v2.video_id)
FROM video v2 JOIN (
SELECT distinct v.mac, v.userag_hash
from video v
WHERE v.date_pl >= '2012-01-30 00:00' AND
v.date_pl <= '2012-02-05 23:55'
ORDER BY rand() LIMIT 50
) table2
ON v2.mac = table2.mac AND
v2.userag_hash = table2.userag_hash AND
v2.date_pl >= '2012-01-30 00:00' AND
v2.date_pl <= '2012-02-05 23:55'
GROUP BY v2.mac, v2.userag_hash
I have one table "video" in the database, it contains several thousand users' data, now I want to randomly select 50 users and calculate based on the selected rows, (each user is identified by unique combination of ), this query's result is:
usermac1, userag_hash1, area1, 10, 5
usermac2, userag_hash2, area2, 20, 8
...
But if I don't use "GROUP BY" in the end of the query, then it will return only one row:
usermac, userag_hash, areax, 1500, 700 (don't know what this row stands for)
I am wondering if the "1500, 700" is the sum of the last two columns of the previous results. like 1500 = 10+20+... 700 = 5+8+...

Based on the fact that you have only one aggregate function (count) and used on 2 columns, and you can run it without GROUP BY at all, you must be using the non-standards compliant MySQL.
SELECT v2.mac, v2.userag_hash, v2.area, count(*), count(distinct v2.video_id)
...
Whatever your data is, MySQL will return one row when you use aggregate functions, which is:
<undefined value>, <undefined value>, count of all rows, count of rows where v2.video_id is distinct (and probably non null).
So I think you have 1500 rows, and 700 distinct values of v2.video_id, or 700 non-null distinct values. To test this null idea, try:
count(distinct IFNULL(v2.video_id,'nullvaluehere'))
which will convert nulls to non-null so they will be included.
The "undefined values" could be the first row, last row, first where something is non null, first in an index, first in some cache, etc. There is no definition of what should happen when you write an invalid query.
Every SQL database I'm aware of other than MySQL will give you an error message and not even run the query. For the query to be valid, it must have all non-aggregated columns in the group by. eg. mac and userag_hash must both be in group by.

Related

group by id in descending order in select statement

I have MySQL statement where I'm trying to select distinct rows with the latest date.
This is the SQL statement:
SELECT st.seno, tc.pl, tc.sno, st.val1, st.val2, st.date
FROM tc
LEFT JOIN st ON tc.seno = st.seno AND tc.pl = st.pl AND st.seno = 1304239136
WHERE tc.pl = 1
ORDER BY st.date DESC
This is the data returned:
I want to distinct by unique 'tc.sno', therefore I only want the first and third row returned as the middle date is earlier then the top one for the 'sno' 3. The 'sno' could always be different so I do not want to hardcode those numbers. I was trying to use 'GROUP BY' but it just picks the first value of the date, tried using 'HAVING' and combining select statements together but can't seem to get it working. (the val1 and val2 in row 2 could be random it is just a coincidence that they exact in this example)

MIN() causing MySQL to return all-NULL row instead of 0 rows

I have two tables, services and extraFees, related 1xn through services.id = extraFees.serviceId. My problem is that when I execute the following query for a non-existing combination of s.id and s.category, I still get 1 row(s) returned, with all fields NULL.
SELECT 100 + (s.feeRate * MIN(ef.extra)) AS extraFees FROM services s
LEFT JOIN extraFees ef ON s.id=ef.serviceId WHERE s.id=12 AND s.category='PRG'
I know the culprit is MIN(), because if I replace it with a number or NULL, I get 0 row(s) returned, which is what I want.
SELECT 100 + (s.feeRate * 5) AS extraFees FROM services s
LEFT JOIN extraFees ef ON s.id=ef.serviceId WHERE s.id=12 AND s.category='PRG'
SELECT 100 + (s.feeRate * NULL) AS extraFees FROM services s
LEFT JOIN extraFees ef ON s.id=ef.serviceId WHERE s.id=12 AND s.category='PRG'
Why does this happen and how can I avoid it?
An aggregate function will return a result. Try this:
CREATE TABLE x(id int);
SELECT MIN(id) FROM x
and you will get 1 row: NULL. You can wrap in a subquery if you want to
ignore the NULL result:
SELECT y.id
FROM (SELECT MIN(id) AS id FROM x) y
WHERE y.id IS NOT NULL
or make use of the HAVING clause.
This happens because, MIN() returns NULL if there were no matching rows.
How can I avoid it?
Try to wrap the call to MIN() in a sub-query to filter out the NULL.
EDIT:
I've read that doc, but I don't see how the NULL returned by MIN() is
different from the NULL I manually specify in my last query.
I suspect this is because your second set of queries are not aggregate queries.
Aggregate queries will return the result of the aggregation on the rows.
If there are no rows then the query returns NULL.
On the other hand the second set of queries are not aggregate queries, those queries can return an 'empty set' as a logical result.

Get number of rows returned by subquery along with the columns returned by subquery

I find it really annoying to be not able to get the number of rows without having to use group by. I just need to get the "Total count" that my subquery returned.
Here is what my subquery looks like:
select sales_flat_order.increment_id, sales_flat_order.created_at, sales_flat_order.status, dispatch.dispatch_date,
DATEDIFF(TO_DATE(dispatch.dispatch_date), TO_DATE(sales_flat_order.created_at)) as delay
FROM
magentodb.sales_flat_order
LEFT OUTER JOIN erpdb.dispatch
ON
sales_flat_order.increment_id == dispatch.order_num
where
TO_DATE(created_at) >= DATE_SUB(current_date(),6)
AND
TO_DATE(created_at) <= DATE_SUB(current_date(), 3)
AND
sales_flat_order.status NOT IN ('canceled', 'exchange', 'rto', 'pending_auth', 'pending_payment' ,'partial_refund','refund', 'refund_cash', 'partial_refund_cash', 'holded')
)
AS TempFiltered
Now, I add 1 extra WHERE clause in my outer query so that it returned "lesser" number of rows, let's call this column y .
I then require to take percentage of x to y(i.e number of rows returned by outer query to subquery)
I do not wan to repeat my subquery only to get count of the rows. HOw do I get it?
This is what I have so far: But ofcourse it is wrong. I can not get count of all my rows without having to exclude select columns or using them in group by. HOw do I resolve this?
SELECT tempfiltered.delay, count(*) as countOfOrders,(100*count(*))/tempfiltered.Total) over () as percentage
FROM
(
select count(*) as Total, sales_flat_order.increment_id, sales_flat_order.created_at, sales_flat_order.status, dispatch.dispatch_date,
DATEDIFF(TO_DATE(dispatch.dispatch_date), TO_DATE(sales_flat_order.created_at)) as delay
FROM
magentodb.sales_flat_order
LEFT OUTER JOIN erpdb.dispatch
ON
sales_flat_order.increment_id == dispatch.order_num
where
TO_DATE(created_at) >= DATE_SUB(current_date(),6)
AND
TO_DATE(created_at) <= DATE_SUB(current_date(), 3)
AND
sales_flat_order.status NOT IN ('canceled', 'exchange', 'rto', 'pending_auth', 'pending_payment' ,'partial_refund','refund', 'refund_cash', 'partial_refund_cash', 'holded')
)
AS TempFiltered
Where
DATEDIFF(TO_DATE(TempFiltered.dispatch_date), TO_DATE(TempFiltered.created_at)) > 1
GROUP BY tempfiltered.delay
ORDER BY tempfiltered.delay
You could change the subquery into a SELECT INTO query, and put the data in a temporary table, and use that in the main query, and separately just select count(*) of that temporary table. That should pretty much satisfy your requirement.

mysql get difference instead of SUM

I have the following query :
SELECT SUM(P_QTY)
FROM rankhistory
WHERE P_ID= '1'
AND RH_DATE>=1438556400
AND RH_DATE<1438642800
The above query returns 268
The result set contains two elements of P_QTY which are 160 and 108
Now what I want is be able to receive the difference instead of the sum, so what I want my query to return is 52, how can I achieve that through sql query?
Please note that the subquery can return more than one result, and the intended is get the total change. For example if query returns 168 160 150, the result should be 18.
There's no aggregate function for difference, but since you know exactly which rows to use, you can select each value as its own subquery, then subtract the two columns from one another in a single select statement.
SELECT a.op, b.op, a.op - b.op as diff
FROM (SELECT 10 as op) a
JOIN (SELECT 8 as op) b
Expressed in accordance with your schema, it would probably look like this:
SELECT a.op, b.op, a.op - b.op as diff
FROM (SELECT P_QTY as op FROM rankhistory WHERE P_QTY = 160) a
JOIN (SELECT P_QTY as op FROM rankhistory WHERE P_QTY = 108) b
To use this approach regularly in an application, however, you'll want to handle it based on ID's or something else easily selectable and meaningful.
It sounds like you want something else, though. Perhaps you're interested in the difference between max and min during a date range?
SELECT MAX(P_QTY) - MIN(P_QTY) as diff
FROM rankhistory
WHERE rh_date BETWEEN '1438556400' AND '1438642800'

How SELECT is executed before GROUP BY in this query?

I was looking at the order in which the SQL is executed and I found out that it is:
FROM,
WHERE,
GROUP BY,
HAVING,
SELECT,
ORDER BY
But in the below query the "_index" is used in the GROUP BY, How is this possible?
SELECT COUNT(ab.id) AS count, COUNT(ab.id)/365.24 AS average,
IF((SUBSTR(ab.begin, 1, 7) = '2014-08'), '2014-08-18 00:00:00.0 CEST',
IF((SUBSTR(ab.begin, 1, 7) = '2014-09'), '2014-09-18 00:00:00.0 CEST',
IF((SUBSTR(ab.begin, 1, 7) = '2014-10'), '2014-10-18 00:00:00.0 CEST',
IF((SUBSTR(ab.begin, 1, 7) = '2014-11'), '2014-11-18 00:00:00.0 CET',
'0')))) AS _index
FROM active_begin AS ab
INNER JOIN asources AS a ON a.id = ab.asource AND a.unit IN (4, 3, 1)
WHERE (1408226400000 <= ab.begin_time AND ab.begin_time < 1417388400000)
GROUP BY _index
PS. refr this for the order: http://www.bennadel.com/blog/70-sql-query-order-of-operations.htm
Thanks in advance.
I think this (the ability to use a column alias defined in the SELECT clause for the GROUP BY) is a probably non-standard extension that some databases allow (but not all).
You are supposed to repeat the exact definition again or wrap everything in a sub-select.
Lucky if your database lets you get away with it.
There is no "order" to how SQL is executed. The SQL optimizer can choose to execute the operations in any order it decides is best for the query.
There is an order to how the clauses are interpreted. So, table aliases and columns are defined in the from clause -- this is interpreted first. Then the subsequent clauses are interpreted. In general, this explains why you cannot use a column alias defined in a select in a where clause, because the where clause is interpreted first.