How to Count the number of rows within each distinct group?

How to Count the number of rows within each distinct group? - mysql

Given the following MySQL table:
| id | category | Hour | quantity|
| 0 | Sunday | 10 | 32 |
| 0 | Sunday | 11 | 19 |
| 0 | Sunday | 12 | 48 |
| 0 | Sunday | 19 | 7 |
| 1 | Monday | 09 | 45 |
| 1 | Monday | 10 | 17 |
| 1 | Monday | 12 | 18 |
| 2 | Tuesday | 08 | 16 |
| 2 | Tuesday | 09 | 39 |
| 2 | Tuesday | 10 | 24 |
| 2 | Tuesday | 11 | 37 |
| 2 | Tuesday | 12 | 40 |
I need to compute a fifth column which must be the division of "quantity" by the number of rows of id: for 0 there are 4 rows, for 1 3 rows, for 2 5 rows.
| id | category | Hour | quantity| avg |
| 0 | Sunday | 10 | 32 | 8 |
| 0 | Sunday | 11 | 19 | 4.75 |
| 0 | Sunday | 12 | 48 | 12 |
| 0 | Sunday | 19 | 7 | 1.75 |
| 1 | Monday | 09 | 45 | 15 |
| 1 | Monday | 10 | 17 | 5.7 |
| 1 | Monday | 12 | 18 | 6 |
| 2 | Tuesday | 08 | 16 | 3.2 |
| 2 | Tuesday | 09 | 39 | 7.8 |
| 2 | Tuesday | 10 | 24 | 4.8 |
| 2 | Tuesday | 11 | 37 | 7.4 |
| 2 | Tuesday | 12 | 40 | 8 |
How can I get the result in a MySQL query?
The first table, is the result of this query:
select id, category, Hour, count(*) as quantity
FROM table_1
GROUP by id, Hour ORDER by id, Hour;
This what I tried, in order to get the number of rows for the occurrence of each id, however I get a large number, the count of id=0 occurrences instead of id=0 rows in the previous query:
select id, Hour, count(id) as q
FROM table_1
GROUP by id
This is mySql 5.6.

This is really quite ugly and cumbersome, but it was the only way to get the results without having a primary key to work with:
SELECT
t.id,
t.category,
t.hour,
quantity,
ROUND(quantity/count,2) AS avg
FROM table_1 t
JOIN (SELECT
id, Hour, count(*) as quantity
FROM table_1
GROUP by id, category, Hour) AS qty
ON t.id = qty.id AND t.hour = qty.hour
JOIN (SELECT
id, count(distinct hour) as count
FROM table_1
GROUP BY id) as counts
ON t.id = counts.id
GROUP BY id, hour;
It seems to be working locally for me, at least, with guessing what your original dataset looks like.
There may well be a simpler way, however.
Edit: On second check, the 'quantity' subquery doesn't really add much that I can see, so this can be replaced with a 'count(*)', making a more optimal query:
SELECT
t.id,
t.category,
t.hour,
count(*) as quantity,
ROUND(count(*)/count,2) AS avg
FROM table_1 t
JOIN (SELECT
id, count(distinct hour) as count
FROM table_1
GROUP BY id) as counts
ON t.id = counts.id
GROUP BY id, hour;

You need to do the counting in a subquery that just groups by id. Join the subquery to the main query and do the division.
SELECT id, category, hour, COUNT(*) AS quantity, COUNT(*)/count AS avg
FROM table_1
JOIN (SELECT id, COUNT(DISTINCT hour) AS count
FROM table_1
GROUP BY id) AS counts
ON table_1.id = counts.id
GROUP BY table_1.id, table_1.hour
ORDER BY table_1.id, table_1.hour

Related

Displaying groups having max number of occurence

t_table looks like:
+-----------+---------+--------------+------------------+-----------------------+----------------------------------+
| pk_IdLoan | fk_IdCar| fk_IdCustomer| fk_Source_Agency | fk_Destination_Agency | RentalDate | DeliveryDate | Cost |
+-----------+---------+--------------+------------------+-----------------------+----------------------------------+
I wrote a query:
(SELECT fk_IdCustomer, MONTHNAME(RentalDate) AS Month, YEAR(RentalDate) As Year, COUNT(*)
FROM t_loan
GROUP BY fk_IdCustomer, Month, Year);
which results in
+---------------+-------------+------+----------+
| fk_IdCustomer | Month | Year | COUNT(*) |
+---------------+-------------+------+----------+
| 1 | July | 2016 | 3 |
| 1 | November | 2017 | 1 |
| 1 | September | 2016 | 7 |
| 5 | May | 2016 | 1 |
| 6 | January | 2016 | 1 |
| 6 | September | 2017 | 2 |
+---------------+-------------+------+----------+
Now I want to get these months and years for each customer which result in highest COUNT(*), f.e.:
+---------------+-------------+------+----------+
| fk_IdCustomer | Month | Year | COUNT(*) |
+---------------+-------------+------+----------+
| 1 | September | 2016 | 7 |
| 5 | May | 2016 | 1 |
| 6 | September | 2017 | 2 |
+---------------+-------------+------+----------+
How to achieve this?

This is a bit painful in MySQL, which doesn't support CTEs or window functions. One method is:
SELECT fk_IdCustomer, MONTHNAME(RentalDate) AS Month,
YEAR(RentalDate) As Year, COUNT(*) as cnt
FROM t_loan l
GROUP BY fk_IdCustomer, Month, Year
HAVING cnt = (SELECT COUNT(*)
FROM t_loan l2
WHERE l2.fk_IdCustomer = l.fk_IdCustomer
GROUP BY MONTHNAME(RentalDate), YEAR(RentalDate)
ORDER BY COUNT(*) DESC
LIMIT 1
);
Note: If there are duplicates, you will get all matching values.

MySQL Group by complex script

I have an script that works perfect, but need to add values from another table
Current script is
select v.id, vm.producto_id, sum(vm.total), count(v.id)
from visita v, reporte r, visitamaquina vm, maquina m,
(select r.id, empleado_id, fecha, cliente_id from ruta r, rutacliente rc where r.id=rc.ruta_id and
fecha>='2016-10-01' and fecha<='2016-10-30' group by fecha, cliente_id, empleado_id) as rem
where rem.fecha=v.fecha and v.cliente_Id=rem.cliente_id and r.visita_id=v.id and vm.visita_id=v.id and m.id=vm.maquina_id
group by vm.visita_id, vm.producto_id
Current Script returns this (I need some extra columns but for this purpose I only leave the ones with issues):
| Producto_Id | Id | Total | count(id) |
|---------------|--------------|-----------|-----------|
| 1 | 31 | 21 | 2 |
| 2 | 31 | 15 | 3 |
| 3 | 31 | 18 | 2 |
Table VisitaMaquina has multiple records for same producto_id
VisitaMaquina has this:
| Producto_Id | Visita_Id | Total |
|---------------|--------------|-----------|
| 1 | 31 | 8 |
| 1 | 31 | 13 |
| 2 | 31 | 9 |
Same situation happens with table called reporteproducto, where multiple times producto_id is repeated.
Table reporteproducto has
| Producto_Id | Visita_Id | Quantity |
|---------------|--------------|-----------|
| 1 | 31 | 4 |
| 1 | 31 | 7 |
| 2 | 31 | 5 |
My previous query works fine, and I just need to get the sum of quantity
I used this Script and this is what I got
select v.id, vm.producto_id, sum(vm.total), sum(quantity), count(id)
from visita v, reporte r, visitamaquina vm, maquina m, reporteproducto rp,
(select r.id, empleado_id, fecha, cliente_id from ruta r, rutacliente rc where r.id=rc.ruta_id and
fecha>='2016-10-01' and fecha<='2016-10-30' group by fecha, cliente_id, empleado_id) as rem
where rem.fecha=v.fecha and v.cliente_Id=rem.cliente_id and r.visita_id=v.id and vm.visita_id=v.id and m.id=vm.maquina_id and rp.visita_Id=v.id and rp.producto_id=vm.producto_id
group by vm.visita_id, vm.producto_id
I got this
|Producto_Id | Visita_Id | Total |Quantity | count(id)
|---------------|--------------|-----------|-----------|-----------|
| 1 | 31 | 42 | 11 | 4 |
| 2 | 31 | 45 | 18 | 6 |
| 3 | 31 | 36 | 44 | 4 |
The desired result is (focus on producto_id=1):
|Producto_Id | Visita_Id | Total |Quantity |
|---------------|--------------|-----------|-----------|
| 1 | 31 | 21 | 11 |
| 2 | 31 | 15 | 18 |
| 3 | 31 | 18 | 44 |
Any Idea on how to solve this?

Better group the sub table that has multiple data with the same group of your outer group by columns.In your case the VisitaMaquina and reporteproducto should be group by with visita_id, producto_id since they all have repeat rows with the same combination of vid=31 and pid=1.
You can change the visitamaquina vm and reporteproducto rp table alias to sub query form of the following:
(select visita_id, Producto_Id, sum(Total) as Total from visitamaquina
group by visita_id, Producto_Id) vm,
(select Producto_Id, Visita_Id, sum(Quantity) as Quantity from reporteproducto
group by Producto_Id, Visita_Id) rp
Also I found that there is vm.maquina_id in your where clause, maybe this causes your problem.Because if the visitamaquina and reporteproducto both have repeat values of visita_id, producto_id then the output should have Total, Quantity both doubled.In your output the Quantity is right, that's odd.

My Mistake
I got this
|Producto_Id | Visita_Id | Total |Quantity | count(id)
|---------------|--------------|-----------|-----------|-----------|
| 1 | 31 | 42 | 22 | 4 |
| 2 | 31 | 45 | 36 | 6 |
| 3 | 31 | 36 | 88 | 4 |

get last record per category filter by date

I have this tables SQL Fiddle
items table:
+----+----------+
| id | name |
+----+----------+
| 1 | Facebook |
| 2 | Twitter |
| 3 | Amazon |
+----+----------+
prices table:
+----+-----------+---------+-----------------------------+
| id | buy | item_id | created_at |
+----+-----------+---------+-----------------------------+
| 1 | 43000 | 1 | June, 18 2014 17:31:04+0000 |
| 2 | 44000 | 1 | June, 19 2014 17:31:04+0000 |
| 3 | 30000 | 2 | June, 20 2014 17:31:04+0000 |
| 4 | 33000 | 2 | June, 21 2014 17:31:04+0000 |
| 5 | 20000 | 3 | June, 22 2014 17:31:04+0000 |
| 6 | 21000 | 3 | June, 23 2014 17:31:04+0000 |
+----+-----------+---------+-----------------------------+
I want to get last prices per item and one before last price's buy field based on a price date
Desired output:
+----+---------+-----------------+---------+
| id | buy | last_before_buy | item_id |
+----+---------+-----------------+---------+
| 10 | 45000 | 43000 | 3 |
| 7 | 33000 | 31000 | 2 |
| 4 | 23000 | 23000 | 1 |
+----+---------+-----------------+---------+

Here's another way to do it:
select a.id, a.buy, b.buy last_before_buy, a.item_id
from (select * from prices WHERE (created_at <= NOW() - INTERVAL 5 DAY) order by id desc) a
join (select * from prices order by id desc) b on a.item_id = b.item_id and a.id > b.id
group by a.item_id;
fiddle

You can do this with the substring_index()/group_concat() trick:
select max(id) as id,
substring_index(group_concat(buy order by created_at desc), ',', 1) as buy,
substring_index(substring_index(group_concat(buy order by created_at desc), ',', 2), ',', -1) as lastbuy,
item_id
from prices p
group by item_id;

Complex SQL query suggestions please

I have three tables with schema as below:
Table: Apps
| ID (bigint) | USERID (Bigint)| START_TIME (datetime) |
-------------------------------------------------------------
| 1 | 13 | 2013-05-03 04:42:55 |
| 2 | 13 | 2013-05-12 06:22:45 |
| 3 | 13 | 2013-06-12 08:44:24 |
| 4 | 13 | 2013-06-24 04:20:56 |
| 5 | 13 | 2013-06-26 08:20:26 |
| 6 | 13 | 2013-09-12 05:48:27 |
Table: Hosts
| ID (bigint) | APPID (Bigint)| DEVICE_ID (Bigint) |
-------------------------------------------------------------
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 1 | 1 |
| 4 | 3 | 3 |
| 5 | 1 | 4 |
| 6 | 2 | 3 |
Table: Usage
| ID (bigint) | APPID (Bigint)| HOSTID (Bigint) | Factor (varchar) |
-------------------------------------------------------------------------------------
| 1 | 1 | 1 | Low |
| 2 | 1 | 3 | High |
| 3 | 2 | 2 | Low |
| 4 | 3 | 4 | Medium |
| 5 | 1 | 5 | Low |
| 6 | 2 | 2 | Medium |
Now if put is userid, i want to get the count of rows of table rows for each month (of all app) for each "Factor" month wise for the last 6 months.
If a DEVICE_ID appears more than once in a month (based on START_TIME, based on joining Apps and Hosts), only the latest rows of Usage (based on combination of Apps, Hosts and Usage) be considered for calculating count.
Example output of the query for the above example should be: (for input user id=13)
| MONTH | USAGE_COUNT | FACTOR |
-------------------------------------------------------------
| 5 | 0 | High |
| 6 | 0 | High |
| 7 | 0 | High |
| 8 | 0 | High |
| 9 | 0 | High |
| 10 | 0 | High |
| 5 | 2 | Low |
| 6 | 0 | Low |
| 7 | 0 | Low |
| 8 | 0 | Low |
| 9 | 0 | Low |
| 10 | 0 | Low |
| 5 | 1 | Medium |
| 6 | 1 | Medium |
| 7 | 0 | Medium |
| 8 | 0 | Medium |
| 9 | 0 | Medium |
| 10 | 0 | Medium |
How is this calculated?
For Month May 2013 (05-2013), there are two Apps from table Apps
In table Hosts , these apps are associated with device_id's 1,1,1,4,3
For this month (05-2013) for device_id=1, the latest value of start_time is: 2013-05-12 06:22:45 (from tables hosts,apps), so in table Usage, look for combination of appid=2&hostid=2 for which there are two rows one with factor Low and other Medium,
For this month (05-2013) for device_id=4, by following same procedure we get one entry i.e 0 Low
Similarly all the values are calculated.
To get the last 6 months via query i'm trying to get it with the following:
SELECT MONTH(DATE_ADD(NOW(), INTERVAL aInt MONTH)) AS aMonth
FROM
(
SELECT 0 AS aInt UNION SELECT -1 UNION SELECT -2 UNION SELECT -3 UNION SELECT -4 UNION SELECT -5
)
Please check sqlfiddle: http://sqlfiddle.com/#!2/55fc2

Because the calculation you're doing involves the same join multiple times, I started by creating a view.
CREATE VIEW `app_host_usage`
AS
SELECT a.id "appid", h.id "hostid", u.id "usageid",
a.userid, a.start_time, h.device_id, u.factor
FROM apps a
LEFT OUTER JOIN hosts h ON h.appid = a.id
LEFT OUTER JOIN `usage` u ON u.appid = a.id AND u.hostid = h.id
WHERE a.start_time > DATE_ADD(NOW(), INTERVAL -7 MONTH)
The WHERE condition is there because I made the assumption that you don't want July 2005 and July 2006 to be grouped together in the same count.
With that view in place, the query becomes
SELECT months.Month, COUNT(DISTINCT device_id), factors.factor
FROM
(
-- Get the last six months
SELECT (MONTH(NOW()) + aInt + 11) % 12 + 1 "Month" FROM
(SELECT 0 AS aInt UNION SELECT -1 UNION SELECT -2 UNION SELECT -3 UNION SELECT -4 UNION SELECT -5) LastSix
) months
JOIN
(
-- Get all known factors
SELECT DISTINCT factor FROM `usage`
) factors
LEFT OUTER JOIN
(
-- Get factors for each device...
SELECT
MONTH(start_time) "Month",
device_id,
factor
FROM app_host_usage a
WHERE userid=13
AND start_time IN (
-- ...where the corresponding usage row is connected
-- to an app row with the highest start time of the
-- month for that device.
SELECT MAX(start_time)
FROM app_host_usage a2
WHERE a2.device_id = a.device_id
GROUP BY MONTH(start_time)
)
GROUP BY MONTH(start_time), device_id, factor
) usageids ON usageids.Month = months.Month
AND usageids.factor = factors.factor
GROUP BY factors.factor, months.Month
ORDER BY factors.factor, months.Month
which is insanely complicated, but I've tried to comment explaining what each part does. See this sqlfiddle: http://sqlfiddle.com/#!2/5c871/1/0

MySQL count rows with similar timestamp

Is there anyway to count a given run of timestamps that are close to each other, but not necessarily in a fixed time frame?
Ie, not grouped by hour or minute, but rather grouped by how close the current row's timestamp is to the next row's timestamp. If the next row is within "x" seconds/minutes then add that row to the group, otherwise start a new grouping.
Given this data:
+----+---------+---------------------+
| id | item_id | event_date |
+----+---------+---------------------+
| 1 | 1 | 2013-05-17 11:59:59 |
| 2 | 1 | 2013-05-17 12:00:00 |
| 3 | 1 | 2013-05-17 12:00:02 |
| 4 | 1 | 2013-05-17 12:00:03 |
| 5 | 3 | 2013-05-17 14:05:00 |
| 6 | 3 | 2013-05-17 14:05:01 |
| 7 | 3 | 2013-05-17 15:30:00 |
| 8 | 3 | 2013-05-17 15:30:01 |
| 9 | 3 | 2013-05-17 15:30:02 |
| 10 | 1 | 2013-05-18 09:12:00 |
| 11 | 1 | 2013-05-18 09:13:30 |
| 12 | 1 | 2013-05-18 09:13:45 |
| 13 | 1 | 2013-05-18 09:14:00 |
| 14 | 2 | 2013-05-20 15:45:00 |
| 15 | 2 | 2013-05-20 15:45:03 |
| 16 | 2 | 2013-05-20 15:45:10 |
| 17 | 2 | 2013-05-23 07:36:00 |
| 18 | 2 | 2013-05-23 07:36:10 |
| 19 | 2 | 2013-05-23 07:36:12 |
| 20 | 2 | 2013-05-23 07:36:15 |
| 21 | 1 | 2013-05-24 11:55:00 |
| 22 | 1 | 2013-05-24 11:55:02 |
+----+---------+---------------------+
Desired Results:
+---------+-------+---------------------+
| item_id | total | last_date_in_group |
+---------+-------+---------------------+
| 1 | 4 | 2013-05-17 12:00:03 |
| 3 | 2 | 2013-05-17 14:05:01 |
| 3 | 3 | 2013-05-17 15:30:02 |
| 1 | 4 | 2013-05-18 09:14:00 |
| 2 | 3 | 2013-05-20 15:45:10 |
| 2 | 4 | 2013-05-23 07:36:15 |
| 1 | 2 | 2013-05-24 11:55:02 |
+---------+-------+---------------------+

This is a little complicated. To start, you need is time of the next event for each record. The following subquery adds in such a time (nexted), if it is within bounds:
select t.*,
(select event_date
from t t2
where t2.item_id = t.item_id and
t2.event_date > t.event_date and
<date comparison here>
order by event_date limit 1
) as nexted
from t
This uses a correlated subquery. The <date comparison here> is for whatever date comparison you want. When there is no record, the value will be NULL.
Now, with this information (nexted) there is a trick to get the grouping. For any record, it is the first event time afterwards where nexted is NULL. This will be the last event in the series. Unfortunately, this requires two levels of nested correlated subqueries (or joins with aggregations). The result looks a bit unwieldy:
select item_id, GROUPING, MIN(event_date) as start_date, MAX(event_date) as end_date,
COUNT(*) as num_dates
from (select t.*,
(select min(t2.event_date)
from (select t1.*,
(select event_date
from t t2
where t2.item_id = t1.item_id and
t2.event_date > t1.event_date and
<date comparison here>
order by event_date limit 1
) as nexted
from t1
) t2
where t2.nexted is null
) as grouping
from t
) s
group by item_id, grouping;

What about approaching it from finding each individual record's local associations, and then grouping on the max event date from each record's discoveries. This is based on a static differential time interval (5 minutes in my example)
SELECT item_id, MAX(total), MAX(last_date_in_group) AS last_date_in_group FROM (
SELECT t1.item_id, COUNT(*) AS total, COALESCE(GREATEST(t1.event_date, MAX(t2.event_date)), t1.event_date) AS last_date_in_group
FROM table_name t1
LEFT JOIN table_name t2 ON t2.event_date BETWEEN t1.event_date AND t1.event_date + INTERVAL 5 MINUTE
GROUP BY t1.id
) t
GROUP BY last_date_in_group

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to Count the number of rows within each distinct group? - mysql

Related

Displaying groups having max number of occurence

MySQL Group by complex script

get last record per category filter by date

Complex SQL query suggestions please

MySQL count rows with similar timestamp

Categories

Resources