Select distinct values from query excluding sorting-purpose column - mysql

Description:
I want to select my site content's categories. Most of them will be created by users so I will ve to deal with problem of many categories in table. I want to respect some kind content's trends on my site. My solution is:
Select all categories from past 2 days and sort it by number of appearances (ascending),
Union query (distinct)
Select all categories from date < past 2 days and sort it like above.
Thanks to it I ve all most popular categories from small amount of time + most popular categories in global scope.
Query:
(SELECT category, COUNT(*) AS number FROM data WHERE date BETWEEN ADDDATE(NOW(), INTERVAL -2 DAY) AND NOW() GROUP BY category)
UNION
(SELECT category, COUNT(*) AS number FROM data WHERE date < ADDDATE(NOW(), INTERVAL -2 DAY) GROUP BY category)
ORDER BY number DESC LIMIT 50
Output:
+----------+--------+
| category | number |
+----------+--------+
| 2 | 3 |
| 4 | 3 |
| 6 | 3 |
| 5 | 2 |
| 1 | 2 |
| 2 | 1 |
+----------+--------+
6 rows in set (0.00 sec)
Note there is duplicated content in category (id 2), UNION DISTINCT (default) is not excluding this because it compares rows from both columns, so:
+----------+--------+
| category | number |
+----------+--------+
| 2 | 3 | //is not equal to
| 2 | 1 | //below values
+----------+--------+
//wont be excluded
Problem to slove:
I need to select distinct values from only category column.
(number is only for sorting purposes and used only in this query)

If I understand your question correctly, this should be the query that you need:
SELECT category
FROM (
SELECT category, COUNT(*) AS number
FROM data WHERE date BETWEEN ADDDATE(NOW(), INTERVAL -2 DAY) AND NOW()
GROUP BY category
UNION ALL
SELECT category, COUNT(*) AS number
FROM data WHERE date < ADDDATE(NOW(), INTERVAL -2 DAY)
GROUP BY category
ORDER BY number DESC
) s
GROUP BY category
ORDER BY MAX(number) DESC
LIMIT 50
I removed brackets () around your two queries that make your union query because the ORDER BY of your UNION query will be applied to both. I also used UNION ALL instead of UNION because categories are grouped again in the outer query, i would try both UNION/UNION ALL to see which one is faster.
Then I'm grouping again, by category, and ordering by the MAX(number) of your category, and keeping only the first 50 rows.

Related

Combine multiple table and use Group By Function in MYSQL

I have 5 different datasets from 5 different tables.. From those 5 different tables I have taken below group by data..
select number,count(*) as total from tb01 group by number limit 5;
select number,count(*) as total from tb02 group by number limit 5;
Like that I can retrieve 5 different datasets. Here is an example.
+-----------+-------+
| number | total |
+-----------+-------+
| 114000259 | 1 |
| 114000400 | 1 |
| 114000686 | 1 |
| 114000858 | 1 |
| 114003895 | 1 |
+-----------+-------+
Now I need to combine those 5 different tables such as below tabular format.
+-----------+-------+-------+-------+
| number | tb01 | tb02 | tb03 |
+-----------+-------+-------+-------+
| 114000259 | 1 | 2 | 1 |
| 114000400 | 1 | 0 | 1 |
| 114000686 | 1 | 3 | 1 |
| 114000858 | 1 | 1 | 5 |
| 114003895 | 1 | 0 | 1 |
+-----------+-------+-------+-------+
Can someone help me to combine those 5 grouped data sets and get the union as above.
Note: I dont need the header as same as table names..these headers can be anything
Further I dont need to limit 5, above is to get a sample of 5 data only. I have a large dataset.
It's a job for JOINs and subqueries. My answer will consider three tables. It should be obvious how to expand it to five.
Your first subquery: get all possible numbers.
SELECT number FROM tb01 UNION
SELECT number FROM tb02 UNION
SELECT number FROM tb03
Then you have a subquery for each table to get the count.
SELECT number, COUNT(*) AS total
FROM tb02 GROUP BY number
Then you LEFT JOIN everything and SELECT from that.
SELECT numbers.number,
tb01.total tb01,
tb02.total tb02,
tb03.total tb03
FROM (
SELECT number FROM tb01 UNION
SELECT number FROM tb02 UNION
SELECT number FROM tb03
) numbers
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb01 GROUP BY number
) tb01 ON numbers.number = tb01.number
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb02 GROUP BY number
) tb02 ON numbers.number = tb02.number
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb03 GROUP BY number
) tb03 ON numbers.number = tb01.number
You can add ORDER BY and LIMIT clauses to that overall query as necessary.
The first subquery together with the LEFT JOIN ensures that you get results even if some of your tables are missing number rows. (Some DBMSs have FULL OUTER JOIN, but MySQL does not.)
Pro tip: If you use LIMIT without ORDER BY, you get an unpredictable subset of your rows. Unpredictable is worse than random, because you get the same subset in testing with small tables, but when your tables grow you may start getting different subsets. You'll never catch the problem in unit testing. LIMIT without ORDER BY is a serious error.

How to select distinct based on condition (another column) [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Closed last year.
I am trying to select distinct values from a table based on date column. I mean I want to extract the distinct rows with higher value of date column
ID| house | people| date
------------------------------
1 | a | 5 | 2021-10-20
2 | a | 5 | 2022-01-20
3 | b | 4 | 2021-10-20
4 | b | 4 | 2022-01-20
After query is runned, I want the below result:
a | 5 | 2022-01-20
b | 4 | 2022-01-20
I have tried below query but I have no idea how to add the condition (show the distinct row with higher date value.
SELECT DISTINCT house, people FROM Table
I tried SELECT DISTINCT house, people FROM Table WHERE MAX(date) but got some errors.
Any ideas?
You can get the row number for each row partitoned by house and ordered by date desc. then only select the rows with row number = 1:
select house, people, date
from(select house, people, date, row_number() over(partition by house order by date desc) rn
from table_name) t
where rn = 1
Fiddle
You will need aggregation via group by and the max date, filtering out rows that are older to 1) ensure that your grouping occurs faster and 2) ignore items that have no newer date values.
SELECT house, people, max(`date`)
FROM Table
WHERE `date` > '2021-10-20 00:00:00'
GROUP BY house, people

Calculate unique items seen by users via sql

I need help to resolve the next case.
The data which users want to see is accessible by pagination requests and later these requests are stored in the database in the next form:
+----+---------+-------+--------+
| id | user id | first | amount |
+----+---------+-------+--------+
| 1 | 1 | 0 | 5 |
| 2 | 1 | 10 | 10 |
| 3 | 1 | 10 | 5 |
| 4 | 1 | 15 | 10 |
| 5 | 2 | 0 | 10 |
| 6 | 2 | 0 | 5 |
| 7 | 2 | 10 | 5 |
+----+---------+-------+--------+
The table is ordered by user id asc, first asc, amount desc.
The task is to write the SQL statement which calculate what total unique amount of data the user has seen.
For the first user total amount must be 20, since the request with id=1 returned first 5 items, with id=2 returned another 10 items. Request with id=3 returns data already 'seen' by request with id=2. Request with id=4 intersects with id=2, but still returns 5 'unseen' pieces of data.
For the second user total amount must be 15.
As a result of SQL statement, I should get the next output:
+---------+-------+
| user id | total |
+---------+-------+
| 1 | 20 |
+---------+-------+
| 2 | 15 |
+---------+-------+
I am using MySQL 5.7, so window functions are not available for me. I stuck with this task for a day already and still cannot get the desired output. If it is not possible with this setup, I will end up calculating the results in the application code. I would appreciate any suggestions or help with resolving this task, thank you!
This is a type of gaps and islands problem. In this case, use a cumulative max to determine if one request intersects with a previous request. If not, that is the beginning of an "island" of adjacent requests. A cumulative sum of the beginnings assigns an "island", then an aggregation counts each island.
So, the islands look like this:
select userid, min(first), max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp;
You then want this summed by userid, so that is one more level of aggregation:
with islands as (
select userid, min(first) as first, max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp
)
select userid, sum(last - first) as total
from islands
group by userid;
Here is a db<>fiddle.
This logic is similar to Gordon's, but runs on older releases of MySQL, too.
select userid
-- overall length minus gaps
,max(maxlast)-min(minfirst) + sum(gaplen) as total
from
(
select userid
,prevlast
,min(first) as minfirst -- first of group
,max(last) as maxlast -- last of group
-- if there was a gap, calculate length of gap
,min(case when prevlast < first then prevlast - first else 0 end) as gaplen
from
(
select t.*
,first + amount as last -- last value in range
,( -- maximum end of all previous rows
select max(first + amount)
from t as t2
where t2.userid = t.userid
and t2.first < t.first
) as prevlast
from t
) as dt
group by userid, prevlast
) as dt
group by userid
order by userid
See fiddle

SQL - Multiple grouping within report

I have a table structured similarly to this:
ID
Incident_Name
Category
Source
I need a report to show all incidents grouped by category but then the amount of incidents in that category that have a certain source value.
Category | Amount | Percentage of Total | Source_1 | Source_2 | Source 3
----------------------------------------------------------------------------
Category 1 | 5 | 25% | 1 | 3 | 2
Category 2 | 15 | 75% | 10 | 2 | 3
I'm using MySQL - how would I go about doing this.
Grouping and getting the amount/percentage is fine but not sure how I'd go about doing the rest.
SELECT Category, COUNT(*) AS Amount, (COUNT(*) / (SELECT COUNT(*) FROM MyTable)) * 100 AS 'Percentage of Total',
FROM MyTable
GROUP BY Category;
Any advice
I think this is what you are trying to do.
SELECT Category,
COUNT(*) AS Amount,
(COUNT(*) / (SELECT COUNT(*) FROM MyTable)) * 100 AS 'Percentage of Total',
SUM(source=someval1) as Source_1, --this may need a change
SUM(source=someval2) as Source_2, --this may need a change
SUM(source=someval3) as Source_3 --this may need a change
FROM MyTable
GROUP BY Category;

select non group by columns with count in Mysql

I have a table tbl with three columns:
id | fk | dateof
1 | 1 | 2016-01-01
2 | 1 | 2016-01-02
3 | 2 | 2016-02-01
4 | 2 | 2016-03-01
5 | 3 | 2016-04-01
I want to get the results like this
Id count of Id max(dateof)
2 | 2 | 2016-01-02
4 | 2 | 2016-03-01
5 | 1 | 2016-04-01
My try
SELECT id,tbl.dateof dateof
FROM tbl
INNER JOIN
(SELECT fk, MAX(dateof) dateof ,
count(id) cnt_of_id -- How to get this count value in the result
FROM tbl
GROUP BY fk) temp
ON tbl.fk = temp.fk AND tbl.dateof = temp.dateof
This is an aggregation query, but you don't seem to want the column being aggregated. That is ok (although you cannot distinguish the rk that defines each row):
select count(*) as CountOfId, max(dateof) as maxdateof
from t
group by fk;
In other words, your subquery is pretty much all you need.
If you have a reasonable amount of data, you can use a MySQL trick:
select substring_index(group_concat(id order by dateof desc), ',', 1) as id
count(*) as CountOfId, max(dateof) as maxdateof
from t
group by fk;
Note: this is limited by the maximum intermediate size for group_concat(). This parameter can be changed and it is typically large enough for this type of query on a moderately sized table.
You obviously want one result row per fk, so group by it. Then you want the max ID, the row count and the max date for each fk:
select
max(id) as max_id,
count(*) as cnt,
max(date_of) as max_date_of
from tbl
group by fk;