I have two MySQL tables A and B both with this schema
ID
entity_id
asset
asset_type
0
12345
x
1
..
.........
.....
..........
I would like to get an aggregated top 10/50/whatever entity_ids with the largest row count difference between the two tables. I think I could do this manually by just getting the highest row count by entity_id like so
select count(*), entity_id
-> from A
-> group by entity_id
-> order by count(*) desc;
and just manually comparing to the same query for table B but I'm wondering if there's a way to do this in just one query, that compares row counts for each distinct entity_id and aggregates the differences between row counts. A few notes
There is an index on entity_id for both tables
Table B will always have an equivalent or greater number of rows for each entity_id
Sample output
entity_id
difference
12345
100
3232
75
5992
40
and so on
for top 10/50
Aggregate in each table and join the results to get the difference:
SELECT a.entity_id, b.counter - a.counter diff
FROM (SELECT entity_id, COUNT(*) counter FROM A GROUP BY entity_id) a
INNER JOIN (SELECT entity_id, COUNT(*) counter FROM B GROUP BY entity_id) b
ON a.entity_id = b.entity_id
ORDER BY diff DESC LIMIT 10
Related
There is a table with the name '**work**' that contains data as shown below:
Id Name a_Column work_datetime
-----------------------------------------
1 A A_1 1592110166
2 A A_2 1592110166
3 A A_3 1592110164
4 B B_1 1582111665
5 B B_2 1592110166
6 C C_1 1592110166
If I run a query which group by A and max(work_datetime), then there could be 2 selections for group with Name='A' but i need only one of them with a_Column='A_1' such that final desired output is as follows:-
Id Name a_Column work_datetime
-----------------------------------------
1 A A_1 1592110166
5 B B_2 1592110166
6 C C_1 1592110166
Handling duplicate records at the group by is something which mysql doesn't seem to support!
Any way i can achieve the required result?
A simple option that works on all versions of MySQL is to filter with a subquery:
select w.*
from work w
where w.id = (
select id
from work w1
where w1.name = w.name
order by work_datetime desc, a_column
limit 1
)
For each name, this brings the row with the latest work_datetime; ties are broken by picking the row with the smallest a_column (which is how I understood your requirement).
For performance, you want an index on (work_datetime, a_column, id).
Since version 8 you can use row_number() to assign a number to each row numbering the position of the row in the descending order of the time repeating for each name. Do that in a derived table and then just select the rows where this number is 1 from it.
SELECT x.id,
x.name,
x.a_column,
x.work_datetime
FROM (SELECT w.id,
w.name,
w.a_column,
w.work_datetime,
row_number() OVER (PARTITION BY w.name
ORDER BY w.work_datetime) rn
FROM work w) x
WHERE x.rn = 1;
With row_number() there are no duplicates. Should there be two rows with the same name and time one of it is chosen randomly. If you want to retain the duplicates you can replace row_number() with rank().
I have 2 table member_asrama and asrama. I want count row row_asrama with condition, but the result not showing result 0 from count.
Table member_asrama:
id asrama_id period_id
1 1 1
Table asrama
id name
1 A
2 B
My query
SELECT asrama.id,asrama.name, COUNT(*) as cnt
FROM asrama
left join member_asrama
on asrama.id = member_asrama.`asrama_id`
where member_asrama.`period_id` = 1
group by asrama.id
Result
asrama.id asrama.name cnt
1 A 1
I want result
asrama.id asrama.name cnt
1 A 1
2 B 0
Basically, the condition needs to be in the ON clause:
select a.id, a.name, count(ma.asrama_id) as cnt
from asrama a left join
member_asrama ma
on a.id = ma.asrama_id and
ma.period_id = 1
group by a.id, a.name;
Note other changes:
COUNT() counts a column from member_asrama. That allows 0 in the results.
Table aliases make the query easier to write and to read.
Backticks make the query harder to write and read -- and they are not necessary.
I included both columns in the GROUP BY. Technically, this is not necessary if id is a primary/unique key. However, it is a good habit if you are learning SQL.
I have a table t1 with 5 columns and 80000 rows :
+---+--------+-------+--------+------------+
|id |category|groupe |subject | description|
+---+--------+-------+--------+------------+
|1 |categ1 |group1 |subject1| desc1 |
|2 |categ1 |group2 |subject2| desc2 |
|3 |categ1 |group2 |subject5| desc3 |
|4 |categ2 |group1 |subject5| desc4 |
|5 |categ2 |group3 |subject1| desc5 |
|6 |categ2 |group3 |subject2| desc6 |
|7 |categ3 |group1 |subject1| desc7 |
|8 |categ3 |group1 |subject4| desc8 |
+---+--------+-------+--------+------------+
I need to extract rows that have minimum 30 occurrences of values in category AND 30 occurrences of group AND 30 of subject.
This means if "categ3" appears more than 30 times, i need rows with categ3
same with group and subject.
but when i used the query bellow the final result can have less than 30 categ3 because result has been filtered by group or subject that remove id who have categ3.
You can see an example on db<>fiddle,the good query result count() with 10 occurences have to return 118 rows.
select
*
from
t1
where
category in (
SELECT
category
FROM
t1
GROUP BY
category
HAVING
COUNT(category) >= 30
)
and
groupe in (
SELECT
groupe
FROM
t1
GROUP BY
groupe
HAVING
COUNT(groupe) >= 30
)
and
subject in (
SELECT
subject
FROM
t1
GROUP BY
subject
HAVING
COUNT(subject) >= 30
)
This query return intersection on ID where category,groupe and subject have 30 occurrences on values, but this intersection reduce the result count...
this means certain category values count could be reduce to a number less than 30.
for resume,i need 30 occurences in the intersection result.
I think I need to do a recursive filter and have to repeat the loop until input rows is equal to output rows.. But I don't know how to do that... An idea?
Thanks 😊
Add some DISTINCT's, while grouping on the 3 columns.
select *
from dataset t
where t.category in (SELECT distinct category FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
and t.groupe in (SELECT distinct groupe FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
and t.subject in (SELECT distinct subject FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
A test on db<>fiddle here
For reference sake, this query will only select those with a tupple that occurs 30 times or more.
Which will naturally be less that the query above.
SELECT *
FROM dataset
WHERE (category, groupe, subject) IN (
SELECT category, groupe, subject
FROM dataset
GROUP BY category, groupe, subject
HAVING COUNT(*) >= 30
)
Pro tip: This is a case where describing your requirement takes a lot of thought. As you think about it, think of SQL as a processor of sets of rows. It is always worthwhile to describe the requirement as carefully as you can, especially when it is as tricky as this one. Often it's helpful to describe the problem domain, rather than just talking about columns and values.
I guess you need the sets of rows meeting your three different criteria (more than x duplicates). You can use a set of id values for those rows because they are apparently a primary key (unique).
Here's one set of IDs
SELECT id FROM dataset WHERE category IN (
SELECT category FROM dataset GROUP BY category HAVING COUNT(*) >= 5))
I believe you need all the rows lying in the intersection of those three sets. That is, you want any rows having all three items recurring frequently. You can get that with
id IN set1 AND id IN set2 AND id IN set3
If you need the union of those sets you can use this instead. This gives you the rows with any of the three items recurring frequently.
id IN set1 OR id IN set2 OR id IN set3
So here's the query.
SELECT *
FROM dataset
WHERE id IN (
SELECT id FROM dataset WHERE category IN (
SELECT category FROM dataset GROUP BY category HAVING COUNT(*) >= 5))
AND id IN (
SELECT id FROM dataset WHERE groupe IN (
SELECT groupe FROM dataset GROUP BY groupe HAVING COUNT(*) >= 5))
AND id IN (
SELECT id FROM dataset WHERE subject IN (
SELECT subject FROM dataset GROUP BY subject HAVING COUNT(*) >= 5))
I used 5 for the repeat threshold. You can use another number.
If you want your result set to contain only those rows with at least ten items in the result set, rather than in the dataset, you would use this query.
select d.*
from dataset d
join (
select count(*), groupe, category, subject
from dataset
group by groupe, category, subject
having count(*) >= 10
) e ON d.groupe=e.groupe AND d.category = e.category AND d.subject = e.subject
I have data in a MySQL table in the following format. I want to retrieve the count in two different conditions as shown in the query below, I want to combine these queries into a single one, by which I mean I would like the first query result in one column and second query result in another column, as so:
Expected output:
count totalcount
--------------------------
3 6
Queries:
select count(*) as count from entries where
date between '2014-08-12' and '2014-08-14';
select count(*) as totalcount from entries ;
Data in mysql table:
id date
------------------------
1 2014-08-14
2 2014-08-13
3 2014-08-12
4 2014-08-11
5 2014-08-10
6 2014-08-09
sql fiddle http://sqlfiddle.com/#!2/faeb26/6
select sum(date between '2014-08-12' and '2014-08-14'), count(*) as totalcount from entries ;
The boolean expression in SUM() equals to true or false, 1 or 0. Therefore just use SUM() instead of COUNT().
Just put the two queries together:
select count(*) as count, b.totalcount from entries,
(select count(*) as totalcount from entries) b
where date between '2014-08-12' and '2014-08-14';
select sum(c) as count, sum(tc) as totalcount
from (select count(*) as c , 0 as tc from entries where date between '2014-08-12' and '2014-08-14'
union all
select 0 as c, count(*) as tc from entries)
simple combine to result in on other select query try this
SELECT (select count(*) as count from entries where
date between '2014-08-12' and '2014-08-14'
) as count, (select count(*) as totalcount from entries) as totalcount;
DEMO LINK
I have the following table (user_record) with millions of rows like this:
no uid s
================
1 a 999
2 b 899
3 c 1234
4 a 1322
5 b 933
-----------------
The uid can be duplicate .What I need is to show the top ten records(need inclued uid and s) with no duplicate uid order by s (desc). I can do this by two steps in the following SQL statements:
SELECT distinct(uid) FROM user_record ORDER BY s DESC LIMIT 10
SELECT uid,s FROM user_record WHERE uid IN(Just Results)
I just wana know is there a bit more efficient way in one statement?
Any help is greatly appreciated.
ps:I also have following the SQL statement:
select * from(select uid,s from user_record order by s desc) as tb group by tb.uid order by tb.s desc limit 10
but it's slow
The simpliest would be by using MAX() to get the highest s for every uid and sorted it based on the highest s.
SELECT uid, MAX(s) max_s
FROM TableName
GROUP BY uid
ORDER BY max_s DESC
LIMIT 10
SQLFiddle Demo
The disadvantage of the query above is that it doesn't handles duplicates if for instance there are multiple uid that have the same s and turn out to be the highest value. If you want to get the highest value s with duplicate, you can do by calculating it on the subquery and joining the result on the original table.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT DISTINCT s
FROM TableName
ORDER BY s DESC
LIMIT 10
) b ON a.s = b.s
ORDER BY s DESC