MySQL efficient and correct linking of GROUP BY results - mysql

is there some efficient way how to write queries that join various results of GROUP BYs on a common table? How MySQL handles merging results of aggreagate functions ona a subGROUP with full fields from original table?
i am using this and its slow (and i need also other condition than CONDITION=1)
SELECT a.CID,a.AS_ALL,b.AS_ACTIVE FROM
(SELECT CID,COUNT(DISTINCT RAID) AS AS_ALL FROM MYTABLE GROUP BY CID) a
LEFT JOIN
(SELECT CID,COUNT(DISTINCT RAID) AS AS_ACTIVE FROM MYTABLE WHERE CONDITION=1 GROUP BY CID ) b ON a.CID=b.CID;
also is it save to use something like?? will MySQL always correctly merge COLUMN_A with results of aggregation?
SELECT COLUMN_A COUNT(DISTINCT COLUMN_A), SUM(COLUMN_A),SUM(COLUMN_B) FROM ATABLE WHERE CONDITION=1 GROUP BY COLUMN_C
Thank you for advice

Try this:
SELECT CID,COUNT(DISTINCT RAID) AS AS_ALL,SUM(IF(CONDITION=1, 1, 0)) AS AS_ACTIVE
FROM MYTABLE GROUP BY CID

Related

Count Distinct on multiple values within same column in SQL Aggregation

Objective:
I wanted to show the number of distinct IDs for any combination selected.
In the below example, I have data at a granular level: ID level data.
I wanted to show the number of distinct IDs for each combination.
For this, I use count distinct which will give me '1' for the below combinations.
But let's say if I wanted to find the number of IDs who made both E-commerce and Face to face transactions, in that case, if I just use this data, I would be showing the sum of E-comm and Face to face and the result would be '2' instead of '1'.
And this is not limited to Ecom/Face to face. I wanted to apply the same logic for all columns.
Please let me know if you have any other alternative approach to address this issue.
First aggregate in your table to get the distinct ids for each TranType:
SELECT TranType, COUNT(DISTINCT id) counter_distinct
FROM tablename
GROUP BY TranType
and then join to the table:
SELECT t.*, g.counter_distinct
FROM tablename t
INNER JOIN (
SELECT TranType, COUNT(DISTINCT id) counter_distinct
FROM tablename
GROUP BY TranType
) g ON g.TranType = t.TranType
Or use a correlated subquery:
SELECT t1.*,
(SELECT COUNT(DISTINCT t2.id) FROM tablename t2 WHERE t2.TranType = t1.TranType) counter_distinct
FROM tablename t1
But let's say if I wanted to find the number of IDs who made both E-commerce and Face to face transactions, in
You can get the list of ids using:
select id
from t
where tran_type in ('Ecomm', 'Face to face')
group by id
having count(distinct tran_type) = 2;
You can get the count using a subquery:
select count(*)
from (select id
from t
where tran_type in ('Ecomm', 'Face to face')
group by id
having count(distinct tran_type) = 2
) i;

Combining simple selects in mysql (join or union?)

I have two queries that are both very quick (20ms) - when i combine them with a join, i get a 30 second query and the data is wrong... What's wrong?
SELECT
count(profile.id),
date(profile.createdAt)
FROM profile
GROUP BY date(profile.createdAt)
ORDER BY date(profile.createdAt) DESC;
and
SELECT
count(product.id),
date(product.createdAt)
FROM product
GROUP BY date(product.createdAt)
ORDER BY date(product.createdAt) desc;
Joining them i get a very slow query:
SELECT
count(profile._id),
date(profile.createdAt),
count(product._id),
date(product.createdAt)
FROM profile
INNER JOIN product
ON date(product.createdAt) = date(profile.createdAt)
GROUP BY
date(product.createdAt),
date(profile.createdAt)
ORDER BY date(product.createdAt) desc;
The logical error with your current approach is that you are double counting one or both of the counts due to the join. You may try doing the aggregations in separate subqueries, and then join those subqueries:
SELECT
t1.createdAt,
COALESCE(t1.profile_cnt, 0) AS profile_cnt,
COALESCE(t2.product_cnt, 0) AS product_cnt
FROM
(
SELECT DATE(createdAt) AS createdAt, COUNT(id) AS profile_cnt
FROM profile
GROUP BY DATE(createdAt)
) t1
INNER JOIN
(
SELECT DATE(createdAt) AS createdAt, COUNT(id) AS product_cnt
FROM product
GROUP BY DATE(createdAt)
) t2
ON t1.createdAt = t2.createdAt;
If the two tables don't both contain the same dates, then the above query might drop certain dates. To avoid this, we could join with a calendar table which includes all dates we want to appear in the output.
Regarding performance, you are doing a join of two aggregation queries, so it is not expected to be that performant. Also, calling DATE to cast createdAt to a pure date is expensive, and maybe could be avoided by maintaining a dedicated date column.
I think the problem is that you are joining on the result of the date function, which is likely doing a lot under the hood. That function has to execute for every record in each table.
If you can, join with the primary keys/foreign keys of the tables to take advantage of indexes.

Two select statements on same table and get Count(*)

Im trying to do two queries on the same table to get the Count(*) value.
I have this
SELECT `a`.`name`, `a`.`points` FROM `rank` AS a WHERE `id` = 1
And in the same query I want to do this
SELECT `b`.`Count(*)` FROM `rank` as b WHERE `b`.`points` >= `a`.`points`
I tried searching but did not find how to do a Count(*) in the same query.
Typically you would not intermingle a non aggregate and aggregate query together in MySQL. You might do this in databases which support analytic functions, such as SQL Server, but not in (the current version of) MySQL. That being said, your second query can be handled using a correlated subquery in the select clause the first query. So you may try the following:
SELECT
a.name,
a.points,
(SELECT COUNT(*) FROM rank b WHERE b.points >= a.points) AS cnt
FROM rank a
WHERE a.id = 1;
As I understand from the question, you want to find out in a table for a given id how many rows have the points greater than this row. This can be achieved using full join.
select count(*) from rank a join rank b on(a.id != b.id) where a.id=1 and b.points >= a.points;

MySQL CROSS JOIN FROM syntax

I have the following query working
SELECT newTable.Score, COUNT(1) AS Total, COUNT(1) / t.count * 100 AS `Frequency`
FROM mytable newTable
CROSS JOIN (SELECT COUNT(1) AS count FROM mytable) t
GROUP BY newTable.Score
ORDER BY Frequency DESC
However, two things I don't understand from the MySQL docs:
1) I don't understand why there isn't a comma, or a join type, specified in the from clause.
Reading the MySQL docs, this seems necessary.
2) What does the 't' represent in the CROSS JOIN clause?
Any advice appreciated.
The t is the same as the newTable - it is an alias name for the table and the temporary table that the subquery builds.
It is easier to read when the optional as keyword is used
SELECT newTable.Score, COUNT(1) AS Total, COUNT(1) / t.count * 100 AS `Frequency`
FROM mytable as newTable
CROSS JOIN (SELECT COUNT(1) AS count FROM mytable) as t
GROUP BY newTable.Score
ORDER BY Frequency DESC
An alias name replaces the original name of the table with a new one to be used in your query. And you need to give subqueries a name to refer to them in your query too.

mysql - how do I get count of counts

I have a table with duplicate skus.
skua
skua
skub
skub
skub
skuc
skuc
skud
SELECT sku, COUNT(1) AS `Count` FROM products GROUP BY sku;
shows me all the skus that have duplicates and the number of duplicates
skua 2
skub 3
skuc 2
skud 1
I am trying to find how many there are with 2 duplicates, 3 duplicates etc.
i.e.
duplicated count
1 1 (skud)
2 2 (skua, and skuc)
3 1 (skub)
and I don't know how to write the sql. I imagine it needs a subselect...
thanks
Just use your current query as an inline view, and use the rows from that just like it was from a table.
e.g.
SELECT t.Count AS `duplicated`
, COUNT(1) AS `count`
FROM ( SELECT sku, COUNT(1) AS `Count` FROM products GROUP BY sku ) t
GROUP BY t.Count
MySQL refers to an inline view as a "derived table", and that name makes sense, when we understand how MySQL actually processes that. MySQL runs that inner query, and creates a temporary MyISAM table; once that is done, MySQL runs the outer query, using the temporary MyISAM table. (You'll see that if you run an EXPLAIN on the query.)
Above, I left your query just as you formatted it; I'd tend to reformat your query, so that entire query looks like this:
SELECT t.Count AS `duplicated'
, COUNT(1) AS `count`
FROM ( SELECT p.sku
, COUNT(1) AS `Count`
FROM products p
GROUP BY p.sku
) t
GROUP BY t.Count
(Just makes it easier for me to see the inner query, and easier to extract it and run it separately. And qualifying all column references (with a table alias or table name) is a best practice.)
select dup_count as duplicated,
count(*) as `count`,
group_concat(sku) as skus
from
(
SELECT sku, COUNT(1) AS dup_count
FROM products
GROUP BY sku
) tmp_tbl
group by dup_count