how to group by a field that has both select and count - mysql

I am wondering how to group by a field that has both a select count() and count() statement. I know that we have to put all select fields in group by but it wont let me do so because of the second count() statement in the field.
create table C as(
select a.id, a.date_id,
(select count(b.hits)*1.00 where b.hits >= '9')/count(b.hits) AS percent **<--error here
from A a join B b
on a.id = b.id
group by 1,2,3) with no data primary index(id);
This is my error:
[SQLState HY000] GROUP BY and WITH...BY clauses may not contain
aggregate functions. Error Code: 3625
When i add a select to the second count in the third line only get 1 or 0 which is not right.
`((select count(b.hits)*1.00 where b.hits >= '9')/(select count(b.hits))) AS` percent
Do i need to do a self join instead or is there any way i can just use nested queries?

You need to fix the group by. But, you can probably simplify the query as:
create table C as
select a.id, a.date_id,
avg(b.hits >= 9) as percent
from A a join
B b
on a.id = b.id
group by a.id, a.date_id
with no data primary index(id);

It looks like you only need to group on 2 columns, not 3, plus you shouldn't need a sub-select:
create table C as(
select a.id, a.date_id,
SUM(CASE WHEN b.hits >= '9' THEN 1 ELSE 0 END)/COUNT(b.hits) AS percent
from A a join B b
on a.id = b.id
group by 1,2) with no data primary index(id);

Related

How to select only the last date row of a joined Table?

I am joining Table B into A. Table A has the basic information I want to retrieve and also the unique ID.
Table B has multiple rows for each ID with another column with Dates. Now I only want to select the last Date of Table B and join in into A.
I found the MAX() function of SQL but it says the other fields are not in the GROUP BY clause or an aggregation function.
This is my (simplified) query:
SELECT
MAX("B"."ENDDATE") AS FINALEND,
"A."ID",
"A"."COLOR",
"A"."MAKE",
"A"."WHEELS",
FROM "A"
JOIN "B" ON "A"."ID" = "B"."ID"
My expected result is for each ID a row with the basic information from Table A and the last Date from all matching rows from Table B. My result now is multiple rows for every row in B.
Do I need to add a GROUP BY for ever other column? Or what am I missing?
Thanks for any input :)
On MySQL 8+, we can use ROW_NUMBER here:
WITH cte AS (
SELECT a.*, b.ENDDATE,
ROW_NUMBER() OVER (PARTITION BY a.ID ORDER BY b.ENDDATE DESC) rn
FROM A a
INNER JOIN B b ON b.ID = a.ID
)
SELECT ID, COLOR, MAKE, WHEELS, ENDDATE AS FINALEND
FROM cte
WHERE rn = 1;
On earlier versions of MySQL, we can join to a subquery which finds the latest record for each ID in the B table:
SELECT a.ID, a.COLOR, a.MAKE, a.WHEELS, b1.ENDDATE AS FINALEND
FROM A a
INNER JOIN B b1 ON b1.ID = a.ID
INNER JOIN
(
SELECT ID, MAX(ENDDATE) AS MAXENDDATE
FROM B
GROUP BY ID
) b2
ON b2.ID = b1.ID AND b2.MAXENDDATE = b1.ENDDATE;

SQL Select two records if they have certain time difference of column A and have same column B value

I have 14000 records in my sql table. They have columns ID, test_subject_id and date_created. I want to fetch all the records that have been created within a time difference of 3 minutes(difference in date_created values) and both records should have the same test_subject_id.
You should use a self join, I assume inner join is what will work for you:
SELECT a.ID, a.date_created, b.ID, b.date_created
FROM accounts a
INNER JOIN accounts b
ON a.test_subject_id = b.test_subject_id
AND TIMESTAMPDIFF(MINUTE,a.date_created,b.date_created) = 3
Note: TIMESTAMPDIFF is used assuming date_created has type datetime, details here.
You can use EXISTS:
SELECT t1.*
FROM tablename t1
WHERE EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.test_subject_id = t1.test_subject_id
AND ABS(TIMESTAMPDIFF(SECOND, t1.date_created, t2.date_created)) <= 180
)
ORDER BY t1.test_subject_id, t1.date_created;

counting join subquery by mainquery

Why is the result of countTypes NULL ?
I need the counts based of the main query (a.id's).
I get the counts only if i put the same on a WHERE clause but i need a solution without double WHERE clause.
Thanks in advance.
items
id type
1 2
2 2
3 1
4 1
5 3
SELECT a.id, countTypes FROM items AS a
INNER JOIN (
SELECT id, JSON_ARRAYAGG(JSON_OBJECT(CONCAT('countType', type), count)) as countTypes
FROM (SELECT id, type, count(id) AS count FROM items GROUP BY type) AS b
) AS c ON c.id = a.id
WHERE a.id >= '100' AND a.id <= '200'
Your innermost subquery query is this.
SELECT id, type, count(id) AS count FROM items GROUP BY type
Here you have run afoul of MySQL's notorious nonstandard extension to GROUP BY. Your query actually means
SELECT ANY_VALUE(id), type, count(id) AS count FROM items GROUP BY type
Where ANY_VALUE() formally means MySQL may return an unpredictable value for that column. Unpredictable can be confusing. That's why MySQL's extension is notorious.
You need
SELECT id, type, count(id) AS count FROM items GROUP BY id, type
for this subquery to be predictable. But I guess you want the count of types in items, so what you want is this.
SELECT type, count(*) AS count FROM items GROUP BY type
Your outer query says
SELECT a.id
FROM items
JOIN (subquery) c ON c.id = a.id
WHERE a.id >= '100'
AND a.id <= '200'
Your outer query returns no items from your subuery so it's not clear what result you need. Inner JOIN operations can serve to filter a resultset. But because of your ANY_VALUE(id) situation, your filter is unpredicatable.

Check if table a primary key is exist in table b

Table A:
ID, Name, etc.
Table B:
ID, TableA-ID.
SELECT * FROM A;
and I want to return a boolean value in the same result for this condition ( if A.ID Exists in Table B).
There are several ways of achieving what you need. Below are three possibilities. These all differ in execution plans and how database actually wants to execute them so depending on your record count one may be more efficient than the other. It's better if you see it for yourself.
1) Use LEFT JOIN and check if a non-null field from B is not null to ensure the record exists. Then apply DISTINCT clause if relationship is 1:N to only show rows from A without duplicates.
select distinct a.*, b.id is not null as exists_b
from a
left join b on
a.id = b.tablea-id
2) Use exists() function, which will be evaluated for each row being returned from table A.
select a.*, exists(select 1 from b where a.id = b.tablea-id) as exists_b
from a
3) Use a combination of subquery expression EXISTS and it's contradiction in two queries to check if a record has or has not a match within table B. Then UNION ALL to combine both results into one.
select *, true as exists_b
from a
where exists (
select 1
from b
where a.id = b.tablea-id
)
union all
select *, false as exists_b
from a
where not exists (
select 1
from b
where a.id = b.tablea-id
)
select A.*, IFNULL((select 1 from B where B.TableA-ID = A.ID limit 1),0) as `exists` from A;
The above statement will result in a 1, if the key exists, and a 0 if that key does not exist. Limit 1 is important if there are multiple records in B

Speeding up select where column condition exists in another table without duplicates

If I have the following two tables:
Table "a" with 2 columns: id (int) [Primary Index], column1 [Indexed]
Table "b" with 3 columns: id_table_a (int),condition1 (int),condition2 (int) [all columns as Primary Index]
I can run the following query to select rows from Table a where Table b condition1 is 1
SELECT a.id FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id_table_a=a.id && condition1=1 LIMIT 1) ORDER BY a.column1 LIMIT 50
With a couple hundred million rows in both tables this query is very slow. If I do:
SELECT a.id FROM a INNER JOIN b ON a.id=b.id_table_a && b.condition1=1 ORDER BY a.column1 LIMIT 50
It is pretty much instant but if there are multiple matching rows in table b that match id_table_a then duplicates are returned. If I do a SELECT DISTINCT or GROUP BY a.id to remove duplicates the query becomes extremely slow.
Here is an SQLFiddle showing the example queries: http://sqlfiddle.com/#!9/35eb9e/10
Is there a way to make a join without duplicates fast in this case?
*Edited to show that INNER instead of LEFT join didn't make much of a difference
*Edited to show moving condition to join did not make much of a difference
*Edited to add LIMIT
*Edited to add ORDER BY
You can try with inner join and distinct
SELECT distinct a.id
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
but using distinct on select * be sure you don't distinct id that return wrong result in this case use
SELECT distinct col1, col2, col3 ....
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
You could also add a composite index with use also condtition1 eg: key(id, condition1)
if you can you could also perform a
ANALYZE TABLE table_name;
on both the table ..
and another technique is try to reverting the lead table
SELECT distinct a.id
FROM b INNER JOIN a ON a.id=b.id_table_a AND b.condition1=1
Using the most selective table for lead the query
Using this seem different the use of index http://sqlfiddle.com/#!9/35eb9e/15 (the last add a using where)
# USING DISTINCT TO REMOVE DUPLICATES without col and order
EXPLAIN
SELECT DISTINCT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
;
It looks like I found the answer.
SELECT a.id FROM a
INNER JOIN b ON
b.id_table_a=a.id &&
b.condition1=1 &&
b.condition2=(select b.condition2 from b WHERE b.id_table_a=a.id && b.condition1=1 LIMIT 1)
ORDER BY a.column1
LIMIT 5;
I don't know if there is a flaw in this or not, please let me know if so. If anyone has a way to compress this somehow I will gladly accept your answer.
SELECT id FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
Take the condition into the ON clause of the join, that way the index of table b can get used to filter. Also use INNER JOIN over LEFT JOIN
Then you should have less results which have to be grouped.
Wrap the fast version in a query that handles de-duping and limit:
SELECT DISTINCT * FROM (
SELECT a.id
FROM a
JOIN b ON a.id = b.id_table_a && b.condition1 = 1
) x
ORDER BY column1
LIMIT 50
We know the inner query is fast. The de-duping and ordering has to happen somewhere. This way it happens on the smallest rowset possible.
See SQLFiddle.
Option 2:
Try the following:
Create indexes as follows:
create index a_id_column1 on a(id, column1)
create index b_id_table_a_condition1 on b(a_table_a, condition1)
These are covering indexes - ones that contain all the columns you need for the query, which in turn means that index-only access to data can achieve the result.
Then try this:
SELECT * FROM (
SELECT a.id, MIN(a.column1) column1
FROM a
JOIN b ON a.id = b.id_table_a
AND b.condition1 = 1
GROUP BY a.id) x
ORDER BY column1
LIMIT 50
Use your fast query in a subselect and remove the duplicates in the outer select:
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
WHERE b.id_table_a > :offset
ORDER BY a.column1
LIMIT 50
) sub
Because of removing duplicates you might get less than 50 rows. Just repeat the query until you get anough rows. Start with :offset = 0. Use the last ID from last result as :offset in the following queries.
If you know your statistics, you can also use two limits. The limit in the inner query should be high enough to return 50 distinct rows with a probability which is high enough for you.
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
ORDER BY a.column1
LIMIT 1000
) sub
LIMIT 50
For example: If you have an average of 10 duplicates per ID, LIMIT 1000 in the inner query will return an average of 100 distinct rows. Its very unlikely that you get less than 50 rows.
If the condition2 column is a boolean, you know that you can have a maximum of two duplicates. In this case LIMIT 100 in the inner query would be enough.