SQL query: group and count distinct values - mysql

Given this DB table:
A_column B_column
---------------
A1 1
A1 2
A1 2
A2 1
A2 1
A2 1
A3 2
A3 3
A3 4
A3 5
How do I write SQL SELECT query to print out number of unique values in B_column per value in A_column, so output would be like this:
A1 2
A2 1
A3 4
I tried this, but doesn't seem to work properly:
SELECT A_column, count(B_column) FROM table GROUP BY A_column

Use distinct:
SELECT A_column, count(distinct B_column) FROM table GROUP BY A_column

Related

How to filter values in column by values in other column in sql

I have a dataframe:
id value
a1 0
a1 1
a1 2
a1 3
a2 0
a2 1
a3 0
a3 1
a3 2
a3 3
I want to filter id's and leave only those which have value from 0 to 3 (0,1,2,3). So in this example id a2 must be removed since it only has values 0 and 1. So desired result is:
id value
a1 0
a1 1
a1 2
a1 3
a3 0
a3 1
a3 2
a3 3
How to to that?
A simple aggregation approach would be:
SELECT id, value
FROM yourTable
WHERE id IN (
SELECT id
FROM yourTable
WHERE value IN (0, 1, 2, 3)
GROUP BY id
HAVING COUNT(DISTINCT value) = 4
);
I would suggest window functions, especially if you are using Presto. Assuming the rows have only those four values and the rows are unique:
select t.*
from (select t.*, count(*) over (partition by id) as cnt
from t
) t
where cnt = 4;
If values can be outside this range, you can use conditional aggregation:
select t.*
from (select t.*,
count(case when value in (0, 1, 2, 3) then 1 else 0 end) over (partition by id) as cnt
from t
) t
where cnt = 4;

Remove rows that do not differ from the previous row in MySQL

Suppose I have a table that records changes to my database over time:
TimeOfChange FieldA FieldB FieldC
-------------------------------------
2019-01-01 A1 B1 C1 /*(R1)*/
2019-01-02 A2 B2 C1 /*(R2)*/
2019-01-03 A2 B2 C1 /*(R3)*/
2019-01-05 A1 B1 C2 /*(R4)*/
2019-01-07 A1 B1 C1 /*(R5)*/
My database has many rows where nothing significant changed, eg row (R3) is the same as (R2).
I would like to remove these rows. I have found many references on how to use a common table expression to remove duplicate rows from the table. So it's possible to remove the duplicate (ignoring the TimeOfChange column) rows. But this will remove (R5) as well because it is the same as R1. I only want to remove the rows that have the same ABC values as the previous row, when ordered by the TimeOfChange column. How do I do that?
edit: You can assume that TimeOfChange values are all unique
Assuming the TimeOfChange is unique, you can do:
delete
from data
where TimeOfChange in (
select TimeOfChange
from (
select d2.TimeOfChange
from data d1
join data d2
where d2.TimeOfChange in (
select min(x.TimeOfChange)
from data x
where x.TimeOfChange>d1.TimeOfChange
) and d1.FieldA=d2.FieldA and d1.FieldB=d2.FieldB and d1.FieldC=d2.FieldC
) as q
);
So you first want to determine which rows are the "next" and then check if the "next" has the same values as the "current". For those the "next" would form a result set that you want to use in DELETE. The select * from data is there to circumvent the reuse of the table in DELETE and in the subquery.
You probably will get much better performance if you separate the logic into a stored procedure and store the id's for rows to be deleted into a temp table.
See DB Fiddle
Presuming, you really meant "when the same A, B, C occurred on the most recent day prior that had any data", this should be usable to identify the rows that need removed:
SELECT t2.TimeOfChange, t2.FieldA, t2.FieldB, t2.FieldC
FROM (
SELECT tMain.TimeOfChange, tMain.FieldA, tMain.FieldB, tMain.FieldC
, MAX(tPrev.TimeOfChange) AS prevTimeOfChange
FROM t AS tMain
LEFT JOIN t AS tPrev ON t.TimeOfChange> tPrev.TimeOfChange
GROUP BY tMain.TimeOfChange, tMain.FieldA, tMain.FieldB, tMain.FieldC
) AS t2
INNER JOIN t AS tPrev2
ON t2.prevTimeOfChange = tPrev2.TimeOfChange
AND t2.FieldA = tPrev2.FieldA
AND t2.FieldB = tPrev2.FieldB
AND t2.FieldC = tPrev2.FieldC
This can then be used in a DELETE with some indirection to force a temp table to be created.
DELETE td
FROM t AS td
WHERE (td.TimeOfChange, td.FieldA, td.FieldB, td.FieldC)
IN (SELECT * FROM ([the query above]) AS tt) -- Yes, you have to wrap the query from above in a select * so mysql will not reject it.
;
However, after getting this far, what happens when....
2019-01-01 A1 B1 C1
2019-01-02 A2 B2 C1
2019-01-03 A2 B2 C1
2019-01-04 A1 B1 C2
2019-01-05 A1 B1 C3
2019-01-05 A1 B1 C1
2019-01-06 A1 B1 C3
2019-01-07 A1 B1 C1
becomes
2019-01-01 A1 B1 C1
2019-01-02 A2 B2 C1
2019-01-04 A1 B1 C2
2019-01-05 A1 B1 C3
2019-01-05 A1 B1 C1
2019-01-07 A1 B1 C1
Does a second pass now need made to remove the 2019-01-07 entry?
Are you going to run the query repeatedly until no rows are affected?

distinct rows when looking at selected columns

I have a table with 6 column that lists order lines.
the columns are Order, Orderline, ProductName, Description, UpdateDate, Location.
What I want to do is looking at values for first 4 columns (order, Line, ProductName, description).
If the rows are not identical in these four columns I want to return
Order, Line, Name, and Description and update dates.
If they are identical I want to
return just one of rows back (first or last).
Order LineNumber ProductName description UpdateDate Location
Order1 1 a1 b1 d1 n
Order1 1 a1 b1 d2 m
Order1 1 a3 b3 d5 L
Order2 1 a1 b1 d3 o
Order2 2 a2 b2 d4 m
I want the result to be:
Order LineNumber ProductName description UpdateDate Location
Order1 1 a1 b1 d1 n
Order1 1 a3 b3 d5 L
Order2 1 a1 b1 d3 o
Order2 2 a2 b2 d4 m
For Order1:
line 1 repeated 3 times.
2 times out of three ProductName a1, and description b1 are identical so one of these two will be returned.
1 time out of three productName a3 and description b3 is unique so this line will be returned as well.
For Order2:
all lines are identical unigue in Name and description so both lines will be returned.
Any help appriciated
You can use window function
SELECT *
FROM
(SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Order], LineNumber, ProductName, [DESCRIPTION] ORDER BY UpdateDate) AS RowNum
FROM YourTable) DerivedTable
WHERE RowNum = 1

Mysql query to get only 4 records in each row

I am facing issue in one mysql query. The database records are as follows.
id name
1 a1
2 a1
3 a1
4 a1
5 a1
6 a1
7 a1
8 a1
and my expected result is as follows
id name
1,2,3,4 a1
5,6,7,8 a1
4 records in each row
Can anyone please help me ?
Please try with this
SELECT GROUP_CONCAT(id) FROM (SELECT
CEIL(#rownum :=#rownum + 1 / 4) AS pageNo,
id,
name
FROM
( SELECT #rownum := 0) r, temp ) tempTable GROUP BY pageNo

Tree structure - Fetch last level data from two tables

I have data in below structure.
A
A1 A2
B B1
C C1 C2 C3
These information transferred into two table named group1 and group2.
group1 has first level of data and middle level data.
group2 has last level of data and middle level data.
ie
group1
group_name group_id
A 1
A1 2
B 3
C 4
C1 5
C2 6
group2
group2_name parent_id
A1 1
A2 2
B 1
B1 3
C 1
C1 4
C2 5
C3 6
Now i want to get the last level of information under the group A.
My output could be
group2_name
A2
B1
C3
I can fetch the information level 2 by using below query.
select group2.group_name from group2
inner join
group1 on group1.group_id = group2.parent_id
where group1.group_name = 'A'
How can get the above output?
Here is SQLFIDDLE Demo
Kindly help me.
You could use this:
select
group2.group_name
from
group2 left join group1
using(group_name)
where
group1.group_name is null
and group2.group_name like 'A%'
that returns all elements from table group2 that are not present in table group1.
Or (depending on how your database is structured) also this:
select
concat(left(group_name,1),
case when max(mid(group_name,2,length(group_name)-1)+0)>0 then
max(mid(group_name,2,length(group_name)-1)+0)
else '' end)
from group2
where group2.group_name like 'A%'
group by left(group_name,1)
here I am grouping for the first character of the string, and getting the maximum value of the numeric value.