bigquery - filter out unique results only - unique

My database looks the following:
Entry-Key Name Surname Age
10a Smith Alex 35
11b Finn John 41
10a Smith Al 35
10c Finn Berta 28
11b Fin John 41
I need to get unique rows out of it. Group by does not work properly since sometimes there are inaccuracies in Name/Surname columns.
I thought to group by just the Entry-Keys and then find the first appearance of the Key in the table and take only this row. I know how to do it in Excel but since the database has some 100,000 lines Excel is not a real option.
the idea is to get finally this table:
10a Smith Alex 35
11b Finn John 41
12c Finn Berta 28
Please help!

For your logic you can do the below query:
select key, first(name), first(surname), first(age) from
(select '10a' as key, 'Smith' as name, 'Alex' as surname, 35 as age),
(select '11b' as key, 'Finn' as name, 'John' as surname, 41 as age),
(select '10a' as key, 'Smith' as name, 'Al' as surname, 35 as age),
(select '10c' as key, 'Finn' as name, 'Berta' as surname, 28 as age),
(select '11b' as key, 'Fin' as name, 'John' as surname, 41 as age),
group by key
This returns:
+-----+-----+-------+-------+-----+---+
| Row | key | f0_ | f1_ | f2_ | |
+-----+-----+-------+-------+-----+---+
| 1 | 10a | Smith | Alex | 35 | |
| 2 | 11b | Finn | John | 41 | |
| 3 | 10c | Finn | Berta | 28 | |
+-----+-----+-------+-------+-----+---+

Related

Why is this left outer join include rows twice?

In the following case:
CREATE TABLE Persons (
groupId int,
age int,
Person varchar(255)
);
insert into Persons (Person, groupId, age) values('Bob' , 1 , 32);
insert into Persons (Person, groupId, age) values('Jill' , 1 , 34);
insert into Persons (Person, groupId, age)values('Shawn' , 1 , 42);
insert into Persons (Person, groupId, age) values('Shawn' , 1 , 42);
insert into Persons (Person, groupId, age) values('Jake' , 2 , 29);
insert into Persons (Person, groupId, age) values('Paul' , 2 , 36);
insert into Persons (Person, groupId, age) values('Laura' , 2 , 39);
The following query:
SELECT *
FROM `Persons` o
LEFT JOIN `Persons` b
ON o.groupId = b.groupId AND o.age < b.age
returns (executed in http://sqlfiddle.com/#!9/cae8023/5):
1 32 Bob 1 34 Jill
1 32 Bob 1 42 Shawn
1 34 Jill 1 42 Shawn
1 32 Bob 1 42 Shawn
1 34 Jill 1 42 Shawn
1 42 Shawn (null) (null) (null)
1 42 Shawn (null) (null) (null)
2 29 Jake 2 36 Paul
2 29 Jake 2 39 Laura
2 36 Paul 2 39 Laura
2 39 Laura (null) (null) (null).
I don't understand the result.
I was expecting
1 32 Bob 1 34 Jill
1 32 Bob 1 42 Shawn
1 34 Jill 1 42 Shawn
1 42 Shawn (null) (null) (null)
2 29 Jake 2 36 Paul
2 29 Jake 2 39 Laura
2 39 Laura (null) (null) (null)
Reason I was expecting that is that in my understanding the left join picks each row from the left table, tries to match it each row of the right table and if there is a match it adds the row. If there is no match in the condition it adds the left row with null values for the right columns.
So if that is correct why in the fiddle output we have after
1 34 Jill 1 42 Shawn
rows for Bob and Jill repeated?
Your condition for joining rows is that the groupId is equal and o.age < b.age.
Bob's age is 32. That is less than Jill's age of 34. It is also less than Shawn's age of 42. So the condition is satisfied in two pairings of joined rows.
The joined row has all the columns from the row referenced as o and all the columns from the row referenced as b.
Note that you have entered two rows for Shawn. Bob's row actually matches Jill's row and both rows for Shawn. So you get three rows for Bob.
When I test your query on my local MySQL instance (8.0.31), I get the result in the following order, which is different from your sqlfiddle's result:
+---------+------+--------+---------+------+--------+
| groupId | age | Person | groupId | age | Person |
+---------+------+--------+---------+------+--------+
| 1 | 32 | Bob | 1 | 42 | Shawn |
| 1 | 32 | Bob | 1 | 42 | Shawn |
| 1 | 32 | Bob | 1 | 34 | Jill |
| 1 | 34 | Jill | 1 | 42 | Shawn |
| 1 | 34 | Jill | 1 | 42 | Shawn |
| 1 | 42 | Shawn | NULL | NULL | NULL |
| 1 | 42 | Shawn | NULL | NULL | NULL |
| 2 | 29 | Jake | 2 | 39 | Laura |
| 2 | 29 | Jake | 2 | 36 | Paul |
| 2 | 36 | Paul | 2 | 39 | Laura |
| 2 | 39 | Laura | NULL | NULL | NULL |
+---------+------+--------+---------+------+--------+
Without an explicit ORDER BY clause, the default behavior of InnoDB is to return rows in the order they are read from the index. In this case, it's using the primary key order for both tables, because there's no other index to optimize the join. You can see that the order of columns from the left table match the primary key order.
I'm not sure how to explain why the Bob-Shawn rows are before the Bob-Jill row, because that's not primary key order for the joined table. It could be that the order is messed up in the join buffer while doing an unindexed join.
The sqlfiddle might be doing something in the client that reorders rows.
You inserted the record of (Shawn) twice. Your query should be :
CREATE TABLE Persons (
groupId int,
age int,
Person varchar(255)
);
insert into Persons (Person, groupId, age) values('Bob' , 1 , 32);
insert into Persons (Person, groupId, age) values('Jill' , 1 , 34);
insert into Persons (Person, groupId, age)values('Shawn' , 1 , 42);
insert into Persons (Person, groupId, age) values('Jake' , 2 , 29);
insert into Persons (Person, groupId, age) values('Paul' , 2 , 36);
insert into Persons (Person, groupId, age) values('Laura' , 2 , 39);
SELECT *
FROM `Persons` o
LEFT JOIN `Persons` b
ON o.groupId = b.groupId AND o.age < b.age
;
This will gives you the following results
1 32 Bob 1 34 Jill
1 32 Bob 1 42 Shawn
1 34 Jill 1 42 Shawn
1 42 Shawn (null) (null) (null)
2 29 Jake 2 36 Paul
2 29 Jake 2 39 Laura
2 36 Paul 2 39 Laura
2 39 Laura (null) (null) (null)

query to display the student table with students of age more than 18 with unique city

My table in MySQL :
mysql> select * from student;
+-----+----------+--------+------+---------+
| ano | name | gender | age | place |
+-----+----------+--------+------+---------+
| 114 | ron | m | 18 | cbe |
| 115 | dhruv | m | 18 | cbe |
| 116 | mini | f | 23 | chennai |
| 117 | yash | m | 20 | chennai |
| 118 | aathmika | f | 19 | delhi |
| 119 | aadhi | m | 9 | pune |
+-----+----------+--------+------+---------+
There was a question called :
Create a query to display the student table with students of age more than 18 with unique
city.
According to me, required output :
+-----+----------+--------+------+---------+
| ano | name | gender | age | place |
+-----+----------+--------+------+---------+
| 116 | mini | f | 23 | chennai |
| 118 | aathmika | f | 19 | delhi |
+-----+----------+--------+------+---------+
Or
+-----+----------+--------+------+---------+
| ano | name | gender | age | place |
+-----+----------+--------+------+---------+
| 117 | yash | m | 20 | chennai |
| 118 | aathmika | f | 19 | delhi |
+-----+----------+--------+------+---------+
I've tried the following :
mysql> select distinct place from student where age>18;
+---------+
| place |
+---------+
| chennai |
| delhi |
+---------+
2 rows in set (0.05 sec)
I tried to add unique key to place field to delete the second record with cbe, whereas my assumption was wrong.
mysql> alter table student add constraint unique(place);
ERROR 1062 (23000): Duplicate entry 'cbe' for key 'place'
mysql> alter table student modify place char(10) unique;
ERROR 1062 (23000): Duplicate entry 'cbe' for key 'place'
mysql> alter table student change place place char(10) unique;
ERROR 1062 (23000): Duplicate entry 'cbe' for key 'place'
mysql> select place from student where age>18 group by place having count(place)
=1;
+-------+
| place |
+-------+
| delhi |
+-------+
Also,
mysql> select distinct place,name,ano,age from student where age>18;
+---------+----------+-----+------+
| place | name | ano | age |
+---------+----------+-----+------+
| chennai | mini | 116 | 23 |
| chennai | yash | 117 | 20 |
| delhi | aathmika | 118 | 19 |
+---------+----------+-----+------+
3 rows in set (0.00 sec)
When I use many fields along with distinct place, it's distinct characteristic is lost!!!
What changes shall I make in any of the above queries to get the desired output???
Thanks in advance.
Create a query to display the student table with students of age more than 18 with unique city
I understand this as: the student should be more than 18, and their place should appear only once in the table. Only one row meets this criteria, that is ano 118 (Aathmika is 19 years old, and no other student lives in Delhi).
You could phrase this as:
select s.*
from student s
where
age > 18
and not exists(select 1 from student s1 where s1.place = s.place and s1.ano <> s.ano)
Important note : Check your sql_mode before executing this query.
You can try with the following query:
select * from student where age > 18 group by place;
You filter the age with the where statement and then you make the place unique with a group by.
SQL
SELECT *
FROM student s1
WHERE age > 18
AND NOT EXISTS
(SELECT * FROM student s2
WHERE s2.ano < s1.ano /* Or could alternatively use > here */
AND s2.age > 18
AND s2.place = s1.place);
Demo
dbfiddle.uk demo
Have added a couple of extra rows in the demo for testing purposes - please note that some of the other answers fail with this data.
For each distinct place you want only 1 row returned under the condition age > 18.
Since you don't care which row will be returned, if there are more than 1 rows that satisfy the conditions you set, you can GROUP BY place and for each place get only 1 of ano with this query:
SELECT ANY_VALUE(ano) ano
FROM student
WHERE age > 18
GROUP by place
The aggregate function ANY_VALUE() will choose 1 value of ano for each place.
The above query can be joined to the table student on the column ano and return your expected results:
SELECT s.*
FROM student s
INNER JOIN (
SELECT ANY_VALUE(ano) ano
FROM student
WHERE age > 18
GROUP by place
) t ON t.ano = s.ano
See the demo.
Note that instead of ANY_VALUE() you could also use MIN() or MAX(), since you don't care which row will be returned for each place.
From the question, I think the result should have students with age>18 and the city should be only once in the table.
select * from student group by place having age>18 and count(*) = 1;
This query groups by place, checks age and return records for which there is only one row in the group.
if you are not bothered of ambiguity in results - as you expected (mini and yash are from chennai and are above 18), i think following queries may help you
SELECT ano,name, age from student where ano in(SELECT MIN(ano) FROM student WHERE age>18 GROUP BY place)
RESULTS
117 MINI 23 chennai
118 AATHMIKA 29 delhi
OR YOU CAN USE
SELECT ano,name, age from student where ano in(SELECT MIN(ano) FROM student WHERE age>18 GROUP BY place)
RESULTS
4 yash 20 chennai
5 aathmika 19 delhi

Create a new column from existing columns

I want to create a new column from this example table but with one more condition that so far I couldn't figure out, I want to create an average holdings column that's specific to each city.
Name | City | Holdings
Tom Jones | London | 42
Rick James| Paris | 83
Mike Tim | NY | 83
Milo James| London | 83
So in this example table London has more than one instance and accordingly it will have a unique value of '62.5' indicating an average of holdings that's specific to the value in the city column.
Name | City | Holdings | City Avg. Holdings
Tom Jones | London | 42 | 62.5
Rick James| Paris | 36 | 36
Mike Tim | NY | 70 | 70
Milo James| London | 83 | 62.5
In MySQL 8.0, this is straight-forward with window functions:
select t.*, avg(holdings) over(partition by city) avg_city_holdings
from mytable t
In earlier versions, you can join:
select t.*, a.avg_city_holdings
from mytable t
left join (select city, avg(holdings) avg_city_holdings from mytable group by city) a
on a.city = t.city

Use MySQL to create a string out of a subquery?

What I'm hoping to do is create a string out of a table WITHIN a query so that I may be able to place that string in another query I'm creating. Say, I have this for a table:
index | position | name
----------------------------------------
1 | member | John Smith
2 | chair | Mary Jones
3 | member | Mary Jones
4 | contact | Grace Adams
5 | director | Grace Adams
6 | member | Grace Adams
7 | treasurer | Bill McDonnell
8 | vice chair | Bill McDonnell
9 | member | Ishmael Rodriguez
I'm looking for the result as follows:
name | positions
----------------------------------------
John Smith | member
Mary Jones | chair,member
Grace Adams | contact,director,member
Bill McDonnell | treasurer,vice chair
Ishmael Rodriguez | member
I was hoping I could use some variant of CONCAT_WS() to get my result, like this...
SELECT
a.NAME,
CONCAT_WS(
',',
(
SELECT
position
FROM
TABLE
WHERE
NAME = a.NAME
)
)AS positions FROM ---------------
Obviously, this isn't working out for me. Any ideas?
Use GROUP_CONCAT[docs]
SELECT name, GROUP_CONCAT(position) result
FROM tableName
GROUP BY name
ORDER BY `index`
SQLFiddle Demo
Use GROUP_CONCAT like so:
SELECT name, GROUP_CONCAT(position SEPARATOR ',')
FROM Table
GROUP BY name

Tricky SQL query involving consecutive values

I need to perform a relatively easy to explain but (given my somewhat limited skills) hard to write SQL query.
Assume we have a table similar to this one:
exam_no | name | surname | result | date
---------+------+---------+--------+------------
1 | John | Doe | PASS | 2012-01-01
1 | Ryan | Smith | FAIL | 2012-01-02 <--
1 | Ann | Evans | PASS | 2012-01-03
1 | Mary | Lee | FAIL | 2012-01-04
... | ... | ... | ... | ...
2 | John | Doe | FAIL | 2012-02-01 <--
2 | Ryan | Smith | FAIL | 2012-02-02
2 | Ann | Evans | FAIL | 2012-02-03
2 | Mary | Lee | PASS | 2012-02-04
... | ... | ... | ... | ...
3 | John | Doe | FAIL | 2012-03-01
3 | Ryan | Smith | FAIL | 2012-03-02
3 | Ann | Evans | PASS | 2012-03-03
3 | Mary | Lee | FAIL | 2012-03-04 <--
Note that exam_no and date aren't necessarily related as one might expect from the kind of example I chose.
Now, the query that I need to do is as follows:
From the latest exam (exam_no = 3) find all the students that have failed (John Doe, Ryan Smith and Mary Lee).
For each of these students find the date of the first of the batch of consecutively failing exams. Another way to put it would be: for each of these students find the date of the first failing exam that comes after their last passing exam. (Look at the arrows in the table).
The resulting table should be something like this:
name | surname | date_since_failing
------+---------+--------------------
John | Doe | 2012-02-01
Ryan | Smith | 2012-01-02
Mary | Lee | 2012-03-04
How can I perform such a query?
Thank you for your time.
You can take advantage of the fact that if someone passed the most recent exam, then they have not failed any exams since their most recent pass: therefore the problem reduces to finding the first exam failed since the most recent pass:
SELECT name, surname, MIN(date) date_since_fail
FROM results NATURAL LEFT JOIN (
SELECT name, surname, MAX(date) lastpass
FROM results
WHERE result = 'PASS'
GROUP BY name, surname
) t
WHERE result = 'FAIL' AND date > IFNULL(lastpass,0)
GROUP BY name, surname
See it on sqlfiddle.
I should use a subquery that fetch last passed exam,
somthing like:
SET #query_exam_no = 3;
SELECT
name,
surname,
MIN(IF(date > last_passed_exam, date, NULL)) AS date_failing_since
FROM
exam_results
LEFT JOIN (
SELECT
name,
surname,
MAX(date) AS last_passed_exam
FROM exam_results
WHERE result = 'PASS'
GROUP BY name, surname
) AS last_passed_exams USING (name, surname)
HAVING
MAX(IF(exam_no = #query_exam_no, result, NULL)) = 'FAIL'
GROUP BY name, surname
This is enough:
select t.name,
t.surname,
t.date as 'date_since_failing'
from tablename t
inner join
(
select name,
surname,
max(exam_no) as exam_no
from tablename
group by name, surname
having min(result) = 'FAIL'
) aux on t.name = aux.name and t.surname = aux.surname and t.exam_no = aux.exam_no
The condition you are asking for is good for nothing you can do it without it. Here is the working example.
select
e.name,
e.sur_name,
min(e.date) as `LastFailed`
from exams as e
where e.result = 'Fail'
group by e.name
order by e.name
This produces this result
name sur_name LastFailed
Ann Evans 2012-02-03
John Doe 2012-02-01
Mary Lee 2012-01-04
Ryan Smith 2012-01-02