I have this type of table:
A.code A.name
1. X
2. Y
3. X
4. Z
5. Y
And i need to write a query that gives me all duplicated names like this:
A.name
X
Y
Z
Without using "group by".
The correlated subquery is your friend here. The subquery is evaluated for every row in the table referenced in the outer query due to the table alias used in both the outer query and the subquery.
In the subquery, the outer table is queried again without the alias to determine the row's compliance with the condition.
SELECT DISTINCT name FROM Names AS CorrelatedNamesTable
WHERE
(
SELECT COUNT(Name) FROM Names WHERE Name = CorrelatedNamesTable.Name
) > 1
Try using DISTINCT for the column. Please note in tables with a large number of rows, this is not the best performance option.
SELECT DISTINCT A.Name FROM A
SELECT a1.name FROM A a1, A a2 WHERE a1.name=a2.name AND a1.code<>a2.code
This assumes code is unique ;).
Related
Is there a way to go through a FOR LOOP in a SELECT-query? (1)
I am asking because I do not know how to commit in a single
SELECT-query collection of some data from table t_2 for each row of
table t_1 (please, see UPDATE for an example). Yes, it's true that we can GROUP BY a UNIQUE INDEX but
what if it's not present? Or how to request all rows from t_1, each concatenated with a specific related row from t_2. So, it seems like in a Perfect World we would have to be able to loop through a table by a proper SQL-command (R). Maybe, ANY(...) will help?
Here I've tried to find maximal count of repetitions in column prop among all values of the column in table t.
I.e. I've tried to carry out something alike Pandas'
t.groupby(prop).max() in an SQL query (Q1):
SELECT Max(C) FROM (SELECT Count(t_1.prop) AS C
FROM t AS t_1
WHERE t_1.prop = ANY (SELECT prop
FROM t AS t_2));
But it only throws the error:
Every derived table must have its own alias.
I don't understand this error. Why does it happen? (2)
Yes, we can implement Pandas' value_counts(...) way easier
by using SELECT prop, COUNT() GROUP BY prop. But I wanted to do it in a "looping" way staying in a "single non-grouping SELECT-query mode" for reason (R).
This sub-query, which attempts to imitate Pandas' t.value_counts(...)) (Q2):
SELECT Count(t_1.prop) AS C FROM t AS t_1 WHERE t_1.prop = ANY(SELECT prop FROM t AS t_2)
results in 6, which is simply a number of rows in t. The result is logical. The ANY-clause simply returned TRUE for every row and once all rows had been gathered COUNT(...) returned simply the number of the gathered (i.e. all) rows.
By the way, it seems to me that in the "full" previous SELECT-query (Q1) should return that very 6.
So, the main question is how to loop in such a query? Is there such
an opportunity?
UPDATE
The answer to the question (2) is found here, thanks to
Luuk. I just assigned an alias to the (...) subquery in SELECT Max(C) FROM (...) AS sq and it worked out. And of course, I got 6. So, the question (1) is still unclear.
I've also tried to do an iteration this way (Q3):
SELECT (SELECT prop_2 FROM t_2 WHERE t_2.prop_1 = t_1.prop) AS isq FROM t_1;
Here in t_2 prop_2 is connected to prop_1 (a.k.a. prop in t_1) as many to one. So, along the course, our isq (inner select query) returns several (rows of) prop_2 values per each prop value in t_1.
And that is why (Q3) throws the error:
Subquery returns more than 1 row.
Again, logical. So, I couldn't create a loop in a single non-grouping SELECT-query.
This query will return the value for b with the highest count:
SELECT b, count(*)
FROM table1
GROUP BY b
ORDER BY count(*) DESC
LIMIT 1;
see: DBFIDDLE
EDIT: Without GROUP BY
SELECT b,C1
FROM (
SELECT
b,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY A) C1,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY A DESC) C2
FROM table1
) x
WHERE x.C2=1
see: DBFIDDLE
Basically, I have two tables. From table A, I want to calculate the total number of rows from it. I can use SELECT COUNT(*) FROM A as the first query to get it. From other table B, I want to select all things(columns) from it. I can use SELECT * FROM B as the second query. My question is how to use a single query to add the result from the first query as a column to the second query result. In other words, I want to have an extra column with the value of total number of rows from Table A to all things from Table B, by using a single query.
CROSS JOIN it:
SELECT * FROM
(SELECT COUNT(*) as cnt FROM A) a
CROSS JOIN
B
Join makes the resultset wider. Union makes the resultset taller. Any time you want to grow the number of columns you have to join, but if you haven't got anything to join ON you can use a CROSS JOIN as it doesn't require any ON predicates
You could alternatively use an INNER JOIN with a predicate that is always true, an old style join syntax without any associated WHERE, or you can put a select that returns a single value as a subquery in the select list area without any co-ordinating predicates. Most DBA would probably assert that none of these are preferable to the CROSS JOIN syntax because CROSS JOIN is an explicit statement of your intent, whereas the others might just look like you forgot something
I'm trying to make an SQL query, that returns all the unique names and a sum of occurences for each name.
This is what I came up with, but it merely gets the sum of all names and not the sum of each name separately.
select distinct(etunimi) as etunimi,
(select count(distinct(etunimi)) as määrä from jasenet)
from jasenet;
Is this the right way to go when solving this problem or is there another way of achieving this? thank you.
If you group by a column then aggregate functions like count() apply to each group and not the complete result set.
select etunimi, count(*)
from jasenet
group by etunimi
That because you haven't reference the colomn from outerquery with subquery
So, it should be referenced like that :
select distinct etunimi,
(select count(*)
from jasenet j1
where j1.etunimi = j.etunimi
) as määrä
from jasenet j;
However, i would also suggest to use GROUP BY clause which is more efficient than correlated subquery.
I have this query:
SELECT (#a:=#a+1) AS priority
FROM (SELECT t1.name FROM t1 LIMIT 100) x, (SELECT #a:=0) r
a few questions:
1 - What is the comma doing between the SELECTS? I have never seen a comma between commands, and I don't know what it means
2 - why is the second SELECT given a name?
3 - why is the second SELECT inside brackets?
4 - Performance-wize: Does it select the first 100 rows form t1, and then assigns them a number? What is going on here??
It is performing a CROSS JOIN (a cartesian product of the rows) but without the explicit syntax. The following 2 queries produce identical in results:
SELECT *
FROM TableA, TableB
SELECT *
FROM TableA
CROSS JOIN TableB
The query in the question uses 2 "derived tables" instead. I would encourage you to use the explicit join syntax CROSS JOIN and never use just commas. The biggest issue with using just commas is you have no idea if the Cartesian product is deliberate or accidental.
Both "derived tables" have been given an alias - and that is a good thing. How else would you reference some item of the first or second "derived table"? e.g. Imagine they were both queries that had the column ID in them, you would then be able to reference x.ID or r.ID
Regarding what the overall query is doing. First note that the second query is just a single row (1 row). So even though the syntax produces a CROSS JOIN it does not expand the total number of rows because 100 * 1 = 100. In effect the subquery "r" is adding a "placeholder" #a (initially at value zero) on every row. Once that #a belongs on each row, then you can increment the value by 1 for each row, and as a result you get that column producing a row number.
x and r are effectively anonymous views produced by the SELECT statements. If you imagine that instead of using SELECTs in brackets, you defined a view using the select statement and then referred to the view, the syntax would be clear.
The selects are given names so that you can refer to these names in WHERE conditions, joins or in the list of fields to select.
That is the syntax. You have to have brackets.
Yes, it selects the first 100 rows. I am not sure what you mean by "gives them a number".
I have a database that has the following columns:
-------------------
id|domain|hit_count
-------------------
And I would like to perform this query on it:
SELECT id,MIN(hit_count)
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY MIN(hit_count)
I would like this query to give me the id of the row that had the smallest hit_count for $domain. The only problem is that if I have two rows that have the same domain, say www.bestbuy.com, the query will just group by whichever one came first, and then although I will get the correct lowest hit_count, the id may or may not be the id of the row that has the lowest hit_count.
Does anyone know of a way for me to perform this query and to get the id that matches up with MIN(hit_count)? Thanks!
Try this:
SELECT id,MIN(hit_count),domain FROM table GROUP BY domain HAVING domain='$domain'
See, when you're using aggregates, either via aggregate functions (and min() is such a function) or via GROUP BY or HAVING operators, your data is being grouped. In your case it is grouped by domain. You have 2 fields in your select list, id and min(hit_count).
Now, for each group database knows which hit_count to pick, as you've specified this explicitly via the aggregate function. But what about id — which one should be included?
MySQL internally wraps such fields into max() aggregate function, which I find an error prone approach. In all other RDBMSes you will get an error for such a query.
The rule is: if you use aggregates, then all columns should be either arguments of aggregate functions or arguments of GROUP BY operator.
To achieve the desired result, you need a subquery:
SELECT id, domain, hit_count
FROM `table`
WHERE domain = '$domain'
AND hit_count = (SELECT min(hit_count) FROM `table` WHERE domain = '$domain');
I've used backticks, as table is a reserved word in SQL.
SELECT
id,
hit_count
FROM
table
WHERE
domain='$domain'
AND hit_count = (SELECT MIN(hit_count) FROM table WHERE domain='$domain')
Try this:
SELECT id,hit_count
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY hit_count ASC;
This should also work:
select id, MIN(hit_count) from table where domain="$domain";
I had same question. Please see that question below.
min(column) is not returning me correct data of other columns
You are using a GROPU BY. Which means each row in result represents a group of values.
One of those values is the group name (the value of the field you grouped by). The rest are arbitrary values from within that group.
For example the following table:
F1 | F2
1 aa
1 bb
1 cc
2 gg
2 hh
If u will group by F1: SELECT F1,F2 from T GROUP BY F1
You will get two rows:
1 and one value from (aa,bb,cc)
2 and one value from (gg,hh)
If u want a deterministic result set, you need to tell the software what algorithem to apply to the group. Several for example:
MIN
MAX
COUNT
SUM
etc etc
There is a most simplist way your query is OK just modify it with DESC keyword after GROUP BY domain
SELECT
id,
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
Explanation:
When you use group by with aggregate function it always selects the first record but if you restrict it with desc keyword it will select the lowest or last record of that group.
For testing puspose use this query that has only group_concat added.
SELECT
group_concat(id),
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
If you can have duplicated domains group by id:
SELECT id,MIN(hit_count)
FROM domain WHERE domain='$domain'
GROUP BY id ORDER BY MIN(hit_count)