I am trying to fetch distinct count of all columns in a single query. Consider the below table.
COL1 | COL2 | COL3
A | 5 | C
B | 5 | C
C | 5 | C
C | 5 | C
D | 7 | C
Expected result
DC_COL1 | DC_COL2 | DC_COL3 #DC - Distinct count
4 | 2 | 1
Though the above result can not be achieved (AFAIK) in a single query (single full table scan) using valid group by functions, what are the optimisations that could be done here?
Firing individual queries for each column might result in full table scan for each column. Though the entire table might have come to the buffer pool during the distinct count query for the first column but it will still be a performance issue on large tables.
It can be done in a single table scan:
SELECT
COUNT(DISTINCT COL1) DC_COL1,
COUNT(DISTINCT COL2) DC_COL2,
COUNT(DISTINCT COL3) DC_COL3
FROM tablename
Related
I have two tables, Table1 and Table2.
Based on a value inside a column on Table1, can I inner join a certain number of columns from Table2, JOIN on ID.
Table1:
id | col_number |
1 | 2
2 | 3
Table2:
id | col1 | col2 | col3
1 | BRK | GOOG | APPL
2 |AMZN | INTC | TSLA
Expected Outcome, If the query was run for ID1:
id | col_number | col1 | col2
1 | 2 | BRK | GOOG
I haven’t been able to find many examples of conditional inner joins easy enough for me to attempt to understand them. Those I have found are conditional on different tables, not columns.
Fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=4efaf735c1f28fa8a9e55d77ca30fa71
The select-list of an SQL query must be fixed before the query is parsed and prepared, and that happens before the query begins reading any rows of data. This means you can't make a query that returns a different number of columns depending on the data values in some of the rows it reads.
Also, any query result must have the same number of columns in every row, not a dynamic number of columns.
You could, however, make some of the expressions return NULL in some columns depending on a data value.
SELECT table1.id, table1.col_number,
CASE WHEN table1.col_number >= 1 THEN table2.col1 ELSE NULL END AS col1,
CASE WHEN table1.col_number >= 2 THEN table2.col2 ELSE NULL END AS col2,
CASE WHEN table1.col_number >= 3 THEN table2.col3 ELSE NULL END AS col3
FROM table1 JOIN table2 USING (id);
If I have three tables each having 1000 rows with number from 1 to 1000, then how many comparisons does MySQL during the following JOINs queries:
SELECT *
FROM table_1 t1
JOIN table_2 t2 ON t1.id_1 = t2.id_2
JOIN table_3 t3 ON t2.id_2 = t3.id_3;
and
SELECT *
FROM table_1 t1
JOIN table_2 t2
JOIN table_3 t3
WHERE t1.id_1 = t2.id_2 AND t2.id_2 = t3.id_3;
Note that there are no indexes on any of the tables. Is there a difference between the two queries? I'm thinking that in second query, it is written with the intention such that MySQL would create a cartesian product of all three tables and then filter out the rows matching the condition(having to scan 1000 X 1000 X 1000 rows) but internally it is translated to first query in which it would scan 1000 X 1000 rows in the first JOIN followed by another 1000 X 1000 rows in the second JOIN. The result should be same(1000 rows with three columns have same number 1 to 1000).
Which one is it? 1000 X 1000 X 1000 rows OR 1000 X 1000 + 1000 X 1000 rows
Question arose after reading the following from "Query Optimization" chapter from MySQL by Paul DuBois(4th edition, page 306):
Those two queries are treated the same.
For clarity, ON should state how the tables are related; WHERE should filter.
But, for optimization, those are equivalent for JOIN.
FOr LEFT JOIN, it does matter where you put the conditions.
As an extra wrinkle to your question,... If the tables had different numbers of rows, the Optimizer would probably start with the smallest table.
See this to find out how to count the rows: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts
Please provide SHOW CREATE TABLE so we can understand id_1, etc.
It would be foolish to run such code without indexes.
What is the expected output? Something like this?
+---+---+---+
| n | n | n |
+---+---+---+
| 0 | 0 | 0 |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
...
During a table join, when does MySQL use this function?
The single result column that replaces two common columns is defined
using the coalesce operation. That is, for two t1.a and t2.a the
resulting single join column a is defined as a = COALESCE(t1.a, t2.a),
where:
COALESCE(x, y) = (CASE WHEN x IS NOT NULL THEN x ELSE y END)
https://dev.mysql.com/doc/refman/8.0/en/join.html
I know what the function does, but I want to know when it is used during the join operation. This just makes no sense to me! Can someone show me an example?
That is in reference to redundant column elimination during natural join and join with using. Describing how the columns are excluded from display.
The order of operation is described above the section you referenced.
First, coalesced common columns of the two joined tables, in the order in which they occur in the first table
Second, columns unique to the first table, in order in which they occur in that table
Third, columns unique to the second table, in order in which they occur in that table
Example
t1
| a | b | c |
| 1 | 1 | 1 |
t2
| a | b | d |
| 1 | 1 | 1 |
The join with using
SELECT * FROM t1 JOIN t2 USING (b);
Would result in, t1.b being coalesced (due to USING), followed by the columns unique to the first table, followed by those in the second table.
| b | a | c | a | d |
| 1 | 1 | 1 | 1 | 1 |
Whereas a natural join
SELECT * FROM t1 NATURAL JOIN t2;
Would result in, the t1 columns (or rather common columns from both tables) being coalesced, followed by the unique columns of the first table, followed by those in the second table.
| a | b | c | d |
| 1 | 1 | 1 | 1 |
I have inherited a table where one column is a comma-separated list of primary keys for a different table:
id | other_ids | value
---|-----------|-------
1 | a,b,c | 100
2 | d,e | 200
3 | f,g | 3000
I would like to convert this table to one where each other_id gets a column of its own:
id | other_id
---|---------
1 | a
1 | b
1 | c
2 | d
2 | e
3 | f
3 | g
However, I cannot think of a way to do this?
The table is > 10 GB in size, so I would like to do this inside the database, if possible.
first time post, please be kind.
Try this
select id,SUBSTRING_INDEX(other_ids,',',1) as other_id from reverseconcat
UNION
select id,SUBSTRING_INDEX(SUBSTRING_INDEX(other_ids,',',2),',',-1) as other_id from reverseconcat
UNION
select id,SUBSTRING_INDEX(SUBSTRING_INDEX(other_ids,',',3),',',-1) as other_id from reverseconcat
order by id
Although I cant really take any credit. Found this on http://www.programering.com/a/MzMyUzNwATg.html
Unsure how you will go on a huge dataset. Also you will need to add more unions if the other_ids are > 3
If you have the other table, then you can use a join and find_in_set():
select t.id, ot.pk as other_id
from t join
othertable ot
on find_in_set(ot.pk, t.other_ids) > 0;
Alright so I have a table, in this table are two columns with ID's. I want to make one of the columns distinct, and once it is distinct to select all of those from the second column of a certain ID.
Originally I tried:
select distinct inp_kll_id from kb3_inv_plt where inp_plt_id = 581;
However this does the where clause first, and then returns distinct values.
Alternatively:
select * from (select distinct(inp_kll_id) from kb3_inv_plt) as inp_kll_id where inp_plt_id = 581;
However this cannot find the column inp_plt_id because distinct only returns the column, not the whole table.
Any suggestions?
Edit:
Each kll_id may have one or more plt_id. I would like unique kll_id's for a certain kb3_inv_plt id.
| inp_kll_id | inp_plt_id |
| 1941 | 41383 |
| 1942 | 41276 |
| 1942 | 38005 |
| 1942 | 39052 |
| 1942 | 40611 |
| 1943 | 5868 |
| 1943 | 4914 |
| 1943 | 39511 |
| 1944 | 39511 |
| 1944 | 41276 |
| 1944 | 40593 |
| 1944 | 26555 |
If you do mean, by "make distinct", "pick only inp_kll_ids that happen just once" (not the SQL semantics for Distinct), this should work:
select inp_kll_id
from kb3_inv_plt
group by inp_kll_id
having count(*)=1 and inp_plt_id = 581;
Get all the distinct first (alias 'a' in my following example) and then join it back to the table with the specified criteria (alias 'b' in my following example).
SELECT *
FROM (
SELECT
DISTINCT inp_kll_id
FROM kb3_inv_plt
) a
LEFT JOIN kb3_inv_plt b
ON a.inp_kll_id = b.inp_kll_id
WHERE b.inp_plt_id = 581
in this table are two columns with
ID's. I want to make one of the
columns distinct, and once it is
distinct to select all of those from
the second column of a certain ID.
SELECT distinct tableX.ID2
FROM tableX
WHERE tableX.ID1 = 581
I think your understanding of distinct may be different from how it works. This will indeed apply the where clause first, and then get a distinct list of unique entries of tableX.ID2, which is exactly what you ask for in the first part of your question.
By making a row distinct, you're ensuring no other rows are exactly the same. You aren't making a column distinct. Let's say your table has this data:
ID1 ID2
10 4
10 3
10 7
4 6
When you select distinct ID1,ID2 - you get the same as select * because the rows are already distinct.
Can you add information to clear up what you are trying to do?