MySQL Conditionally Inner Join X # of columns - mysql

I have two tables, Table1 and Table2.
Based on a value inside a column on Table1, can I inner join a certain number of columns from Table2, JOIN on ID.
Table1:
id | col_number |
1 | 2
2 | 3
Table2:
id | col1 | col2 | col3
1 | BRK | GOOG | APPL
2 |AMZN | INTC | TSLA
Expected Outcome, If the query was run for ID1:
id | col_number | col1 | col2
1 | 2 | BRK | GOOG
I haven’t been able to find many examples of conditional inner joins easy enough for me to attempt to understand them. Those I have found are conditional on different tables, not columns.
Fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=4efaf735c1f28fa8a9e55d77ca30fa71

The select-list of an SQL query must be fixed before the query is parsed and prepared, and that happens before the query begins reading any rows of data. This means you can't make a query that returns a different number of columns depending on the data values in some of the rows it reads.
Also, any query result must have the same number of columns in every row, not a dynamic number of columns.
You could, however, make some of the expressions return NULL in some columns depending on a data value.
SELECT table1.id, table1.col_number,
CASE WHEN table1.col_number >= 1 THEN table2.col1 ELSE NULL END AS col1,
CASE WHEN table1.col_number >= 2 THEN table2.col2 ELSE NULL END AS col2,
CASE WHEN table1.col_number >= 3 THEN table2.col3 ELSE NULL END AS col3
FROM table1 JOIN table2 USING (id);

Related

Fetch distinct count of all columns in a single MySQL query

I am trying to fetch distinct count of all columns in a single query. Consider the below table.
COL1 | COL2 | COL3
A | 5 | C
B | 5 | C
C | 5 | C
C | 5 | C
D | 7 | C
Expected result
DC_COL1 | DC_COL2 | DC_COL3 #DC - Distinct count
4 | 2 | 1
Though the above result can not be achieved (AFAIK) in a single query (single full table scan) using valid group by functions, what are the optimisations that could be done here?
Firing individual queries for each column might result in full table scan for each column. Though the entire table might have come to the buffer pool during the distinct count query for the first column but it will still be a performance issue on large tables.
It can be done in a single table scan:
SELECT
COUNT(DISTINCT COL1) DC_COL1,
COUNT(DISTINCT COL2) DC_COL2,
COUNT(DISTINCT COL3) DC_COL3
FROM tablename

Count all records that does not exist to other table - SQL Query

I have two(2) tables and I'm trying to count all records from Table1 and Table1_delta were pagename from Table1_delta is not yet listed into Table1. Incase pagename from Table1_delta is listed to Table1, status must be 1 so that it will be included in count result.
Sample table structure:
Table1
+-----------+--------+
| pagename | status |
+-----------+--------+
| pagename1 | 2 |
| pagename2 | 1 |
+-----------+--------+
Table1_delta
+-----------+
| pagename |
+-----------+
| pagename1 |
| pagename2 |
| pagename3 |
| pagename4 |
+-----------+
The table sample should return "3".
pagename3 and pagename4 is not listed in Table1(that returns 2) and pagename2 from Table1 has an status = 1(that returns 1). In total there are 3 pagenames from Table1_delta that are not listed in Table1 and record from Table1 where status = 1. I'm wondering on how will be the query of this? I'm using MySQL v5.6.17. Thanks!
Here is an alternative solution using joins:
SELECT COUNT(*)
FROM Table1_delta t1 LEFT JOIN Table1 t2
ON t1.pagename = t2.pagename
WHERE t2.status IS NULL OR t2.status = 1
Here is what the temporary table from the above query looks like:
+-----------+--------+
| pagename | status |
+-----------+--------+
| pagename1 | 2 | # this row is NOT counted
| pagename2 | 1 | # +1 this row has status = 1 and is counted
| pagename3 | null | # +1 this row has status = null and is counted
| pagename4 | null | # +1 this row is also null and is counted
+-----------+--------+
Check out the link below for a running demo.
SQLFiddle
Try using joins
Select count(Table1_delta.pagename) from Table1_delta
INNER JOIN Table1 ON
Table1_delta.pagename != Table1 .pagename
AND Table1.status != 1
If I've understood correctly:
SELECT COUNT(*) FROM Table1_Delta
WHERE pagename NOT IN
(SELECT pagename FROM Table1 WHERE status = 1)
Update
As requested in the comments, here's what this query does:
First, the subquery: SELECT pagename FROM Table1 WHERE status = 1, retrieves the pagename field from those Table1 records where status is 1.
So in the example case, it'll return a single row, containing pagename2.
Then the main query counts all the records in Table1_Delta (SELECT COUNT(*) FROM Table1_Delta) whose Pagename does not contain (WHERE Pagename NOT IN (<subquery>)) those values returned from the subquery.
So this would match 3 entries (pagename1, pagename3, pagename4), and that's the count you get
Historically, using sub-queries is considered slower than using joins, but frankly, RDBMS's have come a long way optimizing queries, and for simple cases like this, it would be "probably" (I haven't measured) faster. It actually depends on the real case and DB... but the SQL code is much more self-explanatory than joins IMO. Your mileage may vary.

How to separate from large table to two tables and combine the sub-table

For fast searching table , I need to separate a large table to two tables
example table:
+--------+--------+-------+------+
| source | target | count | prob |
+--------+--------+-------+------+
| test1 | test2 | 2 | 1 |
| cat | dog | 3 | 1.5|
| dog | cat | 1 | 0.5|
+--------+--------+-------+------+
Using the code below
INSERT INTO Table2 (source,target,count,prob)
SELECT source,target,count,prob FROM Table1 WHERE count <2;
then delete originals
DELETE FROM Table1 WHERE count<2;
And count will grouping up after separating table in Table1, and new same element will increase after separating.
For example:
source = 'dog' and target = 'cat' and count = 1 will be move to Table2 and Table1 will still grouping up like add the count or will be add new row source = 'dog' target ='cat' , count = 3.
How could I combine Table1 and Table2 (Table2 will not change after separating)
You can combine the result with UNION
SELECT source, target, count, prob FROM tbl1
UNION
SELECT source, target, count, prob FROM tbl2
Just note there are lots of better ways to get better performance on large tables

SQL select row in one table, for whose key row in another table has given values

It is difficult to explain what I want in the title, some I'll try to do it with an example here. I have 2 tables:
[Table1]
set_id | data
-------+-----
1 | 123
2 | 456
3 | 789
4 | 987
[Table2]
set_id | single_id
-------+----------
1 | 10
2 | 10
2 | 13
3 | 10
3 | 13
3 | 14
4 | 10
4 | 15
I need to select row in Table1 with such set_id that in Table2 rows with same set_id have single_ids only those given in query. For example:
For query (10, 13) resulting row should be 2 | 456.
For query (10) resulting row should be 1 | 123.
For query (10, 13, 14) resulting row should be 3 | 789.
How this can be done?
This is an example of a set-within-sets subquery. I think the most general approach is to use aggregation with a having clause:
select t1.set_id, t1.data
from table1 t1 join
table2 t2
on t1.set_id = t2.set_id
group by t1.set_id
having sum(t2.single_id = 10) > 0 and
sum(t2.single_id = 13) > 0 and
sum(t2.single_id not in (10, 13)) = 0;
Each condition in the having clause tests one condition. The first is that a row with 10 is present; the second that a row with 13 is present. And the last that no other values are present.
EDIT:
In MySQL, there is actually another approach which might seem more intuitive:
select t1.set_id, t1.data
from table1 t1 join
table2 t2
on t1.set_id = t2.set_id
group by t1.set_id
having group_concat(distinct t2.single_id order by t2.single_id) = '10,13';
That is, concatenate the distinct values together, in order, and compare them to a constant string.

Use string matching to de-dupe results of query

I have a table with the format:
Id | Loc |
-------|-----|
789-A | 4 |
123 | 1 |
123-BZ | 1 |
123-CG | 2 |
456 | 2 |
456 | 3 |
789 | 4 |
I want to exclude certain rows from the result of query based on whether a duplicate exists. In this case, though, the definition of a duplicate row is pretty complex:
If any row returned by the query (let's refer to this hypothetical row as ThisRow) has a counterpart row also contained within the query results where Loc is identical to ThisRow.Loc AND Id is of the form <ThisRow.Id>-<an alphanumeric suffix> then ThisRow should be considered a duplicate and excluded from the query results.
For example, using the table above, SELECT * FROM table should return the results set below:
Id | Loc |
-------|-----|
789-A | 4 |
123-BZ | 1 |
123-CG | 2 |
456 | 2 |
456 | 3 |
I understand how to write the string matching conditional:
ThisRow.Id REGEXP '^PossibleDuplicateRow.Id-[A-Za-z0-9]*'
and the straight comparison of Loc:
ThisRow.Loc = PossibleDuplicateRow.Loc
What I don't understand is how to form these conditionals into a (self-referential?) query.
Here's one way:
SELECT *
FROM myTable t1
WHERE NOT EXISTS
(
SELECT 1
FROM myTable t2
WHERE t2.Loc = t1.Loc
AND t2.Id LIKE CONCAT(t1.Id, '-%')
)
SQL Fiddle example
Or, the same query using an anti-join (which should be a little faster):
SELECT *
FROM myTable t1
LEFT OUTER JOIN myTable t2
ON t2.Loc = t1.Loc
AND t2.Id LIKE CONCAT(t1.Id, '-%')
WHERE t2.Id IS NULL
SQL Fiddle example
In the example data you give, there are no examples of duplicate locs not being on duplicate rows. For example, you don't have a row "123-AZ, 1", where the prefix row "123, 1" would conflict with two rows.
If this is a real characteristic of the data, then you can eliminate dups without a self join, by using aggregation:
select max(id), loc
from t
group by (case when locate(id, '-') = 0 then id
else left(id, locate(id, '-') - 1)
end), loc
I offer this because an aggregation should be much faster than a non-equijoin.