MySql Query results in a specific way - mysql

I have two tables, my_table1 and my_table2.
my_table1 contains numbers from 1 to 10 and my_table2 contains letters a, b, c and d.
I want to do a query which returns the following:
1 a
1 b
1 c
1 d
2 a
2 b
2 c
2 d
All the way until the end.
Is there any possible way to do this in SQL?
Thanks in advance.

That is a cross join. You can write that in the simple (old) form by just selecting select * from table1, table2, but this is outdated syntax, and your queries will become very hard to read if you mix this syntax with the more modern explicit joins that were introduced in 1992. So, I'd chose to write the explicit cross join.
Also, it looks like you want the results sorted. If you're lucky this happens automatically, but you cannot be sure that this will always happen, so best to specify it if you need it. If not, omit the order by clause, because it does make the query slower.
select
n.nr,
l.letter
from
my_table1 n
cross join my_table2 l
order by
n.nr,
l.letter

That is a CROSS JOIN, in MySQL equivalent to an INNER JOIN or a regular comma:
SELECT * FROM my_table1, my_table2;
cf. https://dev.mysql.com/doc/refman/5.7/en/join.html

Related

How to add a column of a constant value from a query to another query result?

Basically, I have two tables. From table A, I want to calculate the total number of rows from it. I can use SELECT COUNT(*) FROM A as the first query to get it. From other table B, I want to select all things(columns) from it. I can use SELECT * FROM B as the second query. My question is how to use a single query to add the result from the first query as a column to the second query result. In other words, I want to have an extra column with the value of total number of rows from Table A to all things from Table B, by using a single query.
CROSS JOIN it:
SELECT * FROM
(SELECT COUNT(*) as cnt FROM A) a
CROSS JOIN
B
Join makes the resultset wider. Union makes the resultset taller. Any time you want to grow the number of columns you have to join, but if you haven't got anything to join ON you can use a CROSS JOIN as it doesn't require any ON predicates
You could alternatively use an INNER JOIN with a predicate that is always true, an old style join syntax without any associated WHERE, or you can put a select that returns a single value as a subquery in the select list area without any co-ordinating predicates. Most DBA would probably assert that none of these are preferable to the CROSS JOIN syntax because CROSS JOIN is an explicit statement of your intent, whereas the others might just look like you forgot something

Is there a way to multiply results in SQL?

I am building a website which populates from a database. I'm testing now, and I'd like to see what my site will look like with a lot of data (mainly so I can watch performance, build out pagination, and address any issues with presentation). I have about 10 pieces of data in my table, which is great, but I'd like to display about 2,000 on my page.
Is there a way I can read from the same SELECT * FROM table statement over and over again in the same query in order to read the table multiple times?
I can do this by feeding all my results into a variable and echoing that variable multiple times, but it won't allow me to set a LIMIT or give me the proper count of rows from the query.
I'm surprised I haven't found a way to do this by Googling. It seems like it would be an easy, built-in thing.
If there's not, can you suggest any other way I can do this without modifying my original table?
Please use Cross Join. Cross Join will give you a cartesian product of rows from tables joined. Cross Join can generate a lot of data in quick amount of time. Can be useful for extensive testing.
Example:
SELECT * FROM A
CROSS JOIN B;
You can cross join on the same table as well.
As of MySQL 8 you can use a recursive query to get your rows multifold:
with recursive cte (a, b, c) as
(
select a, b, 1 from mytable
union all
select a, b, c + 1 from cte where c < 10 -- ten times as many
)
select a, b from cte;
(You can of course alter the generated values in the part after union all, e.g.: select a + 5, b * 2, c + 1 from cte where c < 10.)
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=3a2699c167e1f4a7ffbe4e9b17ac7241

a comma between SELECT statements

I have this query:
SELECT (#a:=#a+1) AS priority
FROM (SELECT t1.name FROM t1 LIMIT 100) x, (SELECT #a:=0) r
a few questions:
1 - What is the comma doing between the SELECTS? I have never seen a comma between commands, and I don't know what it means
2 - why is the second SELECT given a name?
3 - why is the second SELECT inside brackets?
4 - Performance-wize: Does it select the first 100 rows form t1, and then assigns them a number? What is going on here??
It is performing a CROSS JOIN (a cartesian product of the rows) but without the explicit syntax. The following 2 queries produce identical in results:
SELECT *
FROM TableA, TableB
SELECT *
FROM TableA
CROSS JOIN TableB
The query in the question uses 2 "derived tables" instead. I would encourage you to use the explicit join syntax CROSS JOIN and never use just commas. The biggest issue with using just commas is you have no idea if the Cartesian product is deliberate or accidental.
Both "derived tables" have been given an alias - and that is a good thing. How else would you reference some item of the first or second "derived table"? e.g. Imagine they were both queries that had the column ID in them, you would then be able to reference x.ID or r.ID
Regarding what the overall query is doing. First note that the second query is just a single row (1 row). So even though the syntax produces a CROSS JOIN it does not expand the total number of rows because 100 * 1 = 100. In effect the subquery "r" is adding a "placeholder" #a (initially at value zero) on every row. Once that #a belongs on each row, then you can increment the value by 1 for each row, and as a result you get that column producing a row number.
x and r are effectively anonymous views produced by the SELECT statements. If you imagine that instead of using SELECTs in brackets, you defined a view using the select statement and then referred to the view, the syntax would be clear.
The selects are given names so that you can refer to these names in WHERE conditions, joins or in the list of fields to select.
That is the syntax. You have to have brackets.
Yes, it selects the first 100 rows. I am not sure what you mean by "gives them a number".

How to optimize this Levenshtein distance calculation

Table a has around 8,000 rows and table b has around 250,000 rows. Without the levenshtein function the query takes just under 2 seconds. With the function included it is taking about 25 minutes.
SELECT
*
FROM
library a,
classifications b
WHERE
a.`release_year` = b.`year`
AND a.`id` IS NULL
AND levenshtein_ratio(a.title, b.title) > 82
I'm assuming that levenshtein_ratio is a function that you wrote (or maybe included from somewhere else). If so, the database server would not be able to optimize that in the normal sense of using an index. So it means that it simply needs to call it for each record that results from the other join conditions. With an inner join, that could be an extremely large number with those table sizes (a maximum of 8000*250000 = 2 billion). You can check the total number of times it would need to be called with this:
SELECT
count(*)
FROM
library a,
classifications b
WHERE
a.`release_year` = b.`year`
AND a.`id` IS NULL
That is an explanation of why it is slow (not really an answer to the question of how to optimize it). To optimize it, you likely need to add additional limiting factors to the join condition to reduce the number of calls to the user-defined function.
You are giving too little information to actually help you.
1) My first guess would be to try to create other WHERE conditions that reduce the amount of rows to be scanned.
2) If that is not possible...Given that the titles from table library and classifications are known, one idea would be to create a table where all the data is already calculated like this:
TABLE levenshtein_ratio
id_table_library
id_table_classifications
precalculated_levenshtein_ratio
so you would populate the table using this query:
insert into levenshtein_ratio select a.id, b.id, levenshtein_ratio(a.title, b.title) from library, classifications
and then your query would be:
SELECT
*
FROM
library a LEFT JOIN
classifications b ON a.`release_year` = b.`year`
LEFT JOIN levenshtein_ratio c ON c.id_table_library = a.id AND c.id_table_classifications = b.id
WHERE
a.`id` IS NULL
AND precalculated_levenshtein_ratio > 82
this query will probably then no more than the original 2 secs.
The problem with this solution is the fact that the data in tables a and b can change, so you will need to create a trigger to keep it updated.
Change your query to use proper joins (syntax has been around since 1996).
Also, all your levensrein condition may be moved into the join condition, which should give you a performance benefit:
SELECT *
FROM library a
JOIN classifications b
ON a.`release_year` = b.`year`
AND levenshtein_ratio(a.title, b.title) > 82
WHERE a.`id` IS NULL
Also, make sure there's an index on b.year:
create index b_year on b(year);

How to find non-existing data from another Table by JOIN?

I have two tables TABLE1 which looks like:
id name address
1 mm 123
2 nn 143
and TABLE2 w/c looks like:
name age
mm 6
oo 9
I want to get the non existing names by comparing the TABLE1 with the TABLE2.
So basically, I have to get the 2nd row, w/c has a NN name that doesn't exist in the TABLE2, the output should look like this:
id name address
2 nn 143
I've tried this but it doesn't work:
SELECt w.* FROM TABLE1 W INNER JOIN TABLE2 V
ON W.NAME <> V.NAME
and it's still getting the existing records.
An INNER JOIN doesn't help here.
One way to solve this is by using a LEFT JOIN:
SELECT w.*
FROM TABLE1 W
LEFT JOIN TABLE2 V ON W.name = V.name
WHERE ISNULL(V.name);
The relational operator you require is semi difference a.k.a. antijoin.
Most SQL products lacks an explicit semi difference operator or keyword. Standard SQL-92 doesn't have one (it has a MATCH (subquery) semijoin predicate but, although tempting to think otherwise, the semantics for NOT MATCH (subquery) are not the same as for semi difference; FWIW the truly relational language Tutorial D successfully uses the NOT MATCHING semi difference).
Semi difference can of course be written using other SQL predicates. The most commonly seen are: outer join with a test for nulls in the WHERE clause, closely followed by EXISTS or IN (subquery). Using EXCEPT (equivalent to MINUS in Oracle) is another possible approach if your SQL product supports it and again depending on the data (specifically, when the headings of the two tables are the same).
Personally, I prefer to use EXISTS in SQL for semi difference join because the join clauses are closer together in the written code and doesn't result in projection over the joined table e.g.
SELECT *
FROM TABLE1 W
WHERE NOT EXISTS (
SELECT *
FROM TABLE2 V
WHERE W.NAME = V.NAME
);
As with NOT IN (subquery) (same for the outer join approach), you need to take extra care if the WHERE clause within the subquery involves nulls (hint: if WHERE clause in the subquery evaluates UNKNOWN due to the presence of nulls then it will be coerced to be FALSE by EXISTS, which may yield unexpected results).
UPDATE (3 years on): I've since flipped to preferring NOT IN (subquery) because it is more readable and if you are worried about unexpected results with nulls (and you should be) then stop using them entirely, I did many more years ago.
One way in which it is more readable is there is no requirement for the range variables W and V e.g.
SELECT * FROM TABLE1 WHERE name NOT IN ( SELECT name FROM TABLE2 );