MYSQL WHERE-IN Subquery Runs Forever - mysql

I have a MySQL table. Let's call it Widgets. The Widget table has 3 fields: id, type_id, and name. I want, in one query, to get all the widgets that share a type_id with the Widget named 'doodad'. I've written 2 queries:
Give me the type_id of the widget with the name 'doodad'.
Give me all widgets with that type_id.
This works. Each query, independently achieves its goal.
But when I combine them into a single nested query, it runs forever, infinite loop style. It looks like this:
SELECT * FROM widgets WHERE type_id IN (
SELECT type_id FROM widgets WHERE name = 'doodad'
);
Can anyone explain this? Is it because I am writing a nested query which is operating on the same table twice?
Little wheel, why spinnest thou?

There is an issue in MySQL and in where even uncorrelated subqueries are treated as though they were correlated and re-evaluated for each row.
In the explain plan the select type will likely be showing as dependant subquery rather than just subquery as would be desired.
I suggest trying the approach described at the end of this article of using a derived table to materialize the inner result set.
Or alternatively you could look at the constify procedure here to see if it will assist you in getting around this issue.

Using a JOIN risks duplicating results - an EXISTS will work similar to an IN, without the duplication risk:
SELECT x.*
FROM widgets x
WHERE EXISTS (SELECT NULL
FROM WIDGETS y
WHERE y.name = 'doodah'
AND y.type_id = x.type_id)

Related

Can MySQL/MariaDB stop doing a selection by reaching certain condition? Or select all rows before condition

For example, I have table
id;name
1;John
2;Mary
3;Cat
4;Cheng
I want selection to stop right after 3;Cat and still have as much rows in it as exist berore 3;Cat
I think this could be described with such a query
SELECT * FROM table WHERE condition ORDER BY id LIMIT name = 'Cat'
but of course there is no such a construction LIMIT name='Cat' in SQL.
Maybe something else fits?
Currently Im using extensive select, but it requires enormous 1200 rows to be sure that it has at least one record expected.
This is a not-so-ad answer
https://stackoverflow.com/a/22232897/1475428
Solution might look like
SELECT * WHERE id <= (SELECT MIN(id) WHERE name = 'Cat') order by id
MIN function plays role of backward approach that works like conditional LIMIT.
This looks like an ugly way, I still think there might be a better solution.
This is quite awkward to do in a single query. That means you probably should not try to do it in a single query.
Sometimes it's simpler to do a complex task in several steps. It's easier to write, it's easier to debug, it's easier to modify if you need to, and it's easier for future programmers to read your code if they need to take over responsibility.
So first query for the condition, and find out the id of the row you want to stop at:
SELECT MIN(id) FROM mytable WHERE name = 'Cat';
This returns either an id value, or else NULL if there is no row matching the condition.
If that result was not NULL, then use that value to run a simple query:
SELECT * FROM mytable WHERE id <= ? ORDER BY id
Else if the result was NULL, then default to a query with the fixed LIMIT you want:
SELECT * FROM mytable ORDER BY id LIMIT ?
If you have special conditions that aren't supported by simple SQL, then break it up into different queries that are each simple, and use a little bit of application logic to choose which query to run.

Please breakdown this MySQL statement for me

I came up with this solution in my class by piecing together Internet knowledge. Please break this down for me I would love to know how I made it work. Specifically the t.s and the closing t.
SELECT
CourseType,
GPA,
NumberOfStudents * 100 / t.s AS `Percentage of Students`
FROM View1
CROSS JOIN
(
SELECT
SUM(NumberOfStudents) AS s
FROM View1) t;
Your query uses a subquery. A sub-query is a query that is done within another query. In your case, your subquery is:
(
SELECT
SUM(NumberOfStudents) AS s
FROM View1)
When you create subqueries, you need to give them an alias. An alias is just a name you give a subquery, so you can use it in the main query.
In your example, you named your subquery "t".
Fields can also have aliases. in your subquery, you created a field SUM(NumberOfStudents), and you named it s.
Going back to your question, you use the aliases to address fields inside the subquery. in your case, when you do 100 / t.s you are basically saying:
"I want to divide 100 by the field s from my subquery t".
The other concept that is important in your query is the Cross join. A cross join is the Cartesian product of two tables.
You can find a great and intuitive explanation of how a cross join works in the following link:
https://www.sqlshack.com/sql-cross-join-with-examples/#:~:text=The%20CROSS%20JOIN%20is%20used,also%20known%20as%20cartesian%20join.&text=The%20main%20idea%20of%20the,product%20of%20the%20joined%20tables.
I this case, the use is simpler than that. your subquery should return only one value, which is the sum of all students. And since a cross join basically pairs every row of one table with every row from the other, your cross join just provides a way to use the number of students as a constant value for the calculation of the percentage of students in the main query.
A better way to do this uses window functions:
SELECT v.CourseType, v.GPA,
v.NumberOfStudents * 100 / SUM(v.NumberOfStudents) OVER () AS Percentage_of_Students
FROM View1 v;
If you are learning SQL, you might as well learn the correct way to express logic.
Notes:
Use meaningful table aliases (abbreviations for the table/view names).
Qualify column references. This is less important in a query with only one table reference, but it is a good habit.
Window functions allow you to summarize data across multiple rows, without using an explicit JOIN.

Join Performances When Searching For NULL Value

I need to find a value that exists in LoyaltyTransactionBasketItemStores table but not in DimProductConsolidate table. I need the item code and its corresponding company. This is my query
SELECT
A.ProductReference, A.CompanyCode
FROM
(SELECT ProductReference, CompanyCode FROM dwhdb.LoyaltyTransactionsBasketItemsStores GROUP BY ProductReference) A
LEFT JOIN
(SELECT LoyaltyVariantArticleCode FROM dwhdb.DimProductConsolidate) B ON B.LoyaltyVariantArticleCode = A.ProductReference
WHERE
B.LoyaltyVariantArticleCode IS NULL
It is a pretty straight forward query. But when I run it, it's taking 1 hour and still not finish. Then I use EXPLAIN and this is the result
But when I remove the CompanyCode from my query, its performance is increasing a lot. This is the EXPLAIN result
I want to know why is this happening and is there any way to get ProductReference and its company with a lot more better performance?
Your current query is rife with syntax and structural errors. I would use exists logic here:
SELECT a.ProductReference, a.CompanyCode
FROM dwhdb.LoyaltyTransactionsBasketItemsStores a
WHERE NOT EXISTS (SELECT 1 FROM dwhdb.DimProductConsolidate b
WHERE b.LoyaltyVariantArticleCode = a.ProductReference);
Your current query is doing a GROUP BY in the first subquery, but you never select aggregates, but rather other non aggregate columns. On most other databases, and even on MySQL in strict mode, this syntax is not allowed. Also, there is no need to have 2 subqueries here. Rather, just select from the basket table and then assert that matching records do not exist in the other table.

SQL Query: Joining on a SUM()

I'm trying to run a query that sums the value of items and then JOIN on the value of that SUM.
So in the below code, the Contract_For is what I'm trying to Join on, but I'm not sure if that's possible.
SELECT `items_value`.`ContractId` as `Contract`,
`items_value`.`site` as `SiteID`,
SUM(`items_value`.`value`) as `Contract_For`,
`contractitemlists`.`Text` as `Contracted_Text`
FROM items_value
LEFT JOIN contractitemlists ON (`items_value`.`Contract_For`) = `contractitemlists`.`Ref`;
WHERE `items_value`.`ContractID`='2';
When I've face similar issues in the past, I've just created a view that holds the SUM, then joined to that in another view.
At the moment, the above sample is meant to work for just one dummy value, but it's intended to be stored procedure, where the user selects the ContractID. The error I get at the moment is 'Unknown Column items_value.Contract_For
You cannot use aliases or aggregate using expressions from the SELECT clause anywhere but HAVING and ORDER BY*; you need to make the first "part" a subquery, and then JOIN to that.
It might be easier to understand, though a bit oversimplified and not precisely correct, if you look at it this way as far as order of evaluation goes...
FROM (Note: JOIN is only within a FROM)
WHERE
GROUP BY
SELECT
HAVING
ORDER BY
In actual implementation, "under the hood", most SQL implementations actually use information from each section to optimize other sections (like using some where conditions to reduce records JOINed in a FROM); but this is the conceptual order that must be adhered to.
*In some versions of MSSQL, you cannot use aliases from the SELECT in HAVING or ORDER BY either.
Your query needs to be something like this:
SELECT s.*
, `cil`.`Text` as `Contracted_Text`
FROM (
SELECT `iv`.`ContractId` as `Contract`
, `iv`.`site` as `SiteID`
, SUM(`iv`.`value`) as `Contract_For`
FROM items_value AS iv
WHERE `iv`.`ContractID`='2'
) AS s
LEFT JOIN contractitemlists AS cil ON `s`.`Contract_For` = cil.`Ref`
;
But as others have mentioned, the lack of a GROUP BY is something to be looked into; as in "what if there are multiple site values."

Subquery for fetching table name

I have a query like this :
SELECT * FROM (SELECT linktable FROM adm_linkedfields WHERE name = 'company') as cbo WHERE group='BEST'
Basically, the table name for the main query is fetched through the subquery.
I get an error that #1054 - Unknown column 'group' in 'where clause'
When I investigate (removing the where clause), I find that the query only returns the subquery result at all times.
Subquery table adm_linkedfields has structure id | name | linktable
Currently am using MySQL with PDO but the query should be compatible with major DBs (viz. Oracle, MSSQL, PgSQL and MySQL)
Update:
The subquery should return the name of the table for the main query. In this case it will return tbl_company
The table tbl_company for the main query has this structure :
id | name | group
Thanks in advance.
Dynamic SQL doesn't work like that, what you created is an inline-view, read up on that. What's more, you can't create a dynamic sql query that will work on every db. If you have a limited number of linktables you could try using left-joins or unions to select from all tables but if you don't have a good reason you don't want that.
Just select the tablename in one query and then make another one to access the right table (by creating the query string in php).
Here is an issue:
SELECT * FROM (SELECT linktable FROM adm_linkedfields WHERE name = 'company') as cbo
WHERE group='BEST';
You are selecting from DT which contains only one column "linktable", then you cant put any other column in where clause of outer block. Think in terms of blocks the outer select is refering a DT which contains only one column.
Your problem is similar when you try to do:
create table t1(x1 int);
select * from t1 where z1 = 7; //error
Your query is:
SELECT *
FROM (SELECT linktable
FROM adm_linkedfields
WHERE name = 'company'
) cbo
WHERE group='BEST'
First, if you are interested in cross-database compatibility, do not name columns or tables after SQL reserved words. group is a really, really bad name for a column.
Second, the from clause is returning a table containing a list of names (of tables, but that is irrelevant). There is no column called group, so that is the problem you are having.
What can you do to fix this? A naive solution would be to run the subquery, run it, and use the resulting table name in a dynamic statement to execute the query you want.
The fundamental problem is your data structure. Having multiple tables with the same structure is generally a sign of a bad design. You basically have two choices.
One. If you have control over the database structure, put all the data in a single table, linktable for instance. This would have the information for all companies, and a column for group (or whatever you rename it). This solution is compatible across all databases. If you have lots and lots of data in the tables (think tens of millions of rows), then you might think about partitioning the data for performance reasons.
Two. If you don't have control over the data, create a view that concatenates all the tables together. Something like:
create view vw_linktable as
select 'table1' as which, t.* from table1 t union all
select 'table2', t.* from table2 t
This is also compatible across all databases.