Warning: This is a soft question, where you'll be answering to someone who has just started teaching himself SQL from the ground up. I haven't gotten my database software set up yet, so I can't provide tables to run queries against. Some patience required.
Warnings aside, I'm experimenting with basic SQL but I'm having a little bit of a rough time getting a clear answer about the inner workings of subqueries and their execution order within my query.
Let us say my query looks something like shit:
SELECT * FROM someTable
WHERE someFirstValue = someSecondValue
AND EXISTS (
SELECT * FROM someOtherTable
WHERE someTable.someFirstValue = someOtherTable.someThirdValue
)
;
The reason I'm here, is because I don't think I understand fully what is going on in this query.
Now I don't want to seem lazy, so I'm not going to ask you guys to "tell me what's going on here", so instead, I'll provide my own theory first:
The first row in someTable is checked so see if someFirstValue is the same as someSecondValue in that row.
If it isn't, it goes onto the second row and checks it too. It continues like this until a row passes this little inspection.
If a row does pass, it opens up a new query. If the table produced by this query contains even a single row, it returns TRUE, but if it's empty it returns FALSE.
My theory ends here, and my confusion begins.
Will this inner query now compare only the rows that passed the first WHERE? Or will it check all the items someTable and someOtherTable?
Rephrased; will only the rows that passed the first WHERE be compared in the someTable.someFirstValue = someOtherTable.someThirdValue subquery?
Or will the subquery compare all the elements from someTable to all the elements in someOtherTable regardless of which passed the first WHERE and which didn't?
UPDATE: Assume I'm using MySQL 5.5.32. If that matters.
The answer is that SQL is a descriptive language that describes the result set being produced from a query. It does not specify how the query is going to be run.
In your case the query has several options on how it might run, depending on the database engine, what the tables look like, and indexes. The query itself:
SELECT t.*
FROM someTable t
WHERE t.someFirstValue = t.someSecondValue AND
EXISTS (SELECT *
FROM someOtherTable t2
WHERE t.someFirstValue = t2.someThirdValue
);
Says: "Get me all columns from SomeTable where someFirstValue = someSecondValue and there is a corresponding row in someOtherTable where that's table column someThirdValue is the same as someFirstValue".
One possible way to approach this query would be to scan someTable and first check for the first condition. When the two columns match, then look up someFirstValue in an index on someOtherTable(someThirdValue) and keep the row if the values match. As I say, this is one approach, and there are others.
Related
Ok, for a moment, throw out of your mind "good database design". Let's say I have two tables, and they have some of the same columns.
item
-------
id
title
color
and
item_detail
-------
id
weight
color
In a good normal query, you'd choose the columns you want within the query, like so:
SELECT item.title, item_detail.color, item_detail.weight ...
But what if you are stuck with a query that was built with star/all:
SELECT * ...
In this case you would get two color columns pulled back in your results, one for each table. Is there a way in MySQL to chose one color column over the other, so only one shows up in the results, without a full rewrite of the statement? So that I could say that the table item_detail takes priority?
Probably not but I thought I'd ask.
Err. No there is not.
But define "without a full rewrite of the statement". As far as I can see you'd just need to rewrite the select * portion of the query.
If you cannot touch the statement at all, then you are free to ignore the column in your application (the order of the columns does not change between calls)... or you could create a view...
It's hard to know which constraints you are dealing with when you say "But what if you are stuck with a query".
I'm trying to do what I think is a set of simple set operations on a database table: several intersections and one union. But I don't seem to be able to express that in a simple way.
I have a MySQL table called Moment, which has many millions of rows. (It happens to be a time-series table but that doesn't impact on my problem here; however, these data have a column 'source' and a column 'time', both indexed.) Queries to pull data out of this table are created dynamically (coming in from an API), and ultimately boil down to a small pile of temporary tables indicating which 'source's we care about, and maybe the 'time' ranges we care about.
Let's say we're looking for
(source in Temp1) AND (
((source in Temp2) AND (time > '2017-01-01')) OR
((source in Temp3) AND (time > '2016-11-15'))
)
Just for excitement, let's say Temp2 is empty --- that part of the API request was valid but happened to include 'no actual sources'.
If I then do
SELECT m.* from Moment as m,Temp1,Temp2,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11'15'))
)
... I get a heaping mound of nothing, because the empty Temp2 gives an empty Cartesian product before we get to the WHERE clause.
Okay, I can do
SELECT m.* from Moment as m
LEFT JOIN Temp1 on m.source=Temp1.source
LEFT JOIN Temp2 on m.source=Temp2.source
LEFT JOIN Temp3 on m.source=Temp3.source
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... but this takes >70ms even on my relatively small development database.
If I manually eliminate the empty table,
SELECT m.* from Moment as m,Temp1,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... it finishes in 10ms. That's the kind of time I'd expect.
I've also tried putting a single unmatchable row in the empty table and doing SELECT DISTINCT, and it splits the difference at ~40ms. Seems an odd solution though.
This really feels like I'm just conceptualizing the query wrong, that I'm asking the database to do more work than it needs to. What is the Right Way to ask the database this question?
Thanks!
--UPDATE--
I did some actual benchmarks on my actual database, and came up with some really unexpected results.
For the scenario above, all tables indexed on the columns being compared, with an empty table,
doing it with left joins took 3.5 minutes (!!!)
doing it without joins (just 'FROM...WHERE') and adding a null row to the empty table, took 3.5 seconds
even more striking, when there wasn't an empty table, but rather ~1000 rows in each of the temporary tables,
doing the whole thing in one query took 28 minutes (!!!!!), but,
doing each of the three AND clauses separately and then doing the final combination in the code took less than a second.
I still feel I'm expressing the query in some foolish way, since again, all I'm trying to do is one set union (OR) and a few set intersections. It really seems like the DB is making this gigantic Cartesian product when it seriously doesn't need to. All in all, as pointed out in the answer below, keeping some of the intelligence up in the code seems to be the better approach here.
There are various ways to tackle the problem. Needless to say it depends on
how many queries are sent to the database,
the amount of data you are processing in a time interval,
how the database backend is configured to manage it.
For your use case, a little more information would be helpful. The optimization of your query by using CASE/COUNT(*) or CASE/LIMIT combinations in queries to sort out empty tables would be one option. However, if-like queries cost more time.
You could split the SQL code to downgrade the scaling of the problem from 1*N^x to y*N^z, where z should be smaller than x.
You said that an API is involved, maybe you are able handle the temporary "no data" tables differently or even don't store them?
Another option would be to enable query caching:
https://dev.mysql.com/doc/refman/5.5/en/query-cache-configuration.html
I am trying to use the results of another query to use as a criteria for another. In my specific example, I might have four houses that are 'A', 'B', 'C', 'D' (the unique values of a field in a table called Homes).
I want to go through another query and say for each house type, what percent of residents (in Residents table) are married, which I want to do by using Count() to count the number for each Home type.
Do I need to loop through the results using VBA? Asking on a higher level, is there a way to use the results from a query as inputs into another - more than just limit the results of the new query to the results of the prior query?
Edit:
In semi-pseudo code:
For each (result of previous query) Do
New query WHERE field1 = (row of previous query)
End Do
What I am trying to ask, is there a way to accomplish this in Access using SQL? Or is this something that has to be done in VBA?
I know that if it can be done in SQL that would be the best performing and best practice, but I'm relatively inexperienced in SQL and online resources aren't always helpful because Access has it's own particular flavor of SQL.
Since you are using VBA to run this, you can loop through your recordsets and yes you can use a value from one query in the next query. There are alot of resources out there to help.
VBA: Working with RecordSets
Looping through Record Sets
Code through all records
To answer your general question, yes there is. You can do a nested query i.e. select column a from table a where column a = (select column b from table b where column b=x)
You can go as many levels deep as you want, but the caveat is the nested query can only return one column and with a specific answer set. You can also use select statements as your columns i.e
select (select column b from table b) col b from table a ..... Not the exact syntax but I would have to dig out some examples from an old project to find that.
Nested queries are useful, but for the level of precision you are looking for, a stored procedure or a view is probably a better option. Just for ease of use, I would look at creating a view of the data that you want and then querying from that to start with. More flexible than a nested query.
You need to join two tables using a common column and then get your specific column from any of the table
SELECT A.REQUIRED_FIELD from TABLEA AS A
INNER JOIN TABLEB AS B ON A.FOREIGN_KEY=B.FOREIGN_KEY
WHERE CONDITION
Is there a way that I can do a select as such
select * from attributes where product_id = 500
would return
id name description
1 wheel round and black
2 horn makes loud noise
3 window solid object you can see through
and the query
select * from attributes where product_id = 234
would return the same results as would any query to this table.
Now obviously I could just remove the where clause and go about my day. But this involves editing code that I don't really want to modify so i'm trying to fix this at the database level.
So is there a "magical" way to ignore what is in the where clause and return whatever I want using a view or something ?
Even if it was possible, I doubt it would work. Both of those WHERE clauses expect one thing to be returned, therefore the code would probably just use the first row returned, not all of them.
It would also give the database a behaviour that would make future developers pull their hair out trying to understand.
Do it properly and fix the code.
or you could pass "product_id" instead of an integer, if there's no code checking for that...so the query would become:
select * from attributes where product_id = product_id;
this would give you every row in the table.
If you can't edit the query, maybe you can append to it? You could stick
OR 1=1
on the end.
You may be able to use result set metadata to get what you want, but a result set won't have descriptions of fields. The specific API to get result set metadata from a prepared query varies by programming language, and you haven't said what language you're using.
You can query the INFORMATION_SCHEMA for the products table.
SELECT ordinal_position, column_name, column_comment
FROM INFORMATION_SCHEMA.columns
WHERE table_name = 'products' AND schema_name = 'mydatabase';
You can restructure the database into an Entity-Attribute-Value design, but that's a much more ambitious change than fixing your code.
Or you can abandon SQL databases altogether, and use a semantic data store like RDF, which allows you to query metadata of an entity in the same way you query data.
As far out as this idea seems I'm always interested in crazy ways to do things.
I think the best solution I could come up with is to use a view that uses the products table to get all the products then the attributes table to get the attributes, so every possible product is accounted for and all will get the same result
It's common to have a table where for example the the fields are account, value, and time. What's the best design pattern for retrieving the last value for each account? Unfortunately the last keyword in a grouping gives you the last physical record in the database, not the last record by any sorting. Which means IMHO it should never be used. The two clumsy approaches I use are either a subquery approach or a secondary query to determine the last record, and then joining to the table to find the value. Isn't there a more elegant approach?
could you not do:
select account,last(value),max(time)
from table
group by account
I tested this (granted for a very small, almost trivial record set) and it produced proper results.
Edit:
that also doesn't work after some more testing. I did a fair bit of access programming in a past life and feel like there is a way to do what your asking in 1 query, but im drawing a blank at the moment. sorry.
After literally years of searching I finally found the answer at the link below #3. The sub-queries above will work, but are very slow -- debilitatingly slow for my purposes.
The more popular answer is a tri-level query: 1st level finds the max, 2nd level gets the field values based on the 1st query. The result is then joined in as a table to the main query. Fast but complicated and time-consuming to code/maintain.
This link works, still runs pretty fast and is a lot less work to code/maintain. Thanks to the authors of this site.
http://access.mvps.org/access/queries/qry0020.htm
The subquery option sounds best to me, something like the following psuedo-sql. It may be possible/necessary to optimize it via a join, that will depend on the capabilities of the SQL engine.
select *
from table
where account+time in (select account+max(time)
from table
group by account
order by time)
This is a good trick for returning the last record in a table:
SELECT TOP 1 * FROM TableName ORDER BY Time DESC
Check out this site for more info.
#Tom
It might be easier for me in general to do the "In" query that you've suggested. Generally I do something like
select T1.account, T1.value
from table T as T1
where T1 = (select max(T2.time) from table T as T2 where T1.account = T2.Account)
#shs
yes, that select last(value) SHOULD work, but it doesn't... My understanding although I can't produce an authorative source is that the last(value) gives the last physical record in the access file, which means it could be the first one timewise but the last one physically. So I don't think you should use last(value) for anything other than a really bad random row.
I'm trying to find the latest date in a group using the Access 2003 query builder, and ran into the same problem trying to use LAST for a date field. But it looks like using MAX finds the lates date.
Perhaps the following SQL is clumsy, but it seems to work correctly in Access.
SELECT
a.account,
a.time,
a.value
FROM
tablename AS a INNER JOIN [
SELECT
account,
Max(time) AS MaxOftime
FROM
tablename
GROUP BY
account
]. AS b
ON
(a.time = b.MaxOftime)
AND (a.account = b.account)
;