How is this advanced SQL syntax constructed? - mysql

I have done the SQL beginners and advanced courses at W3Schools successfully, and cannot find any other free advanced onine SQL course. I am having a problem with the SQL syntax in the accepted answer in this SO thread, covering the calculation of the median value of a column. My questions are:
After 'from' come two variables. Does that mean that data are selected from two tables, and if so, how would the formula be if I just require the median value of one column of one table?
The OP/TS named this columns 'id' and 'val'. Why then is 'x.val' selected?

The SELECT x.val from data x, data y means to cross-join table data with itself. I'm pretty sure it eventually just finds the median for one column, and this is a trick to help calculate that median.
To understand this better (and note that I don't totally understand it), try this:
Set up a table with some sample data - say 5-8 rows to begin with
Promote the HAVING values to the SELECT list
Get rid of the HAVING clause
So your query will look something like this:
SELECT x.val, SUM(SIGN(1-SIGN(y.val-x.val))), (COUNT(*)+1)/2
FROM data x, data y
GROUP BY x.val
Then take a look at the results and you'll be able to get more insight into the logic. Also see if you can follow the calculation when you track it row by row.
Finally, note that the query isn't so much advanced as it is specialized. I mean, it is advanced and all, but it's the math gymnastics rather than the query semantics that are probably giving you trouble. Don't sweat it if you don't understand this right off the bat :)
As for why val is selected - that's the column the OP is trying to calculate the median for. The id is probably there because it's generally a good idea to have a PK on every row. It's not needed for the calculation so it's not included in the query.

Related

Does DISTINCT will automatically sort the result in MySQL?

Here is the tutorial of GROUP_CONCAT() in GeeksForGeeks.
In "Queries 2", the Output is ascending. But there is no ORDER BY clause.
here is the picture of "Queries 2"
Could anyone can tell me why?
Any help would be really appreciated!
This is one of those oddballs where there is likely an implicit sort happening behind the scenes to optimize the DISTINCT execution by mysql.
You can test this yourself pretty easily:
CREATE TABLE t1 (c1 VARCHAR(50));
INSERT INTO t1 VALUES ('zebra'),('giraffe'),('cattle'),('fox'),('octopus'),('yak');
SELECT GROUP_CONCAT(c1) FROM t1;
SELECT GROUP_CONCAT(DISTINCT c1) FROM t1;
GROUP_CONCAT(c1)
zebra,giraffe,cattle,fox,octopus,yak
GROUP_CONCAT(DISTINCT c1)
cattle,fox,giraffe,octopus,yak,zebra
It's not uncommon to find sorted results where no ORDER BY was specified. Window Functions output are a good example of this.
You can imagine if you were tasked, as a human, to only pick distinct items from a list. You would likely first sort the list and then pick out duplicates, right? And when you hand the list back to the person that requested this from you, you wouldn't scramble the data back up to be unsorted, I would assume. Why do the extra work? What you are seeing here is a byproduct of the optimized execution path chosen by the mysql server.
The key takeaway is "byproduct". If I specifically wanted the output of GROUP_CONCAT to be sorted, I would specify exactly what I want and I would not rely on this implicit sorting behavior. We can't guess what the execution path will be. There are a lot of decisions an RDBMS makes when SQL is submitted to optimize the execution and depending on data size and other steps it needs to take in the sql, this behavior may work on one sql statement and not another. Likewise, it may work one day, and not another.
TL;DR Never omit an ORDER BY clause from a query if you rely on the order for something.
Does DISTINCT will automatically sort the result in MySQL?
No. NO! Be careful!
SQL is all about sets of rows. Without ORDER BY clauses, SQL queries return the rows of their result sets in an "unpredictable" order. "Unpredictable" is like random, but worse. If the order is truly random, you have a chance to catch any ordering problem when you're testing. Unpredictable means the server returns rows in any convenient order. This means everything works as you expect until some day in the future when it doesn't, without warning. (MySQL might start using some kind of parallel algorithm in the future.)
Now it is true that DISTINCT result sets from modestly sized tables are often generated using a sorting / deduplicating algorithm in the server. But that is an implementation detail. MySql and other table servers are complex enough that relying on implementation details is not wise. The good news: If you include an ORDER BY clause showing the same order as that methodology generates, usually performance is not changed.
SQL is declarative, not procedural. We specify what we want, not how to get it. It's probably the only declarative language most of us ever see, so it's easy to make the mistake of thinking it is procedural.

Function in MySQL that operates on multiple columns

Is it possible to create a custom function in MySQL like SUM, MAX, and so on. That accepts multiple columns and do some operation on each row?
The reason I am asking this question is because I tried to do my logic using stored procedure but unfortunatelly couldn't find a way how to select data from table name where the name of the table is input parameter.
Somebody suggested to use dynamic SQL but I can not get the cursor. So my only hope is to use custom defined function.
To make the question more clear here is what I want to do:
I want to calculate the distance of a route where each row in the database table represents coordinates (latitude and longtitude). Unfortunatelly the data I have is really big and if I query the data and do the calculationgs using Java it takes more than half a minute to transfer the data to the web server so I want to do the calculations on the SQL machine.
Select something1, something2 from table_name where table name is a variable
Multiple identically-structured tables (prerequisite for this sort of query) is contrary to the Principle of Orthogonal Design.
Don't do it. At least not without very good reason—with suitable indexes, (tens of) millions of records per table is easily enough for MySQL to handle without any need for partitioning; and even if one does need to partition the data, there are better ways than this manual kludge (which can give rise to ambiguous, potentially inconsistent data and lead to redundancy and complexity in your data manipulation code).

MySQL which is done first - filtering the WHERE conditions or calculating fields?

When performing a MySQL query, is all the data filtered by the WHERE conditions before any fields are calculated?
I have an application where I need to order by distance but need to work out the distances from lat/lng in the query (found this which should cover that part of it (page 8)) but it mentions that the query is fairly slow. I want to know if MySQL will work out the distance for every database entry and then filter those results, or if the results will be filtered first?
EDIT:
I didn't make it quite clear - I only need to order the results at this stage. Filtering by distance would be good too but that will always be secondary to other criteria - i.e. the type would always need to be hotel before it mattered how close it was to the co-ordinates
WHERE works directly on the rows, that's also the reason why you can't specify column aliases in WHERE clause. It would cause an unknown column error.
You could use HAVING or a subquery. In both cases the result will be calculated for all rows, then filtered.
UPDATE:
You have to filter by distance, right? So you can't filter rows first, then calculate distance. But what you can do, is to use the spatial extension. Then you'll have spatial indexes which could make your queries pretty fast, I guess (never had anything to do with things like this).
Read more about it here.
UPDATE 2:
Actually I don't get what your question is at all. You want to order by something you have to calculate first. So what's the question? Will it be fast? I don't know. All I can tell is, that no index can be used since MySQL does not support indexes on calculated columns. Probably a temporary table to sort whatever data you're getting will be used, too, depends on how much data it is.
I'd suggest try things out.
Just do
SELECT coordA, coordB
FROM whereever
ORDER BY coordA - coordB
and that's it. When you have problems then, ask again.
To filter later
SELECT * FROM (
SELECT coordA, coordB, coordA - coordB AS distance
FROM whereever
ORDER BY distance
) sq
WHERE distance > $foo
When that's too slow play around with indexes and/or the extension mentioned earlier.

Are calculated values used in ORDER BY ever recalculated during sorting?

Suppose we have a query that orders by a calculated value that is not selected, for example:
select * from table
order by col1 * col2 - col3
During typical sorting operations in most languages, it is common for the sort value to be used multiple times during the sort, as a given record is compared to potentially many other records. It is possible that mysql has such an implementation.
Can anyone say definitively if mysql calculates such values once per row and stores them temporarily while the sort completes, or if the values are recalculated whenever a comparison is made (which may be 1-n times)?
I have tagged this mysql, but I would welcome comments/answers regarding/including other popular databases
And the answer is... mysql executes the calculation once per row.
Due to lack of credible answers, I ran a definitive test on sqlfiddle that orders by the result of a not deterministic function (that must be called every time it's compared) that also records in another table when it has been called. It shows that the number of times called = number of rows.
I just used the MySQL query browser to explain a query with sorting on a calculation of several columns and it said that it used file sort...So it would appear that it uses a temporary index (calculate one time).
If you have a particular example that you wanted to test - you could make sure that it did the same for you.
Here is a link to one of the developer's blogs that details how order by works:
http://s.petrunia.net/blog/?p=24

Mysql: use value as alias in query

given a table
create table mymy(A int(2),B int(2))
is it possible to use a field value as an alias? Something like (not really):
select A as valueOf(B) from mymy.
No. You can't. The values are not known until the query is run. And even if you could, you'd have a lot of possibly different values in one column. Which one should be used?
The only valid reason I can imagine for such a request is that you have some kind of EAV design and you want to have a Pivot result.
If that's the case, you could use Dymanic SQL (run a query, get the results, build another query based on those results and run that one.) But this kind of operations is better done at the application side (get the results and format them there, as you prefer).