I've been experimenting with this particular table:
http://www.quackit.com/sql/tutorial/sql_order_by.cfm
and it seems when I order by more than 2 columns, I get the same results as ordering by one column.
For example:
SELECT * FROM Individual ORDER BY last_name;
is basically the same as saying:
SELECT * FROM Individual ORDER BY last_name, first_name;
What's the whole point of ordering by multiple columns in SQL? I really see no practical use of it, are there some things you can accomplish with it that you can't accomplish in sorting by same column?
It is not the same.
While ORDER BY last_name may produce a result like
last_name | first_name
Doe | John
Doe | Jane
ORDER BY last_name,first_name is always
last_name | first_name
Doe | Jane
Doe | John
If 2+ people have the same last name, the second sort column will sort by their first name.
it could be that there is a index on first_name and last_name column and they are being sorted on the index.
Related
I am doing some works to fill in document by using MySQL Database. What I want to do is to make result with given WHERE condition. Following student table:
student
+-----+------------+-----------------+-----+
| id | nickname | student_name | ... |
+-----+------------+-----------------+-----+
| 1 | Joy | Anderson | ... |
| 2 | Prank | Campbell | ... |
+-----+------------+-----------------+-----+
I gave this following query to database:
SELECT nickname FROM students WHERE student_name in ('Anderson', 'Campbell')
then, I expected a result like this:
Joy
Prank
Above expected result is matched with sequence on WHERE condition. ( WHERE student_name in ('Anderson', 'Campbell') ) Joy is matched with Anderson and Prank is matched with Campbell. But current result is like this:
Prank
Joy
Now, I don't know what I should do to make my expected result. Does anyone can give me some idea or information for this situation ?
You have fallen into a common SQL trap. Rows such as your rows in students and members of sets such as ('Anderson', 'Campbell') have no built-in order. The server doesn't know anything about Anderson coming before Campbell even though your query shows them that way.
Your only recourse is to use an appropriate ORDER BY clause. Without an ORDER BY clause, results are shown in an order that's formally unpredictable. In your case ORDER BY student_name at the end of your query will make your row ordering predictable.
Unpredictable is a complex idea. It's like random except worse. Random usually implies a result is likely to be different each time. Unpredictable means it's the same every time, until it isn't.
I have the following (intentionally denormalized for demonstrating purposes) sample CARS table:
| CAR_ID | OWNER_ID | OWNER_NAME | COLOR |
|--------|----------|------------|-------|
| 1 | 1 | John | White |
| 2 | 1 | John | Black |
| 3 | 2 | Mike | White |
| 4 | 2 | Mike | Black |
| 5 | 2 | Mike | Brown |
| 6 | 3 | Tony | White |
If I wanted to count the amount of cars per owner and return this:
| OWNER_ID | OWNER_NAME | TOTAL |
|----------|------------|-------|
| 1 | John | 2 |
| 2 | Mike | 3 |
| 3 | Tony | 1 |
I know I can write the following query:
SELECT owner_id, owner_name, COUNT(*) total FROM cars
GROUP BY owner_id, owner_name
However, removing owner_name from the GROUP BY clause gives me the same results.
What is the difference between those 2 queries?
Under what circumstances should I group by all non-agreggated fields in the SELECT statement and in which ones shouldn't I?
Can you give an example in which this grouping would return different results when removing a non-aggregated field and explain why?
The first thing to make clear is that SQL is not MySQL.
In standard SQL it is not allowed to group by a subset of the non-aggregated fields. The reason is very simple. Suppose I'm running this query:
SELECT color, owner_name, COUNT(*) FROM cars
GROUP BY color
That query would not make any sense. Even trying to explain it would be impossible. For sure it is selecting colors and counting the amount of cars per color. However, it is also adding the owner_name field and there can be many owners for a given color, as it is the case of the White color. So if there can be many owner_name values for a single color which happens to be the only field in the GROUP BY clause... then which owner_name will be returned?
If it is needed to return an owner_name then some kind of criteria should be added to only select one of them, e.g., the first one alphabetically, which in this case would be John. That criteria would result in adding an aggregate function MIN(owner_name) and then the query will make sense again as it will be grouping by, at least, all the non-agreggated fields in the select statement.
As you can see, there is a clear and practical reason for standard SQL to be inflexible in the grouping. If it wasn't, you could face awkward situations in which the value for a column will be unpredictable, and that is not a nice word, particularly if the query being run is showing you your bank account transactions.
Having said that, then why would MySQL allow queries that might not make sense? And even worse, the error in the query above could be just syntactically detected! The short answer is: performance. The long answer is that there are certain situations in which, based on data relations, getting an unpredictable value from the group will result in a predictable value.
If you haven't figured it out yet, the only way in which you can predict the value you'll get from taking an unpredictable element from a group will be if all the elements in the group are the same. A clear example of this situation is in the sample query in your very same question. Look at how owner_id and owner_name relates in the table. It is clear that given any owner_id, e.g. 2, you can only have one distinct owner_name. Even having many rows, by choosing any, you will get Mike as the result. In formal database jargon this can be explained as owner_id functionally determines owner_name.
Let's take a closer look at that fully working MySQL query:
SELECT owner_id, owner_name, COUNT(*) total FROM cars
GROUP BY owner_id
Given any owner_id this would return the same owner_name, so adding it to the GROUP BY clause will not result in more rows returned. Even adding an aggregated function MAX(owner_name) will not result in less rows returned. The resulting data will be exacly the same. In both cases, the query would be immediately turned into a legal standard SQL query as at least all the non-aggregated fields would be grouped by. So there are 3 approaches to get the same results.
However, as I mentioned before, this non-standard grouping has a performance advantage. You can check this so underrated link in which this is explained for more detail but I'm going to cite the most important part:
You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. [...] The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
One thing that is worth mentioning is that the results are not necessarily wrong but rather indeterminate. In other words, getting the expected results does not mean you have written the right query. Writing the right query will always give you the expected results.
As you can see, it might be worth applying this MySQL extension to the GROUP BY clause. Anyway, if this is not 100% clear yet then there is a rule of thumb that will make sure that your grouping will always be correct: Always group, at least, by all the non-aggregated fields in the select clause. You might be wasting a few CPU cycles in certain situations but it is better than returning indeterminate results. If you're still terrified about not grouping correctly then changing the ONLY_FULL_GROUP_BY SQL mode could be a last resort :)
May your grouping be correct and performant... or at least correct.
anyone know to how to create a query to find out if the data in one column contains (like function) of another column?
For example
ID||First_Name || Last_Name
------------------------
1 ||Matt || Doe
------------------------
2 ||Smith || John Doe
------------------------
3 ||John || John Smith
find all rows where Last_name contains First_name. The answer is ID 3
thanks in advance
Here's one way to do it:
Select *
from TABLE
where instr(first_name, last_name) >= 1;
Try this:
select * from TABLE where last_name LIKE '%' + first_name + '%'
WHERE Last_Name LIKE '%'+First_Name+'%'
You could also use INSTR(), but take note, both methods perform full-table scans, a general no-no when dealing with high-performance MySQL.
I have data stored in a MySQL database according to the Entity-Attribute-Value pattern (EAV), specifically user profile values from Drupal 6. I would need an SQL query or view to get the data as a normal relational table. The tables have the following layout:
Table: users
user_id username
---------------------
1 steve
2 michelle
Table: profile_fields
field_id field_name
------------------------
1 first_name
2 last_name
Table: profile_values
field_id user_id value
---------------------------
1 1 Steve
2 1 Smith
1 2 Michelle
2 2 Addams
And I would need to somehow get the following result from a query:
user_id first_name last_name
-----------------------------------
1 Steve Smith
2 Michelle Addams
I have understood this is impossible to do in a single SQL query in the general case. But this is not the general case, and I have two advantages:
I know the content of the "profile_fields" table, and I am 100% sure that this data will not change for the time period that this query will be used.
It doesn't have to be in a single query - it can be a query, some PHP code to analyze the results and then another query.
This can be done in a sql query using columnar subqueries as follows:
SELECT
u.user_id,
(select value from profile_values f1 WHERE f1.field_id=1 and u.user_id=f1.user_id) AS first_name,
(select value from profile_values f2 WHERE f2.field_id=2 and u.user_id=f2.user_id) AS last_name
FROM users u
I have a table of employees and their schedule, like so:
Emp_Name | Date
-------- -----
Smith | 08-01-2009
Jones | 08-01-2009
Goodman | 08-02-2009
Smith | 08-02-2009
Jones | 08-02-2009
Goodman | 08-03-2009
How would I write a query so that the results were only employee names of employees working on 08-02-2009 and 08-03-2009.
I'm getting caught up because all I can think of are ways to get the names for EITHER match, but I can't seem to find the right way to get the results for only the names that have matches for all search criteria across multiple rows.
So based on the example conditions, I should only get Goodman. But if I do a query like WHERE Date IS (list of dates) I would get Goodman and Smith and Jones, since Smith and Jones both work on 08-02-2009. If I try to do some kind of JOIN, I would not always end up with equal columns, since the number of days each employees works is variable.
I thought Union might be the way to go, but I wouldn't always know how may conditions someone was searching by.
Here's my first stab at it, there's probably a more efficient way than using HAVING though...
SELECT Emp_Name,COUNT(DISTINCT date) as daycount
FROM employee
WHERE date IN ('08-01-2009', '08-02-2009')
GROUP BY Emp_Name
HAVING daycount=2
If you don't need a generic solution, you can use this:
select Emp_Name
from employee
where Emp_Name
in (select Emp_Name from employee where Date = '08-02-2009')
and in (select Emp_Name from employee where Date = '08-03-2009')
...