MySQL "GROUP BY" experiment - mysql

I am testing SQL and I am stuck on one query. It is a useless query but I want to understand it.
select count(*), floor(rand()*2) as x from table_name group by x;
The result is either two rows, or duplicate entry '0/1' for key 'group_key'
What happens that leads to this error?

rand() is going to generate a random number for every row in your table. You are then grouping by the results of all of those random numbers. You will get one row for each unique value.

The main point here is not to group by some strange sinthetic data.
Better to group by some certain fields.
Because mysql have some bugs there.
Like this
http://bugs.mysql.com/bug.php?id=58081
Or this
https://bugs.mysql.com/bug.php?id=60808
Certainly it is trying to create a unique index on a tmp table and that is somehow not working

Related

SQL query performance - single record, filtered by non-indexed column, sorted by indexed column, record is close in the sequence of sorted records

I have the following (My)SQL query:
SELECT * FROM table WHERE nidx = x ORDER BY id DESC LIMIT 1
With the following assumptions:
id is an indexed field
nidx is a non-indexed field (let's say it has a numerical type)
x is a constant
record with nidx = x is relatively close in the ordered sequence of records (let's say it is guaranted to be somewhere among the first 1000 records in the order)
I have two questions:
Can I assume that this is an efficient query or should I add an index to nidx column?
Does the answer to the first question depends on the specific RDBMS (so it may by different for MySQL, PostgreSQL, MSSQL, SQLite, etc.)? If yes, how it is for MySQL?
Ordering is applied after filtering. The ORDER BY clause does not help the seach in this case. Equally, unless you have some clear constraint on the table that indicates the values will be close, the optimiser doesn't know that and it won't help.
What -might- help, if you can't / won't apply an index on nidx is to first get the records around id = x and then search those.
Something like...
SELECT
*
FROM
table
WHERE
id BETWEEN x - 1000 AND x + 1000
AND nidx = x
ORDER BY
id
LIMIT
1
-Hopefully- this will allow the optimiser to build a plan where the 2000 records around id=x are found first, then only those 2000 records manually searched for nidx= x.
You'll have to try it and see, and use EXPLAIN to find out exactly what's being done in what order.
In general, however, this is a hack, don't rely on it too much. Better to fix the indexing.
Which is advice for all platforms
Just add the index. :)
Considering the number of records, index would be preferrable.
Example in MySQL:
ALTER TABLE table ADD INDEX nidx_index (nidx)
You can also create unique index for unique values:
ALTER TABLE table ADD UNIQUE INDEX nidx_index (nidx)
You could use an index for the nidx field, but you have to keep in mind that this will make the UPDATE, INSERT and DELETE queries more inefficient.
The most penalizing of the sql queries with ORDER BY and GROUP BY, because they are operations that are performed at the end. If it is not necessary, I would remove the ORDER BY
finally you can use the EXPLAIN command to diagnose SQL queries
EXPLAIN SELECT * FROM table WHERE nidx = x ORDER BY id DESC
Here a little tutorial for improve a query using Explain
https://dev.mysql.com/doc/workbench/en/wb-tutorial-visual-explain-dbt3.html

How to speed up the query search with the issue of GROUP BY?

I have the issue of using GROUP BY when select all the column from the table and in result with the poor performance in term of speed.
Select * from employee
group by customer_id;
The query above wouldn't be change,it is mandatory and fixed.It takes 17720ms is to long and the result must take shorter time, which is below 1 minute as my desired result.Since the table has many column and record, so it take much time in query searching.Is there any solution to solve this problem.Thanks.
For as simple as your query is, it appears almost pointless... You would not have duplicate employee IDs within an employee table, and doing a group by would still result in returning every row, every column.
However, that said, to optimize a GROUP BY, you would need an index on that column ... which I would think would already exist as the employee ID would probably be the primary key to the table.
Additionally, you don't have any aggregate columns what would warrant a group by. Are you instead just trying to LOOK for a specific employee? If so, that would be a different query using a WHERE clause for the criteria you are looking for.
FEEDBACK...
You updated your question and did a group by CUSTOMER ID (not employee ID). Ok, but what do you really mean to group by..
OR... Did you want to ORDER by a customer... In other words, I want a list of all employees, but want them sorted by the customer they are associated with... If this is the case, you would want something like...
select *
from employees
ORDER BY
customerID,
employeeLastName,
employeeFirstName
Without seeing your table structure(s), but if the employee table DOES have a column for the customer ID they are associated with, this query would put all employees for the same customer in a common PRE-SORT output by customer, then within that customer, sorted by the employees name (last, first).
If you have another table(s) with relationships between employees and customers, we would need to see that too to better offer an answer.
Column with heavy type LIKE BLOB, TEXT, NVARCHAR(200 or more) will slowdown your query by a lot if you have a lot of records. I suggest to check if it is really necessary to load them all from the start.
Also, you GROUP BY seem weird. What exactly are you trying to achieve with it?
The GROUP BY is not just weird, it is wrong. If you don't specify all the non-aggregate columns in the GROUP BY, you get seemingly random values for each column. Remove the GROUP BY or explain why you think you need it.
Or maybe the "*" is not correct. OK, you cannot show us your real column names, at least show us the real pattern to the SELECT, even if it has bogus column names.
I'm also confused as to why you call it a "search". There is no WHERE clause, which is where "search" criteria goes.

get last record in file

I have a table (rather ugly designed, but anyway), which consists only of strings. The worst is that there is a script which adds records time at time. Records will never be deleted.
I believe, that MySQL store records in a random access file, and I can get last or any other record using C language or something, since I know the max length of the record and I can find EOF.
When I do something like "SELECT * FROM table" in MySQL I get all the records in the right order - cause MySQL reads this file from the beginning to the end. I need only the last one(s).
Is there a way to get the LAST record (or records) using MySQL query only, without ORDER BY?
Well, I suppose I've found a solution here, so my current query is
SELECT
#i:=#i+1 AS iterator,
t.*
FROM
table t,
(SELECT #i:=0) i
ORDER BY
iterator DESC
LIMIT 5
If there's a better solution, please let me know!
The order is not guaranteed unless you use an ORDER BY. It just happens that the records you're getting back are sorted the way need them.
Here is the importance of keys (primary key for example).
You can make some modification in your table by adding a primary key column with auto_increment default value.
Then you can query
select * from your_table where id =(select max(id) from your_table);
and get the last inserted row.

optimizing a complex query in mysql

I have two questions here but i am asking them at once as i think they are inter-related.
I am working with a complex query (Multiple joins + sub queries) and the table is pretty huge as well (around 2,00,000 records in this table).
A part of this query (a LEFT JOIN) is required to find a record which has a second lowest value in a cetain column among all the records associated with the primary key of the first table. For now I have isolated this part and thinking on the lines of -
SELECT id FROM tbl ORDER BY `myvalue` ASC LIMIT 1,1;
But there is a case where, if there is only 1 record in the table, it must return that record instead of NULL. So my first question is how do write a query for this ?
Secondly, considering the size of the table and the time its already taking to run even after creating indexes, I understand that adding any more complexity to it in order to achieve the above part might affect the querying time dramatically.
I cannot decompose joins because I need to get some of the columns for the ORDER BY clause (the application has an option to sort the result by these columns, the above column "myvalue" being one of them)
What would be the way(s) to approach this problem ?
Thanks
Something like this might work
COALESCE(
(SELECT id FROM tbl ORDER BY `myvalue` ASC LIMIT 1,1),
(SELECT id FROM tbl ORDER BY `myvalue` ASC LIMIT 0,1))
It selects the first non null value from the list provided.
As for the complexity of the query, post the whole thing so we can take a look at it.

MySQL: SELECT(x) WHERE vs COUNT WHERE?

This is going to be one of those questions but I need to ask it.
I have a large table which may or may not have one unique row. I therefore need a MySQL query that will just tell me TRUE or FALSE.
With my current knowledge, I see two options (pseudo code):
[id = primary key]
OPTION 1:
SELECT id FROM table WHERE x=1 LIMIT 1
... and then determine in PHP whether a result was returned.
OPTION 2:
SELECT COUNT(id) FROM table WHERE x=1
... and then just use the count.
Is either of these preferable for any reason, or is there perhaps an even better solution?
Thanks.
If the selection criterion is truly unique (i.e. yields at most one result), you are going to see massive performance improvement by having an index on the column (or columns) involved in that criterion.
create index my_unique_index on table(x)
If you want to enforce the uniqueness, that is not even an option, you must have
create unique index my_unique_index on table(x)
Having this index, querying on the unique criterion will perform very well, regardless of minor SQL tweaks like count(*), count(id), count(x), limit 1 and so on.
For clarity, I would write
select count(*) from table where x = ?
I would avoid LIMIT 1 for two other reasons:
It is non-standard SQL. I am not religious about that, use the MySQL-specific stuff where necessary (i.e. for paging data), but it is not necessary here.
If for some reason, you have more than one row of data, that is probably a serious bug in your application. With LIMIT 1, you are never going to see the problem. This is like counting dinosaurs in Jurassic Park with the assumption that the number can only possibly go down.
AFAIK, if you have an index on your ID column both queries will be more or less equal performance. The second query will need 1 less line of code in your program but that's not going to make any performance impact either.
Personally I typically do the first one of selecting the id from the row and limiting to 1 row. I like this better from a coding perspective. Instead of having to actually retrieve the data, I just check the number of rows returned.
If I were to compare speeds, I would say not doing a count in MySQL would be faster. I don't have any proof, but my guess would be that MySQL has to get all of the rows and then count how many there are. Altough...on second thought, it would have to do that in the first option as well so the code will know how many rows there are as well. But since you have COUNT(id) vs COUNT(*), I would say it might be slightly slower.
Intuitively, the first one could be faster since it can abort the table(or index) scan when finds the first value. But you should retrieve x not id, since if the engine it's using an index on x, it doesn't need to go to the block where the row actually is.
Another option could be:
select exists(select 1 from mytable where x = ?) from dual
Which already returns a boolean.
Typically, you use group by having clause do determine if there are duplicate rows in a table. If you have a table with id and a name. (Assuming id is the primary key, and you want to know if name is unique or repeated). You would use
select name, count(*) as total from mytable group by name having total > 1;
The above will return the number of names which are repeated and the number of times.
If you just want one query to get your answer as true or false, you can use a nested query, e.g.
select if(count(*) >= 1, True, False) from (select name, count(*) as total from mytable group by name having total > 1) a;
The above should return true, if your table has duplicate rows, otherwise false.