MySQL, Selecting based on subquery using MIN() and GROUP BY - mysql

First, to describe my data set. I am using SNOMED CT codes and trying to make a usable list out of them. The relevant columns are rowId, conceptID, and Description. rowId is unique, the other two are not. I want to select a very specific subset of those codes:
SELECT *
FROM SnomedCode
WHERE LENGTH(Description)=MIN(LENGTH(Description))
GROUP BY conceptID
The result should be a list of 400,000 unique conceptIDs (out of 1.4 million) and the shortest applicable description for each code. The query above is obviously malformed (and would only return rows where LENGTH(description)=1 because the shortest description in the table is 1 character long.) What am I missing?

SELECT conceptID, MAX(Description)
FROM SnomedCode A
WHERE LENGTH(Description)=(SELECT MIN(LENGTH(B.Description))
FROM SnomedCode B
WHERE B.conceptID = A.conceptID)
GROUP BY conceptID
The "GROUP BY" and "MAX(Description)" are not really necessary, but were added as a tiebreaker for different descriptions with same length for a conceptID, as the requirements include unique conceptIDs.
MAX was chosen to penalize possible leading spaces. Otherwise MIN(Description) works as well.
BTW, this query takes quite some time if you have over million records. Test it with "AND conceptID in (list-of-conceptIDs-to-test)" added in the WHERE clause.
The table SnomedCode must have an index on conceptID. If not, the query will take forever.

Related

Mysql query is returning false results

I have a products table with a product_id field that is an auto-ID and an integer.
When I search:
SELECT * FROM `products` WHERE product_id = '73N716507Y5928128'
it actually returns the row whose product_id is 73. And I may be new to programming but I KNOW that 73 != 73N716507Y5928128.
What can I do to fix this?
The reason BTW for this query is that I am searching multiple tables and multiple fields for a search term and using logic to determine what the user is searching for...
Any help would be greatly appreciated.
In mysql, when you compare (=, <, >, <=, >=, <=>) a numeric field and a character field, the character field is converted to a number first, disregarding any trailing non-numeric characters.
Presumably you want the product_id= part to stay unchanged to take advantage of an index. You can add an additional condition to test if the input is in fact a number:
SELECT * FROM `products`
WHERE product_id = '73N716507Y5928128'
AND CONCAT(0+'73N716507Y5928128')='73N716507Y5928128';
Your business logic should ensure that only integers are passed in the where clause. However, you may also try the following, it will fix the issue. But if products table has millions of records, this will not be a good idea due to table full scan. But if there are say 10K or less products in the table, full scan will not be significant at all.
SELECT * FROM `products` WHERE binary(product_id) = '73N716507Y5928128'

How to speed up the query search with the issue of GROUP BY?

I have the issue of using GROUP BY when select all the column from the table and in result with the poor performance in term of speed.
Select * from employee
group by customer_id;
The query above wouldn't be change,it is mandatory and fixed.It takes 17720ms is to long and the result must take shorter time, which is below 1 minute as my desired result.Since the table has many column and record, so it take much time in query searching.Is there any solution to solve this problem.Thanks.
For as simple as your query is, it appears almost pointless... You would not have duplicate employee IDs within an employee table, and doing a group by would still result in returning every row, every column.
However, that said, to optimize a GROUP BY, you would need an index on that column ... which I would think would already exist as the employee ID would probably be the primary key to the table.
Additionally, you don't have any aggregate columns what would warrant a group by. Are you instead just trying to LOOK for a specific employee? If so, that would be a different query using a WHERE clause for the criteria you are looking for.
FEEDBACK...
You updated your question and did a group by CUSTOMER ID (not employee ID). Ok, but what do you really mean to group by..
OR... Did you want to ORDER by a customer... In other words, I want a list of all employees, but want them sorted by the customer they are associated with... If this is the case, you would want something like...
select *
from employees
ORDER BY
customerID,
employeeLastName,
employeeFirstName
Without seeing your table structure(s), but if the employee table DOES have a column for the customer ID they are associated with, this query would put all employees for the same customer in a common PRE-SORT output by customer, then within that customer, sorted by the employees name (last, first).
If you have another table(s) with relationships between employees and customers, we would need to see that too to better offer an answer.
Column with heavy type LIKE BLOB, TEXT, NVARCHAR(200 or more) will slowdown your query by a lot if you have a lot of records. I suggest to check if it is really necessary to load them all from the start.
Also, you GROUP BY seem weird. What exactly are you trying to achieve with it?
The GROUP BY is not just weird, it is wrong. If you don't specify all the non-aggregate columns in the GROUP BY, you get seemingly random values for each column. Remove the GROUP BY or explain why you think you need it.
Or maybe the "*" is not correct. OK, you cannot show us your real column names, at least show us the real pattern to the SELECT, even if it has bogus column names.
I'm also confused as to why you call it a "search". There is no WHERE clause, which is where "search" criteria goes.

Whats wrong with this MYSQL query

I have the following SQL query , it seems to run ok , but i am concerned as my site grows it may not perform as expected ,I would like some feeback as to how effective and efficient this query really is:
select * from articles where category_id=XX AND city_id=XXX GROUP BY user_id ORDER BY created_date DESC LIMIT 10;
Basically what i am trying to achieve - is to get the newest articles by created_date limited to 10 , articles must only be selected if the following criteria are met :
City ID must equal the given value
Category ID must equal the given value
Only one article per user must be returned
Articles must be sorted by date and only the top 10 latest articles must be returned
You've got a GROUP BY clause which only contains one column, but you are pulling all the columns there are without aggregating them. Do you realise that the values returned for the columns not specified in GROUP BY and not aggregated are not guaranteed?
You are also referencing such a column in the ORDER BY clause. Since the values of that column aren't guaranteed, you have no guarantee what rows are going to be returned with subsequent invocations of this script even in the absence of changes to the underlying table.
So, I would at least change the ORDER BY clause to something like this:
ORDER BY MAX(created_date)
or this:
ORDER BY MIN(created_date)
some potential improvements (for best query performance):
make sure you have an index on all columns you querynote: check if you really need an index on all columns because this has a negative performance when the BD has to build the index. -> for more details take a look here: http://dev.mysql.com/doc/refman/5.1/en/optimization-indexes.html
SELECT * would select all columns of the table. SELECT only the ones you really require...

optimizing a complex query in mysql

I have two questions here but i am asking them at once as i think they are inter-related.
I am working with a complex query (Multiple joins + sub queries) and the table is pretty huge as well (around 2,00,000 records in this table).
A part of this query (a LEFT JOIN) is required to find a record which has a second lowest value in a cetain column among all the records associated with the primary key of the first table. For now I have isolated this part and thinking on the lines of -
SELECT id FROM tbl ORDER BY `myvalue` ASC LIMIT 1,1;
But there is a case where, if there is only 1 record in the table, it must return that record instead of NULL. So my first question is how do write a query for this ?
Secondly, considering the size of the table and the time its already taking to run even after creating indexes, I understand that adding any more complexity to it in order to achieve the above part might affect the querying time dramatically.
I cannot decompose joins because I need to get some of the columns for the ORDER BY clause (the application has an option to sort the result by these columns, the above column "myvalue" being one of them)
What would be the way(s) to approach this problem ?
Thanks
Something like this might work
COALESCE(
(SELECT id FROM tbl ORDER BY `myvalue` ASC LIMIT 1,1),
(SELECT id FROM tbl ORDER BY `myvalue` ASC LIMIT 0,1))
It selects the first non null value from the list provided.
As for the complexity of the query, post the whole thing so we can take a look at it.

MySQL: SELECT(x) WHERE vs COUNT WHERE?

This is going to be one of those questions but I need to ask it.
I have a large table which may or may not have one unique row. I therefore need a MySQL query that will just tell me TRUE or FALSE.
With my current knowledge, I see two options (pseudo code):
[id = primary key]
OPTION 1:
SELECT id FROM table WHERE x=1 LIMIT 1
... and then determine in PHP whether a result was returned.
OPTION 2:
SELECT COUNT(id) FROM table WHERE x=1
... and then just use the count.
Is either of these preferable for any reason, or is there perhaps an even better solution?
Thanks.
If the selection criterion is truly unique (i.e. yields at most one result), you are going to see massive performance improvement by having an index on the column (or columns) involved in that criterion.
create index my_unique_index on table(x)
If you want to enforce the uniqueness, that is not even an option, you must have
create unique index my_unique_index on table(x)
Having this index, querying on the unique criterion will perform very well, regardless of minor SQL tweaks like count(*), count(id), count(x), limit 1 and so on.
For clarity, I would write
select count(*) from table where x = ?
I would avoid LIMIT 1 for two other reasons:
It is non-standard SQL. I am not religious about that, use the MySQL-specific stuff where necessary (i.e. for paging data), but it is not necessary here.
If for some reason, you have more than one row of data, that is probably a serious bug in your application. With LIMIT 1, you are never going to see the problem. This is like counting dinosaurs in Jurassic Park with the assumption that the number can only possibly go down.
AFAIK, if you have an index on your ID column both queries will be more or less equal performance. The second query will need 1 less line of code in your program but that's not going to make any performance impact either.
Personally I typically do the first one of selecting the id from the row and limiting to 1 row. I like this better from a coding perspective. Instead of having to actually retrieve the data, I just check the number of rows returned.
If I were to compare speeds, I would say not doing a count in MySQL would be faster. I don't have any proof, but my guess would be that MySQL has to get all of the rows and then count how many there are. Altough...on second thought, it would have to do that in the first option as well so the code will know how many rows there are as well. But since you have COUNT(id) vs COUNT(*), I would say it might be slightly slower.
Intuitively, the first one could be faster since it can abort the table(or index) scan when finds the first value. But you should retrieve x not id, since if the engine it's using an index on x, it doesn't need to go to the block where the row actually is.
Another option could be:
select exists(select 1 from mytable where x = ?) from dual
Which already returns a boolean.
Typically, you use group by having clause do determine if there are duplicate rows in a table. If you have a table with id and a name. (Assuming id is the primary key, and you want to know if name is unique or repeated). You would use
select name, count(*) as total from mytable group by name having total > 1;
The above will return the number of names which are repeated and the number of times.
If you just want one query to get your answer as true or false, you can use a nested query, e.g.
select if(count(*) >= 1, True, False) from (select name, count(*) as total from mytable group by name having total > 1) a;
The above should return true, if your table has duplicate rows, otherwise false.