SELECT other if value is no - mysql

I have a column called is_thumbnail with a default value of no. I select a row based off of is_thumbnail = 'yes'
$query = $conn->prepare("SELECT * FROM project_data
WHERE project_id = :projectId
AND is_thumbnail = 'yes'
LIMIT 1");
There is a chance that no rows will have a value of yes. In that case, I want to select the first row with that same projectId regardless of the value of is_thumbnail
Now I know I can see what the query returns and then run another query. I was wondering if it was possible to do this in a single query or is there somehow I can take advantage of PDO? I just started using PDO. Thanks!
Example data:
id project_id image is_thumbnail
20 2 50f5c7b5b8566_20120803_185833.jpg no
19 2 50f5c7b2767d1_4link 048.jpg no
18 2 50f5c7af2fb22_4link 047.jpg no

$query = $conn->prepare("SELECT * FROM project_data
WHERE project_id = :projectId
ORDER BY is_thumbnail ASC LIMIT 1");

Given that the schema described in the question shows multiple rows for a given project_id, using only ORDER BY is_thumbnail ... solutions may not yield good performance, if there is for instance a single project with many related rows. The cost of sorting rows can potentially be fairly high, and it won't be able to use an index. An alternate solution which may be necessary is:
SELECT * FROM (
SELECT *
FROM project_data
WHERE project_id = :projectId AND is_thumbnail = "yes"
ORDER BY id DESC
LIMIT 1
UNION
SELECT *
FROM project_data
WHERE project_id = :projectId AND is_thumbnail = "no"
ORDER BY id DESC
LIMIT 1
) AS t
ORDER BY t.is_thumbnail = "yes" DESC
LIMIT 1
While this solution is a bit more complex to understand, it is able to use a compound index on (project_id, is_thumbnail, id) to quickly find exactly one row matching the requested conditions. The outer select ensures a stable ordering of the yes/no rows if both are found.
Note that you could also just issue two queries, and probably get similar or better performance. In order to use the above UNION and sub-select, MySQL will require temporary tables, which aren't great in busy environments.

Related

MySQL ORDER BY from subquery lost by GROUP BY

I have a table x :
id lang externalid
1 nl 10
2 nl 11
3 fr 10
From this table I want al the rows for a certain lang and externalid, if the externalid doesn't exist for this lang, I want the row with any other lang.
The subquery sorts the table correct, but when I add the group by, the sort of the subquery is lost. This works in older mysql versions but not in 5.7.
(
SELECT
*
FROM
x
ORDER BY FIELD(lang, "fr") DESC, id
)
as y
group by externalid
I want the query to return the records with id 2 & 3. So for each distinct external id, if possible the lang = 'fr', else any other lang.
How can i solve this problem?
You are talking of given externalid and land. No need to group by externalid hence; use a mere where clause instead.
Combined with ORDER BY and LIMIT you get the record you want (i.e. the desired language if such a record exists, else another one).
select *
from mytable
where externalid = 10
order by lang = 'fr' desc
limit 1;
UPDATE: Okay, according to your comment you want to get the "best" record per externalid. In standard SQL you'd use ROW_NUMBER for this. Other DBMS have further solutions, e.g. Oracle's KEEP FIRST or Postgre's DISTINCT ON. MySQL doesn't support any of these. One way would be to emulate ROW_NUMBER with variables. Another would be to use above query as a subquery per externalid to find the best records:
select *
from mytable
where id in
(
select
(
select m.id
from mytable m
where m.externalid = e.externalid
order by m.lang = 'fr' desc
limit 1
) as best_id
from (select distinct externalid from mytable) e
);
Your subquery generates a result set (a virtual table) that's passed to your outer query.
All SQL queries, without exception, generate their results in unpredictable order unless you specify the order completely in an ORDER BY clause.
Unpredictable is like random, except worse. Random implies you'll get a different order every time you run the query. Unpredictable means you'll get the same order every time, until you don't.
MySQL ordinarily ignores ORDER BY clauses in subqueries (there are a few exceptions, mostly related to subquery LIMIT clauses). Move your ORDER BY to the top level query.
Edit. You are also misusing MySQL's notorious nonstandard extension to GROUP BY.

Fetch entries in table from last using LIMIT a,b or limit offset

I want to fetch latest entries in a table that is containing more than 1,000,000 entries. I am using this query for an instance
SELECT id FROM tablea WHERE flag = "N" ORDER BY id LIMIT 510045,200;
and it gives me entries starting from 510045 and ending at 510245. Can MYSQL have something where I can get entries starting from 510245 to 510045. I mean fetching the data from the last and I don't want to fetch only 200 entries.
You should ORDER BY desc and, if you want, LIMIT for define how many entries you want.
Example:
SELECT id FROM tablea WHERE flag = "N" ORDER BY id DESC;
-- this will help to find the last entries
But if you want to have the latest entries that you didn't get in last query, you should always hold the value of the last ID, and use it as reference to next check.
Example (Supposing the last ID of the last query execution was 55304):
SELECT id FROM tablea WHERE flag = "N" WHERE id > 55304 ORDER BY id DESC;
If what you want is rows where the id is greater than 510245 just use the where condition
Select * FROM table WHERE flag = 'n' AND id > 510245
This should do it
As i understand your requirement . you may try it.
Select * FROM table WHERE flag = 'N' AND id > 510245 ORDER BY id
One more thing here is
The version i was working on was not supporting subquery containing LIMIT. So, #strawberry Thanks for giving me the hint to solve the question. But I used this sub query as inner join table(explained below)
SELECT id FROM tablea AS T1
INNER JOIN (SELECT id FROM tablea WHERE flag = "N" ORDER BY id LIMIT 510045,200) AS T2
WHERE T2._id = T1._id ORDER BY T2._id DESC;
This gave me the required results. Thanks everyone for your help !!

Count matches in the previous 100 rows

As the title states, I'm trying to count the number of matches within the last 100 records in a certain table.
This query works, but the data is very dynamic, with lots of inserts on that particular table, and similar queries are being run, and they all end up being extremely slow (20s) probably blocking each other out.
Because caching the result is not acceptable (data has to be live) I'm thinking of switching the exterior query to a PHP, even though I know that would be slower because it would still be faster than 20s.
Here's the query
SELECT count(*) as matches
FROM (
SELECT first_name FROM customers
WHERE division = 'some_division'
AND type = 'employee'
ORDER BY request_time DESC
LIMIT 0, 100
) as entries
WHERE first_name = 'some_first_name_here'
What I'm looking for a more optimized way of performing the same task, without having to implement it in PHP since that's the naive/obviously wrong approach.
the table looks something like this:
id first_name last_name type division request_time
Just to set things straight, this is obviously not the actual table / data due to NDA reasons, but, the table looks exactly the same with different column names.
So again, what I'm trying to achieve is to pull a count of matches found WITHIN the last 100 records which have some contraints.
for example,
how many times does the name 'John' appear within the last 100 employees added in the HR division?
I see.
How about something like this...
SELECT i
FROM
( SELECT CASE WHEN first_name = 'some_first_name_here' THEN #i:=#i+1 END i
FROM customers
WHERE division = 'some_division'
AND type = 'employee'
, (SELECT #i:=0)val
ORDER
BY request_time
DESC
LIMIT 0,100
) n
ORDER
BY i DESC
LIMIT 1;
Try this:
SELECT SUM(matches)
FROM
(
SELECT IF(first_name = 'some_first_name_here', 1, 0) AS matches
FROM customers
WHERE division = 'some_division' AND type = 'employee'
ORDER BY request_time DESC
LIMIT 0,100
) AS entries

MySQL Optimize UNION query

I'm trying to optimize a query.
My question seems to be similar to MySQL, Union ALL and LIMIT and the answer might be the same (I'm afraid). However in my case there's a stricter limit (1) as well as an index on a datetime column.
So here we go:
For simplicity, let's have just one table with three: columns:
md5 (varchar)
value (varchar).
lastupdated (datetime)
There's an index on (md5, updated) so selecting on a md5 key, ordering by updated and limiting to 1 will be optimized.
The search shall return a maximum of one record matching one of 10 md5 keys. The keys have a priority. So if there's a record with prio 1 it will be preferred over any record with prio 2, 3 etc.
Currently UNION ALL is used:
select * from
(
(
select 0 prio, value
from mytable
where md5 = '7b76e7c87e1e697d08300fd9058ed1db'
order by lastupdated desc
limit 1
)
union all
(
select 1 prio, value
from mytable
where md5 = 'eb36cd1c563ffedc6adaf8b74c259723'
order by lastupdated desc
limit 1
)
) x
order by prio
limit 1;
It works, but the UNION seems to execute all 10 queries if 10 keys are provided.
However, from a business perspective, it would be ok to run the selects sequentially and stop after the first match.
Is that possible though plain SQL?
Or would the only option be a stored procedure?
There's a much better way to do this that doesn't need UNION. You really want the groupwise max for each key, with a custom ordering.
Groupwise Max
Order by FIELD()
There's no way the optimizer for UNION ALL can figure out what you're up to.
I don't know if you can do this, but suppose you had a md5prio table with the list of hash codes you know you're looking for. For example.
prio md5
0 '7b76e7c87e1e697d08300fd9058ed1db'
1 'eb36cd1c563ffedc6adaf8b74c259723'
etc
in it.
Then your query could be:
select mytable.*
from mytable
join md5prio on mytable.md5 = md5prio.md5
order by md5prio.prio, mytable.lastupdated desc
limit 1
This might save the repeated queries. You'll definitely need your index on mytable.md5. I am not sure whether your compound index on lastupdated will help; you'll need to try it.
In your case, the most efficient solution may be to build an index on (md5, lastupdated). This index should be used to resolve each subquery very efficiently (looking up the values in the index and then looking up one data page).
Unfortunately, the groupwise max referenced by Gavin will produce multiple rows when there are duplicate lastupdated values (admittedly, perhaps not a concern in your case).
There is, actually, a MySQL way to get this answer, using group_concat and substring_index:
select p.prio,
substring_index(group_concat(mt.value order by mt.lastupdated desc), ',', 1)
from mytable mt join
(select 0 as prio, '7b76e7c87e1e697d08300fd9058ed1db' as md5 union all
select 1 as prio, 'eb36cd1c563ffedc6adaf8b74c259723' as md5 union all
. . .
) p
on mt.md5 = p.md5

SQL query: Delete all records from the table except latest N?

Is it possible to build a single mysql query (without variables) to remove all records from the table, except latest N (sorted by id desc)?
Something like this, only it doesn't work :)
delete from table order by id ASC limit ((select count(*) from table ) - N)
Thanks.
You cannot delete the records that way, the main issue being that you cannot use a subquery to specify the value of a LIMIT clause.
This works (tested in MySQL 5.0.67):
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
The intermediate subquery is required. Without it we'd run into two errors:
SQL Error (1093): You can't specify target table 'table' for update in FROM clause - MySQL doesn't allow you to refer to the table you are deleting from within a direct subquery.
SQL Error (1235): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' - You can't use the LIMIT clause within a direct subquery of a NOT IN operator.
Fortunately, using an intermediate subquery allows us to bypass both of these limitations.
Nicole has pointed out this query can be optimised significantly for certain use cases (such as this one). I recommend reading that answer as well to see if it fits yours.
I know I'm resurrecting quite an old question, but I recently ran into this issue, but needed something that scales to large numbers well. There wasn't any existing performance data, and since this question has had quite a bit of attention, I thought I'd post what I found.
The solutions that actually worked were the Alex Barrett's double sub-query/NOT IN method (similar to Bill Karwin's), and Quassnoi's LEFT JOIN method.
Unfortunately both of the above methods create very large intermediate temporary tables and performance degrades quickly as the number of records not being deleted gets large.
What I settled on utilizes Alex Barrett's double sub-query (thanks!) but uses <= instead of NOT IN:
DELETE FROM `test_sandbox`
WHERE id <= (
SELECT id
FROM (
SELECT id
FROM `test_sandbox`
ORDER BY id DESC
LIMIT 1 OFFSET 42 -- keep this many records
) foo
);
It uses OFFSET to get the id of the Nth record and deletes that record and all previous records.
Since ordering is already an assumption of this problem (ORDER BY id DESC), <= is a perfect fit.
It is much faster, since the temporary table generated by the subquery contains just one record instead of N records.
Test case
I tested the three working methods and the new method above in two test cases.
Both test cases use 10000 existing rows, while the first test keeps 9000 (deletes the oldest 1000) and the second test keeps 50 (deletes the oldest 9950).
+-----------+------------------------+----------------------+
| | 10000 TOTAL, KEEP 9000 | 10000 TOTAL, KEEP 50 |
+-----------+------------------------+----------------------+
| NOT IN | 3.2542 seconds | 0.1629 seconds |
| NOT IN v2 | 4.5863 seconds | 0.1650 seconds |
| <=,OFFSET | 0.0204 seconds | 0.1076 seconds |
+-----------+------------------------+----------------------+
What's interesting is that the <= method sees better performance across the board, but actually gets better the more you keep, instead of worse.
Unfortunately for all the answers given by other folks, you can't DELETE and SELECT from a given table in the same query.
DELETE FROM mytable WHERE id NOT IN (SELECT MAX(id) FROM mytable);
ERROR 1093 (HY000): You can't specify target table 'mytable' for update
in FROM clause
Nor can MySQL support LIMIT in a subquery. These are limitations of MySQL.
DELETE FROM mytable WHERE id NOT IN
(SELECT id FROM mytable ORDER BY id DESC LIMIT 1);
ERROR 1235 (42000): This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'
The best answer I can come up with is to do this in two stages:
SELECT id FROM mytable ORDER BY id DESC LIMIT n;
Collect the id's and make them into a comma-separated string:
DELETE FROM mytable WHERE id NOT IN ( ...comma-separated string... );
(Normally interpolating a comma-separate list into an SQL statement introduces some risk of SQL injection, but in this case the values are not coming from an untrusted source, they are known to be integer values from the database itself.)
note: Though this doesn't get the job done in a single query, sometimes a more simple, get-it-done solution is the most effective.
DELETE i1.*
FROM items i1
LEFT JOIN
(
SELECT id
FROM items ii
ORDER BY
id DESC
LIMIT 20
) i2
ON i1.id = i2.id
WHERE i2.id IS NULL
If your id is incremental then use something like
delete from table where id < (select max(id) from table)-N
To delete all the records except te last N you may use the query reported below.
It's a single query but with many statements so it's actually not a single query the way it was intended in the original question.
Also you need a variable and a built-in (in the query) prepared statement due to a bug in MySQL.
Hope it may be useful anyway...
nnn are the rows to keep and theTable is the table you're working on.
I'm assuming you have an autoincrementing record named id
SELECT #ROWS_TO_DELETE := COUNT(*) - nnn FROM `theTable`;
SELECT #ROWS_TO_DELETE := IF(#ROWS_TO_DELETE<0,0,#ROWS_TO_DELETE);
PREPARE STMT FROM "DELETE FROM `theTable` ORDER BY `id` ASC LIMIT ?";
EXECUTE STMT USING #ROWS_TO_DELETE;
The good thing about this approach is performance: I've tested the query on a local DB with about 13,000 record, keeping the last 1,000. It runs in 0.08 seconds.
The script from the accepted answer...
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
Takes 0.55 seconds. About 7 times more.
Test environment: mySQL 5.5.25 on a late 2011 i7 MacBookPro with SSD
DELETE FROM table WHERE ID NOT IN
(SELECT MAX(ID) ID FROM table)
try below query:
DELETE FROM tablename WHERE id < (SELECT * FROM (SELECT (MAX(id)-10) FROM tablename ) AS a)
the inner sub query will return the top 10 value and the outer query will delete all the records except the top 10.
What about :
SELECT * FROM table del
LEFT JOIN table keep
ON del.id < keep.id
GROUP BY del.* HAVING count(*) > N;
It returns rows with more than N rows before.
Could be useful ?
Using id for this task is not an option in many cases. For example - table with twitter statuses. Here is a variant with specified timestamp field.
delete from table
where access_time >=
(
select access_time from
(
select access_time from table
order by access_time limit 150000,1
) foo
)
Just wanted to throw this into the mix for anyone using Microsoft SQL Server instead of MySQL. The keyword 'Limit' isn't supported by MSSQL, so you'll need to use an alternative. This code worked in SQL 2008, and is based on this SO post. https://stackoverflow.com/a/1104447/993856
-- Keep the last 10 most recent passwords for this user.
DECLARE #UserID int; SET #UserID = 1004
DECLARE #ThresholdID int -- Position of 10th password.
SELECT #ThresholdID = UserPasswordHistoryID FROM
(
SELECT ROW_NUMBER()
OVER (ORDER BY UserPasswordHistoryID DESC) AS RowNum, UserPasswordHistoryID
FROM UserPasswordHistory
WHERE UserID = #UserID
) sub
WHERE (RowNum = 10) -- Keep this many records.
DELETE UserPasswordHistory
WHERE (UserID = #UserID)
AND (UserPasswordHistoryID < #ThresholdID)
Admittedly, this is not elegant. If you're able to optimize this for Microsoft SQL, please share your solution. Thanks!
If you need to delete the records based on some other column as well, then here is a solution:
DELETE
FROM articles
WHERE id IN
(SELECT id
FROM
(SELECT id
FROM articles
WHERE user_id = :userId
ORDER BY created_at DESC LIMIT 500, 10000000) abc)
AND user_id = :userId
This should work as well:
DELETE FROM [table]
INNER JOIN (
SELECT [id]
FROM (
SELECT [id]
FROM [table]
ORDER BY [id] DESC
LIMIT N
) AS Temp
) AS Temp2 ON [table].[id] = [Temp2].[id]
DELETE FROM table WHERE id NOT IN (
SELECT id FROM table ORDER BY id, desc LIMIT 0, 10
)
Stumbled across this and thought I'd update.
This is a modification of something that was posted before. I would have commented, but unfortunately don't have 50 reputation...
LOCK Tables TestTable WRITE, TestTable as TestTableRead READ;
DELETE FROM TestTable
WHERE ID <= (
SELECT ID
FROM TestTable as TestTableRead -- (the 'as' declaration is required for some reason)
ORDER BY ID DESC LIMIT 1 OFFSET 42 -- keep this many records);
UNLOCK TABLES;
The use of 'Where' and 'Offset' circumvents the sub-query.
You also cannot read and write from the same table in the same query, as you may modify entries as they're being used. The Locks allow to circumvent this. This is also safe for parallel access to the database by other processes.
For performance and further explanation see the linked answer.
Tested with mysql Ver 15.1 Distrib 10.5.18-MariaDB
For further details on locks, see here
Why not
DELETE FROM table ORDER BY id DESC LIMIT 1, 123456789
Just delete all but the first row (order is DESC!), using a very very large nummber as second LIMIT-argument. See here
Answering this after a long time...Came across the same situation and instead of using the answers mentioned, I came with below -
DELETE FROM table_name order by ID limit 10
This will delete the 1st 10 records and keep the latest records.