I need to update a table except its top 1000 records. My query is like this :
UPDATE tableA
SET price = 100
WHERE price = 200 AND
item_id =12 AND
status NOT IN (1,2,3);
I know the subquery approcah will work here but I am having a huge table in which 200000 records satisfy the WHERE condition and it is growing. So I think if I follow the sub query approach, it will will not scale as the DB grows.
Also I have seen specifying LIMIT in UPDATE query but it is up to a certain limit. In my case it is after a certain offset and should update all the records.
Also it is possible to find the total count and specify it with LIMIT. But the COUNT() query is failing.
You can use user defined variable:
SET #X = (SELECT ID FROM tableA
WHERE price = 200 AND item_id =12 AND status NOT IN (1,2,3)
ORDER BY ID LIMIT 1000,1 );
UPDATE tableA SET price = 100
WHERE price = 200 AND item_id =12 AND status NOT IN (1,2,3)
AND ID>#X;
Yes, you will need some way how to define "what is first N rows". User defined variables gives you just more options how to do it. And If you can not do it effectively in some select query, you will need to think about some way how to rework such table. Maybe different indexing approach, splitting the table, caching some values, etc.
I am not sure if this would be a right solution but if you have a unique ID column in your table; lets say ID for example then you can place the predicate very easily saying WHERE ID > 1000. Which will consider only rows from 1001th position like
UPDATE tableA SET price = 100
WHERE price = 200
AND item_id = 12
AND ID > 1000
Related
I'm trying to use "keyset pagination", on this no problem, I do my query and save the last id found for the next one.
My doubt was how to reset the count, to return to 0.
Currently, I am running an additional query every time to check if the id I saved is equal to the SELECT MAX(id) FROM users of the table and in that case update the saved id as 0, otherwise update it keeping the correct count..
Is there a better way?
I was thinking something like (it's a "pseudo-sql" just to show my idea):
SELECT 0 OR MAX(id) FROM users_table WHERE (SELECT MAX(id) FROM users_table) =/!= :actual_count
Update
Perhaps it is better to use an example:
Suppose I have 1000 entries in my table and I browse these 100 entries per request to an endpoint.
INSERT INTO util_table (`key`, `value`) VALUES ("last_visited_id", 0)
SELECT * FROM users
WHERE id >= (SELECT `value` FROM util_table WHERE `key` = "last_visited_id")
ORDER BY id ASC
LIMIT 100
After this query, I update the value of the last_visited_id key in the util table.
So that the second time, I can continue counting from where I left off (100 to 200).
Now let's say I redo the query a tenth time so that I end up with rows from 900 to 1000.
The eleventh time and further, if I just kept saving the id value (1000..1100..1200..etc..), would give an empty result.
And with this back to my question, what is the best method to reset that key to 0?
If you are, say, display 10 items per 'page', SELECT 11 each time.
Then observe how many rows were returned by the Select:
<= 10 -- That's the 'last' page.
11 -- There are more page(s). Show 10 on this page; fetch 10 for the next page (that will include re-fetching the 11th).
There is no need to fetch MAX(id).
More discussion: http://mysql.rjweb.org/doc.php/pagination
I want to remove duplicates based on the combination of listings.product_id and listings.channel_listing_id
This simple query returns 400.000 rows (the id's of the rows I want to keep):
SELECT id
FROM `listings`
WHERE is_verified = 0
GROUP BY product_id, channel_listing_id
While this variation returns 1.600.000 rows, which are all records on the table, not only is_verified = 0:
SELECT *
FROM (
SELECT id
FROM `listings`
WHERE is_verified = 0
GROUP BY product_id, channel_listing_id
) AS keepem
I'd expect them to return the same amount of rows.
What's the reason for this? How can I avoid it (in order to use the subselect in the where condition of the DELETE statement)?
EDIT: I found that doing a SELECT DISTINCT in the outer SELECT "fixes" it (it returns 400.000 records as it should). I'm still not sure if I should trust this subquery, for there is no DISTINCT in the DELETE statement.
EDIT 2: Seems to be just a bug in the way phpMyAdmin reports the total count of the rows.
Your query as it stands is ambiguous. Suppose you have two listings with the same product_id and channel_id. Then what id is supposed to be returned? The first, the second? Or both, ignoring the GROUP request?
What if there is more than one id with different product and channel ids?
Try removing the ambiguity by selecting MAX(id) AS id and adding DISTINCT.
Are there any foreign keys to worry about? If not, you could pour the original table into a copy, empty the original and copy back in it the non-duplicates only. Messier, but you only do SELECTs or DELETEs guaranteed to succeed, and you also get to keep a backup.
Assign aliases in order to avoid field reference ambiguity:
SELECT
keepem.*
FROM
(
SELECT
innerStat.id
FROM
`listings` AS innerStat
WHERE
innerStat.is_verified = 0
GROUP BY
innerStat.product_id,
innerStat.channel_listing_id
) AS keepem
As the title states, I'm trying to count the number of matches within the last 100 records in a certain table.
This query works, but the data is very dynamic, with lots of inserts on that particular table, and similar queries are being run, and they all end up being extremely slow (20s) probably blocking each other out.
Because caching the result is not acceptable (data has to be live) I'm thinking of switching the exterior query to a PHP, even though I know that would be slower because it would still be faster than 20s.
Here's the query
SELECT count(*) as matches
FROM (
SELECT first_name FROM customers
WHERE division = 'some_division'
AND type = 'employee'
ORDER BY request_time DESC
LIMIT 0, 100
) as entries
WHERE first_name = 'some_first_name_here'
What I'm looking for a more optimized way of performing the same task, without having to implement it in PHP since that's the naive/obviously wrong approach.
the table looks something like this:
id first_name last_name type division request_time
Just to set things straight, this is obviously not the actual table / data due to NDA reasons, but, the table looks exactly the same with different column names.
So again, what I'm trying to achieve is to pull a count of matches found WITHIN the last 100 records which have some contraints.
for example,
how many times does the name 'John' appear within the last 100 employees added in the HR division?
I see.
How about something like this...
SELECT i
FROM
( SELECT CASE WHEN first_name = 'some_first_name_here' THEN #i:=#i+1 END i
FROM customers
WHERE division = 'some_division'
AND type = 'employee'
, (SELECT #i:=0)val
ORDER
BY request_time
DESC
LIMIT 0,100
) n
ORDER
BY i DESC
LIMIT 1;
Try this:
SELECT SUM(matches)
FROM
(
SELECT IF(first_name = 'some_first_name_here', 1, 0) AS matches
FROM customers
WHERE division = 'some_division' AND type = 'employee'
ORDER BY request_time DESC
LIMIT 0,100
) AS entries
I have a column called is_thumbnail with a default value of no. I select a row based off of is_thumbnail = 'yes'
$query = $conn->prepare("SELECT * FROM project_data
WHERE project_id = :projectId
AND is_thumbnail = 'yes'
LIMIT 1");
There is a chance that no rows will have a value of yes. In that case, I want to select the first row with that same projectId regardless of the value of is_thumbnail
Now I know I can see what the query returns and then run another query. I was wondering if it was possible to do this in a single query or is there somehow I can take advantage of PDO? I just started using PDO. Thanks!
Example data:
id project_id image is_thumbnail
20 2 50f5c7b5b8566_20120803_185833.jpg no
19 2 50f5c7b2767d1_4link 048.jpg no
18 2 50f5c7af2fb22_4link 047.jpg no
$query = $conn->prepare("SELECT * FROM project_data
WHERE project_id = :projectId
ORDER BY is_thumbnail ASC LIMIT 1");
Given that the schema described in the question shows multiple rows for a given project_id, using only ORDER BY is_thumbnail ... solutions may not yield good performance, if there is for instance a single project with many related rows. The cost of sorting rows can potentially be fairly high, and it won't be able to use an index. An alternate solution which may be necessary is:
SELECT * FROM (
SELECT *
FROM project_data
WHERE project_id = :projectId AND is_thumbnail = "yes"
ORDER BY id DESC
LIMIT 1
UNION
SELECT *
FROM project_data
WHERE project_id = :projectId AND is_thumbnail = "no"
ORDER BY id DESC
LIMIT 1
) AS t
ORDER BY t.is_thumbnail = "yes" DESC
LIMIT 1
While this solution is a bit more complex to understand, it is able to use a compound index on (project_id, is_thumbnail, id) to quickly find exactly one row matching the requested conditions. The outer select ensures a stable ordering of the yes/no rows if both are found.
Note that you could also just issue two queries, and probably get similar or better performance. In order to use the above UNION and sub-select, MySQL will require temporary tables, which aren't great in busy environments.
Which of the following notations is better?
SELECT id,name,data FROM table WHERE id = X
OR
SELECT id,name,data FROM table WHERE id = X LIMIT 1
I think it should not have "LIMIT".
Thanks!
If there is a unique constraint on id then they will be exactly the same.
If there isn't a unique constraint (which I would find highly surprising on a column called id) then which is better depends on what you want to do:
If you want to find all rows matching the condition, don't use LIMIT.
If you want to find any row matching the condition (and you don't care which), use LIMIT 1.
Always use LIMIT with select statement even if you are fetching 1 record because it will speed up your query. So use :
SELECT id,name,data FROM table WHERE id = X LIMIT 1
For example :
If there are 1000 records in your table than if you using
SELECT id,name,data FROM table WHERE id = X
than it will traverse through 1000 records even if finds that id
But if you using LIMIT like this
SELECT id,name,data FROM table WHERE id = X LIMIT 1
than it will stop executing when finds first record.