I have an sql query to get values from 4 tables. In my query takes lot of time. I need to simplify the query
what i need is i have to display only 50 records. In my table i have 90,000 records. so i deciede to apply batch processing like
first select the 50 records from first table and then check with the 3 other tables.
if the 50 is satisfied i will display that otherwise i have to continue next 50.
But i don't have idea to implement
select file_name,
A.id,
A.reference,
user.username,
c.update_date
from A_Table A,
(select reference
from B_Table
where code = 'xxx'
group by reference
having count(*) > 1) B,
C_Table c,
D_Table d
where A.reference = B.reference
and A.id = c.id
and A.code = 'ICG'
and c.updated_by = d.user_id
order by 3
limit 20;
The query looks fine.
Adding some indexes will help a lot.
Assuming the id columns (A_Table.id and C_Table.id) are already PRIMARY KEY columns, you won't need to index them.
ALTER TABLE A_Table
ADD INDEX (reference),
ADD INDEX (code);
ALTER TABLE B_Table
ADD INDEX (reference),
ADD INDEX (code, reference);
ALTER TABLE C_Table
ADD INDEX (updated_by);
ALTER TABLE D_Table
ADD INDEX (user_id);
Related
I am trying to delete records from table with duplicate column values but it's taking forever. Basically it gets stuck and no response for hours. I have a significantly large table with over 1.3M records. Is the query inefficient? any wat to optimize it?
delete n1 from ids n1, ids n2 where n1.id > n2.id and n1.user_id = n2.user_id
Database is remote, and am using putty to run queries.
Add an index:
ALTER TABLE ids ADD INDEX (user_id, id);
This makes it efficient to find all the rows with the same user ID and higher IDs.
It will also help to join with a subquery.
DELETE n1
FROM ids AS n1
JOIN (SELECT user_id, MIN(id) AS minid
FROM ids
GROUP BY user_id) AS n2
ON n1.user_id = n2.user_id AND n1.id > n2.minid
This will still be faster with the above index.
yes, that query is very inefficient. Even if you used explicit joins you need to keep in mind that basically every row "N" is being matched up with every row before "N", and every row "N-1" is being matched up with the rows before it.
Try something like this:
DROP TEMPORARY TABLE IF EXISTS keeps;
CREATE TEMPORARY TABLE keeps (
user_id INT,
keepID INT,
INDEX (user_id, keepID)
)
INSERT INTO keeps (user_id, keepID)
SELECT user_id, MIN(id) As keepID
FROM ids
GROUP BY user_id;
DELETE FROM ids WHERE (user_id, id) NOT IN (SELECT user_id, keepID FROM keeps);
DROP TEMPORARY TABLE IF EXISTS keeps;
I'm also tempted to suggest trying something like the below, but I can't remember if MySQL allows subquerying the delete table in the delete query ... which is why I suggested the temp table in the first one.
DELETE a
FROM ids AS a
WHERE EXISTS (
SELECT *
FROM ids AS b
WHERE b.id < a.id
AND b.user_id = a.user_id
)
So I've got massive slow SQL query and I've narrowed it down to a slow sub-query, so I want to rewrite it to a JOIN. But I'm stuck... (due to the MAX and GROUP BY)
SELECT *
FROM local.advice AS aa
LEFT JOIN webdb.account AS oa ON oa.shortname = aa.shortname
WHERE aa.aa_id = ANY (SELECT MAX(dup.aa_id)
FROM local.advice AS dup
GROUP BY dup.shortname)
AND oa.cat LIKE '111'
ORDER BY aa.ram, aa.cpu DESC
LIMIT 0, 30
Here is a different version of your query where the subquery is converted with a join clause
select * from local.advice aa
JOIN webdb.account oa ON oa.shortname = aa.shortname
join(
select max(aa_id) as aa_id,shortname from local.advice
group by shortname
)x on x.aa_id = aa.aa_id
where
oa.cat = '111'
order by aa.ram, aa.cpu DESC
limit 0,30
Also you may need to apply indexes if they are not added already
alter table local.advice add index shortname_idx(shortname);
alter table webdb.account add index cat_shortname_idx(cat,shortname);
alter table local.advice add index ram_idx(ram);
alter table local.advice add index cpu_idx(cpu);
I am assuming aa_id is a primary key so did not add the index
Make sure to take a backup of the tables before applying the indexes
I am running a delete which removes all of the duplicates within a table. A duplicate is defined as a row where the tag_id, user_id, and is_self are all the same. My technique here is pretty standard, to preform this delete, since the tags_users table itself needs to be referenced to know if a duplicate exists a temp table is created so that a delete can be preformed from the same table that is being referenced. The problem is that this table is about a million rows so this query takes about an hour to run. I know this is related to the slow speed of defining this temp table and then referencing it as it is un-indexed.
DELETE FROM tags_users WHERE id IN (
SELECT id FROM (
SELECT A.id FROM tags_users as A, tags_users as B WHERE A.id > B.id AND A.user_id = B.user_id AND A.tag_id = B.tag_id AND A.is_self = B.is_self GROUP BY A.id
) temp_dup_delete
);
I have reviewed the explain from this query listed here (Please note I'm on mysql 5.5 so I'm using EXPLAIN SELECT 1 to simulate EXPLAIN DELETE). I think the best possible solution to this is to define an index on the temp table, but I cannot figure out how to do this yet. The crux of my question here is: is there a way to improve the speed of this query considering the way it defines a temp table. Thank you to anyone that can help.
Here is an alternative approach. Use an aggregation query to find the minimum id for each set of key values -- this seems to be the row you want to keep.
Then, use left outer join to match to this table and delete all the rows in the original data that do not match.
delete tu
from tags_users tu left outer join
(select tag_id, user_id, is_self, min(id) as minid
from tags_users
group by tag_id, user_id, is_self
) tui
on tui.id = tu.id
where tui.id is null;
I have table called scheduler. It contains following columns:
ID
sequence_id
schedule_time (timestamp)
processed
source_order
I need to delete duplicate rows from the table but keeping 1 row which has same schedule_time and source_order for a particular sequence_id where processed=0
DELETE yourTable FROM yourTable LEFT OUTER JOIN (
SELECT MIN(ID) AS minID FROM yourTable WHERE processed = 0 GROUP BY schedule_time, source_order
) AS keepRowTable ON yourTable.ID = keepRowTable.minID
WHERE keepRowTable.ID IS NULL AND processed = 0
I apply from this post ;P How can I remove duplicate rows?
Have you seen it?
--fixed version--
DELETE yourTable FROM yourTable LEFT OUTER JOIN (
SELECT MIN(ID) AS minID FROM yourTable WHERE processed = 0 GROUP BY schedule_time, source_order
) AS keepRowTable ON yourTable.ID = keepRowTable.minID
WHERE keepRowTable.minID IS NULL AND processed = 0
For mysql
DELETE a from tbl a , tbl b WHERE a.Id>b.Id and
a.sequence_id= b.sequence_id and a.processed=0;
The fastest way to remove duplicates - is definitely to force them out by adding an index, leaving only one copy of each left in the table:
ALTER IGNORE TABLE dates ADD PRIMARY KEY (
ID
sequence_id
schedule_time
processed
source_order
)
Now if you have a key, you might need to delete it and so on, but the point is that when you add a unique key with IGNORE to a table with duplicates - the bahavior is to delete all the extra records / duplicates. So after you added this key, you now just need to delete it again to be able to make new duplicates :-)
Now if you need to do more complex filtering (on witch one of the duplicates to keep that you can not just include in indexes - although unlikely), you can create a table at the same time as you select and input what you want in it - all in the same query:
CREATE TABLE tmp SELECT ..fields.. GROUP BY ( ..what you need..)
DROP TABLE original_table
ALTER TABLE tmp RENAME TO original_table_name
I have this query below:
SELECT a.id, b.item_name
FROM table_1 as a
INNER JOIN table_2 as b on a.item_id = b.item_id
There is index on a.bid as primary, and a index on a.item_id and there is an index on b.item_id as primary and index on b.item_name
However when I run the query through the EXPLAIN the primary table becomes table_1 and there is no index uses so its doing a full scan. Why wouldn't it join the index for b.item_id?
It needs to do a full scan on at least one table, as you are requesting all records. It sounds like it picked table_1. The index should be used on table_2.
If you additionally add a where clause, you can avoid that table scan. But if you actually need all rows, then a scan is the quickest way to get them.
It is doing a table scan because you ask for all rows that match. Add a WHERE a.id = <any id> and it should use the index.