My data scheme is really simple, let s say it's about farms
tableA is the main one, with an
important field "is_active" assuming
the farm is trusted (kind of)
tableB is a data storage of
serialized arrays about farms
statistics
I want to retrieve all data about active farm so I just do something like that:
SELECT * FROM tableA LEFT JOIN tableB ON id_tableA=id_tableB WHERE is_active=1 ORDER BY id_tableA DESC;
Right now the query takes 15 sec to execute straight from a sql shell, for example it I want to retrieve all data from the tableB, like :
SELECT * FROM tableB ORDER BY id_tableB DESC;
it takes less than 1 sec (approx 1200 rows)...
Any ideas how to improve the original query ?
thx
Create indexes on the keys joing two tables..
check this link, how to create indexes in mysql:
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
You'll have to create an index.
You could create the following index:
mysql> create index ix_a_active_id on tableA (id_tableA, is_active);
mysql> create index ix_b_id on tableB (id_tableB);
This first creates an index on BOTH the id + is active variable.
The second creates an index on the id for tableB.
Related
I have a MySQL database with just 1 table:
Fields are: blocknr (not unique), btcaddress (not unique), txid (not unique), vin, vinvoutnr, netvalue.
Indexes exist on both btcaddress and txid.
Data in it looks like this:
I need to delete all "deletable" record pairs. An example is given in red.
Conditions are:
txid must be the same (there can be more than 2 records with same txid)
vinvoutnr must be the same
vin must be different (can have only 2 values 0 and 1, so 1 must be 0 other must be 1)
In a table of 36M records, about 33M records will be deleted.
I've used this:
delete t1
from registration t1
inner join registration t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
It works but takes 5 hours.
Maybe this would work too (not tested yet):
delete t1
from registration as t1, registration as t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
Or do I forget about a delete query and try to make a new table with all non-delatables in and then drop the original ?
Database can be offline for this delete query.
Based on your question, you are deleting most of the rows in the table. That is just really expensive. A better approach is to empty the table and re-populate it:
create table temp_registration as
<query for the rows to keep here>;
truncate table registration;
insert into registration
select *
from temp_registration;
Your logic is a bit hard to follow, but I think the logic on the rows to keep is:
select r.*
from registration r
where not exists (select 1
from registration r2
where r2.txid = r.txid and
r2.vinvoutnr = r.vinvoutnr and
r2.vin <> r.vin
);
For best performance, you want an index on registration(txid, vinvoutnr, vin).
Given that you expect to remove the majority of your data it does sound like the simplest approach would be to create a new table with the correct data and then drop the original table as you suggest. Otherwise ADyson's corrections to the JOIN query might help to alleviate the performance issue.
I try to inner join multiple tables (table_A, table_B and table_C) with a table_X. The table_X is selected from anther table (table_Y) using LIKE. table_X takes a very long time to create. How do I do the task efficiently?
Currently, I do the following query for table_A. And repeat the process for table_B and Table_C.
SELECT * FROM
Table_A INNER JOIN
(SELECT ID FROM table_Y where ID LIKE "%keyword%") as table_X
USING (ID)
Since table_X takes a lot of time to create, I would like to select from table_A, table_B and table_C in one query. How do I do it?
Several things to note:
My expected result is three separate tables, not one combined table.
I do not have permission to create a temporary table in the database.
A query returns a result set not a table, and a query can only return single result set.
You will need three separate queries to get your desired results.
If your core goal is to reduce the cost of your table_X subquery, you could create a temporary table to store the results of the table_X subquery, and then join to the that table for your queries with table_A, table_B, and table_C.
Edit, things to keep in mind:
True TEMPORARY tables are only visible by the connection in which they are created, and will be automatically dropped when the connection is closed; but will still be persist if a connection is reused (from a connection pool for a example), so it is still good practice to drop them explicitly. True temporary tables also have limits on how they can be used, most noticeably that they can only be referenced once in any given query (no self joins, or joins to multiple references, or unions that have multiple parts referencing the same table).
Assuming you have the proper permissions, you can create normal tables that you intend to drop when finished; but care must be taken because such tables can generally be seen by all connections and a disconnect will not "clean up" such tables. They can perform better, and do not have the limitations of true temporary tables, but you need to weigh the risks vs the benefits.
If you do not have any create table permissions, most of your data processing is happening client side, and you do not expect enormous results from the costly subquery, you could collect the subquery results first and use them in dynamic construction of the later queries.
very pseudo code:
query: SELECT ID FROM table_Y WHERE [expensive condition(s)];
code: convert ID values received into a comma separated list
query: SELECT [stuff] FROM Table_A WHERE ID IN ([ID values from expensive query]);
query: SELECT [other_stuff] FROM Table_B WHERE ID IN ([ID values from expensive query]);
query: SELECT [more_stuff] FROM Table_C WHERE ID IN ([ID values from expensive query]);
I have a huge table of mysqlwhich contains more than 33 million records .How I could compare my table to found non duplicate records , but unfortunately select statement doesn't work. Because it's huge table.
Please provide me a solution
First, Create a snapshot of your database or the tables you want to compare.
Optionally you can also limit the range of data you want to compare , for example only 3 years of data. This way your select query won't hog all the resources.
Snapshot will be bunch of files each representing a table containg your primary key or business key for each record ( I am assuming you can compare data based on aforementioned key . If thats not the case record all the field in your file)
Next, read each records from the file and do a select against the corresponding table. If there are more than 1 record you know it is a duplicate
Thanks
Look at the explain plan and see if what the DB is actually doing for the NOT IN.
You could try refactoring, with an index on subscriber as Roy suggested if necessary. I'm not familiar enough with MySQL to know whether the optimizer will execute these identically.
SELECT *
FROM contracts
WHERE NOT EXISTS
( SELECT 1
FROM edms
WHERE edms.subscriber=contracts.subscriber
);
-- or
SELECT C.*
FROM contracts AS C
LEFT
JOIN edms AS E
ON E.subscriber = C.subscriber
WHERE E.subscriber IS NULL;
My MySql schema looks like the following
create table TBL1 (id, person_id, ....otherData)
create table TBL2 (id, tbl1_id, month,year, ...otherData)
I am querying this schema as
select * from TBL1 join TBL2 on (TBL2.tbl1_id=TBL1.id)
where TBL1.person_id = ?
and TBL2.month=?
and TBL2.year=?
The current problem is that there is about 18K records on TBL1 associated with some person_id and there is also about 20K records on TBL2 associated with the same values of month/year.
For now i have two indexes.
index1 on TBL1(person_id) and other on index2 on TBL2(month,year)
when the database runs the query it uses index1 (ignoring month and year params) or index2 (ignoring person_id param). So, in both cases it scans about 20K records and doesn't perform as expected.
There is any way for me to create a single index on both tables or tell to mysql to merge de index on querying?
No, an index can belong to only one table. You will need to look at the EXPLAIN for this query to see if you can determine where the performance issue is coming from.
Do you have indexes on TBL2.tbl1_id and TBL1.id?
No. Indexes are on single tables.
You need compound indices on both table, to include the join column. If you add "ID" to both indices, the query optimizer should pick that up.
Can you post an "EXPLAIN"?
I have 2 Innodb tables. IN TableA I have a column (guidNew) that I want to assign its' values to a column in TableB (owner) depending on the relation between the column in TableA (guid) and TableB (owner).
Basically Tabl6eB (owner) has multiple entries that correspond to one TableA (guid). This is a Many to One relation. I want to change the TableB(owner) value to the new TableA(guidNew) values.
This is an example of the query:
UPDATE `TableB`, `TableA`
SET
`TableB`.`owner` = `TableA`.`guidNew`
WHERE `TableB`.`guid` != 0
AND `TableB`.`owner` = `TableA`.`guid`;
Now I do not know if this is working or not because there are more than 2 million entries. Is there a way to know the progress it has AND more important, a way to do it faster.
Make sure that you have indexed the guid and owner columns.
Try using the EXPLAIN command to see how the query is being performed
EXPLAIN SELECT TableB.owner, TableA.guidNew
FROM TableB, TableA
WHERE TableB.guid != 0
AND TableB.owner = TableA.guid