Faster SQL query on two table matching - mysql

I have two tables that I am trying to perform matching on with the following query:
select * from task1,task2
where task1.From_Number=task2.To_Number
and task1.Start_Time<task2.Start_Time;
It will work eventually but is taking forever. The tables have 33 columns and one has around 45k rows and the other 500k rows. There are duplicates in various columns and no column is unique so there isn't a primary key. The tables were imported from spreadsheets.
There are a bunch of phone call logs and as mentioned, there are several duplicates in each column. What can I do to get this query to run faster? I am only matching against a few columns but need to print all columns and output the result into a csv.

The best thing you can do is create an non-unique index on both columns in both tables.
Read the MySQL documentation on creating an index.
Something like:
create index task1_idx
on task1
( From_Number
, Start_Time
)
And:
create index task2_idx
on task2
( To_Number
, Start_Time
)

Related

MYSQL: how to speed up an sql Query for getting data

I am using Mysql database.
I have a table daily_price_history of stock values stored with the following fields. It has 11 million+ rows
id
symbolName
symbolId
volume
high
low
open
datetime
close
So for each stock SymbolName there are various daily stock values. And the data is now more than 11 million rows,
The following sql try to get the last 100 days of daily data for a set of 1500 symbols
SELECT `daily_price_history`.`id`,
`daily_price_history`.`symbolId_id`,
`daily_price_history`.`volume`,
`daily_price_history`.`close`
FROM `daily_price_history`
WHERE (`daily_price_history`.`id` IN
(SELECT U0.`id`
FROM `daily_price_history` U0
WHERE (U0.`symbolName` = `daily_price_history`.`symbolName`
AND U0.`datetime` >= 1598471533546))
AND `daily_price_history`.`symbolName` IN (A,AA, ...... 1500 symbols Names)
I have the table indexed on symbolName and also datetime
For getting 130K (i.e 1500 x 100 ~ 150000) rows of data it takes 20 secs.
Also i have weekly_price_history and monthly_price_history tables, and I try to run the similar sql, they take less time for the same number (130K) of rows, because they have less data in the table than daily.
weekly_price_history getting 150K rows takes 3s. The total number of rows in it are 2.5million
monthly_price_history getting 150K rows takes 1s. The total number of rows in it are 800K
So how to speed up the thing when the size of table is large.
As a starter: I don't see the point for the subquery at all. Presumably, your query could filter directly in the where clause:
select id, symbolid_id, volume, close
from daily_price_history
where datetime >= 1598471533546 and symbolname in ('A', 'AA', ...)
Then, you want an index on (datetime, symbolname):
create index idx_daily_price_history
on daily_price_history(datetime, symbolname)
;
The first column of the index matches on the predicate on datetime. It is not very likley, however, that the database will be able to use the index to filter symbolname against a large list of values.
An alternative would be to put the list of values in a table, say symbolnames.
create table symbolnames (
symbolname varchar(50) primary key
);
insert into symbolnames values ('A'), ('AA'), ...;
Then you can do:
select p.id, p.symbolid_id, p.volume, p.close
from daily_price_history p
inner join symbolnames s on s.symbolname = p.symbolname
where s.datetime >= 1598471533546
That should allow the database to use the above index. We can take one step forward and try and add the 4 columns of the select clause to the index:
create index idx_daily_price_history_2
on daily_price_history(datetime, symbolname, id, symbolid_id, volume, close)
;
When you add INDEX(a,b), remove INDEX(a) as being no longer necessary.
Your dataset and query may be a case for using PARTITIONing.
PRIMARY KEY(symbolname, datetime)
PARTITION BY RANGE(datetime) ...
This will do "partition pruning": datetime >= 1598471533546. Then the PRIMARY KEY will do most of the rest of the work for symbolname in ('A', 'AA', ...).
Aim for about 50 partitions; the exact number does not matter. Too many partitions may hurt performance; too few won't provide effective pruning.
Yes, get rid of the subquery as GMB suggests.
Meanwhile, it sounds like Django is getting in the way.
Some discussion of partitioning: http://mysql.rjweb.org/doc.php/partitionmaint

how to compare huge table of mysql

I have a huge table of mysqlwhich contains more than 33 million records .How I could compare my table to found non duplicate records , but unfortunately select statement doesn't work. Because it's huge table.
Please provide me a solution
First, Create a snapshot of your database or the tables you want to compare.
Optionally you can also limit the range of data you want to compare , for example only 3 years of data. This way your select query won't hog all the resources.
Snapshot will be bunch of files each representing a table containg your primary key or business key for each record ( I am assuming you can compare data based on aforementioned key . If thats not the case record all the field in your file)
Next, read each records from the file and do a select against the corresponding table. If there are more than 1 record you know it is a duplicate
Thanks
Look at the explain plan and see if what the DB is actually doing for the NOT IN.
You could try refactoring, with an index on subscriber as Roy suggested if necessary. I'm not familiar enough with MySQL to know whether the optimizer will execute these identically.
SELECT *
FROM contracts
WHERE NOT EXISTS
( SELECT 1
FROM edms
WHERE edms.subscriber=contracts.subscriber
);
-- or
SELECT C.*
FROM contracts AS C
LEFT
JOIN edms AS E
ON E.subscriber = C.subscriber
WHERE E.subscriber IS NULL;

how to create table with unique rows out of two tables

i have two tables, with same columns. but there are duplicates. i want to create third one but without duplicate rows. what is the best way to do this keeping in mind that tables have over million records?
Table have two columns, ean and price
If both the tables have same structure then , I think you should try union.
Sample Query--
Assuming 2 table from which the data needs to be retrieved as
table_src_1
table_src_2
Final Table-table_unique_records
Your Query-
create table table_unique_records as
(select * from table_src_1
union
select * from table_src_2)

SQL query to retrieve primary key column from a table which is stored in multiple databases

I am working on a project related to a database. I want to find out the highest value from the primary key column of a same table (say tbrmenuitem) which is stored in multiple databases.
So, is it possible through one query or I do have to fire a different query at different times to make the connection with multiple databases? (that is, the first query to get the table name in the database, the second query to find the primary key of the table I got and then MAX() on the value of the primary key column?)
You can query tables in other databases on a server similar to how you would any other tables. You just need to qualify the table name with the name of the schema (database).
SELECT MAX(max) FROM (
SELECT MAX(id_column) AS max
FROM test2.test2table
UNION ALL
SELECT MAX(id_column) AS max
FROM test.test1table
) AS t
What this does, is selects the MAX() of a column from the table test2table in the test2 database.
SELECT MAX(id_column) AS max
FROM test2.test2table
It then performs a UNION on that result, with the result of a similar query performed on the test1table table in the test database.
UNION ALL
SELECT MAX(id_column) AS max
FROM test.test1table
This is then wrapped in a subquery which pulls the maximum value of each of the results returned from the UNION.

There is a way to index information on different tables in MySQL

My MySql schema looks like the following
create table TBL1 (id, person_id, ....otherData)
create table TBL2 (id, tbl1_id, month,year, ...otherData)
I am querying this schema as
select * from TBL1 join TBL2 on (TBL2.tbl1_id=TBL1.id)
where TBL1.person_id = ?
and TBL2.month=?
and TBL2.year=?
The current problem is that there is about 18K records on TBL1 associated with some person_id and there is also about 20K records on TBL2 associated with the same values of month/year.
For now i have two indexes.
index1 on TBL1(person_id) and other on index2 on TBL2(month,year)
when the database runs the query it uses index1 (ignoring month and year params) or index2 (ignoring person_id param). So, in both cases it scans about 20K records and doesn't perform as expected.
There is any way for me to create a single index on both tables or tell to mysql to merge de index on querying?
No, an index can belong to only one table. You will need to look at the EXPLAIN for this query to see if you can determine where the performance issue is coming from.
Do you have indexes on TBL2.tbl1_id and TBL1.id?
No. Indexes are on single tables.
You need compound indices on both table, to include the join column. If you add "ID" to both indices, the query optimizer should pick that up.
Can you post an "EXPLAIN"?