MySQL database has only one table products. Table record contains 1 integer autoincrement primarykey, 1 integer modified_time field and 10 varchar fields.
Two times a day CRON launches process that receives 700.000 XML records with new/updated/old products from other server. Usually there are about 100.000 updated on new products a day (other 600.000 dont change).
The question is what way will be faster
1) Make something like DROP TABLE then recreate same table or DELETE * FROM products. Then INSERT everything we receive (700k records).
2) Start loop on every XML record, compare modified_time field and UPDATE if XML modified_time is newer than modified_time from database.
As I understand second way will lead to 700k SELECT queries, 100k UPDATE queries and some DELETE queries.
So what way will be faster? Thanks in advance
Related
I'm working on automating the process of building a database. This is a database that needs daily updates after one build.
This database has 51 tables, divided into 3 schemas (there are 17 tables in each schema), and has a total of 20 million records, each record with a PK of manage_number .
I need to update 2000~3000 records every day, but I don't know which method to use.
Make a table for PK indexing
This is a method to separately create a table with a PK and a table name with a PK in the same database. To create a table with metadata about which table the manage_number is stored in. Currently, this methodology is applied. The problem is that the build time takes 5-6 times longer than before (increased from 2 minutes to 12 minutes).
Multi-table update query
This is how to perform update query on 17 tables with the same schema. However, in this case, the load in the FROM clause is expected to be very high, and I think it will be a burden on the server.
Update query may looks like below.
UPDATE table1, table2, table3, ..., table17
SET data_here
WHERE manage_number = 'TARGET_NUMBER';
Please share which way is better, or if you have a better way.
Thank you.
i have a table with 2255440 records,
a cron job works every minute and inserts upto 50-100 records on every execution
inserts are working fine
the problem is that there is another cron job which is also running every minute.. this cron job updates these records according to the data recieved from other server
the problem is that the update query is taking around 6 - 7 seconds per update query
this is the table information and update query example
records are updated with this query
Query:
UPDATE `$month`
SET `acctstoptime`='$data->acctstoptime',
`acctsessiontime`='$data->acctsessiontime',
`acctinputoctets`='$data->acctinputoctets',
`acctoutputoctets`='$data->acctoutputoctets',
`acctterminatecause`='$data->acctterminatecause'
WHERE `radacctid`=$data->radacctid
Is there a single-column index on column of 'radacctid'?
If not you should create one.
CREATE INDEX:
Indexes are used to retrieve data from the database more quickly than
otherwise. The users cannot see the indexes, they are just used to
speed up searches/queries.
Syntax:
CREATE INDEX [index name] ON [table name]([column name]);
Arguments
Name Description
index name Name of the index.
table name Name of the table.
column name Name of the column.
Example
Code:
CREATE INDEX radacctid ON table_name(radacctid);
I have a very large staging table that I want to process a few rows at a time into an indexed table.
As the time to write indexes results in longer than desired locks on the target table I usually do this a few 100k rows at a time. I pick the rows using order by a unique column as well as Limit and Offset to pick a value to repeatedly churn away on the staging table.
SELECT unique_id INTO #cut_off FROM staging_X ORDER BY unique_id;
START TRANSACTION;
INSERT INTO my_indexed_table ([columns])
SELECT columns FROM staging_X where unique_id <= #cut_off;
DELETE FROM my_indexed_table WHERE unique_id <= #cut_off;
COMMIT;
I've done this for a couple of tables successfully, but am now faced with the largest table in my list. This one has more than 100 million rows. It is created by Apache Spark, so I have no control over setting up partitions or anything.
I've been wondering if I can just use LIMIT with a constant value on both the INSERT and DELETE queries without trying to sort the data. But I cannot find anything that states that the rows will be returned in a reliably repeatable order.
For reference I am using MySQL 5.7 and INNODB tables.
Update
On request the data is something like this:
uuid - Text
timestamp1 - Bigint - unixtime
timestmap2 - Bigint - unixtime
timestmap3 - Bigint - unixtime
timestmap4 - Bigint - unixtime
url - Text
metric1 - Int
metric2 - Int
There are about 30 million rows per day and I can only process this weekly. I can throttle the provision of the data create multiple tables (I cannot create partitions) with a limited row count each but ideally, I'd just like to be able to reliably get the first N rows, insert them elsewhere and delete them, without trying to sort the data.
I have 2 database servers with identical schema.
Basically, I have a crontab job (every minute) to fetch all new or updated records within last 2 minutes from database 1 to database 2.
Database 2 is a Mysql server.
(Database 1 is not mysql server.)
To make sure I have all the data, I use "replace-into" statement.
REPLACE INTO myTable (ID, Data, CreateTime, UpdateTime) VALUES ....
Normally, it works well.
However, there are too many connections to database 1. Occasionally, a fetching takes over a minute due to connection problem, which jammed with the fetching in the next minute. The data batch which is fetched in time T was put into database 2 later than the batch in time T+1. Hence, a record with older data overwrite the newer one.
I want these to happen:
If the record is not exist in database 2, insert the records.
If the record exists in database 2, update the records only if createTime >= existing createTime or updateTime >= existing updateTime. (*If both createTime and updateTime are same as the existing ones, it is ok to overwrite the current records.)
The above replace-into can be rewritten into INSERT...ON DUPLICATE UPDATE, but my main problem is I need a where-condition to control the updating.
Thanks in advance.
I have a table with about 35 million rows. each has about 35 integer values and one time value (last updated)
The table has two indexes
primary - uses two integer values from the table columns
Secondary - uses the 1st integer from the primary + another integer value.
I would like to delete old records (about 20 millions of them) according to the date field.
What is the fastest way:
1. Delete as is according the the date field?
2. Create another index by date and then delete by date.
There will be one time deletion of large portion of the data and then incremental weekly deletion of much smaller parts.
Is there another way to do it more efficiently?
it might be quicker to create a new table containing the rows you want to keep, drop the old table and then rename the new table
For weekly deletions an index on date field would speed things up.
Fastest (but not easiest) - i think - is to keep your records segmented into multiple
tables based on date, e.g. given week, and then have a union table of all those tables for the regular queries across the whole thing (so your queries would be unaltered). You would each week, create new tables and redefine the union table.
When you wish to drop old records, you simply recreate the union table to leave the records in the old tables out, and then drop those left out (remember to truncate before you drop depending on you filesystem). This is probably the fastest way to get there with MySQL.
A mess to manage though :)