How do I migrate and sync up a table efficiently?

How do I migrate and sync up a table efficiently? - mysql

I have a table A, it has millions of records and it's growing. A new column needs be added to table A with indexing, but it could be headache to migrate such a large table. So table B is created at some point from table A, question is how to sync up these 2 tables efficiently?
There're multiple scenarios new records will be added to table A.

To 'Sync' up two tables without actually merging them you can create a UNION VIEW. A VIEW can be used just like a table for calculation, manipulation, data storage etc. This is assuming that both tables have the same amount of rows, if not you'll need to create primary and foreign keys.
CREATE OR REPLACE VIEW viewname AS
SELECT * FROM TABLE_A
UNION ALL
SELECT * FROM TABLE_B
Now if both tables don't share the same amount of rows you'll need at least 1 field in common between the two tables called primary and foreign keys, to join the tables using the primary and foreign keys you'll need to use a JOIN like this:
CREATE OR REPLACE VIEW viewname AS
SELECT TableA.FieldName, TableB.FieldName, TableA.FieldName
FROM TableA
LEFT JOIN TableB
ON TableA.primarykeyField = TableB.foreignkeyField
UNION ALL
SELECT TableA.FieldName, TableB.FieldName, TableA.FieldName
FROM TableA
RIGHT JOIN TableB
ON TableA.primarykeyField = TableB.foreignkeyField
It depends on what type of join you want but I think FULL JOIN will give you the best results, FULL JOIN's aren't supported in MySQL but using LEFT JOIN RIGHT JOIN and UNION ALL mimics the same results.
Or if you simply want to copy all the records from table A to table B you could use this.
INSERT INTO TableB
SELECT * FROM TableA;

Related

Most efficient way to select data from one sql table and see if it matches data on another table in the same database

I have a database with 2 tables, both tables have around 200,000 records.
Lets call these tables, TableA and TableB
Currently I have a function that triggers a select query, this query grabs all records in TableA that match a condition. Once I have that data, I have a foreach loop that uses the data from TableA to see if it matches any record in TableB.
The problem is that it takes a while to do this because there are so many records. I know the way Im doing it works because it does what its supposed to but it takes a good 3 minutes to finish the script. Is there a faster more efficient way to do something like this?
Thank you in advance for the help.
PS: I'm using PHP.

The most efficient way to achieve what you want is to:
1. Create a primary key column for each table (if you do not already have one). Example schema where column "id" is a unique identifier for the table row:
TableA
id firstname lastname
1 Michael Douglas
2 Michael Jackson
TableB
id table_a_id pet
1 1 cat
2 2 ape
3 1 dog
Google or search here on stackoverflow on how to create or add a primary key for a mysql table column. An example of creating TableA with a primary key:
CREATE TABLE `TableA` (
`id` int(11) unsigned AUTO_INCREMENT,
`firstname` varchar(100),
`lastname` varchar(100),
PRIMARY KEY (`id`)
)
2. Create an SQL-query to fetch what you need. For example:
To get all rows with at least one match in BOTH tables:
SELECT TableA.id, TableA.firstname, TableA.lastname, TableB.pet
FROM TableA
INNER JOIN TableB
ON TableA.id = TableB.table_a_id;
To instead get all rows from TableA, and only the matching rows from TableB:
SELECT TableA.id, TableA.firstname, TableA.lastname, TableB.pet
FROM TableA
LEFT JOIN TableB
ON TableA.id=TableB.table_a_id;

The answer to your question ultimately depends on what you mean by "if it matches."
Let's assume, for a moment, that you have primary keys on each of these tables, TableA an TableB, and that you're NOT matching those. But that you have one or more other columns, the actual data that you're storing in each row, which you are considering for your matching. Let's call those ColA and ColB.
In that case you could use:
SELECT TableA.id, TableB.id, TableA.ColA, TableB.ColB
FROM TableA
LEFT JOIN TableB
ON (TableA.ColA = TableB.ColA)
AND (TableB.ColB = TableB.ColB);
... notice that we're using a complex expression on which to JOIN. You'd want to add an AND (TableA.XXX = TableB.XXX) for each columned that you want to consider significant in your matching.
Of course I'm assuming that these tables don't share a common surrogate key (otherwise MicKri's JOIN would be simpler ... or a "NATURAL JOIN" would be even simpler still).
What you're doing, conceptually, is defining a pair of (mathematical) sets an finding the intersection between them. The complication of doing this in SQL is that real world tables often have these extra columns (surrogate primary keys, and foreign keys) which aren't attributes of the underlying entities ... but which serve to map relationships among them.
In my example I'm just showing a way to formulate a JOIN query that finds the intersection based only on the attributes that are significant for your purposes.
(By the way, the parentheses in my example are there for human legibility. They should not be required by your SQL engine ... though they don't hurt, either).
Here's one of a number of visual explanations of SQL JOINs that's handy for learning this sort of thing. An INNER JOIN is an intersection. The ON and WHERE clauses define the subsets of the data (columns and rows, respectively) which are to be related.

Deleting multiple records from a very big table (query takes forever)

I am using the following query to delete multiple records except one from my table. It works well with small tables, but it got stuck when I tried it with a table that has >130000 records. The thing is, I don't even get an error. phpMyAdmin just gets stuck and the query ("loading... yellow line) basically takes forever.
My table structure
person_id (AI & PK)
person_name ( I want to delete multiple person_name records except one)
query
DELETE t2
FROM `person` t1
INNER JOIN `person` t2
ON t1.person_name = t2.person_name
AND t1.person_id < t2.person_id;
UPDATE : I don't have an index on person table. But my three other tables (person_job & person_image, book_who_wrote_it) contains foreign keys from person table (person_id)

First, do you have an index on person(person_name, person_id)? That would be the place to start.
Deleting lots of rows incurs overhead. Often, it is faster to put the results in another table and reinsert them:
create temporary table tmp_person as
select p.*
from person p join
(select person_name, max(person_id) as max_person_id
from person
) pp
on p.person_id = pp.max_person_id;
truncate table person;
insert into person
select * from tmp_person;
Be sure you validate tmp_person before truncating person! Truncate does not log the deletion of each row, so it is much, much, much faster than delete under most circumstances.
NOTE:
If you really only have two columns in person, then you can simplify the first query to:
create temporary table tmp_person as
select person_name, max(person_id) as max_person_id
from person;

try this
DELETE
FROM `person` t1
where person_id not in
(select * from
(select person_id from person group by person_name)x)

Syncing primary key between two tables stored in two different database

I have two MySQL database that contain two table, let's call them TABLE_A and TABLE_B. Both these tables have as fields id and title. Both databases are on the same server and the same user can access both.
Now, TABLE_A is a subset of TABLE_B for what concern the title field. It means that every title in TABLE_A is present in TABLE_B, too. While id fields of the two table are in no way related.
What I need is to sync id fields in TABLE_A with id fields in TABLE_B, according to the title, i.e. same title, same id. If it's not clear, I have to save TABLE_B id and override TABLE_A ones. And I DON'T have to add missing title from TABLE_B to TABLE_A.
Someone suggested to use a temporal table where to copy all TABLE_B fields in common with TABLE_A and then rename it as TABLE_A. I cannot follow this way, as TABLE_A actually has also other fields that I need to maintain. So, I cannot entirely drop the old TABLE_A.
Moreover, id is the primary key for both tables. It means that I cannot simply copy from TABLE_B to TABLE_A as the query will fail as soon as I try to change one id to another one that is already present in TABLE_A but linked to a different title.
I know how to write a Perl or PHP script to do it, but I would like to know if a pure MySQL solution exists.

You can do this
CREATE TABLE TableA_TMP AS
SELECT * FROM TableA;
ALTER TABLE TableA_TMP ADD id_new int;
UPDATE TableA_TMP A INNER JOIN TableB B ON lower(A.title) = lower(B.title)
SET id_new = B.id;
RENAME TABLE TableA TO TableA_backup;
CREATE TableA AS
select id_new as id, title,.... from TableA_TMP;

MSQL creating a table which automatically updates with data from other tables

I have two tables:
Table1 - Sales
id, Timestamp, Notes, CSR, Date, ExistingCustomer, PubCode, Price, Pub, Expiry, Size, KeyCode, UpSell, CustName, CustAddress, CustCity, CustState, CustPostal, CustCountry, CustPhone, CustEmail, CustCardName, CustCardNumber, CustCardType, CustCardExpiry, CustCardCode
Table2 - Refunds
id,Timestamp,CSR,Date,OrderId,Refunded,Saved,Publication
Basically, I want to create a table (MySQL) which will have some columns that are the same between the two tables and which will update automatically with the values from these two columns.
ie. Table3
Timestamp, CSR, Date, Publication
And this table would automatically update whenever a new record is posted into either of the other two tables, so it would essentially be a merged table.
Because there's nothing to join these two tables, I don't think the JOIN function would work here. Is there anyway I can do this?

You can use an trigger which actives on insert on both tables to make it automatically update.
As for combining tables with no common tables, view this question.

You need to use a stored procedure and a trigger on insert/update of the non merged table

There's got to be some way to join it, and in fact you mention Timestamp, CSR, Date, Publication.
You could join on them in a view. You could add table three and then add triggers though that would be an awful mess.
Why do you want to denormalise in this way?
How about Table3 is a unique key to use as a surrogate, and your 4 join fields, and then you take those out of Table 1 and 2 and replace them with the key the suurrogate key in table 3.
Then it'sa simple join query and no data duplication.

Convert MySQL query into PostgreSQL

I have this query:
DROP TABLE IF EXISTS tmp_table;
CREATE TEMPORARY TABLE tmp_table(id int primary key)
IGNORE (
SELECT user2role.userid AS userid
FROM user2role
INNER JOIN users ON users.id=user2role.userid
INNER JOIN role ON role.roleid=user2role.roleid
WHERE role.parentrole like 'H1::H2::H3::H4::H5::%')
UNION (
SELECT groupid
FROM groups
WHERE groupid IN (2,3,4));
This query was originally written in MySQL and instead of DROP TABLE IF EXISTS it used IF NOT EXISTS. I changed that part, but I don't know what to do about the IGNORE.
First off, what is IGNORE doing?
I tried looking for PostgreSQL equivalents, but they all seem to involve complicated procedures. Do I have to write a procedure for this? And if I have to write one, what would it look like? Could I just emulate IGNORE using some PHP code instead? (The SQL queries are generated by PHP.)

You would write like this in postgres.
IGNORE is irrelevant here, as the table has just been recreated and is guaranteed to be empty. And UNION guarantees there are no duplicate rows inserted.
DROP TABLE IF EXISTS tmp_table;
CREATE TEMP TABLE tmp_table(id int4 primary key);
INSERT INTO tmp_table
SELECT user2role.userid::int4 AS id
FROM user2role
JOIN users ON users.id = user2role.userid
JOIN role ON role.roleid = user2role.roleid
WHERE role.parentrole like 'H1::H2::H3::H4::H5::%'
UNION
SELECT groupid::int4
FROM groups
WHERE groupid in (2,3,4);
If duplicates in the SELECT cannot occur, you might consider the faster UNION ALL instead of UNION. Otherwise you need UNION to eliminate possible dupes. Read here.
If your dataset is large you might consider creating the primary key after the INSERT. That's faster.
Read the mySQL docs on effects of IGNORE.
On revisiting the page I realized you mention IF NOT EXISTS in the original code.
You don't say so, but that only makes sense if the original code created the table only if it didn't exist already, which introduces the possibility of it being not empty before the INSERT. In this case IGNORE is relevant and needs an equivalent in PostgreSQL.
So here is alternative answer for that interpretation of your question.
CREATE TEMP TABLE IF NOT EXISTS has been implemented in PostgreSQL 9.1.
For older version I posted a solution on SO recently.
CREATE TEMP TABLE IF NOT EXISTS tmp_table(id int4 primary key);
INSERT INTO tmp_table
SELECT x.id
FROM (
SELECT user2role.userid::int4 AS id
FROM user2role
JOIN users ON users.id = user2role.userid
JOIN role ON role.roleid = user2role.roleid
WHERE role.parentrole like 'H1::H2::H3::H4::H5::%'
UNION
SELECT groupid::int4
FROM groups
WHERE groupid in (2,3,4)
) x
LEFT JOIN tmp_table t USING (id)
WHERE t.id IS NULL;
LEFT JOIN ... WHERE t.id IS NULL excludes any id that might already be present in tmp_table. UNION goes into a sub-select, so that clause needs only be applied once. Should be fastest.
More on LEFT JOIN here.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008