Compare 2 Mysql tables' data having same structure - mysql

I have 2 tables city_sessions_1 and city_sessions_2
Structure of both table are similar
CREATE TABLE `city_sessions_1` (
`city_id` int(11),
`session_date` date,
`start_time` varchar(12),
`end_time` varchar(12) ,
`attendance` int(11) ,
KEY `city` (`city_id`),
KEY `session_date` (`session_date`)
) ENGINE=MyISAM;
Note these tables do not have any primary key, but they have their indexes defined. Both tables have same number of rows. But it is expected that some data would be different.
How can I compare these 2 tables' data?

-- We start with the rows in city_session_1, and their fit in city_session_2
SELECT
* -- or whatever fields you are interested in
FROM city_sessions_1
LEFT JOIN city_sessions_2 ON city_sessions_1.city_id=city_sessions_2.city_id
WHERE
-- Chose only those differences you are intersted in
city_sessions_1.session_date<>city_session_2.session_date
OR city_sessions_1.start_time<>city_session_2.start_time
OR city_sessions_1.end_time<>city_session_2.end_time
OR city_sessions_1.attendance<>city_session_2.attendance
UNION
-- We need those rows in city_session_2, that have no fit in city_session_1
SELECT
* -- or whatever fields you are interested in
FROM city_sessions_2
LEFT JOIN city_sessions_1 ON city_sessions_1.city_id=city_sessions_2.city_id
WHERE city_sessions_1.city_id IS NULL

Related

How to optimize an UPDATE and JOIN query on practically identical tables?

I am trying to update one table based on another in the most efficient way.
Here is the table DDL of what I am trying to update
Table1
CREATE TABLE `customersPrimary` (
`id` int NOT NULL AUTO_INCREMENT,
`groupID` int NOT NULL,
`IDInGroup` int NOT NULL,
`name` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`address` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `groupID-IDInGroup` (`groupID`,`IDInGroup`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Table2
CREATE TABLE `customersSecondary` (
`groupID` int NOT NULL,
`IDInGroup` int NOT NULL,
`name` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`address` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`groupID`,`IDInGroup`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Both the tables are practically identical but customersSecondary table is a staging table for the other by design. The big difference is primary keys. Table 1 has an auto incrementing primary key, table 2 has a composite primary key.
In both tables the combination of groupID and IDInGroup are unique.
Here is the query I want to optimize
UPDATE customersPrimary
INNER JOIN customersSecondary ON
(customersPrimary.groupID = customersSecondary.groupID
AND customersPrimary.IDInGroup = customersSecondary.IDInGroup)
SET
customersPrimary.name = customersSecondary.name,
customersPrimary.address = customersSecondary.address
This query works but scans EVERY row in customersSecondary.
Adding
WHERE customersPrimary.groupID = (groupID)
Cuts it down significantly to the number of rows with the GroupID in customersSecondary. But this is still often far larger than the number of rows being updated since the groupID can be large. I think the WHERE needs improvement.
I can control table structure and add indexes. I will have to keep both tables.
Any suggestions would be helpful.
Your existing query requires a full table scan because you are saying update everything on the left based on the value on the right. Presumably the optimiser is choosing customersSecondary because it has fewer rows, or at least it thinks it has.
Is the full table scan causing you problems? Locking? Too slow? How long does it take? How frequently are the tables synced? How many records are there in each table? What is the rate of change in each of the tables?
You could add separate indices on name and address but that will take a good chunk of space. The better option is going to be to add an indexed updatedAt column and use that to track which records have been changed.
ALTER TABLE `customersPrimary`
ADD COLUMN `updatedAt` DATETIME NOT NULL DEFAULT '2000-01-01 00:00:00',
ADD INDEX `idx_customer_primary_updated` (`updatedAt`);
ALTER TABLE `customersSecondary`
ADD COLUMN `updatedAt` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ADD INDEX `idx_customer_secondary_updated` (`updatedAt`);
And then you can add updatedAt to your join criteria and the WHERE clause -
UPDATE customersPrimary cp
INNER JOIN customersSecondary cs
ON cp.groupID = cs.groupID
AND cp.IDInGroup = cs.IDInGroup
AND cp.updatedAt < cs.updatedAt
SET
cp.name = cs.name,
cp.address = cs.address,
cp.updatedAt = cs.updatedAt
WHERE cs.updatedAt > :last_query_run_time;
For :last_query_run_time you could use the last run time if you are storing it. Otherwise, if you know you are running the query every hour you could use NOW() - INTERVAL 65 MINUTE. Notice I have used more than one hour to make sure records aren't missed if there is a slight delay for some reason. Another option would be to use SELECT MAX(updatedAt) FROM customersPrimary -
UPDATE customersPrimary cp
INNER JOIN (SELECT MAX(updatedAt) maxUpdatedAt FROM customersPrimary) t
INNER JOIN customersSecondary cs
ON cp.groupID = cs.groupID
AND cp.IDInGroup = cs.IDInGroup
AND cp.updatedAt < cs.updatedAt
SET
cp.name = cs.name,
cp.address = cs.address,
cp.updatedAt = cs.updatedAt
WHERE cs.updatedAt > t.maxUpdatedAt;
Plan A:
Something like this would first find the "new" rows, then add only those:
UPDATE primary
SET ...
JOIN ( SELECT ...
FROM secondary
LEFT JOIN primary
WHERE primary... IS NULL )
ON ...
Might secondary have changes? If so, a variant of that would work.
Plan B:
Better yet is to TRUNCATE TABLE secondary after it is folded into primary.

Mysql annotate table with aggregated sum

I have two tables, which go like
t1
alias_id (string, unique)
finished (datetime)
sum (float)
t2
alias_id (string)
sum (float)
tables contain payments, around 800 k records each. t1 contains each payment just one time, while t2 can have several records with same alias_id - for some payments can consist of several transactions.
I need to compare the sum field in t1 to Sum of sum fields in t2, grouped by alias.
Doing it in Excel works, but is painful and takes about 4 hours. I tried uploading tables to mysql and running a query on them, was surprised to see it took like 8 hours to complete.
I have no idea why, maybe my query is bad? Or maybe grouping by time and sum does that? Could really use a general advice on best approach to the task.
Query goes below.
SELECT
s.alias_id AS id,
s.finished AS finished,
s.sum AS sum,
Sum(b.sum_aggr) AS b_sum
FROM report.rep1 s
LEFT JOIN
( SELECT alias_id, SUM(sum) AS sum_aggr
FROM report.rep2
GROUP BY 1
) b
ON b.alias_id = s.alias_id
GROUP BY 1, 2, 3;
Table DDLs:
first:
CREATE TABLE `rep1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`corp_client_id` longtext,
`agr_name` longtext,
`client_id` longtext,
`order_id` longtext,
`alias_id` longtext,
`due` longtext,
`finished` longtext,
`sum` double NOT NULL,
`currency` longtext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=720886 DEFAULT CHARSET=utf8
second:
CREATE TABLE `rep2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`client_id` longtext,
`contract` longtext,
`contract_start_dt` longtext,
`contract_end_dt` longtext,
`country` longtext,
`provider` longtext,
`date` longtext,
`alias_id` longtext,
`transaction_id` longtext,
`payment_transaction` longtext,
`transaction_type` longtext,
`sum` double NOT NULL,
`transaction_type_name` longtext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=655351 DEFAULT CHARSET=utf8
If you want to compare that the Sums are matching, you can simply do a left join between the tables on alias_id. Now, just compute the SUM on the second table, and then you can compare them.
Try the following instead:
SELECT
s.alias_id AS id,
s.finished AS finished,
s.sum AS sum,
SUM(b.sum) AS b_sum
FROM report.rep1 AS s
LEFT JOIN report.rep2 AS s2 ON s2.alias_id = s.alias_id
GROUP BY s.alias_id, s.finished, s.sum
EDIT: As observed by OP's comments, that alias_id is not indexed on either of the tables. Since the alias_id field is longtext type; it will need proper Indexing, otherwise queries will be slow no matter what. Now, fields with longtext datatype cannot be indexed; so you will need to first convert them into varchar datatype.
ALTER TABLE `rep1` MODIFY COLUMN `alias_id` VARCHAR(255);
ALTER TABLE `rep2` MODIFY COLUMN `alias_id` VARCHAR(255);
You can add the indexing on both the tables as follows:
ALTER TABLE `rep1` ADD INDEX alias_id (`alias_id`);
ALTER TABLE `rep2` ADD INDEX alias_id (`alias_id`);
If alias_id is going to be Unique in the table rep1, you can use the following statement (instead of the first statement above):
ALTER TABLE `rep1` ADD UNIQUE alias_id (`alias_id`);

MySQL update column with value from a different table

I have two tables with the following structure and example content. Table one has the membership_no set to the correct values, but table two has some incorrect values in the membership_no column. I am needing to query both tables and check to see when the membership_no values are not equal, then update table two's membership_no column with the value from table one.
Table One:
id membership_no
====================
800960 800960
800965 800965
Table Two:
id membership_no
====================
800960 800970
800965 800975
Update query so far. It is not catching all of the incorrect values from table two.
UPDATE
tabletwo
INNER JOIN
tableone ON tabletwo.id = tableone.id
SET
tabletwo.membership_no = tableone.membership_no;
EDIT: Including SHOW CREATE and SELECT queries for unmatched membership_no column values.
Table One SHOW:
CREATE TABLE `n2z7m3_kiduka_accounts_j15` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`membership_no` int(11) NOT NULL,
...
`membershipyear` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=800987 DEFAULT CHARSET=utf8
Table Two SHOW:
CREATE TABLE `n2z7m3_kiduka_accounts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`membership_no` int(11) NOT NULL,
...
`membershipyear` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=801072 DEFAULT CHARSET=utf8
SELECT query for unmatched membership_no column values:
SELECT
u.name,
a.membership_no as 'Joomla 1.5 accounts table',
j.membership_no as 'Joomla 3.0 accounts table'
FROM
n2z7m3_kiduka_accounts_j15 AS a
INNER JOIN n2z7m3_users AS u ON a.user_id = u.id
INNER JOIN n2z7m3_kiduka_accounts AS j ON a.user_id = j.membership_no
and a.membership_no != j.membership_no
ORDER BY u.name;
While Tim's Answer is perfectly valid, another variation is to add the filter qualifier to the ON clause such that:
UPDATE tabletwo
INNER JOIN
tableone ON tabletwo.id = tableone.id AND tabletwo.membership_no <> tableone.membership_no
SET
tabletwo.membership_no = tableone.membership_no;
This means that you don't have the WHERE filter so it will process all rows, but will act on only those with differing membership_no values. Because it is an INNER JOIN the results will be both tables or no tables (Skipped/NULL result).
EDIT:
If you suspect you have a problem still, what does the MySQL command respond, do you have a specific error notice? With 80k columns, it may take a while for the comand to actually process , so are you giving the command time to complete or is PHP or the system causing the command to abort due to execution time expiry? (Update your execution time on PHP and MySQL and rerun query just to see if that causes it to complete successfully?)
Suggestion
As another sggestion I think your UNIQUE KEY should also be your AI key so for both tables:
DROP INDEX `user_id` ON <table> #removes the current unique index.
then
CREATE UNIQUE INDEX `id` ON <table> #addes unique index to the A_I column.
You just need to add a WHERE clause:
UPDATE
tabletwo
INNER JOIN
tableone
ON tabletwo.id = tableone.id
SET
tabletwo.membership_no = tableone.membership_no
WHERE tabletwo.membership_no <> tableone.membership_no

Calling Data from 2 tables

I am kind of new to SQL. I have 2 MySQL Tables. Below is their structure.
Key_Hash Table
CREATE TABLE `key_hash` (
`primary_key` int(11) NOT NULL,
`hash` text NOT NULL,
`totalNumberOfWords` int(11) NOT NULL,
PRIMARY KEY (`primary_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
--
Key_Word Table
CREATE TABLE `key_word` (
`primary_key` bigint(20) NOT NULL AUTO_INCREMENT,
`indexVal` int(11) NOT NULL,
`hashed_word` char(3) NOT NULL,
PRIMARY KEY (`primary_key`),
KEY `hashed_word` (`hashed_word`,`indexVal`)
) ENGINE=InnoDB AUTO_INCREMENT=28570982 DEFAULT CHARSET=latin1
Now, below is my query
SELECT `indexVal`, COUNT(`indexVal`) FROM `key_word` WHERE `hashed_word` IN ('001','01v') GROUP BY `indexVal` LIMIT 100;
When you run the above query, you will get an output like below
The important thing here to note is that indexVal in key_word table is the same set of data in primary_key in key_hash table (I think it can be a foreign key?). In other words, primary_key data in key_hash table appear as indexVal in key_word table. But pleas note indexVal can appear any number of times inside the table because it is not a primary key in key_word.
OK so, this is not the query what I need exactly. I need to count how many times each unique indexVal appear in the above search, and divide it by appropriate value in key_hash.totalNumberOfWords.
I am providing few examples below.
Imagine I ran the above query, now the result is generated. It says
indexVal 0 appeared 10 times in search
indexVal 1 appeared 20 times in search
indexVal 300 appeared 20,000 times in search
Now keep in mind that key_hash.primary_key = key_word.indexVal . first I search for key_hash.primary_key which is similar to key_word.indexVal and get the associated key_hash.numberOfWords. Then I divide the count() appeared in the above mentioned query from this key_hash.numberOfWords and multiply the total answer by 100 (to get the value as a percentage). Below is a query I tried but it has errors.
SELECT `indexVal`,COUNT(`indexVal`), (COUNT(`indexVal`) / (select `numberOfWords` from `key_hash` where `primary_key`=`key_word.indexVal`)*100) FROM `key_word` WHERE `hashed_word` IN ('001','01v') GROUP BY `indexVal` LIMIT 100;
How can I do this job?
EDIT
This is how the key_hash table looks like
This is how the key_word table looks like
You can use a JOIN instead of a sub-query
SELECT w.indexVal
, COUNT(w.indexVal)
, COUNT(w.indexVal) / MAX(h.numberOfWords) * 100
FROM key_word w
INNER JOIN key_hash h ON h.primary_key = w.indexVal
WHERE w.hashed_word IN ('001','01v')
GROUP BY indexVal
LIMIT 100

Is it possible to merge two tables by primary key?

I have two tables, which I need to merge, and they are:
CREATE TABLE IF NOT EXISTS `legacy_bookmarks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` text,
`title` text,
`snippet` text,
`datetime` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `datetime` (`datetime`),
FULLTEXT KEY `title` (`title`,`snippet`)
)
And:
CREATE TABLE IF NOT EXISTS `legacy_links` (
`id` mediumint(11) NOT NULL AUTO_INCREMENT,
`user_id` mediumint(11) NOT NULL,
`bookmark_id` int(11) NOT NULL,
`status` enum('public','private') NOT NULL DEFAULT 'public',
UNIQUE KEY `id` (`id`),
KEY `bookmark_id` (`bookmark_id`)
)
As you can see, "legacy_links" contains the ID for "legacy_bookmarks". Am I able to merge the two, based on this relationship?
I can easily change the name of the ID column in "legacy_bookmarks" to "bookmark_id", if that makes things any easier.
Just so you know, the order of the columns, and their types, must be exact, because the data from this combined table is then to be imported into the new "bookmarks" table.
Also, I'd need to able to include additional columns (a "modification" column, populated with the "datetime" values), and change the order of the ones I have.
Any takers?
[Up to you to change the order of the columns]
CREATE TABLE `legacy_linkss` AS
SELECT l.id, l.url, l.title, l.snippet, l.datetime AS modification, b.user_id, b.status
FROM
`legacy_links` l
JOIN `legacy_bookmarks` b ON b.id = l.bookmark_id
;
Afterwards, after checking the consistency and adding manually the constraints, you may:
DROP TABLE `legacy_links`;
DROP TABLE `legacy_bookmarks`;
RENAME TABLE `legacy_linkss` TO `legacy_links`;
Yes, it's called a join, and you would do it like so:
SELECT *
FROM legacy_bookmarks lb
INNER JOIN legacy_links ll ON ll.bookmark_id = lb.id