Loading data from text file to mysql database by eliminating duplicates - mysql

I want to load the data from text file to database, if data already exists i need to escape that data while loading.
I am using below query to load the data from text file to mysql data base.
"Load data infile 'F:/wbrdata.txt' into table wbrdatatable
fields terminated by ','
optionally enclosed by ""
lines terminated by '\r\n'
Ignore 1 lines (channel, time, pulserate, dwellid, targetid);"
It is appending the data to existing table data. I want to avoid the common data which is already exists(duplicates) in table & file while loading to the database.
How can i achieve this?
thank you
regards
sankar

Try to load text file into the temporary table (the same as target table), then remove duplicates from the temporary table and copy the rest to the target table.
Example (suppose that wbrdatatable_temp is a temporary table with all data from text file):
CREATE TABLE wbrdatatable(
id INT(11) NOT NULL AUTO_INCREMENT,
column1 VARCHAR(255) DEFAULT NULL,
PRIMARY KEY (id)
);
INSERT INTO wbrdatatable VALUES
(1, '111'),
(2, '222'),
(3, '333'),
(4, '444'),
(5, '555');
CREATE TABLE wbrdatatable_temp(
id INT(11) NOT NULL AUTO_INCREMENT,
column1 VARCHAR(255) DEFAULT NULL,
PRIMARY KEY (id)
);
INSERT INTO wbrdatatable_temp VALUES
(1, '111'),
(2, '222'),
(10, '100'), -- new record that should be added
(11, '200'); -- new record that should be added
-- Copy only new records!
INSERT INTO wbrdatatable
SELECT t1.* FROM wbrdatatable_temp t1
LEFT JOIN wbrdatatable t2
ON t1.id = t2.id AND t1.column1 = t2.column1
WHERE t2.id IS NULL;
-- Test result
SELECT * FROM wbrdatatable;
+----+---------+
| id | column1 |
+----+---------+
| 1 | 111 |
| 2 | 222 |
| 3 | 333 |
| 4 | 444 |
| 5 | 555 |
| 10 | 100 | -- only new record is added
| 11 | 200 | -- only new record is added
+----+---------+

Try this Logic.
1. Upload Text File Data
2. Check record using select statement on your database
if(recordexist==true)
save
else
not save
Regards

Related

insert ... on duplicate key update with null values for not-null column

I have a following example table:
+-------+-----------+-------+-----+
| Field | Type | Null | PK |
+-------+-----------+-------+-----+
| id | int | NO | PRI |
| numb | int | NO | |
| text | text | NO | |
+-------+-----------+-------+-----+
In which I'm trying to update several rows with one query:
INSERT INTO example_table
VALUES (1, 100, null), (2, 100, 'abc')
ON DUPLICATE KEY UPDATE
numb = VALUES(numb), text = IFNULL(VALUES(text), text);
MySQL doesn't allows to execute this query because one of VALUES blocks contains null value for not-null column (text column). But, I am passing only data for existing rows which will 100% trigger "ON DUPLICATE KEY" section, which have additional null check.
Is there any way I can disable this check?
I know I can use multiple UPDATE statements with different set of columns.
But I am interested specifically in INSERT ... ON DUPLICATE KEY UPDATE query.
As #Shadow suggested, using empty string will do the job just perfect!
INSERT INTO example_table
VALUES (1, 100, ''), (2, 100, 'abc')
ON DUPLICATE KEY UPDATE
numb = VALUES(numb), text = IF(VALUES(text) = '', text, VALUES(text));

Insert into mysql table only if those values are not in table

I have search already an answer but i can't find one that is good for my situation.
I have a table called Names like this
ID NAME Age
1 Paula 20
2 Mark 17
And i want to run this sql
Insert into table names(name,age) values ("Chriss",15)//should be inserted
Insert into table names(name,age) values ("Mark",17)// should be ignored
Insert into table names(name,age) values ("Andrea",20) //should be inserted
So how can I ignore second insert query
Create a constraint that demands NAME and Age to be unique in the table.
ALTER TABLE `tablename` ADD UNIQUE `unique_index`(`NAME`, `Age`);
You would either need to Add UNIQUE constraint or check the data at the run time (if you don't have a permission to change table schema):
ALTER TABLE `Table_name`
ADD UNIQUE INDEX (`NAME`, `AGE`);
You can use:
INSERT INTO names(name,age)
SELECT * FROM (SELECT 'Chriss', 15) AS tmp
WHERE NOT EXISTS (
SELECT name FROM names WHERE name = 'Chriss' AND age = 15
) LIMIT 1;
An other way is just make the columns name and age UNIQUE so the query fails.
Change your query to this:
Insert into table names(name,age)
SELECT "Chriss",15 WHERE NOT EXISTS (SELECT 1 FROM names WHERE `name` = "Chriss");
Insert into table names(name,age)
SELECT "Mark",17 WHERE NOT EXISTS (SELECT 1 FROM names WHERE `name` = "Mark");
Insert into table names(name,age)
SELECT "Andrea",20 WHERE NOT EXISTS (SELECT 1 FROM names WHERE `name` = "Andrea");
First create a unique constraint for the columns NAME and Age:
ALTER TABLE names ADD UNIQUE un_name_age (`NAME`, `Age`);
and then use INSERT IGNORE to insert the rows:
Insert ignore into names(name,age) values
("Chriss",15),
("Mark",17),
("Andrea",20);
So if you try to insert a duplicate name the error will just be ignored and the statement will continue with the next row to insert.
See the demo.
Result:
| ID | NAME | Age |
| --- | ------ | --- |
| 1 | Paula | 20 |
| 2 | Mark | 17 |
| 3 | Chriss | 15 |
| 4 | Andrea | 20 |

Moving hex data from a varchar type field to bigint type (mysql)

I am trying to insert data from one table into another, and each table has an 'id' field that should be the same, but is stored different datatype. This 'id' field should represent the same unique value, allowing me to update from one to another.
In one table (the new.table one), the 'id' is stored as datatype varchar(35) and in the old.table it is datatype bigint(20) -- I believe this older table represents the integer version of the hex value stored in the new one. I am trying to update data from the new.table back into the old.table
After searching about this for a while
When I try this simple mysql update query it fails:
INSERT INTO old.table (id, field2)
SELECT CAST(CONV(id,16,10) AS UNSIGNED INTEGER), field2
FROM new.table;
It fails with this error:
Out of range value for column 'id' at row 1
I have also tried a simple
SELECT CAST(CONV(id, 16,10) AS UNSIGNED INTEGER) from new.table;
And the result is all the same integer mostly, but each hex value in new.table is unique. I've google this for two days, and could really use to help to figure out what is wrong. Thanks.
EDIT: Some of the example data from console of output of SELECT ID from new.table:
| 1d2353560110956e1b3e8610a35d903a |
| ec526762556c4f92a3ea4584a7cebfe1.11 |
| 34b8c838c18a4c5690514782b7137468.16 |
| 1233fa2813af44ca9f25bb8cac05b5b5.16 |
| 37f396d9c6e04313b153a34ab1e80304.16 |
The problem id is too high values.
MySQL will return limit-value when overflow happened.
Query 1:
select CONV('FFFFFFFFFFFFFFFF1',16,10)
Results:
| CONV('FFFFFFFFFFFFFFFF1',16,10) |
|---------------------------------|
| 18446744073709551615 |
Query 2:
select CONV('FFFFFFFFFFFFFFFF',16,10)
Results:
| CONV('FFFFFFFFFFFFFFFF',16,10) |
|--------------------------------|
| 18446744073709551615 |
I would suggest you, Implement the logic algorithm for id in your case in a function instead of use CONV function.
EDIT
I would use a variable to make new row number and insert to old table.
CREATE TABLE new(
Id varchar(35)
);
insert into new values ('1d2353560110956e1b3e8610a35d903a');
insert into new values ('ec526762556c4f92a3ea4584a7cebfe1.11');
insert into new values ('34b8c838c18a4c5690514782b7137468.16');
insert into new values ('1233fa2813af44ca9f25bb8cac05b5b5.16');
insert into new values ('37f396d9c6e04313b153a34ab1e80304.16');
CREATE TABLE old(
Id bigint(20),
val varchar(35)
);
INSERT INTO old (id, val)
SELECT rn, id
FROM (
SELECT *,(#Rn:=#Rn +1) rn
FROM new CROSS JOIN (SELECT #Rn:=0) v
) t1
Query 1:
SELECT * FROM old
Results:
| Id | val |
|----|-------------------------------------|
| 1 | 1d2353560110956e1b3e8610a35d903a |
| 2 | ec526762556c4f92a3ea4584a7cebfe1.11 |
| 3 | 34b8c838c18a4c5690514782b7137468.16 |
| 4 | 1233fa2813af44ca9f25bb8cac05b5b5.16 |
| 5 | 37f396d9c6e04313b153a34ab1e80304.16 |

How to change the data of a column in mysql?

I need to change some data of one column from table1 and then, I will copy some of the table1 data to new_table, here is my example tables.
table1
id | url | user1_ign | user2_ign | message | fields that are not needed anymore
new_table
id | url | user1_ign | user2_ign | message
Basically, table1 have fields that are not in new_table. My problem is I do not know how to change the data in a field while copying it to a new table (already searched here).
I need to change the data of the url. Here is the layout.
table1
id | url | user1_ign | user2_ign | message | some field
1 | jj-HasZNsh | jj | gg | hello dude! | ...
new_table
id | url | user1_ign | user2_ign | message
1 | jj-gg-HasZNsh | jj | gg | hello dude!
That is what I needed to do to, as you can see, I need to change the url in new_table based on the user1_ign and user2_ign. Is there a way of how to solve this?
UPDATE
I have this kind of url in table1 number-HasZNsh or alphabet-HasZNsh.
I need them to become like this in new_table
number-HasZNsh -> ign1-ign2-HasZNsh
alphabet-HasZNsh -> ign1-ign2-HasZNsh
This is what I need to do specifically.
You can combine the INSERT statement for your destination table followed SELECT to set the values to be inserted. For your url field as you specify above, you can use REPLACE to replace a string inside a string.
INSERT INTO
`new_table` (id, url, user1_ign, user2_ign, message)
SELECT
id,
REPLACE(url, '-', '-gg-') `url`,
user1_ign,
message
FROM
`table1`
If you wish to grab data from another field for the gg part of the REPLACE line, you would use :
INSERT INTO
`new_table` (id, url, user1_ign, user2_ign, message)
SELECT
id,
REPLACE(url, '-', CONCAT('-', user2_ign, '-') `url`,
user1_ign,
message
FROM
`table1`
For more information on the command syntax as used above :
REPLACE
CONCAT
INSERT INTO table1 FROM table2

Fastest way to diff datasets and update/insert lots of rows into large MySQL table?

The schema
I have a MySQL database with one large table (5 million rows say). This table has several fields for actual data, an optional comment field, and fields to record when the row was first added and when the data is deleted. To simplify to one "data" column, it looks a bit like this:
+----+------+---------+---------+----------+
| id | data | comment | created | deleted |
+----+------+---------+---------+----------+
| 1 | val1 | NULL | 1 | 2 |
| 2 | val2 | nice | 1 | NULL |
| 3 | val3 | NULL | 2 | NULL |
| 4 | val4 | NULL | 2 | 3 |
| 5 | val5 | NULL | 3 | NULL |
This schema allows us to look at any past version of the data thanks to the created and deleted fields e.g.
SET #version=1;
SELECT data, comment FROM MyTable
WHERE created <= #version AND
(deleted IS NULL OR deleted > #version);
+------+---------+
| data | comment |
+------+---------+
| val1 | NULL |
| val2 | nice |
The current version of the data can be fetched more simply:
SELECT data, comment FROM MyTable WHERE deleted IS NULL;
+------+---------+
| data | comment |
+------+---------+
| val2 | nice |
| val3 | NULL |
| val5 | NULL |
DDL:
CREATE TABLE `MyTable` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`data` varchar(32) NOT NULL,
`comment` varchar(32) DEFAULT NULL,
`created` int(11) NOT NULL,
`deleted` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `data` (`data`,`comment`)
) ENGINE=InnoDB;
Updating
Periodically a new set of data and comments arrives. This can be fairly large, half a million rows say. I need to update MyTable so that this new data set is stored in it. This means:
"Deleting" old rows. Note the "scare quotes" - we don't actually delete rows from MyTable. We have to set the deleted field to the new version N. This has to be done for all rows in MyTable that are in the previous version N-1, but are not in the new set.
Inserting new rows. All rows that are in the new set and are not in version N-1 in MyTable must be added as new rows with the created field set to the new version N, and deleted as NULL.
Some rows in the new set may match existing rows in MyTable at version N-1 in which case there is nothing to do.
My current solution
Given that we have to "diff" two sets of data to work out the deletions, we can't just read over the new data and do insertions as appropriate. I can't think of a way to do the diff operation without dumping all the new data into a temporary table first. So my strategy goes like this:
-- temp table uses MyISAM for speed.
CREATE TEMPORARY TABLE tempUpdate (
`data` char(32) NOT NULL,
`comment` char(32) DEFAULT NULL,
PRIMARY KEY (`data`),
KEY (`data`, `comment`)
) ENGINE=MyISAM;
-- Bulk insert thousands of rows
INSERT INTO tempUpdate VALUES
('some new', NULL),
('other', 'comment'),
...
-- Start transaction for the update
BEGIN;
SET #newVersion = 5; -- Worked out out-of-band
-- Do the "deletions". The join selects all non-deleted rows in MyTable for
-- which the matching row in tempUpdate does not exist (tempUpdate.data is NULL)
UPDATE MyTable
LEFT JOIN tempUpdate
ON MyTable.data = tempUpdate.data AND
MyTable.comment <=> tempUpdate.comment
SET MyTable.deleted = #newVersion
WHERE tempUpdate.data IS NULL AND
MyTable.deleted IS NULL;
-- Delete all rows from the tempUpdate table that match rows in the current
-- version (deleted is null) to leave just new rows.
DELETE tempUpdate.*
FROM MyTable RIGHT JOIN tempUpdate
ON MyTable.data = tempUpdate.data AND
MyTable.comment <=> tempUpdate.comment
WHERE MyTable.id IS NOT NULL AND
MyTable.deleted IS NULL;
-- All rows left in tempUpdate are new so add them.
INSERT INTO MyTable (data, comment, created)
SELECT DISTINCT tempUpdate.data, tempUpdate.comment, #newVersion
FROM tempUpdate;
COMMIT;
DROP TEMPORARY TABLE IF EXISTS tempUpdate;
The question (at last)
I need to find the fastest way to do this update operation. I can't change the schema for MyTable, so any solution must work with that constraint. Can you think of a faster way to do the update operation, or suggest speed-ups to my existing method?
I have a Python script for testing the timings of different update strategies and checking their correctness over several versions. It's fairly long but I can edit into the question if people think it would be useful.
One of speed-ups is for loading -- LOAD DATA INFILE.
In so far as I've experienced audit-logging, you'll be better off with two tables, e.g.:
yourtable (id, col1, col2, version) -- pkey on id
yourtable_logs (id, col1, col2, version) -- pkey on (id, version)
Then add an update trigger on yourtable, which inserts the previous version in yourtable_logs.