I am doing an insert from an imported table and adding data into multiple tables using mysql.
Basically when doing the insert there are some null fields as the data has been imported from a csv.
What I want to do is extract the data and not create multiple null entries. An example is when adding contacts which have no entries. Basically I want to have one entry in the table which can be bound to the id within the table.
How can i do this?
My current code is
Insert into Contact(FirstName, Surname, Position, TelephoneNo, EmailAddress, RegisteredDate)
Select Distinct Import.FirstName, Import.SecondName, Import.JobTitle,
Import.ContactTelNumber, Import.EmailAddress, Import.RegistrationDate
FROM Import
This basically imports and does no checks but where can I add check for this?
It's hard to infer exactly what you mean from your description. It would help if you showed a couple of example lines, one that you want included and one that you want to be excluded.
But you can add a variety of conditions in the WHERE clause of your SELECT. For example, if you just want to make sure that at least one column in Import is non-null, you could do this:
INSERT INTO Contact(FirstName, Surname, Position, TelephoneNo,
EmailAddress, RegisteredDate)
SELECT DISTINCT FirstName, SecondName, JobTitle,
ContactTelNumber, EmailAddress, RegistrationDate
FROM Import
WHERE COALESCE(FirstName, SecondName, JobTitle, ContactTelNumber,
EmailAddress, RegistrationDate) IS NOT NULL
COALESCE() is a function that accepts a variable number of arguments, and returns the first non-null argument. If all the arguments are null, it returns null. So if we coalesce all the columns, and we get a null, then we know that all the columns are null, and we exclude that row.
Re your comment:
Okay, it sounds like you want a unique constraint over the whole row, and you want to copy only rows that don't violate the unique constraint.
One way to accomplish this would be the following:
ALTER TABLE Contact ADD UNIQUE KEY (FirstName, Surname, Position, TelephoneNo,
EmailAddress, RegisteredDate);
INSERT IGNORE INTO Contact(FirstName, Surname, Position, TelephoneNo,
EmailAddress, RegisteredDate)
SELECT DISTINCT FirstName, SecondName, JobTitle,
ContactTelNumber, EmailAddress, RegistrationDate
FROM Import;
The INSERT IGNORE means if it encounters an error like a duplicate row, don't insert it, but also don't abort the insert for other rows.
The unique constraint creates an index, so it will take some time to run that ALTER TABLE, depending on the size of your table.
Also it may be impractical to have a key containing many columns. Indexes have a limit of 16 columns and 1000 bytes total in length. However, I would expect that what you really want is to restrict to one row per EmailAddress or some other subset of the columns.
Related
I'm trying to write a SQL statement that replace instead of update.
The columns of my table look like that
(id
block
region
login
password
email
business
firstname
name
version
updatable
bodyshop_id
mac
register_date
lastvisite_date
enum_test
address1)
and when I run a statement like this:
REPLACE INTO `users` (`login`, `firstname`, `region`, `address1`, `enum_test`, `block`, `id`) VALUES ('Samira GO', 'Samira', 'all', 'lmklm', '1', '0', '2')
Samira have the id number two. (target of the replace ;) )
The person with the id number one is drop by the request.
(The primary id key of the table is id+login+email)
(When I ask this request to SQL it told me that 3 lines are affect)
If you want to ask, id, login, or email are some primary value, so I don't understand how it would be able to change some value with another id or login
From the MySQL REPLACE doc:
The REPLACE statement returns a count to indicate the number of rows affected. This is the sum of the rows deleted and inserted. If the count is 1 for a single-row REPLACE, a row was inserted and no rows were deleted. If the count is greater than 1, one or more old rows were deleted before the new row was inserted. It is possible for a single row to replace more than one old row if the table contains multiple unique indexes and the new row duplicates values for different old rows in different unique indexes.
So, it sounds like one row was inserted and two rows were deleted.
Examine your table definition and see if there are any UNIQUE indexes other than the PRIMARY KEY. Note also that while you say the primary key is id, login, email, your query doesn't specify email. If two rows existed that matched id and login but had different email, they may have both been deleted.
You may also consider that what you wanted to do is an INSERT ... ON DUPLICATE KEY UPDATE instead of a REPLACE. REPLACE functions more like a combined DELETE then INSERT.
I have a database table that was generated by importing several thousand text documents each very large. For some reason, some files were imported multiple times.
I am trying to remove duplicate rows by using following query:
ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (LASTNAME, FIRSTNAME, HOUSENUMBER, STREET, CITY, ZIP, DOB, SEX);
but I was getting an error
1062 - Duplicate entry
Apparently, IGNORE has been deprecated.
How can I remove duplicates from my database?
I guess I have to do a DELETE with a JOIN but I can't figure out the code.
The table is InnoDB and currently has about 40,000,000 rows (there should be about 17,000,000). Each row has a primary key.
Considering the size, I am hesitant to temporally change the table to MyISAM.
Each row has a primary key
Is a unique number?
Create an AUX table like this(assuming ID is the PK):
create table mytable_aux as (
select LASTNAME, FIRSTNAME, HOUSENUMBER, STREET, CITY, ZIP, DOB, SEX, MIN(ID)
from mytable
group by LASTNAME, FIRSTNAME, HOUSENUMBER, STREET, CITY, ZIP, DOB, SEX);
Then delete everything that is not in aux table:
delete from mytable where id not in (select aux.id from mytable_aux aux) ;
Assuming it is just one table and you have the SQL dump available...
CREATE the table with all the relationships established but no data inserted. Keep the INSERT statements stored in a separate .sql file.
Change all the INSERT statements to INSERT IGNORE.
Import the updated .sql file containing only the INSERT IGNORE statements. The duplicates will be automatically ignored.
Please note that, without comparing manually, you won't be able to figure out which or how many records were ignored.
However, if you're absolutely sure that you really don't need the duplicates based on the relationships defined on the table, then this approach works fairly well.
Also, if you'd like to do the same with multiple tables, you'll have to make sure that you CREATE all the tables at the start, define the foreign keys / dependencies AND, most importantly, arrange the new .sql file in such a manner that the table that has no dependency gets the INSERT statements loaded first. Likewise, the last set of INSERT statements will be for the table with the most number of dependencies.
Hope that helps.
If those are the only fields in your table you can always:
create table temp_unique as
select distinct LASTNAME, FIRSTNAME, HOUSENUMBER, STREET, CITY, ZIP, DOB, SEX
from mytable
then rename (or drop if you dare) mytable and rename temp_unique to mytable, then create your indexes (make sure to create any other indexes or FKs or whatever that already exist).
If you're working on a live table you'll have to delete the underlying records one at a time. That's quite a bit different -- add a uid then perform deletes. If that's your situation, let us know, we can refactor.
I have duplicated a Table to create an Archive table, and for some reason I can't make to Appending Query to work.
This is the SQL code:
INSERT INTO tblArc
SELECT tblCostumer.*
FROM tblCostumer, tblArc
WHERE (((tblArc.num)=[Enter Client Number you'd like to move to the archive]));
When I enter the costumer number, it says "You are about to append 0 row(s)" instead of appending 1 row.
That FROM clause would give you a cross join, which is probably not what you should really want ...
FROM tblCostumer, tblArc
Instead SELECT only from tblCostumer based on its primary key. For example, if the primary key is tblCostumer.num ...
INSERT INTO tblArc
SELECT tblCostumer.*
FROM tblCostumer
WHERE tblCostumer.num=[Enter Client Number you'd like to move to the archive];
And if the structures of the two tables are not the same, list the specific fields instead of ...
INSERT INTO tblArc
SELECT tblCostumer.*
I have been trying to do insert / update records in a mysql table. Cannot use ON DUPLICATE KEY because i have nothing to do with the primary key.
Basically i have to update a record in the database
INSERT INTO table (city, state, gender, value) VALUES ("delhi","delhi","M",22)
If a record of that city, state, gender exists, then simply overwrite the value.
Can i achieve this without sending two queries from the programming language
actually you can still use ON DUPLICATE KEY, just add a unique index on the following columns, eg
ALTER TABLE tbl_name ADD UNIQUE index_name (city, state, gender)
your query now will be,
INSERT INTO table (city, state, gender, value)
VALUES ('delhi','delhi','M', 22)
ON DUPLICATE KEY UPDATE value = 22
Keep in mind that constructs such as ON DUPLICATE KEY and REPLACE INTO were specifically designed to prevent exactly that. The only other way to prevent two queries from your application layer is by declaring a database function that does the same things.
Therefore, add either a UNIQUE(city, state, gender) key or a primary key that spans the same columns. The difference between the two lies in the value range of each column; primary keys force NOT NULL whereas UNIQUE allows for columns to be NULL.
The difference is subtle but can sometimes lead to unexpected results, because NULL values are considered to be unique. For example, let's say you have this data in your database:
nr | name
123 | NULL
If you try to insert another (123, NULL) it will not complain when you use UNIQUE(nr,name); this may seem like a bug, but it's not.
I'm using MySQL 4.1. Some tables have duplicates entries that go against the constraints.
When I try to group rows, MySQL doesn't recognise the rows as being similar.
Example:
Table A has a column "Name" with the Unique proprety.
The table contains one row with the name 'Hach?' and one row with the same name but a square at the end instead of the '?' (which I can't reproduce in this textfield)
A "Group by" on these 2 rows return 2 separate rows
This cause several problems including the fact that I can't export and reimport the database. On reimporting an error mentions that a Insert has failed because it violates a constraint.
In theory I could try to import, wait for the first error, fix the import script and the original DB, and repeat. In pratice, that would take forever.
Is there a way to list all the anomalies or force the database to recheck constraints (and list all the values/rows that go against them) ?
I can supply the .MYD file if it can be helpful.
To list all the anomalies:
SELECT name, count(*) FROM TableA GROUP BY name HAVING count(*) > 1;
There are a few ways to tackle deleting the dups and your path will depend heavily on the number of dups you have.
See this SO question for ways of removing those from your table.
Here is the solution I provided there:
-- Setup for example
create table people (fname varchar(10), lname varchar(10));
insert into people values ('Bob', 'Newhart');
insert into people values ('Bob', 'Newhart');
insert into people values ('Bill', 'Cosby');
insert into people values ('Jim', 'Gaffigan');
insert into people values ('Jim', 'Gaffigan');
insert into people values ('Adam', 'Sandler');
-- Show table with duplicates
select * from people;
-- Create table with one version of each duplicate record
create table dups as
select distinct fname, lname, count(*)
from people group by fname, lname
having count(*) > 1;
-- Delete all matching duplicate records
delete people from people inner join dups
on people.fname = dups.fname AND
people.lname = dups.lname;
-- Insert single record of each dup back into table
insert into people select fname, lname from dups;
-- Show Fixed table
select * from people;
Create a new table, select all rows and group by the unique key (in the example column name) and insert in the new table.
To find out what is that character, do the following query:
SELECT HEX(Name) FROM TableName WHERE Name LIKE 'Hach%'
You will se the ascii code of that 'square'.
If that character is 'x', you could update like this:(but if that column is Unique you will have some errors)
UPDATE TableName SET Name=TRIM(TRAILING 'x' FROM Name);
I'll assume this is a MySQL 4.1 random bug. Somes values are just changing on their own for no particular reason even if they violates some MySQL constraints. MySQL is simply ignoring those violations.
To solve my problem, I will write a prog that tries to resinsert every line of data in the same table (to be precise : another table with the same caracteristics) and log every instance of failures.
I will leave the incident open for a while in case someone gets the same problem and someone else finds a more practical solution.