Make duplicates in varchar field not duplicate by adding an incrementing number - mysql

I'm upgrading a DB schema. A table is divided into 2 tables. These two tables are linked and the only way to link them is using a varchar name field in the original table.
Problem is that customer could have duplicate names, since name is not the Primary/Unique Key.
I came here looking for ideas to make these names unique so it can be used as a key when moving the data to the two new linked tables.
As I'm writing this, I had the idea of CONCATing the current PK to name (as in CONCAT(name,id) as newName) and use this as key.
There's also a unique code field, but this only goes into one of the new tables.
Example Schema:
tableA
a_id INT(11) PRI AUTO_INCREMENT
code VARCHAR(10) UNIQUE
name VARCHAR(30) NOT NULL
newTableB
b_id INT(11) PRI AUTO_INCREMENT
name VARCHAR(30)
newTableC
c_id INT(11) PRI AUTO_INCREMENT
b_id INT(11) FK->newTableB
code VARCHAR(10) UNIQUE
What I want:
Generate New b_id's (auto_increment)
newTableB.name imported from tableA.name
newTableC.code imported from tableA.code
newTableC.c_id imported from tableA.a_id
a_id is an FK in another fourth table that's staying the same. After the changes above, c_id will now be valid FK in this fourth table.
My Challenge:
How Do insert into these new tables but still keep the original name<->code<->a_id relationship and transfer tableA.a_id values to newTableC.c_id. Don't care if b_id gets new auto_incr. values.
Not sure how clear I'm making problem come across, but feel free to ask any questions.
Thanks
Dan

I've usually done a task like this in several parts. First insert to table b but put the id field from a in the name column instead of the name. Then insert the data into tablec from table a. Then update tableb name field by joining to table a on the a.id and b.name fields. BUt im not sure if you can use a join in an update in mysql. Should bea way to write the statement though once you have the data there.
Alternatively, I temporarily add a column to table b called tableid. Then I put the data in tableb including the a id and use that to join to table a to get the b.id and table A data for table c. When it is done, I drop the tableA column from table b.

how about this
CREATE TABLE temp_newTableB AS (SELECT * FROM newTableB);
DELETE FROM newTableB WHERE b_id NOT IN (SELECT b_id FROM temp_newTableB GROUP BY name);
DROP TABLE temp_newTableB ;
but is there any rule on which you are going to decide which b_id you want to keep in the newTableC as some of the duplicate named b_id will be removed from newTableB by the above query??

Related

How do I insert a column of values from one table to another, non-matching schemas?

I have two tables:
Table A: lastName,firstName,clientExtension
Table B: ~45 columns, however lastName,firstName,clientExtension are also in this table. The data types for these three columns match in each table.. lastName VARCHAR(150),firstName VARCHAR(150),clientExtension INT(5) unsigned.
Table A has 31 rows, no NULL values. The records in Table A are already in Table B, but my objective is to update the clientExtension value in Table B to be the clientExtension value from Table A for each agent.
This is what I have tried so far, with no luck..
INSERT INTO table_A (lastName, firstName, clientExtension)
SELECT clientExtension
FROM tableB AS tb
WHERE lastName=tb.lastName
AND firstName=tb.firstName;
I've also tried using the UPDATE function, however I can't seem to get it to work. It feels like what I'm trying to do is an INNER JOIN, except I'm not looking to create a new table with the output of the INNER JOIN, I'm looking to update existing records in Table B with the clientExtension values in Table A.
Any ideas??
This schema needs some help before you have more than a few dozen rows in those tables. If that is really your schema, then you have some problems when names change. It will take a few minutes to show a better approach, bear with me.
Then I will show the update/join pattern if you don't have it yet (on the better schema).
create table tableA
( -- assuming this is really a user table
id int auto_increment primary key, -- if you don't have this, you are going to have problems
firstName varchar(150) not null,
lastName varchar(150) not null,
clientExtension int not null -- sign, display width of no concern
);-- sign, display width of no concern
insert tableA (firstName,lastName,clientExtension) values ('f1','l1',777),('f2','l2',888);
create table tableB
( -- assuming this is really a user table
id int auto_increment primary key, -- if you don't have this, you are going to have problems
firstName varchar(150) not null,
lastName varchar(150) not null,
clientExtension int not null
);
insert tableB (firstName,lastName,clientExtension) values ('f1','l1',0),('f2','l2',0);
update tableB b
join tableA a
on a.id=b.id
set b.clientExtension=a.clientExtension;
select * from tableA;
(same as below)
select * from tableB;
+----+-----------+----------+-----------------+
| id | firstName | lastName | clientExtension |
+----+-----------+----------+-----------------+
| 1 | f1 | l1 | 777 |
| 2 | f2 | l2 | 888 |
+----+-----------+----------+-----------------+
The long and short of it is that if you join on names that change in one table and not another, you have problems. That is why you need a primary key that won't change (as opposed to when Bob becomes Robert again).
Also, if your tables are not user tables, then the PK of an int id is just as important. The id is used in other tables without de-normalized ideas of dragging firstName, lastName over as keys in those non-user Entity tables, if you will.
What do I mean by non-user Entity tables? Well I kinda just made that up, first phrase that came to my head. It is about data normalization, and concepts like 2nd and 3rd Normal Form. Let's say you have a paystub table. A row needs to be identified by PayeeId (that is your user id from above tables), and other info such as pay period, etc. A horrible way of identifying the Payee would be by first and last name.
Plan B: (I hold my nose at doing this, but here it is)
update tableB b
join tableA a
on a.firstName=b.firstName and a.lastName=b.lastName
set b.clientExtension=a.clientExtension;
-- 2 row(s) affected

don't repeat entry row from two different table

i created two database (php using XAMPP) one for employee (id, name) and another for administrator(id, name).
the id in the two tables are primary key, i need to build a relation between the two table where id don't repeat .for example :admin(1,a)uses id = 1 which should not be used in the employee table
please help
The normative approach to this problem is to use a single table. That makes it very easy to keep the id values distinct.
You can include a discriminator column that indicates whether a row represents an "employee" or an "administrator". In your example, there's two possible values.
CREATE TABLE employee
( id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT COMMENT 'pk'
, ename VARCHAR(50) NOT NULL
, admin TINYINT(1) UNSIGNED NOT NULL DEFAULT '0' COMMENT 'boolean'
)
Some example data, to illustrate:
id ename admin
--- ---------------- -------
42 Barney Rubble 0
43 Fred Flintstone 0
17 Mr. Slate 1
Sample queries:
-- select "employee" rows
SELECT id, ename FROM employee WHERE admin=0
-- select "administrator" rows
SELECT id, ename FROM employee WHERE admin
If you need two separate tables, that you asked about
Bottom line is that there is no declarative constraint available in MySQL that will enforce the id values between the two tables to be "distinct" from one another.
To do that, you would have to "roll your own" solution. And that solution is not trivial, it can be rather involved.
There are some solutions to simpler problems, automatically generating unique id values. But to actually enforce uniqueness, there is no simple way to do that.
Is your goal to just enforce a constraint, such that INSERT and UPDATE statements will throw an error if they attempt to violate the constraint, you are going to need to write triggers.

How to use an auto-incrementing integer primary key to combine multiple files?

How do you set up a valid auto-incrementing integer primary key on a table if you want to join it with separate files? I get data like this on a daily basis:
Interaction data:
Date | PersonID | DateTime | CustomerID | Other values...
The primary key there would be PersonID + DateTime + CustomerID. If I have an integer key, how can I get that to relate back to another table? I want to know the rows where a specific person interacted with a specific customer so I can tie back those pieces of data together into one master-file.
Survey return data:
Date | PersonID | DateTime | CustomerID | Other values...
I am normally processing all raw data first in pandas before loading it into a database. Some other files also do not have a datetime stamp and only have a date. It is rare for one person to interact with the same customer on the same day so I normally drop all rows where there are duplicates (all instances) so my sample of joins are just purely unique.
Other Data:
Date | PersonID | CustomerID | Other values...
I can't imagine how I can set it up so I know row 56,547 on 'Interaction Data' table matches with row 10,982 on 'Survey Return Data' table. Or should I keep doing it the way I am with a composite key of three columns?
(I'm assuming postgresql since you have tag-spammed this post; it's up to you to translate for other database systems).
It sounds like you're loading data with a complex natural key like (PersonID,DateTime,CustomerID) and you don't want to use the natural key in related tables, perhaps for storage space reasons.
If so, for your secondary tables you might want to CREATE UNLOGGED TABLE a table matching the original input data. COPY the data into that table. Then do an INSERT INTO ... SELECT ... into the final target table, joining on the table with the natural key mapping.
In your case, for example, you'd have table interaction:
CREATE TABLE interaction (
interaction_id serial primary key,
"PersonID" integer
"DateTime" timestamp,
"CustomerID" integer,
UNIQUE("PersonID", "DateTime", "CustomerID"),
...
);
and for table survey_return just a reference to interaction_id:
CREATE TABLE survey_return (
survey_return_id serial primary key,
interaction_id integer not null foreign key references interaction(interaction_id),
col1 integer, -- data cols
..
);
Now create:
CREATE UNLOGGED TABLE survey_return_load (
"PersonID" integer
"DateTime" timestamp,
"CustomerID" integer,
PRIMARY KEY ("PersonID","DateTime", "CustomerID")
col1 integer, -- data cols
...
);
and COPY your data into it, then do an INSERT INTO ... SELECT ... to join the loaded data against the interaction table and insert the result with the derived interaction_id instead of the original natural keys:
INSERT INTO survey_return
SELECT interaction_id, col1, ...
FROM survey_return_load l
LEFT JOIN interaction i ON ( (i."PersonID", i."DateTime", i."CustomerID") = (l."PersonID", l."DateTime", l."CustomerID") );
This will fail with a null violation if there are natural key tuples in the input survey returns that do not appear in the interaction table.
There are always many ways. Here might be one.
A potential customer (table: cust) walking into a car dealership and test driving 3 cars (table: car). An intersection/junction table between cust and car in table cust_car.
3 tables. Each with int autoinc.
Read this answer I wrote up for someone. Happy to work your tables if you need help.
SQL result table, match in second table SET type
That question had nothing to do with yours. But the solution is the same.

problem with auto_increment()

I have a table that have two fields.
table test
{
fname char(20),
id int not null auto_increment,
primary key(id)
}
now I add 3 records to the table like below:
insert into test(fname) values
('a'),('b'),('c');
and the table looks like
fname id
a 1
b 2
c 3
now I delete b from table so I have:
fname id
a 1
b 3
now again I insert a new record into the table
insert into test(fname) values('d);
and get:
fname id
a 1
b 3
d 4
but I want last record's id to be "2"
how can I do this?
An auto increment column would be used to identify your rows as unique if you have no other candidate for a primary key. If you are relying on their being no gaps in your sequence then you have trouble with the logic of how you are approching the problem, your queries should not rely on anything other than them being unique.
Also find a piece of MySQL Cookbook chapter that says the same
I don't think you can change that. This is houw auto_increment works in mysql.
You can't do that with autoincrement. It only keeps the id of the last inserted record and increments it when you insert. It doesn't keep track of delete operations.
Anyway, why do you want to do it?

Database design - primary key naming conventions

I am interested to know what people think about (AND WHY) the following 3 different conventions for naming database table primary keys in MySQL?
-Example 1-
Table name: User,
Primary key column name: user_id
-Example 2-
Table name: User,
Primary key column name: id
-Example 3-
Table name: User,
Primary key column name: pk_user_id
Just want to hear ideas and perhaps learn something in the process :)
Thanks.
I would go with option 2. To me, "id" itself seems sufficient enough.
Since the table is User so the column "id" within "user" indicates that it is the identification criteria for User.
However, i must add that naming conventions are all about consistency.
There is usually no right / wrong as long as there is a consistent pattern and it is applied across the application, thats probably the more important factor in how effective the naming conventions will be and how far they go towards making the application easier to understand and hence maintain.
I always prefer the option in example 1, in which the table name is (redundantly) used in the column name. This is because I prefer to see ON user.user_id = history.user_id than ON user.id = history.user_id in JOINs.
However, the weight of opinion on this issue generally seems to run against me here on Stackoverflow, where most people prefer example 2.
Incidentally, I prefer UserID to user_id as a column naming convention. I don't like typing underscores, and the use of the underscore as the common SQL single-character-match character can sometimes be a little confusing.
ID is the worst PK name you can have in my opinion. TablenameID works much better for reporting so you don't have to alias a bunch of columns named the same thing when doing complex reporting queries.
It is my personal belief that columns should only be named the same thing if they mean the same thing. The customer ID does not mean the same thing as the orderid and thus they should conceptually have different names. WHen you have many joins and a complex data structure, it is easier to maintain as well when the pk and fk have the same name. It is harder to spot an error in a join when you have ID columns. For instance suppose you joined to four tables all of which have an ID column. In the last join you accidentally used the alias for the first table and not the third one. If you used OrderID, CustomerID etc. instead of ID, you would get a syntax error because the first table doesn't contain that column. If you use ID it would happily join incorrectly.
I tend to go with the first option, user_id.
If you go with id, you usually end up with a need to alias excessively in your queries.
If you go with more_complicated_id, then you either must abbreviate, or you run out of room, and you get tired of typing such long column names.
2 cents.
I agree with #InSane and like just Id. And here's why:
If you have a table called User, and a column dealing with the user's name, do you call it UserName or just Name? The "User" seems redundant. If you have a table called Customer, and a column called Address, do you call the column CustomerAddress?
Though I have also seen where you would use UserId, and then if you have a table with a foreign key to User, the column would also be UserId. This allows for the consistency in naming, but IMO, doesn't buy you that much.
In response to Tomas' answer, there will still be ambiguity assuming that the PK for the comment table is also named id.
In response to the question, Example 1 gets my vote. [table name]_id would actually remove the ambiguity.
Instead of
SELECT u.id AS user_id, c.id AS comment_id FROM user u JOIN comment c ON u.id=c.user_id
I could simply write
SELECT user_id, comment_id FROM user u JOIN comment c ON u.user_id=c.user_id
There's nothing ambiguous about using the same ID name in both WHERE and ON. It actually adds clarity IMHO.
I've always appreciated Justinsomnia's take on database naming conventions. Give it a read: http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/
I would suggest example 2. That way there is no ambiguity between foreign keys and primary keys, as there is in example 1. You can do for instance
SELECT * FROM user, comment WHERE user.id = comment.user_id
which is clear and concise.
The third example is redundant in a design where all id's are used as primary keys.
OK so forget example 3 - it's just plain silly, so it's between 1 and 2.
the id for PK school of thought (2)
drop table if exists customer;
create table customer
(
id int unsigned not null auto_increment primary key, -- my names are id, cid, cusid, custid ????
name varchar(255) not null
)engine=innodb;
insert into customer (name) values ('cust1'),('cust2');
drop table if exists orders;
create table orders
(
id int unsigned not null auto_increment primary key, -- my names are id, oid, ordid
cid int unsigned not null -- hmmm what shall i call this ?
)engine=innodb;
insert into orders (cid) values (1),(2),(1),(1),(2);
-- so if i do a simple give me all of the customer orders query we get the following output
select
c.id,
o.id
from
customer c
inner join orders o on c.id = o.cid;
id id1 -- big fan of column names like id1, id2, id3 : they are sooo descriptive
== ===
1 1
2 2
1 3
1 4
2 5
-- so now i have to alias my columns like so:
select
c.id as cid, -- shall i call it cid or custid, customer_id whatever ??
o.id as oid
from
customer c
inner join orders o on c.id = o.cid; -- cid here but id in customer - where is my consistency ?
cid oid
== ===
1 1
2 2
1 3
1 4
2 5
the tablename_id prefix for PK/FK name school of thought (1)
(feel free to use an abbreviated form of tablename i.e cust_id instead of customer_id)
drop table if exists customer;
create table customer
(
cust_id int unsigned not null auto_increment primary key, -- pk
name varchar(255) not null
)engine=innodb;
insert into customer (name) values ('cust1'),('cust2');
drop table if exists orders;
create table orders
(
order_id int unsigned not null auto_increment primary key,
cust_id int unsigned not null
)engine=innodb;
insert into orders (cust_id) values (1),(2),(1),(1),(2);
select
c.cust_id,
o.order_id
from
customer c
inner join orders o on c.cust_id = o.cust_id; -- ahhhh, cust_id is cust_id is cust_id :)
cust_id order_id
======= ========
1 1
2 2
1 3
1 4
2 5
so you see the tablename_ prefix or abbreviated tablename_prefix method is ofc the most
consistent and easily the best convention.
I don't disagree with what most of the answers note - just be consistent. However, I just wanted to add that one benefit of the redundant approach with user_id allows for use of the USING syntactic sugar. If it weren't for this factor, I think I'd personally opt to avoid the redundancy.
For example,
SELECT *
FROM user
INNER JOIN subscription ON user.id = subscription.user_id
vs
SELECT *
FROM user
INNER JOIN subscription USING(user_id)
It's not a crazy significant difference, but I find it helpful.