Suppose I have a database with 3 tables of rooms: room_lvl1, room_lvl2, room_lvl3 which represent 3 levels in a building. Each level has 20++ rooms. So each table of room has 20++ rows that represent the rooms in that level.
I have a need to delete all attributes in the table except the primary key which is the room number or update everything to a null value except the primary key.
Is there any query I could possibly run?
You can set values to NULL by doing:
update room_lvl1
set col1 = NULL,
col2 = NULL,
. . .;
Where col1 etc are the non-primary key columns.
That seems like a really strange thing to want to do. You could empty the tables:
truncate table room_lvl1;
That would remove all rows.
You could save, truncate, and re-insert:
create table temp_room_lvl1 as select * from room_lvl1;
truncate table room_lvl1;
insert into room_lvl1(pk)
select pk from temp_room_lvl1;
Related
I have a table like this:
uuid | username | first_seen | last_seen | score
Before, the table used the primary key of a "player_id" column that ascended. I removed this player_id as I no longer needed it. I want to make the 'uuid' the primary key, but there's a lot of duplicates. I want to remove all these duplicates from the table, but keep the first one (based off the row number, the first row stays).
How can I do this? I've searched up everywhere, but they all show how to do it if you have a row ID column...
I highly advocate having auto-incremented integer primary keys. So, I would encourage you to go back. These are useful for several reasons, such as:
They tell you the insert order of rows.
They are more efficient for primary keys.
Because primary keys are clustered in MySQL, they always go at the end.
But, you don't have to follow that advice. My recommendation would be to insert the data into a new table and reload into your desired table:
create temporary table tt as
select t.*
from tt
group by tt.uuid;
truncate table t;
alter table t add constraint pk_uuid primary key (uuid);
insert into t
select * from tt;
Note: I am using a (mis)feature of MySQL that allows you to group by one column while pulling columns not in the group by. I don't like this extension, but you do not specify how to choose the particular row you want. This will give values for the other columns from matching rows. There are other ways to get one row per uuid.
I have a table with UID is the primary key. In the old system, it wasn't the primary key. So, people can insert data to that field, but we don't want to do it anymore.
In this table, I have a gap between UID 2000 and 2005 (2003 is taken). How do I get the list integers inside that gap?
UPDATED
I actually don't want the list of consecutive numbers between 2 uids. Assuming that I may have some UIDs existing between 2 numbers but I don't know that. I just want to get the list of available UIDs between 2 UIDs
I want this list to return:
MISSING
2001
2002
2004
See Generating a range of numbers in MySQL for how to create a table that lists all the numbers in a range. Then do:
set #start = 2000;
set #end = 2005;
SELECT n AS missing
FROM number_table AS nt
LEFT JOIN your_table AS t ON nt.n = t.uid
WHERE n BETWEEN #start AND #end
AND t.uid IS NULL
Summary of sections seen below.
Section 1 mimicks a table that has gaps in id like your question
Section 2 shows a fast way to put a 4 million row table together with incrementing pk.
completely not used for this but perhaps useful.
if it seems left like a half-thought it is because it was not useful entirely
Section 3 creates a table inspired by section 2 to leave you with a table of
it is just a worktable where ordering in important, both to insert into it and processing it
The new id to use (the pk)
Your current id (the one that is gap-prone)
and a column that says whether or not has been processed so you can do them in batches
section 3 is where the action is
xxxxxxxxxxxxxxxxxxxxxxxxxx
Section 1:
create table tbl1
( // this mimicks your current table. naturally you have one already
id bigint not null auto_increment primary key,
thing varchar(100) -- whatever columns you have
)engine=MyISAM;
insert tbl1(thing) values('a'),('a'),('b');
show table status from test; -- auto_increment=4, max is 3
alter table tbl1 auto_increment=2000;
insert tbl1(thing) values('a'),('a'),('b');
alter table tbl1 auto_increment=100000; -- 100k
insert tbl1(thing) values('a'),('a'),('b');
alter table tbl1 auto_increment=110000; -- 110k
insert tbl1(thing) values('a'),('a'),('b');
alter table tbl1 auto_increment=2000000; -- 2m
insert tbl1(thing) values('a'),('a'),('b');
show table status from test; -- auto_increment=2000003, max is 2000002
select count(*) from tbl1 -- 15 rows
xxxxxxxxxxxxxxxxxxxxxxxxxx
Section 2:
create table idFix
( colIs bigint auto_increment primary key, -- Is your Key
colShouldBe bigint null -- Should be your new Key
processedYet tinyint null -- 1 if processed
)engine=myisam;
insert into idFix(colIs) values(null); -- prime it with 1 row
-- this is pretty fast, don't laugh
-- run the following line 22 times
insert into idFix(colIs) select null from idFix;
-- you now have 4.2m rows in tbl2 (4,194,304)
select count(*) from idFix
select count(*) from idFix
where colIs not in (select id from tbl1)
-- 4,194,289
xxxxxxxxxxxxxxxxxxxxxxxxxx
Section 3:
Backup data first. Then perform tests in a scratch database of the following
create table idFix2
( yourIdShouldBe bigint auto_increment primary key, -- This becomes what your new key should be
yourIdIs bigint null, -- This is what your gap-prone id is right now
processedYet tinyint null -- 1 if processed, null otherwise
)engine=myisam;
-- the order by is important
insert into idFix2(yourIdIs,processedYet)
select id,null from tbl1 order by id
-- again order by in above stmt is important
Now you have a table that what your key should be, what your key is, and processedYet is null.
Do them in batches in a stored proc or front end code (say java/c#, whatever)
It is important to do them top to bottom. any other way will screw up your data
Did i mention it is important to do it top to bottom?
I will leave my thought out of it about getting everyone out of system and requiring a table lock
only you know your system not us.
select *
from idFix2
where processedYet is null and yourIdShouldBe<>yourIdIs
order by yourIdShouldBe -- order is important
limit 2 -- you might want to choose a bigger number :>
Did i mention it is important to do it top to bottom ??
Here is the flow using result set from above select stmt
(a) get next row in result set
(b) insert new parent record using data from tbl1 data back into tbl1 using
data from row yourIdIs but the insert will be pk=yourIdShouldBe
The insert will guarantee you won't have foreign key constraints in children tweaked below
(c) update children that use the old yourIdIs to hang under the new yourIdShouldBe
in their tables (there can be scads of these tables). the children's foreign key constraints
will be honored because the new parent row is in place already from step(b)
(d) delete the parent row from tbl1 where pk is yourIdIs. fear not that this will cause even more
gaps because those will be filled based on looping thru (a) which will slot fill them
(e) update idFix2 set processedYet=1 for the row your are processing from step (a) result set
(f) GoTo (a)
When you have no more processedYet=null you are almost done
Set new auto_increment value to what it should be (1 more than max(id) in tbl1, let's call that number nnnn)
alter table tbl1 auto_increment=nnnn;
xxxxxxxxxxxxxxxxxxxxxxxxxx
Note the following
show table status from test where name like 'tbl1%'; -- auto_increment=2000003
I have nothing in slot4, 2000 will become slot 4
insert tbl1(id,thing) values(4,'stuff from record2000 you get the drift');
show table status from test where name like 'tbl1%'; -- auto_increment=2000003 is left as is
So you are free to fill the gaps without screwing with auto_increment until the end
There it is and your gaps go away. If it fails, consider taking a vacation day.
Oh, I forgot, you were testing this first in a scratch database anyway.
Good luck!
How do you set up a valid auto-incrementing integer primary key on a table if you want to join it with separate files? I get data like this on a daily basis:
Interaction data:
Date | PersonID | DateTime | CustomerID | Other values...
The primary key there would be PersonID + DateTime + CustomerID. If I have an integer key, how can I get that to relate back to another table? I want to know the rows where a specific person interacted with a specific customer so I can tie back those pieces of data together into one master-file.
Survey return data:
Date | PersonID | DateTime | CustomerID | Other values...
I am normally processing all raw data first in pandas before loading it into a database. Some other files also do not have a datetime stamp and only have a date. It is rare for one person to interact with the same customer on the same day so I normally drop all rows where there are duplicates (all instances) so my sample of joins are just purely unique.
Other Data:
Date | PersonID | CustomerID | Other values...
I can't imagine how I can set it up so I know row 56,547 on 'Interaction Data' table matches with row 10,982 on 'Survey Return Data' table. Or should I keep doing it the way I am with a composite key of three columns?
(I'm assuming postgresql since you have tag-spammed this post; it's up to you to translate for other database systems).
It sounds like you're loading data with a complex natural key like (PersonID,DateTime,CustomerID) and you don't want to use the natural key in related tables, perhaps for storage space reasons.
If so, for your secondary tables you might want to CREATE UNLOGGED TABLE a table matching the original input data. COPY the data into that table. Then do an INSERT INTO ... SELECT ... into the final target table, joining on the table with the natural key mapping.
In your case, for example, you'd have table interaction:
CREATE TABLE interaction (
interaction_id serial primary key,
"PersonID" integer
"DateTime" timestamp,
"CustomerID" integer,
UNIQUE("PersonID", "DateTime", "CustomerID"),
...
);
and for table survey_return just a reference to interaction_id:
CREATE TABLE survey_return (
survey_return_id serial primary key,
interaction_id integer not null foreign key references interaction(interaction_id),
col1 integer, -- data cols
..
);
Now create:
CREATE UNLOGGED TABLE survey_return_load (
"PersonID" integer
"DateTime" timestamp,
"CustomerID" integer,
PRIMARY KEY ("PersonID","DateTime", "CustomerID")
col1 integer, -- data cols
...
);
and COPY your data into it, then do an INSERT INTO ... SELECT ... to join the loaded data against the interaction table and insert the result with the derived interaction_id instead of the original natural keys:
INSERT INTO survey_return
SELECT interaction_id, col1, ...
FROM survey_return_load l
LEFT JOIN interaction i ON ( (i."PersonID", i."DateTime", i."CustomerID") = (l."PersonID", l."DateTime", l."CustomerID") );
This will fail with a null violation if there are natural key tuples in the input survey returns that do not appear in the interaction table.
There are always many ways. Here might be one.
A potential customer (table: cust) walking into a car dealership and test driving 3 cars (table: car). An intersection/junction table between cust and car in table cust_car.
3 tables. Each with int autoinc.
Read this answer I wrote up for someone. Happy to work your tables if you need help.
SQL result table, match in second table SET type
That question had nothing to do with yours. But the solution is the same.
I have a question regarding the design of two tables.
Table 1: The main table, called Batch. Values are added here from parsing files.
Table 2: This table works like a log table, every row that is deleted from table 1 goes here.
Example
Table 1
ID text
1 'bla1'
2 'bla2'
3 'bla3'
Delete row where id is 2 and 3
Table 2
ID text
2 'bla2'
3 'bla3'
Problem:
What if I insert ID 2 and 3 again in table 1 and deletes it? Table 2 would have same data. How can I fix this? Should I just make ID also identity column? So when I add 2 records it would be this (additional question how do I keep counting if I delete the whole table 1?):
Table 1
ID
4 'Bla3'
5 'Bla4'
Just have a unique identifier for Table 1. This identifier should be unique to this table, not for the data you load. You can then load id 100 from your source file as many times as you want, they should get a unique identifier in the Table 1.
An Identity Column seems to fit your requirements for this. I'd look into more audit data as well, perhaps store what file it came from, when it was loaded, who loaded it, etc.
As for filling the log table, you can just attach a trigger on your Table 1 that fills Table 2 with deleted rows, should be pretty straight forward.
It seems that in your design Table 1 uses a surrogate key. In this case you should define also a natural key for your purpose. Then Table 2 will contains natural key and values of Table 1 erased data.
Because you can erase some data for several times, you should add a timestamp field you your table 2.
create table table1 (
id int identity primary key,
[text] varchar(50) not null unique,
... other data ...
)
create table table2 (
[text] varchar(50) not null,
erased datetime not null,
... other data ...
constraint table2_pk
primary key ( [text], erased )
)
I am trying to run a query:
INSERT
INTO `ProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
FROM `tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
ON DUPLICATE KEY UPDATE
`ChangedOn` = VALUES(`ChangedOn`)
(I am not quite sure the query is correct, but it appears to be working), however I am running into the following issue. I am running this query before creating the entry into the 'Products' table and am getting a foreign key constraint problem due to the fact that the entry is not in the Products table yet.
My question is, is there a way to run this query, but wait until the next query (which updates the Product table) before performing the insert portion of the query above? Also to note, if the query is run after the Product entry is created it will no longer see the p.Id as being null and therefore failing so it has to be performed before the Product entry is created.
---> Edit <---
The concept I am trying to achieve is as follows:
For starters I am importing a set of data into a temp table, the Product table is a list of all products that are (or have been in the past) added through the set of data from the temp table. What I need is a separate table that provides a state change to the product as sometimes the product will become unavailable (no longer in the data set provided by the vendor).
The ProductState table is as follows:
CREATE TABLE IF NOT EXISTS `ProductState` (
`ProductId` VARCHAR(32) NOT NULL ,
`ChangedOn` DATE NOT NULL ,
`State` ENUM('Activated','Deactivated') NULL ,
PRIMARY KEY (`ProductId`, `ChangedOn`) ,
INDEX `fk_ProductState_Product` (`ProductId` ASC) ,
CONSTRAINT `fk_ProductState_Product`
FOREIGN KEY (`ProductId` )
REFERENCES `Product` (`Id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci;
The foreign key is an identifying relationship with the Product table (Product.Id)
Essentially what I am trying to accomplish is this:
1. Anytime a new product (or previously deactivated product) shows up in the vendor data set, the record is created in the ProductState table as 'Activated'.
2. Anytime a product (that is activated), does not show up in the vendor data set, the record is created as 'Deactivated' in the ProductState table.
The purpose of the ProductState table is to track activation and deactivation states of a product. Also the ProductState is a Multi-To-One relationship with the Product Table, and the state of the product will only change once daily, therefore my PKEY would be ProductId and ChangedDate.
With foreign keys, you definitely need to have the data on the Product table first, before entering the state, think about it with this logic: "How can something that dont exist have a state" ?
So pseudocode of what you should do:
Read in the vendor's product list
Compare them to the existing list in your Product table
If new ones found: 3.1 Insert it to Product table, 3.2 Insert it to ProductState table
If missing from vendor's list: 4.1 Insert it to ProductState table
All these should be done in 1 transaction. Note that you should NOT delete things from Product table, unless you really want to delete every information associated with it, ie. also delete all the "states" that you have stored.
Rather than trying to do this all in 1 query - best bet is to create a stored procedure that does the work as step-by-step above. I think it gets overly complicated (or in this case, probably impossible) to do all in 1 query.
Edit: Something like this:
CREATE PROCEDURE `some_procedure_name` ()
BEGIN
-- Breakdown the tmpImport table to 2 tables: new and removed
SELECT * INTO _temp_new_products
FROM`tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
SELECT * INTO _temp_removed_products
FROM `Product` p
LEFT JOIN `tmpImport` t
ON t.`ProductId` = p.`Id`
WHERE t.`ProductId` IS NULL
-- For each entry in _temp_new_products:
-- 1. Insert into Product table
-- 2. Insert into ProductState table 'activated'
-- For each entry in _temp_removed_products:
-- 1. Insert into ProductState table 'deactivated'
-- drop the temporary tables
DROP TABLE _temp_new_products
DROP TABLE _temp_removed_products
END
I think you should:
start a transaction
do your insert into the Products table
do your insert into the ProductState table
commit the transaction
This will avoid any foreign key errors, but will also make sure your data is always accurate. You do not want to 'avoid' the foreign key constraint in any way, and InnoDB (which I'm sure you are using) never defers these constraints unless you turn them off completely.
Also no you cannot insert into multiple tables in one INSERT ... SELECT statement.