Fastest way to modify a decimal-keyed table in MySQL? - mysql

I am dealing with a MySQL table here that is keyed in a somewhat unfortunate way. Instead of using an auto increment table as a key, it uses a column of decimals to preserve order (presumably so its not too difficult to insert new rows while preserving a primary key and order).
Before I go through and redo this table to something more sane, I need to figure out how to rekey it without breaking everything.
What I would like to do is something that takes a list of doubles (the current keys) and outputs a list of integers (which can be cast down to doubles for rekeying).
For example, input {1.00, 2.00, 2.50, 2.60, 3.00} would give output {1, 2, 3, 4, 5).
Since this is a database, I also need to be able to update the rows nicely:
UPDATE table SET `key`='3.00' WHERE `key`='2.50';
Can anyone think of a speedy algorithm to do this? My current thought is to read all of the doubles into a vector, take the size of the vector, and output a new vector with values from 1 => doubleVector.size. This seems pretty slow, since you wouldn't want to read every value into the vector if, for instance, only the last n/100 elements needed to be modified.
I think there is probably something I can do in place, since only values after the first non-integer double need to be modified, but I can't for the life of me figure anything out that would let me update in place as well. For instance, setting 2.60 to 3.00 the first time you see 2.50 in the original key list would result in an error, since the key value 3.00 is already used for the table.
Edit: I guess what this really abstracts to is this:
I need a way to convert an ordered map keyed with doubles into an ordered map keyed with integers, where at no point does there ever exist two values for one key (which is a violation of a map anyway).

I'm assuming you'll be able to take the database down at some point to make this conversion.
Note: I am NOT a MySQL user. My DB of choice is PostgreSQL, so there MAY BE SYNTAX ERRORS here between how MySQL does it and Pg does it. But this should give you a good idea.
First, make a keymap table that maps old keys to new:
create table keymap (
oldkey decimal,
newkey integer autoincrement
)
Make sure you index keymap because we're going to be lookups aplenty on it.
create unique index keymap_oldkey on keymap(oldkey);
Then fill it with old keys and let MySQL create the new ones:
insert into keymap
select distinct `key` from fribbles order by `key`
Now you'll have keymap with all the old keys, and because you haven't specified a new key, you'll have the autoincrement on the newkey column will populate, and your table will look like.
oldkey newkey
----------------
1.5 1
1.6 2
1.93 3
3.1 4
Now, add a newkey column to your tables that need it
alter table fribbles add column newkey integer
Don't make it autoincrement, because otherwise it will get populated at alter time, and we don't need that.
Now, finally, update the fribbles table:
update fribbles f
set newkey = ( select newkey from keymap m where m.oldkey = f.`key` )
Finally, now that you have newkey populated, you can drop the old one.
alter table fribbles drop column `key`;
alter table fribbles alter column newkey rename to `key`;
I hope that gives you a decent plan of attack.

I would just add an int column (that allows NULL values) to the table, then do a cursor- or code-based run where I sort by the original whacked-out double PK column and then iterate through the records writing an incremented value into the new int column. Then update the table by changing the PK to the new int column and deleting the old PK.
"Here", as they say in France.

Related

Get the last primary key from table with no auto increment

I have a table that has a primary key and for some reason i was advised not to use the AUTO_INCREMENT flag.
So every time i have to insert a new entry i search for the last value inserted (highest value) of the primary key.
Then i increment it by 1 to get a new ID.
Now the problem i face is, when inserting the first entry, there is no data in the table.
Can anyone suggest the optimal way to check
if data exists in table,
if not set id as 1 and insert new row,
else get the last id from table, increment it by 1 and then insert new row.
PS: New to mysql so having difficulty with its syntax.
Based on your statement 'i search for the last value inserted (highest value) of the primary key', I presume that you're currently doing something like this to get the maximum existing ID:
SELECT MAX(id_column) + 1 FROM my_table
If you have an empty table, this will of course return NULL. In that case, just handle the NULL using IFNULL to return 0 if there is no maximum value:
SELECT IFNULL(MAX(id_column), 0) + 1 FROM my_table
This will output 1 as the next identifier if the table has no rows.
I've given this more thought, and it turns out there is a way to generate a unique primary key without using Auto Increment or worrying about race conditions, so long as you are willing and able to use a 36 byte primary key (or, alternatively, a 128 bit binary).
The solution (at least as of MySQL 5.5) is the UUID, which stands for Universal Unique Identifier.
You would use it thus:
CREATE TABLE uu_table (
id VARCHAR(36) PRIMARY KEY,
name VARCHAR(50),
{other interesting columns}
)
Then insert new rows thus:
INSERT INTO uu_table VALUES (UUID(), 'Name of this Row', {other interesting values});
The UUID() function is guaranteed to generate a unique key 99.99{bunch more 9's}% of the time, even if generated on independent systems. That's its whole purpose, to be as unique as snowflake patterns, no matter where it is created.
There are pros and cons to this method. Best to read up on it here: https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_uuid
It is basically a 128 bit number, which you can save as a binary value after conversion from the 36 characters. I believe some versions of MySQL come with functions for that purpose. That would use less space in your database than 36 bytes, but I'll leave that as an exercise for the reader.

MySql Indexing part of a column

I need to search a medium sized MySql table (about 15 million records).
My query searches for a value ending with another value, for example:
SELECT * FROM {tableName} WHERE {column} LIKE '%{value}'
{value} is always 7 characters length.
{column} is sometimes 8 characters length (otherwise it is 7).
Is there a way to improve performence on my search?
clearly index is not an option.
I could save {column} values in reverse order on another column and index that column, but im looking to avoid this solution.
{value} is always 7 characters length
Your data is not mormalized. Fixing this is the way to fix the problem. Anything else is a hack. Having said that I accept it is not always proactical to repair damage done in the past by dummies.
However the most appropriate hack depends on a whole lot of information you've not told us about.
how frequently you will run the query
what the format of the composite data is
but im looking to avoid this solution.
Why? It's a reasonable way to address the problem. The only downside is that you need to maintain the new attribute - given that this data domain appears in different attributes in multiple (another normalization violation) means it would make more sense to implement the index in a seperate, EAV relation but you just need to add triggers on the original table to maintain sync using your existing code base. Every solution I can think will likely require a similar fix.
Here's a simplified example (no multiple attributes) to get you started:
CREATE TABLE lookup (
table_name VARCHAR(18) NOT NULL,
record_id INT NOT NULL, /* or whatever */
suffix VARCHAR(7),
PRIMARY KEY (table_name, record_id),
INDEX (suffix, table_name, record_id)
);
CREATE TRIGGER insert_suffix AFTER INSERT ON yourtable
FOR EACH ROW
REPLACE INTO lookup (table_name, record_id, suffix)
VALUES ('yourtable', NEW.id
, SUBSTR(NEW.attribute, NEW.id, RIGHT(NEW.attribute, 7
);
CREATE TRIGGER insert_suffix AFTER UPDATE ON yourtable
FOR EACH ROW
REPLACE INTO lookup (table_name, record_id, suffix)
VALUES ('yourtable', NEW.id
, RIGHT(NEW.attribute, 7)
);
CREATE TRIGGER insert_suffix AFTER DELETE ON yourtable
FOR EACH ROW
DELETE FROM lookup WHERE table_name='yourtable' AND record_id=OLD.id
;
If you have a set number of options for the first character, then you can use in. For instance:
where column in ('{value}', '0{value}', '1{value}', . . . )
This allows MySQL to use an index on the column.
Unfortunately, with a wildcard at the beginning of the pattern, it is hard to use an index. Is it possible to store the first character in another column?

mySQL valid bit - alternatives?

Currently, I have a mySQL table with columns that looks something like this:
run_date DATE
name VARCHAR(10)
load INTEGER
sys_time TIME
rec_time TIME
valid TINYINT
The column valid is essentially a valid bit, 1 if this row is the latest value for this (run_date,name) pair, and 0 if not. To make insertions simpler, I wrote a stored procedure that first runs an UPDATE table_name SET valid = 0 WHERE run_date = X AND name = Y command, then inserts the new row.
The table reads are in such a way that I usually use only the valid = 1 rows, but I can't discard the invalid rows. Obviously, this schema also has no primary key.
Is there a better way to structure this data or the valid bit, so that I can speed up both inserts and searches? A bunch of indexes on different orders of columns gets large.
In all of the suggestions below, get rid of valid and the UPDATE of it. That is not scalable.
Plan A: At SELECT time, use 'groupwise max' code to locate the latest run_date, hence the "valid" entry.
Plan B: Have two tables and change both when inserting: history, with PRIMARY KEY(name, run_date) and a simple INSERT statement; current, with PRIMARY KEY(name) and INSERT ... ON DUPLICATE KEY UPDATE. The "usual" SELECTs need only touch current.
Another issue: TIME is limited to 838:59:59 and is intended to mean 'time of day', not 'elapsed time'. For the latter, use INT UNSIGNED (or some variant of INT). For formatting, you can use sec_to_time(). For example sec_to_time(3601) -> 01:00:05.

Replace Into Query Syntax

I want to be able to update a table of the same schema using a "replace into" statement. In the end, I need to be able to update a large table with values that may have changed.
Here is the query I am using to start off:
REPLACE INTO table_name
(visual, inspection_status, inspector_name, gelpak_name, gelpak_location),
VALUES (3, 'Partially Inspected', 'Me', 'GP1234', 'A01');
What I don't understand is how does the database engine know what is a duplicate row and what isn't? This data is extremely important and I can't risk the data being corrupted. Is it as simple as "if all columns listed have the same value, it is a duplicate row"?
I am just trying to figure out an efficient way of doing this so I can update > 45,000 rows in under a minute.
As the documentation says:
REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted.
REPLACE does work much like an INSERT that just overwrites records that have the same PRIMARY KEY or UNIQUE index, however, beware.
Shlomi Noach writes about the problem with using REPLACE INTO here:
But weak hearted people as myself should be aware of the following: it is a heavyweight solution. It may be just what you were looking for in terms of ease of use, but the fact is that on duplicate keys, a DELETE and INSERT are performed, and this calls for a closer look.
Whenever a row is deleted, all indexes need to be updated, and most importantly the PRIMARY KEY. When a new row is inserted, the same happens. Especially on InnoDB tables (because of their clustered nature), this means much overhead. The restructuring of an index is an expensive operation. Index nodes may need to be merged upon DELETE. Nodes may need to be split due to INSERT. After many REPLACE INTO executions, it is most probable that your index is more fragmented than it would have been, had you used SELECT/UPDATE or INSERT INTO ... ON DUPLICATE KEY
Also, there's the notion of "well, if the row isn't there, we create it. If it's there, it simply get's updated". This is false. The row doesn't just get updated, it is completely removed. The problem is, if there's a PRIMARY KEY on that table, and the REPLACE INTO does not specify a value for the PRIMARY KEY (for example, it's an AUTO_INCREMENT column), the new row gets a different value, and this may not be what you were looking for in terms of behavior.
Many uses of REPLACE INTO have no intention of changing PRIMARY KEY (or other UNIQUE KEY) values. In that case, it's better left alone. On a production system I've seen, changing REPLACE INTO to INSERT INTO ... ON DPLICATE KEY resulted in a ten fold more throughput (measured in queries per second) and a drastic decrease in IO operations and in load average.
In summary, REPLACE INTO may be right for your implementation, but you might find it more appropriate (and less risky) to use INSERT ... ON DUPLICATE KEY UPDATE instead.
or something like that:
insert ignore tbl1 (select * from tbl2);
UPDATE
`tbl1` AS `dest`,
(SELECT * FROM tbl2) AS `src`
SET
dest.field=src.field,
dest.field=if (length(src.field)>0,src.field,dest.field) /* or anything like that*/
WHERE
`dest`.id = `src`.id;
CREATE TEMPORARY TABLE test
(prim INT PRIMARY KEY
,sec INT NOT NULL UNIQUE
,tert INT UNIQUE
,com VARCHAR(255)
);
INSERT INTO test (prim,sec,tert,com)
VALUES (1,2,3,'123')
,(2,3,null,'23n')
,(3,1,null,'31n');
REPLACE INTO test(prim,sec,tert,com)
VALUES (3,3,3,'333');
SELECT *
FROM test;
DROP TEMPORARY TABLE test;
fun times

How to let data "disappear" from database? MySQL

I've got a bit of a stupid question. The thing is my program has to have the function to delete data from my database. Yay, not really the problem. But how can I delete data without the danger that others can see, that there has been something deleted.
User Table:
U_ID U_NAME
1 Chris
2 Peter
OTHER TABLE
ID TIMESTAMP FK_U_D
1 2012-12-01 1
2 2012-12-02 1
Sooooo the ID's are AUTO_INCREMENT, so if I delete one of them there's a gap. Furthermore, the timestamp is also bigger than the row before, so ascending.
I want to let the data with ID 1 disappear from the user's profile (U_ID 1).
If I delete it, there is a gap. If I just change the FK_U_ID to 2 (Peter) it's obvious, because when I insert data, there are 20 or 30 data rows with the same U_ID...so it's obvious that there has been a modification.
If I set the FK_U_ID NULL --> same sh** like when I change it to another U_ID.
Is there any solution to get this work? I know that if nobody but me has access to the database, it's just no problem. But just in case, if somebody controls my program it should not be obvious that there has been modifications.
So here we go.
For the ID gaps issue you can use GUIDs as #SLaks suggests, but then you can't use the native RDBMS auto_increment which means you have to create the GUID and insert it along with the rest of the record data upon creation. Of course, you don't really need the ID to be globally unique, you could just store a random string of 20 characters or something, but then you have to do a DB read to see if that ID is taken and repeat (recursively) that process until you find an unused ID... could be quite taxing.
It's not at all clear why you would want to "hide" evidence that a delete was performed. That sounds like a really bad idea. I'm not a fan of promulgating misinformation.
Two of the characteristics of an ideal primary key are:
- anonymous (be void of any useful information, doesn't matter what it's set to)
- immutable (once assigned, it will never be changed.)
But, if we set that whole discussion aside...
I can answer a slightly different question (an answer you might find helpful to your particular situation)
The only way to eliminate a "gap" in the values in a column with an AUTO_INCREMENT would be to change the column values from their current values to a contiguous sequence of new values. If there are any foreign keys that reference that column, the values in those columns would need to be updated as well, to preserve the relationship. That will likely leave the current auto_increment value of the table higher than the largest value of the id column, so I'd want to reset that as well, to avoid a "gap" on the next insert.
(I have done re-sequencing of auto_increment values in development and test environments, to "cleanup" lookup tables, and to move the id values of some tables to ranges that are distinct from ranges in other tables... that let's me test SQL to make sure the SQL join predicates aren't inadvertently referencing the wrong table, and returning rows that look correct by accident... those are some reasons I've done reassignment if auto_increment values)
Note that the database can "automagically" update foreign key values (for InnnoDB tables) when you change the primary key value, as long as the foreign key constraint is defined with ON UPDATE CASCADE, and FOREIGN_KEY_CHECKS is not disabled.
If there are no foreign keys to deal with, and assuming that all of the current values of id are positive integers, then I've been able to do something like this: (with appropriate backups in place, so I can recover if things don't work right)
UPDATE mytable t
JOIN (
SELECT s.id AS old_id
, #i := #i + 1 AS new_id
FROM mytable s
CROSS
JOIN (SELECT #i := 0) i
ORDER BY s.id
) c
ON t.id = c.old_id
SET t.id = c.new_id
WHERE t.id <> c.new_id
To reset the table AUTO_INCREMENT back down to the largest id value in the table:
ALTER TABLE mytable AUTO_INCREMENT = 1;
Typically, I will create a table and populate it from that query in the inline view (aliased as c) above. I can then use that table to update both foreign key columns and the primary key column, first disabling the FOREIGN_KEY_CHECKS and then re-enabling it. (In a concurrent environment, where other processes might be inserting/updating/deleting rows from one of the tables, I would of course first obtain an exclusive lock on all of the tables to be updated.)
Taking up again, the discussion I set aside earlier... this type of "administrative" function can be useful in a test environment, when setting up test cases. But it is NOT a function that is ever performed in a production environment, with live data.