Currently, I have a mySQL table with columns that looks something like this:
run_date DATE
name VARCHAR(10)
load INTEGER
sys_time TIME
rec_time TIME
valid TINYINT
The column valid is essentially a valid bit, 1 if this row is the latest value for this (run_date,name) pair, and 0 if not. To make insertions simpler, I wrote a stored procedure that first runs an UPDATE table_name SET valid = 0 WHERE run_date = X AND name = Y command, then inserts the new row.
The table reads are in such a way that I usually use only the valid = 1 rows, but I can't discard the invalid rows. Obviously, this schema also has no primary key.
Is there a better way to structure this data or the valid bit, so that I can speed up both inserts and searches? A bunch of indexes on different orders of columns gets large.
In all of the suggestions below, get rid of valid and the UPDATE of it. That is not scalable.
Plan A: At SELECT time, use 'groupwise max' code to locate the latest run_date, hence the "valid" entry.
Plan B: Have two tables and change both when inserting: history, with PRIMARY KEY(name, run_date) and a simple INSERT statement; current, with PRIMARY KEY(name) and INSERT ... ON DUPLICATE KEY UPDATE. The "usual" SELECTs need only touch current.
Another issue: TIME is limited to 838:59:59 and is intended to mean 'time of day', not 'elapsed time'. For the latter, use INT UNSIGNED (or some variant of INT). For formatting, you can use sec_to_time(). For example sec_to_time(3601) -> 01:00:05.
Related
I have a table that has a primary key and for some reason i was advised not to use the AUTO_INCREMENT flag.
So every time i have to insert a new entry i search for the last value inserted (highest value) of the primary key.
Then i increment it by 1 to get a new ID.
Now the problem i face is, when inserting the first entry, there is no data in the table.
Can anyone suggest the optimal way to check
if data exists in table,
if not set id as 1 and insert new row,
else get the last id from table, increment it by 1 and then insert new row.
PS: New to mysql so having difficulty with its syntax.
Based on your statement 'i search for the last value inserted (highest value) of the primary key', I presume that you're currently doing something like this to get the maximum existing ID:
SELECT MAX(id_column) + 1 FROM my_table
If you have an empty table, this will of course return NULL. In that case, just handle the NULL using IFNULL to return 0 if there is no maximum value:
SELECT IFNULL(MAX(id_column), 0) + 1 FROM my_table
This will output 1 as the next identifier if the table has no rows.
I've given this more thought, and it turns out there is a way to generate a unique primary key without using Auto Increment or worrying about race conditions, so long as you are willing and able to use a 36 byte primary key (or, alternatively, a 128 bit binary).
The solution (at least as of MySQL 5.5) is the UUID, which stands for Universal Unique Identifier.
You would use it thus:
CREATE TABLE uu_table (
id VARCHAR(36) PRIMARY KEY,
name VARCHAR(50),
{other interesting columns}
)
Then insert new rows thus:
INSERT INTO uu_table VALUES (UUID(), 'Name of this Row', {other interesting values});
The UUID() function is guaranteed to generate a unique key 99.99{bunch more 9's}% of the time, even if generated on independent systems. That's its whole purpose, to be as unique as snowflake patterns, no matter where it is created.
There are pros and cons to this method. Best to read up on it here: https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_uuid
It is basically a 128 bit number, which you can save as a binary value after conversion from the 36 characters. I believe some versions of MySQL come with functions for that purpose. That would use less space in your database than 36 bytes, but I'll leave that as an exercise for the reader.
I need to search a medium sized MySql table (about 15 million records).
My query searches for a value ending with another value, for example:
SELECT * FROM {tableName} WHERE {column} LIKE '%{value}'
{value} is always 7 characters length.
{column} is sometimes 8 characters length (otherwise it is 7).
Is there a way to improve performence on my search?
clearly index is not an option.
I could save {column} values in reverse order on another column and index that column, but im looking to avoid this solution.
{value} is always 7 characters length
Your data is not mormalized. Fixing this is the way to fix the problem. Anything else is a hack. Having said that I accept it is not always proactical to repair damage done in the past by dummies.
However the most appropriate hack depends on a whole lot of information you've not told us about.
how frequently you will run the query
what the format of the composite data is
but im looking to avoid this solution.
Why? It's a reasonable way to address the problem. The only downside is that you need to maintain the new attribute - given that this data domain appears in different attributes in multiple (another normalization violation) means it would make more sense to implement the index in a seperate, EAV relation but you just need to add triggers on the original table to maintain sync using your existing code base. Every solution I can think will likely require a similar fix.
Here's a simplified example (no multiple attributes) to get you started:
CREATE TABLE lookup (
table_name VARCHAR(18) NOT NULL,
record_id INT NOT NULL, /* or whatever */
suffix VARCHAR(7),
PRIMARY KEY (table_name, record_id),
INDEX (suffix, table_name, record_id)
);
CREATE TRIGGER insert_suffix AFTER INSERT ON yourtable
FOR EACH ROW
REPLACE INTO lookup (table_name, record_id, suffix)
VALUES ('yourtable', NEW.id
, SUBSTR(NEW.attribute, NEW.id, RIGHT(NEW.attribute, 7
);
CREATE TRIGGER insert_suffix AFTER UPDATE ON yourtable
FOR EACH ROW
REPLACE INTO lookup (table_name, record_id, suffix)
VALUES ('yourtable', NEW.id
, RIGHT(NEW.attribute, 7)
);
CREATE TRIGGER insert_suffix AFTER DELETE ON yourtable
FOR EACH ROW
DELETE FROM lookup WHERE table_name='yourtable' AND record_id=OLD.id
;
If you have a set number of options for the first character, then you can use in. For instance:
where column in ('{value}', '0{value}', '1{value}', . . . )
This allows MySQL to use an index on the column.
Unfortunately, with a wildcard at the beginning of the pattern, it is hard to use an index. Is it possible to store the first character in another column?
I had a table with 3 columns and 3600K rows. Using MySQL as a key-value store.
The first column id was VARCHAR(8) and set to primary key.The 2nd and 3rd columns were MEDIUMTEXT. When calling SELECT * FROM table WHERE id=00000 MySQL took like 54 sec ~ 3 minutes.
For testing I created a table containing VARCHAR(8)-VARCHAR(5)-VARCHAR(5) where data casually generated from numpy.random.randint. SELECT takes 3 sec without primary key. Same random data with VARCHAR(8)-MEDIUMTEXT-MEDIUMTEXT, the time cost by SELECT was 15 sec without primary key.(note: in second test, 2nd and 3rd column actually contained very short text like '65535', but created as MEDIUMTEXT)
My question is: how can I achieve similar performance on my real data? (or, is it impossible?)
If you use
SELECT * FROM `table` WHERE id=00000
instead of
SELECT * FROM `table` WHERE id='00000'
you are looking for all strings that are equal to an integer 0, so MySQL will have to check all rows, because '0', '0000' and even ' 0' will all be casted to integer 0. So your primary key on id will not help and you will end up with a slow full table. Even if you don't store values that way, MySQL doesn't know that.
The best option is, as all comments and answers pointed out, to change the datatype to int:
alter table `table` modify id int;
This will only work if your ids casted as integer are unique (so you don't have e.g. '0' and '00' in your table).
If you have any foreign keys that references id, you have to drop them first and, before recreating them, change the datatype in the other columns too.
If you have a known format you are storing your values (e.g. no zeros, or filled with 0s up to the length of 8), the second best option is to use this exact format to do your query, and include the ' to not cast it to integer. If you e.g. always fill 0 to 8 digits, use
SELECT * FROM `table` WHERE id='00000000';
If you never add any zeros, still add the ':
SELECT * FROM `table` WHERE id='0';
With both options, MySQL can use your primary key and you will get your result in milliseconds.
If your id column contains only numbers so define it as int , because int will give you better performance ( it is more faster)
Make the column in your table (the one defined as key) integer and retry. Check first performance by running a test within your DB (workbench or simple command line). You should get a better result.
Then, and only if needed (I doubt it though), modify your python to convert from integer to string (and/or vise-versa) when referencing the key column.
Scenario: WAMP server, InnoDB Table, auto-increment Unique ID field [INT(10)], 100+ concurrent SQL requests. VB.Net should also be used if needed.
My database has an auto-increment field wich is used to generate a unique ticket/protocol number for each new information stored (service order).
The issue is that this number must be reseted each year. I.e. it starts at 000001/12 on 01/01/2012 00:00:00 up to maximum 999999/12 and then at 01/01/2013 00:00:00 it must start over again to 000001/13.
Obviously it should easly acomplished using some type of algorithm, but, I'm trying to find a more efficient way to do that. Consider:
It must (?) use auto-increment since the database has some concurrency (+100).
Each ticket must be unique. So 000001 on 2012 is not equal to 000001 on 2013.
It must be automatic. (no human interaction needed to make the reset, or whatever)
It should be reasonably efficient. (a watch program should check the database daily (?) but it seems not the best solution since it will 'fail' 364 times to have success only once).
The 'best' approach I could think of is to store the ticket number using year, such as:
12000001 - 12999999 (it never should reach the 999.999, anyway)
and then an watch program should set the auto increment field to 13000000 at 01/01/2013.
Any Suggestions?
PS: Thanks for reading... ;)
So, for futher reference I've adopted the following sollution:
I do create n tables on the database (one for each year) with only one auto-increment field wich is responsible to generate the each year unique id.
So, new inserts are done into the corresponding table considering the event date. After that the algorithm takes the last_inseted_id() an store that value into the main table using the format 000001/12. (ticket/year)
That because each year must have it's own counter since an 2012 event would be inserted even when the current date is already 2013.
That way events should be retroactive, no reset is needed and it's simple to implement.
Sample code for insertion:
$eventdate="2012-11-30";
$eventyear="2012";
$sql = "INSERT INTO tbl$eventyear VALUES (NULL)";
mysql_query ($sql);
$sql = "LAST_INSERT_ID()";
$row = mysql_fetch_assoc(mysql_query($sql));
$eventID = $row(0)
$sql = "INSERT INTO tblMain VALUES ('$eventID/$eventYear', ... ";
mysql_query($sql)
MongoDB uses something very similar to this that encodes the date, process id and host that generated an id along with some random entropy to create UUIDs. Not something that fulfills your requirement of monotonic increase, but something interesting to look at for some ideas on approach.
If I were implementing it, I would create a simple ID broker server that would perform the logic processing on date and create a unique slug for the id like you described. As long as you know how it's constructed, you have native MySql equivalents to get your sorting/grouping working, and the representation serializes gracefully this will work. Something with a UTC datestring and then a monotonic serial appended as a string.
Twitter had some interesting insights into custom index design here as they implemented their custom ID server Snowflake.
The idea of a broker endpoint that generates UUIDs that are not just simple integers, but also contain some business logic is becoming more and more widespread.
You can set up a combined PRIMARY KEY for both of the two columns ID and YEAR; this way you would only have one ID per year.
MySQL:
CREATE TABLE IF NOT EXISTS `ticket` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`year` YEAR NOT NULL DEFAULT '2012',
`data` TEXT NOT NULL DEFAULT '',
PRIMARY KEY (`id`, `year`)
)
ENGINE = InnoDB DEFAULT CHARACTER SET = utf8 COLLATE = utf8_unicode_ci
UPDATE: (to the comment from #Paulo Bueno)
How to reset the auto-increment-value could be found in the MySQL documentation: mysql> ALTER TABLE ticket AUTO_INCREMENT = 1;.
If you also increase the default value of the year-column when resetting the auto-increment-value, you 'll have a continuous two-column primary key.
But I think you still need some sort of trigger-program to execute the reset. Maybe a yearly cron-job, which is launching a batch-script to do so on each first of January.
UPDATE 2:
OK, I've tested that right now and one can not set the auto-increment-value to a number lower than any existing ID in that specific column. My mistake – I thought it would work on combined primary keys…
INSERT INTO `ticket` (`id`, `year`, `data`) VALUES
(NULL , '2012', 'dtg htg het'),
-- some more rows in 2012
);
-- this works of course
ALTER TABLE `ticket` CHANGE `year` `year` YEAR( 4 ) NOT NULL DEFAULT '2013';
-- this does not reset the auto-increment
ALTER TABLE `ticket` AUTO_INCREMENT = 1;
INSERT INTO `ticket` (`id`, `year`, `data`) VALUES
(NULL , '2013', 'sadfadf asdf a'),
-- some more rows in 2013
);
-- this will result in continously counted ID's
UPDATE 3:
The MySQL-documentation page has a working example, which uses grouped primary keys on MyISAM table. They are using a table similar to the one above, but with reversed column-order, because one must not have auto-increment as first column. It seems this works only using MyISAM instead of InnoDB. If MyISAM still fits your needs, you don't need to reset the ID, but merely increase the year and still have a result as the one you've questioned for.
See: http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html (second example, after "For MyISAM and BDB tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index.")
Is there a possibility of getting a unique timestamp value for for each record in MySQL??..
I created a sample table
CREATE TABLE t1 (id int primary key, name varchar(50),
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP );
and ran some sample INSERTIONS and seems to be timestamp values are duplicated.
e.g insert into t1(id,name) values(1,"test");
Some day soon (5.6.4), MySQL will provide fractional seconds in TIMESTAMP columns, however, even fractional seconds aren't guaranteed to be unique. Though theoretically, they'd most often be unique, especially if you limited MySQL to a single thread.
You can use a UUID if you need a unique number that is ordered temporally.
SELECT UUID(); yields something like:
45f9b8d6-8f00-11e1-8920-842b2b55ce56
And some time later:
004b721a-8f01-11e1-8920-842b2b55ce56
The first three portions of a UUID consist of the time, however, they're in order from highest precision to least, so you'd need to reverse the first three portions using SUBSTR() and CONCAT() like this:
SELECT CONCAT(SUBSTR(UUID(), 16, 3), '-', SUBSTR(UUID(), 10, 4),
'-', SUBSTR(UUID(), 1, 8))
Yields:
1e1-8f00-45f9b8d6
You obviously couldn't use a function like this as a default value, so you'd have to set it in code, but it's a guaranteed unique temporally ordered value. UUID() works at a much lower level than seconds (clock cycles), so it's guaranteed unique with each call and has low overhead (no locking like auto_increment).
Using the UUID() on the database server may be preferred to using a similar function, such as PHP's microtime() function on the application server because your database server is more centralized. You may have more than one application (web) server, which may generate colliding values, and microtime() still doesn't guarantee unique values.
Useful reading for understanding the components of UUID
Universally unique identifier (UUID)
Extracting timestamp and MAC address from UUIDs
Yes if you don't do two or more inserts or edits during one second. Only problem is that a lot stuff can be done during a second, i.e. multiple inserts or automatic updates using a where clause. That rules out the simple solution to force unique timestamps: to add unique constraint into timestamp column.
Why should a timestamp be unique? Use auto increment or something else if you need unique index etc.
If you need more precise time values than timestamp, see:
http://dev.mysql.com/doc/refman/5.5/en/fractional-seconds.html (Note: fractional part is discarded during insert. Not helping...)
http://codeigniter.com/forums/viewthread/66849/ (Apparently double(13,3) makes it possible to add microtime into DB.)
MySQL greater than problem with microtime timestamp (int multiplied with 100 or 1000 could also work. Here decimal is preferred over double.)
Why doesn't MySQL support millisecond / microsecond precision?