MySql Indexing part of a column - mysql

I need to search a medium sized MySql table (about 15 million records).
My query searches for a value ending with another value, for example:
SELECT * FROM {tableName} WHERE {column} LIKE '%{value}'
{value} is always 7 characters length.
{column} is sometimes 8 characters length (otherwise it is 7).
Is there a way to improve performence on my search?
clearly index is not an option.
I could save {column} values in reverse order on another column and index that column, but im looking to avoid this solution.

{value} is always 7 characters length
Your data is not mormalized. Fixing this is the way to fix the problem. Anything else is a hack. Having said that I accept it is not always proactical to repair damage done in the past by dummies.
However the most appropriate hack depends on a whole lot of information you've not told us about.
how frequently you will run the query
what the format of the composite data is
but im looking to avoid this solution.
Why? It's a reasonable way to address the problem. The only downside is that you need to maintain the new attribute - given that this data domain appears in different attributes in multiple (another normalization violation) means it would make more sense to implement the index in a seperate, EAV relation but you just need to add triggers on the original table to maintain sync using your existing code base. Every solution I can think will likely require a similar fix.
Here's a simplified example (no multiple attributes) to get you started:
CREATE TABLE lookup (
table_name VARCHAR(18) NOT NULL,
record_id INT NOT NULL, /* or whatever */
suffix VARCHAR(7),
PRIMARY KEY (table_name, record_id),
INDEX (suffix, table_name, record_id)
);
CREATE TRIGGER insert_suffix AFTER INSERT ON yourtable
FOR EACH ROW
REPLACE INTO lookup (table_name, record_id, suffix)
VALUES ('yourtable', NEW.id
, SUBSTR(NEW.attribute, NEW.id, RIGHT(NEW.attribute, 7
);
CREATE TRIGGER insert_suffix AFTER UPDATE ON yourtable
FOR EACH ROW
REPLACE INTO lookup (table_name, record_id, suffix)
VALUES ('yourtable', NEW.id
, RIGHT(NEW.attribute, 7)
);
CREATE TRIGGER insert_suffix AFTER DELETE ON yourtable
FOR EACH ROW
DELETE FROM lookup WHERE table_name='yourtable' AND record_id=OLD.id
;

If you have a set number of options for the first character, then you can use in. For instance:
where column in ('{value}', '0{value}', '1{value}', . . . )
This allows MySQL to use an index on the column.
Unfortunately, with a wildcard at the beginning of the pattern, it is hard to use an index. Is it possible to store the first character in another column?

Related

Can I add rows to MySQL before removing all old rows (except same primary)?

If I have a table that has these rows:
animal (primary)
-------
man
dog
cow
and I want to delete all the rows and insert my new rows (that may contain some of the same data), such as:
animal (primary)
-------
dog
chicken
wolf
I could simply do something like:
delete from animal;
and then insert the new rows.
But when I do that, for a split second, 'dog' won't be accessible through the SELECT statement.
I could simply insert ignore the new data and then delete the rest, one by one, but that doesn't feel like the right solution when I have a lot of rows.
Is there a way to insert the new data and then have MySQL automatically delete the rest afterward?
I have a program that selects data from this table every 5 minutes (and the code I'm writing now will be updating this table once every 30 minutes), so I would like to be as accurate as possible at all times, and I would rather have too many rows for a split second than too few rows for the same time.
Note: I know that this may seem like it is unnecessary but I just feel like if I leave too many of those unlikely possibilities in different places, there will be times where things go wrong.
You may want to use TRUNCATE instead of DELETE here. TRUNCATE is faster than DELETE and resets the table back to its empty state (meaning IDENTITY columns are reset to original values as well).
Not sure why you're having problems with selecting a value that was deleted and re-added, maybe I'm missing some context. But if you're wiping the table clean, you might want to use truncate instead.
You could add another column timestamp and change the select statement to accommodate this scenario where it needs to check for the latest value.
If this is for school, I would argue that you need a timestamp and that is what your professor is looking for. You shouldn't need to truncate a table to get the latest values, you need to adjust the thinking behind the table and how you are querying data. Hope this helps!
Check out these:
How to make a mysql table with date and time columns?
Why not update values instead?
My other questions would be:
How are you loading this into the table?
What does that code look like?
Can you change the way you Select from the table?
What values are being "updated" and change in such a way that you need to truncate the entire table?
If you don't want to add new column, there is an other method.
1. At first step, update table in any way that mark all existing rows for deletion in future. For example:
UPDATE `table_name` SET `animal`=CONCAT('MUST_BE_DELETED_', `animal`)
At second step, insert new rows.
On final step, remove all marked rows:
DELETE FROM `table_name` WHERE `animal` LIKE 'MUST_BE_DELETED_%'
You could implement this by having the updated_on column as timestamp and you may even utilize some default values, but let's go with an example without them.
I presume the table would look something like this:
CREATE TABLE `new_table` (
`animal` varchar(255) NOT NULL,
`updated_on` timestamp,
PRIMARY KEY (`animal`)
) ENGINE=InnoDB
This is just a dummy table example. What's important are the two queries later on.
You would simply perform a query to insert the data, such as:
insert into my_table(animal)
select animal from my_view where animal = 'dogs'
on duplicate key update
updated_on = current_timestamp;
Please notice that my_view is your table/view/query by which you supply the values to insert into your table. Also notice that you need to have primary/unique key constraint on your animal column in this example, in order to work.
Then, you proceed with the following query, to "purge" (delete) the old values:
delete from my_table
where updated_on < (
select *
from (
select max(updated_on) from my_table
) as max_date
);
Please notice that you could make a separate view in order to obtain this max_date value for updated_on entry. This entry should indicate the timestamp for your last updated/inserted values in a previous query, so you could proceed with utilizing it in a where clause in order to issue deletion of old records that you don't want/need anymore.
IMPORTANT NOTE:
Since you are doing multiple queries and it's supposed to be a single operation, I'd advise you to utilize it within a single trancations and to utilize a proper rollback on various potential outcomes (i.e. in case of mysql exceptions). You might wish to utilize a proper stored procedure for that.

mySQL valid bit - alternatives?

Currently, I have a mySQL table with columns that looks something like this:
run_date DATE
name VARCHAR(10)
load INTEGER
sys_time TIME
rec_time TIME
valid TINYINT
The column valid is essentially a valid bit, 1 if this row is the latest value for this (run_date,name) pair, and 0 if not. To make insertions simpler, I wrote a stored procedure that first runs an UPDATE table_name SET valid = 0 WHERE run_date = X AND name = Y command, then inserts the new row.
The table reads are in such a way that I usually use only the valid = 1 rows, but I can't discard the invalid rows. Obviously, this schema also has no primary key.
Is there a better way to structure this data or the valid bit, so that I can speed up both inserts and searches? A bunch of indexes on different orders of columns gets large.
In all of the suggestions below, get rid of valid and the UPDATE of it. That is not scalable.
Plan A: At SELECT time, use 'groupwise max' code to locate the latest run_date, hence the "valid" entry.
Plan B: Have two tables and change both when inserting: history, with PRIMARY KEY(name, run_date) and a simple INSERT statement; current, with PRIMARY KEY(name) and INSERT ... ON DUPLICATE KEY UPDATE. The "usual" SELECTs need only touch current.
Another issue: TIME is limited to 838:59:59 and is intended to mean 'time of day', not 'elapsed time'. For the latter, use INT UNSIGNED (or some variant of INT). For formatting, you can use sec_to_time(). For example sec_to_time(3601) -> 01:00:05.

MySQL performance issue on ~3million rows containing MEDIUMTEXT?

I had a table with 3 columns and 3600K rows. Using MySQL as a key-value store.
The first column id was VARCHAR(8) and set to primary key.The 2nd and 3rd columns were MEDIUMTEXT. When calling SELECT * FROM table WHERE id=00000 MySQL took like 54 sec ~ 3 minutes.
For testing I created a table containing VARCHAR(8)-VARCHAR(5)-VARCHAR(5) where data casually generated from numpy.random.randint. SELECT takes 3 sec without primary key. Same random data with VARCHAR(8)-MEDIUMTEXT-MEDIUMTEXT, the time cost by SELECT was 15 sec without primary key.(note: in second test, 2nd and 3rd column actually contained very short text like '65535', but created as MEDIUMTEXT)
My question is: how can I achieve similar performance on my real data? (or, is it impossible?)
If you use
SELECT * FROM `table` WHERE id=00000
instead of
SELECT * FROM `table` WHERE id='00000'
you are looking for all strings that are equal to an integer 0, so MySQL will have to check all rows, because '0', '0000' and even ' 0' will all be casted to integer 0. So your primary key on id will not help and you will end up with a slow full table. Even if you don't store values that way, MySQL doesn't know that.
The best option is, as all comments and answers pointed out, to change the datatype to int:
alter table `table` modify id int;
This will only work if your ids casted as integer are unique (so you don't have e.g. '0' and '00' in your table).
If you have any foreign keys that references id, you have to drop them first and, before recreating them, change the datatype in the other columns too.
If you have a known format you are storing your values (e.g. no zeros, or filled with 0s up to the length of 8), the second best option is to use this exact format to do your query, and include the ' to not cast it to integer. If you e.g. always fill 0 to 8 digits, use
SELECT * FROM `table` WHERE id='00000000';
If you never add any zeros, still add the ':
SELECT * FROM `table` WHERE id='0';
With both options, MySQL can use your primary key and you will get your result in milliseconds.
If your id column contains only numbers so define it as int , because int will give you better performance ( it is more faster)
Make the column in your table (the one defined as key) integer and retry. Check first performance by running a test within your DB (workbench or simple command line). You should get a better result.
Then, and only if needed (I doubt it though), modify your python to convert from integer to string (and/or vise-versa) when referencing the key column.

Is it possible to insert a new row at top of MySQL table?

All rows in MySQL tables are being inserted like this:
1
2
3
Is there any way how to insert new row at a top of table so that table looks like this?
3
2
1
Yes, yes, I know "order by" but let me explain the problem. I have a dating website and users can search profiles by sex, age, city, etc. There are more than 20 search criteria and it's not possible to create indexes for each possible combination. So, if I use "order by", the search usually ends with "using temporary, using filesort" and this causes a very high server load. If I remove "order by" oldest profiles are shown as first and users have to go to the last page to see the new profiles. That's very bad because first pages of search results always look the same and users have a feeling that there are no new profiles. That's why I asked this question. If it's not possible to insert last row at top of table, can you suggest anything else?
The order in which the results are returned when there's no ORDER BY clause depends on the RDBM. In the case of MySQL, or at least most engines, if you don't explicitly specify the order it will be ascending, from oldest to new entries. Where the row is located "physically" doesn't matter. I'm not sure if all mysql engines work that way though. I.e., in PostgreSQL the "default" order shows the most recently updated rows first. This might be the way some of the MySQL engines work too.
Anyway, the point is - if you want the results ordered - always specify sort order, don't just depend on something default that seems to work. In you case you want something trivial - you want the users in descending order, so just use:
SELECT * FROM users ORDER BY id DESC
I think you just need to make sure that if you always need to show the latest data first, all of your indexes need to specify the date/time field first, and all of your queries order by that field first.
If ORDER BY is slowing everything down then you need to optimise your queries or your database structure, i would say.
Maybe if you add the id 'by hand', and give it a negative value, but i (and probably nobody) would recommend you to do that:
Regular insert, e.g.
insert into t values (...);
Update with set, e.g.
update t set id = -id where id = last_insert_id();
Normally you specify a auto_incrementing primary key.
However, you can just specify the primary key like so:
CREATE TABLE table1 (
id signed integer primary key default 1, <<-- no auto_increment, but has a default value
other fields .....
Now add a BEFORE INSERT trigger that changes the primary key.
DELIMITER $$
CREATE TRIGGER ai_table1_each BEFORE INSERT ON table1 FOR EACH ROW
BEGIN
DECLARE new_id INTEGER;
SELECT COALESCE(MIN(id), 0) -1 INTO new_id FROM table1;
SET NEW.id = new_id;
END $$
DELIMITER ;
Now your id will start at -1 and run down from there.
The insert trigger will make sure no concurrency problems occur.
I know that a lot of time has passed since the above question was asked. But I have something to add to the comments:
I'm using MySQL version: 5.7.18-0ubuntu0.16.04.1
When no ORDER BY clause is used with SELECT it is noticeable that records are displayed, regardless of the order in which they are added, in the table's Prime Key sequence.

How do I reset sequence numbers to become consecutive?

I've got a mysql table where each row has its own sequence number in a "sequence" column. However, when a row gets deleted, it leaves a gap. So...
1
2
3
4
...becomes...
1
2
4
Is there a neat way to "reset" the sequencing, so it becomes consecutive again in one SQL query?
Incidentally, I'm sure there is a technical term for this process. Anyone?
UPDATED: The "sequence" column is not a primary key. It is only used for determining the order that records are displayed within the app.
If the field is your primary key...
...then, as stated elsewhere on this question, you shouldn't be changing IDs. The IDs are already unique and you neither need nor want to re-use them.
Now, that said...
Otherwise...
It's quite possible that you have a different field (that is, as well as the PK) for some application-defined ordering. As long as this ordering isn't inherent in some other field (e.g. if it's user-defined), then there is nothing wrong with this.
You could recreate the table using a (temporary) auto_increment field and then remove the auto_increment afterwards.
I'd be tempted to UPDATE in ascending order and apply an incrementing variable.
SET #i = 0;
UPDATE `table`
SET `myOrderCol` = #i:=#i+1
ORDER BY `myOrderCol` ASC;
(Query not tested.)
It does seem quite wasteful to do this every time you delete items, but unfortunately with this manual ordering approach there's not a whole lot you can do about that if you want to maintain the integrity of the column.
You could possibly reduce the load, such that after deleting the entry with myOrderCol equal to, say, 5:
SET #i = 5;
UPDATE `table`
SET `myOrderCol` = #i:=#i+1
WHERE `myOrderCol` > 5
ORDER BY `myOrderCol` ASC;
(Query not tested.)
This will "shuffle" all the following values down by one.
I'd say don't bother. Reassigning sequential values is a relatively expensive operation and if the column value is for ordering purpose only there is no good reason to do that. The only concern you might have is if for example your column is UNSIGNED INT and you suspect that in the lifetime of your application you might have more than 4,294,967,296 rows (including deleted rows) and go out of range, even if that is your concern you can do the reassigning as a one time task 10 years later when that happens.
This is a question that often I read here and in other forums. As already written by zerkms this is a false problem. Moreover if your table is related with other ones you'll lose relations.
Just for learning purpose a simple way is to store your data in a temporary table, truncate the original one (this reset auto_increment) and than repopulate it.
Silly example:
create table seq (
id int not null auto_increment primary key,
col char(1)
) engine = myisam;
insert into seq (col) values ('a'),('b'),('c'),('d');
delete from seq where id = 3;
create temporary table tmp select col from seq order by id;
truncate seq;
insert into seq (col) select * from tmp;
but it's totally useless. ;)
If this is your PK then you shouldn't change it. PKs should be (mostly) unchanging columns. If you were to change them then not only would you need to change it in that table but also in any foreign keys where is exists.
If you do need a sequential sequence then ask yourself why. In a table there is no inherent or guaranteed order (even in the PK, although it may turn out that way because of how most RDBMSs store and retrieve the data). That's why we have the ORDER BY clause in SQL. If you want to be able to generate sequential numbers based on something else (time added into the database, etc.) then consider generating that either in your query or with your front end.
Assuming that this is an ID field, you can do this when you insert:
INSERT INTO yourTable (ID)
SELECT MIN(ID)
FROM yourTable
WHERE ID > 1
As others have mentioned I don't recommend doing this. It will hold a table lock while the next ID is evaluated.