MYSQL find INT in string - mysql

I have a column MEDIUMTEXT that contains values that come from a goup_concat, in the form of INT,INT,INT . We can call it Concatenated_IDs
The length of the string can be of 1 int or more.
I need to break it down into original values somehow to be able to do something such as
SELECT
table_country.name
FROM
table_country
WHERE
table_country.country_id IN (
SELECT
Concatenated_IDs
FROM
table_targeted_countries
WHERE
table_targeted_countries.email LIKE "%gmail.com")
and get the country names that users registered with a gmail address target.
I have considered exploding the mediumtext into INT, creating one row for each int, sort of like a reverse concat, but I am guessing it would take a large procedure
edit:reformulated question name

You should probably normalize that table, so those concated ids are stored in a separate table, one id per record. But in the mean time, you can use mysql's find_in_set() function:
SELECT ...
WHERE FIND_IN_SET(table_country.country_id, Concatenated_IDs) > 0
Relevant docs: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set

Related

Mysql: LIKE CONCAT Replacement --> less performance heavy

So I have a SELECT Statement that is comparing the current column content from the table_1 column "table_1_content" with the content of another column (table_2_content) in table_2, whereas content in "table_2_content" can be found anywhere in "table_1_content":
$select = "SELECT * FROM table_1, table_2 WHERE `table_1_content` LIKE CONCAT('%', table_2_content, '%')";
$result = mysqli_query($con, $select);
My problem is that LIKE CONCAT is pretty performance heavy.
Is there another way to search through two columns from different tables, so that no full table scan is performed every time the query is executed?
The LIKE in total free text format (% at the start and at the end of the search string) is the performance heavy part. Is the wildcard at the start of the string necessary? If so: You might have to consider pre-processing the data in a different way so that the search can use a single wildcard or no wildcard at all. This last part (depending on the data) is for example done by splitting the string by a delimiter and storing the data in separate rows, after which a much faster comparison and indexes are possible to be used.
To put data in multiple rows, we would assume a usable separator (can be multiple, the code just gets longer):
CREATE TABLE baseinfo (id INT NOT NULL auto_increment primary key,
some other columns);
CREATE TABLE explodedstring(id INT NOT NULL, str VARCHAR(200),
FOREIGN KEY (id) REFERENCES baseinfo(id));
CREATE PROCEDURE explodestring(id int, fullstr VARCHAR(4000))
BEGIN
{many examples exist already how to do this on SO}
END;
The procedure would take as input your key from the original data (id in this case), and the original string.
The output of the procedure would end up in a secondary table explodedstring against which you now could run a normal select (add some index for performance). The resulting ids would tell you which record would match.

MySQL performance issue on ~3million rows containing MEDIUMTEXT?

I had a table with 3 columns and 3600K rows. Using MySQL as a key-value store.
The first column id was VARCHAR(8) and set to primary key.The 2nd and 3rd columns were MEDIUMTEXT. When calling SELECT * FROM table WHERE id=00000 MySQL took like 54 sec ~ 3 minutes.
For testing I created a table containing VARCHAR(8)-VARCHAR(5)-VARCHAR(5) where data casually generated from numpy.random.randint. SELECT takes 3 sec without primary key. Same random data with VARCHAR(8)-MEDIUMTEXT-MEDIUMTEXT, the time cost by SELECT was 15 sec without primary key.(note: in second test, 2nd and 3rd column actually contained very short text like '65535', but created as MEDIUMTEXT)
My question is: how can I achieve similar performance on my real data? (or, is it impossible?)
If you use
SELECT * FROM `table` WHERE id=00000
instead of
SELECT * FROM `table` WHERE id='00000'
you are looking for all strings that are equal to an integer 0, so MySQL will have to check all rows, because '0', '0000' and even ' 0' will all be casted to integer 0. So your primary key on id will not help and you will end up with a slow full table. Even if you don't store values that way, MySQL doesn't know that.
The best option is, as all comments and answers pointed out, to change the datatype to int:
alter table `table` modify id int;
This will only work if your ids casted as integer are unique (so you don't have e.g. '0' and '00' in your table).
If you have any foreign keys that references id, you have to drop them first and, before recreating them, change the datatype in the other columns too.
If you have a known format you are storing your values (e.g. no zeros, or filled with 0s up to the length of 8), the second best option is to use this exact format to do your query, and include the ' to not cast it to integer. If you e.g. always fill 0 to 8 digits, use
SELECT * FROM `table` WHERE id='00000000';
If you never add any zeros, still add the ':
SELECT * FROM `table` WHERE id='0';
With both options, MySQL can use your primary key and you will get your result in milliseconds.
If your id column contains only numbers so define it as int , because int will give you better performance ( it is more faster)
Make the column in your table (the one defined as key) integer and retry. Check first performance by running a test within your DB (workbench or simple command line). You should get a better result.
Then, and only if needed (I doubt it though), modify your python to convert from integer to string (and/or vise-versa) when referencing the key column.

How to create random unique (not repeated) and fix length (9 digits) primary key in mysql database?

As the title, how to create a 9 digits number primary key which is random, unique, not repeated and from range 100000000 to 999999999?
And this method must be work on the godaddy server, seems godaddy have so many limitation.
I can only think of two reliable ways of creating unique numbers.
Use a systematic process, such as auto-incrementing, where you now the numbers are unique.
Store generated numbers in a table.
You want random numbers, so the first method could be applied using a pseudo-random number generator. But the second is probably simpler to implement.
It goes something like this:
create table numbers (
numberid int auto_increment primary key,
n varchar(10) not null unique
);
Then you need to create numbers using a loop. Do the following until it succeeds:
insert into numbers (n)
select cast((rand(*) * 900000000) + 1000000000 as varchar);
You can use last_inserted_id() to then get the most recent number inserted.
If pseudo-random is OK for you, you could create a trigger like this:
create trigger tr_setid before insert on mytable for each row
set new.id := (
select mod ((count(*) ^ 42) * 479001599 + 714320596, 900000000)+100000000
from mytable);
This system is not good if you also delete records from your table, as this solution assumes count(*) is one larger every time this trigger runs.
The multiplier is a prime and not a divisor of 900000000, guaranteeing that no duplicate number will be generated before all possible numbers have been visited.
The ^ operator is just mapping the count(*) so to make the generated series a bit less predictable.
With this trigger the first 10 records in the table will get these id values:
232387754
711389353
174384556
653386155
348394150
827395749
290390952
769392551
900374962
479376561

Can I create a mapping from interger values in a column to the text values they represent in sql?

I have a table full of traffic accident data with column headers such as 'Vehicle_Manoeuvre' which contains integers for example 13 represents the vehicle manoeuvre which caused the accident was 'overtaking moving vehicle'.
I know the mappings from integers to text as I have a (quite large) excel file with this data.
An example of what I want to know is percentage of the accidents involved this type of manoeuvre but I don't want to have to open the excel file and find the mappings of integers to text every time I write a query.
I could manually change the integers of all the columns (write query with all the possible mappings of each column, add them as new column, then delete the orginial columns) but this sould take a long time.
Is it possible to create some type of variable (like an array with first column as integers and second column with the mapped text) that SQL could use to understand how text relates to the integers allowing me to write a query below:
SELECT COUNT(Vehicle_Manoeuvre) FROM traffictable WHERE Vehicle_Manoeuvre='overtaking moving vehicle';
rather than:
SELECT COUNT(Vehicle_Manoeuvre) FROM traffictable WHERE Vehicle_Manoeuvre=13;
even though the data in the table is still in integer form?
You would do this with a Maneeuvres reference table:
create table Manoeuvres (
ManoeuvreId int primary key,
Name varchar(255) unique
);
insert into Manoeuvres(ManoeuvreId, Name)
values (13, 'Overtaking');
You might even have such a table already, if you know that 13 has a special meaning.
Then use a join:
SELECT COUNT(*)
FROM traffictable tt JOIN
Manoeuvres m
ON tt.Vehicle_Manoeuvre = m.ManoeuvreId
WHERE m.name = 'Overtaking';

LPAD with leading zero

I have table with invoice numbers. Guidelines say that numbers should have 6 or more digits. First of all tried to do:
UPDATE t1 SET NUMER=CONCAT('00000',NUMER) WHERE LENGTH(NUMER)=1;
UPDATE t1 SET NUMER=CONCAT('0000',NUMER) WHERE LENGTH(NUMER)=2;
UPDATE t1 SET NUMER=CONCAT('000',NUMER) WHERE LENGTH(NUMER)=3;
UPDATE t1 SET NUMER=CONCAT('00',NUMER) WHERE LENGTH(NUMER)=4;
UPDATE t1 SET NUMER=CONCAT('0',NUMER) WHERE LENGTH(NUMER)=5;
but that isn't efficient, and even pretty. I tried LPAD function, but then came problem because function :
UPDATE t1 SET NUMER=LPAD(NUMER,6,'0') WHERE CHAR_LENGTH(NUMER)<=6 ;
returns ZERO rows affected. Also googled and they say that putting zero into quotes will solve problem, but didn't, any help ? It's daily import.
EDIT:
Column NUMER is INT(19) and contain already data like :
NUMER
----------
1203
12303
123403
1234503
...
(it's filled with data with different length from 3 to 7 digits by now)
I think you should consider that the guidelines you read apply to how an invoice should be displayed, and not how it should be stored in the database.
When a number is stored as an INT, it's a pure number. If you add zeros in front and store it again, it is still the same number.
You could select the NUMER field as follows, or create a view for that table:
SELECT LPAD(NUMER,6,'0') AS NUMER
FROM ...
Or, rather than changing the data when you select it from the database, consider padding the number with zeros when you display it, and only when you display it.
I think your requirement for historical data to stay the same is a moot point. Even for historical data, an invoice numbered 001203 is the same as an invoice numbered 1203.
However, if you absolutely must do it the way you describe, then converting to a VARCHAR field may work. Converted historical data can be stored as-is, and any new entries could be padded to the required number of zeros. But I do not recommend that.
UPDATE t1 SET NUMER=LPAD(NUMER,6,'0') WHERE CHAR_LENGTH(NUMER)<=6 ; will not do what you expect since the NUMER field is an int. It will create the string '001234' from the int 1234 and then cast it back into 1234 - that is why there is no change.
Change NUMER to type int(6) zerofill and MySQL will pad it for you each time you read it.
If you really want zeros stored in the database, you have to change the type to CHAR/VARCHAR, then your LPAD update statement will work.
The field in the table is an int column so it just stores a number. There's no way to pad out the data in the table. 1 == 001 == 000000000001. This is the same number.
You should do the padding at the application level (the system that pulls the data out of the table). What happens when the order number goes above 999999? You would then have to update all the data in the table to add an extra 0. This kind of thing should not be done at the database level.
You could also select the data out with an LPAD:
SELECT LPAD(NUMER,6,'0'), [other_columns] FROM t1;
Alternative, As CBroe mentioned you could change the datatype to be INT(6) ZEROFILL so that it displays correctly but this will have to be modified if it goes above 999999 as mentioned above..