One data result requires leading zero (MYSQL Workbench) - mysql

I have sorted data that has a column (varchar (5)) containing data 4 characters long - except one, which is 3 characters long. So it doesn't sort numerically. What I need is 0200, but what is listed in the database is 200. This is what it's supposed to look like:
0200
111X
2222
3333
This is what it looks like:
111X
200
2222
3333
How do a add the leading zero to only this numeric rather than to the whole field so that it sorts the 200 before the 111X?

you can use LPAD function to pad values
select LPAD('200',4,0); => 0200
Sample select
Select
LPAD(TheField,4,0) as FieldName
FROM
YourTable
Order By 1; -- position of the field

Related

MySQL extract first 4 digits

I want to run a query in mysql which will return the record where the first 4 digits are '0123' or '0798' from the following column:
Number
0123 427 6465
0123 1451
01 23 46 47
0123 945675
07984 473456
0845 46 47
(012377) 5258
0800 586931
012 3668 6098
0 1238592371
I want the query to return all records where '0123' or '0798' are the first 4 numeric characters regardless of if there are other characters before or in between. E.g. I would want record 7 returned even though '0123' is in brackets. And I would want record 10 returned even though it is written as '0 123' i.e. there is a space in between.
Is regex relevant here? If so, what would the regex expression be?
Use a combination of LEFT and REPLACE.
REPLACE will strip out any unwanted brackets and whitespaces, and LEFT will select the first four characters, starting from left, of the newly formatted value which will be used in the WHERE clause selecting for values IN '0123', '0798'.
SELECT `number` FROM Numbers WHERE LEFT(REPLACE(REPLACE(REPLACE(`number`, '(', ''), ')', ''), ' ', ''), 4) IN ('0123', '0798')
Fiddle.
Result:
Number
0123 427 6465
0123 1451
01 23 46 47
0123 945675
07984 473456
(012377) 5258
012 3668 6098
0 1238592371
Also, it's worth noting, number is a Reserved Word in MySQL. I used backticks ` to escape it, however, it is advised that you do not use reserved words in your naming conventions.
We can use REGEXP_REPLACE function to remove all others characters other than number and get first four using the below query,
SELECT LEFT(REGEXP_REPLACE(Number, '[^0-9]+', ''), 4) as 4digitonly FROM Numbers a;
Please refer How to get only Digits from String in mysql?
Nothing is better than regex, yes they make us think even think recursivelly :)
Here is the query(of course it can be refactored N times):
SELECT n.number FROM Numbers n WHERE n.number REGEXP '^.*(0[ \t\r\n]*1[ \t\r\n]*2[ \t\r\n]*3).*|^.*(0[ \t\r\n]*7[ \t\r\n]*9[ \t\r\n]*8).*$'
Fiddle

Is it faster to search by column integer or column string in mysql?

I have a table "transactions" with million records
id trx secret_string (varchar(50)) secret_id (int(2.))
1 80 52987624f7cb03c61d403b7c68502fb0 1
2 28 52987624f7cb03c61d403b7c68502fb0 1
3 55 8502fb052987624f61d403b7c67cb03c 2
4 61 52987624f7cb03c61d403b7c68502fb0 1
5 39 8502fb052987624f61d403b7c67cb03c 2
..
999997 27 8502fb052987624f61d403b7c67cb03c 2
999998 94 8502fb052987624f61d403b7c67cb03c 2
999999 40 52987624f7cb03c61d403b7c68502fb0 1
1000000 35 8502fb052987624f61d403b7c67cb03c 2
As you can notice, secret_string and secret_id will always match.
Let's say, I need to select records where secret_string = "52987624f7cb03c61d403b7c68502fb0".
Is it faster to do:
SELECT id FROM transactions WHERE secret_id = 1
Than:
SELECT id FROM transactions WHERE secret_string = "52987624f7cb03c61d403b7c68502fb0"
Or it does not matter? What about for other operations such as SUM(trx), COUNT(trx), AVG(trx), etc?
Column secret_id currently does not exist, but if it is faster to search records by it, I am planning to create it upon row insertions.
Thank you
I hope I make sense.
Int comparisons are faster than varchar comparisons, for the simple fact that ints take up much less space than varchars.
This holds true both for unindexed and indexed access. The fastest way to go is an indexed int column.
There is another reason to use an int, and that is to normalise the database. Instead of having the text '52987624f7cb03c61d403b7c68502fb0' stored thousands of times in the table,you should store it's id and have the secret string stored once in a separate table. It's the same deal for other operations such as SUM COUNT AVG.
As the others told you: selecting int is definitly faster than strings. However if you need to select by secret_string, all given strings look like a hex string, that said you can consider to cast those strings to an int (or big int) using hex('52987624f7cb03c61d403b7c68502fb0') and store those int values instead of strings

pandas to_csv: how to format floats in a column with mixed types

I have a df which contains a column with both float and text values.
df.some_column
0 48.5182
1 58.2259
2 some string
3 48.5182
4 17.4928
I want to write all the values to CSV with floats rounded to 0 decimals. So the values in this column would be:
48
58
some string
48
17
When I write this to CSV with
df.to_csv(output_path,encoding='utf-8', index=False, float_format='%.0f')
the float_format is ignored and I get decimal values. If I remove the rows with strings first, the float_formatis used as expected. I looked around for a way to convert the values to int, but didn't find a way to do that on the column.
It looks like I could possibly iterate through all the values and round them, but I suspect there is some more elegant way.
You could cast the dtype to str and then split on the decimal point and take the whole number part:
In [70]:
df['some_col'] = df['some_col'].astype(str)
df['some_col'] = df['some_col'].loc[df['some_col'].str.contains('.')].str.split('.').str[0]
df
Out[70]:
some_col
index
0 48
1 58
2 some string
3 48
4 17
Then when you call to_csv you don't need the float_format param

MySQL: how to search as much as substrings matches in a table of millions of strings

Let's say I have this strings in a MySQL table:
id | hash
1 | 462a276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f2654
2 | 5c2670355b6e503f39427a435a423d6d4c7c5156344c336c6c244a7234
3 | 35785c5f45373c495b70522452564b6f4531792b275e40642854772764
...
millions of records !
Now I have a set of substrings (6 character size), for example this:
["76e262", "435a42", "75e406", "95b705", "344c33"]
What I want is to know how many of these substrings are in each string, so the result could be:
id | matches
63 | 5
34 | 5
123 | 3
153 | 3
13 | 2
9 | 1
How can achieve this in a fast way ?
Real numbers and sizes are:
1) Table with 100.000/200.000 hashes
2) Main Hash size: 256 bytes
3) Substring of mini-hashes: 16 of 32 each one
NOTE: I'd like to avoid the "%LIKE%" since it's 16 likes for each row, and millions rows
You can accomplish this by using the Aho-Corasick algorithm: http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
MySQL doesn't have a function for that, so you'd need to write your own or consider using a language like java or c to massage the data.
How about a different approach?
You could also consider having a shifting mechanism for your data and the check on the shifting. For example, if your key is 462a276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f2654 and you know that your hash will have 58 chars, then you would have these variations:
62a276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f26544
2a276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f265446
a276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f2654462
276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f2654462a
...
Each one of these would be in a column, every one of them would be indexed.
So your query would be simply:
Select * from table where hash like "a27e262%" or s1 like "a27e262%" ...
Note that this would be MUCH faster than LIKE "%value%" as the column is indexed and the LIKE is only checking the begins with.
There are many disadvantages to this solutions: space required for the extra columns, insertion and update time would increase because of the time calculating the shifted columns, and time required to process the result of the select. But you wouldn't need to implement the algorithm in mysql.
You could also require that the minimum length of the string being searched is 6 chars, so you won't need to shift the whole string, only to keep the first 6 digits. If a match is found then you keep looking for the next 6 digits on the next match.

MySQL append , prefix a range of numbers with 0

I have around a few thousand rows with which contain 3 digit numbers starting with 100 and ranging to 199 which i need to prefix with 0. There are also thousands of other numbers 4 digit numbers as well which i don't want to change.
I need find all the 3 digit numbers in the range and prefix only those ranging from 100 -199 with a 0 so as they are 4 digits eg 100 > 0100 , 104 > 0104 and so on.
Also these numbers may step eg 110 next is 124.
Is there a way I can do this using SQL? as i don't fancy changing these manually!
Many Thanks
This is best done with a programming language. That said, here's a SQL query that will update all the existing numbers:
UPDATE tableName SET fieldName = right(concat('0000',fieldName), 4) WHERE length(fieldName) < 4
The LPAD function is what you are looking for. You can use this in your query to pad the numbers on the fly.
SELECT LPAD(CONVERT(num AS CHAR), 4, '0') FROM tbl WHERE num > 99 AND num < 200
If you prefer to do this on the script side, str_pad will do the same in php.