Process TEXT BLOBs fields in MySQL line by line - mysql

I have a MEDIUMTEXT blob in a table, which contains paths, separated by new line characters. I'd like to add a "/" to the begging of each line if it is not already there. Is there a way to write a query to do this with built-in procedures?
I suppose an alternative would be to write a Python script to get the field, convert to a List, process each line and update the record. There aren't that many records in the DB, so I can take the processing delay (if it doesn't lock the entire DB or table). About 8K+ rows.
Either way would be fine. If second option is recommended, do I need to know of specific locking schematics before getting into this -- as this would be run on a live prod DB (of course, I'd take a DB snapshot). But in place updates would be best to not have downtime.

Demo:
mysql> create table mytable (id int primary key, t text );
mysql> insert into mytable values (1, 'path1\npath2\npath3');
mysql> select * from mytable;
+----+-------------------+
| id | t |
+----+-------------------+
| 1 | path1
path2
path3 |
+----+-------------------+
1 row in set (0.00 sec)
mysql> update mytable set t = concat('/', replace(t, '\n', '\n/'));
mysql> select * from mytable;
+----+----------------------+
| id | t |
+----+----------------------+
| 1 | /path1
/path2
/path3 |
+----+----------------------+
However, I would strongly recommend to store each path on its own row, so you don't have to think about this. In SQL, each column should store one value per row, not a set of values.

Related

Update Column by Parsing String from Other Column in MYSQL

I would like to create a new column in a MYSQL table based on the string values in an existing column.
My strategy is to first create an empty column and then update the values in the new column based on values in the existing column. However, I am stumbling on how to parse the string in order to extract the correct values.
The string is of the form 1.1.25. I want to extract the value before the first period and the value between the two periods and put these in new columns.
mytable
id|actsceneline|text
1 |1.1.1 |How are you.
1 |1.1.2 |Not bad. You?
To create the new empty column
ALTER TABLE mytable
ADD COLUMN act VARCHAR(6) NOT NULL,
ADD COLUMN scene VARCHAR(6) NOT NULL
To change the values in the new columns, I imagine I would do something like:
UPDATE mytable SET act = '1',scene = 1
And then use MYSQL string functions such as instr or substr or regex to extract the values and update the new columns as in.
UPDATE mytable SET act =
SELECT SUBSTR(actsceneline, 1, LOCATE('.', text)) FROM mytable
However, I'm struggling with how to extract the values from the string.
Thanks for any suggestions.
Try using SUBSTRING_INDEX():
UPDATE mytable
SET act = SUBSTRING_INDEX(actsceneline, '.', 1),
scene = SUBSTRING_INDEX(SUBSTRING_INDEX(actsceneline, '.', 2), '.', -1);
Result given your data:
mysql> select * from mytable;
+----+--------------+---------------+-----+-------+
| id | actsceneline | text | act | scene |
+----+--------------+---------------+-----+-------+
| 1 | 1.1.1 | How are you. | 1 | 1 |
| 2 | 1.1.2 | Not bad. You? | 1 | 1 |
+----+--------------+---------------+-----+-------+
Best way to create a select and what you want to update.
create a new table from your existing table.
"create table destinationtablename
select * from sourcetable;"
then work on your destinationtablename.
All work finished then check twice before update to original table or you can also take backup of your data by creating new table.

Extract a specific value from string (MYSQL)

I have the MISC column in a MYSQL table with the following value:
'PrimeCC_Stripe/XX_582130/PMethod=VISA/CardType=VISA/489930******8888/12/2020/TraceId=7182992'
another example:
'-1/error/PMethod=VISA/CardType=VISA/489930******8888/12/2020/TraceId=714291'
or
'Cancelled by PendingDepositCleanerJob. User didn't finish the payment process properly.'
Im am trying to extract the CARD number as another column in my query, here it should be: '489930******8888' or nothing if no card number is included in the MISC column.
What is the best option to extract this information?
A bit of string manipulation
drop table if exists t;
create table t (str varchar(100));
insert into t values
('PrimeCC_Stripe/XX_582130/PMethod=VISA/CardType=VISA/489930******8888/12/2020/TraceId=7182992'),
('Cancelled by PendingDepositCleanerJob. User didnt finish the payment process properly.'),
('123456******7891')
;
select str,
case when instr(str,'******') > 0 then
concat(
substring(str, instr(str,'******') - 6, 6),
'******',
substring(str, instr(str,'******') + 6, 4)
)
end
from t;
+----------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PrimeCC_Stripe/XX_582130/PMethod=VISA/CardType=VISA/489930******8888/12/2020/TraceId=7182992 | 489930******8888 |
| Cancelled by PendingDepositCleanerJob. User didnt finish the payment process properly. | NULL |
| 123456******7891 | 123456******7891 |
+----------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
But it won't work if you have more than 1 occurrance of ****** or the number format differs (or is only a partial)
MySQL supports regular expressions, which we can use as a last resort.
SELECT REGEXP_SUBSTR(misc, '489930******8888') as CARD
The default value returned will be null. Hope this sorts you.

mysql strange behavior when inserting data

When inserting data to mysql via the phpmyadmin page, or via python I've seen something I can't explain:
cur.execute("INSERT INTO 28AA507A0500009E (timestamp, temp) VALUES ('2014-01-04 15:36:30',24.44)")
cur.execute("INSERT INTO 28D91F7A050000D9 (timestamp, temp) VALUES ('2014-01-04 15:36:30',24.44)")
cur.execute("INSERT INTO `28012E7A050000F5` (timestamp, temp) VALUES ('2014-01-04 15:36:30',24.44)")
Notice the last entry with the ` around the table name.
The first 2 entry's work fine without the apostrophe.
I can also put the apostrophes around all the table names and it still works.
Why can I remote the apostrophes from the first 2 lines, and not the 3rd one?
The tables are all created equally.
Edit 1:
In due respect to the following comments:
Your explanation is not entirely accurate. There is no alias in
the INSERT statement. I think that the part of the identifier after
28012E7 is just discarded as MySQL tries convert the identifier to
an integer value! – ypercube
these are table names not column names. – Sly Raskal
Well, MySQL sure have discarded the part of the table name identifier. My intention was to bring forward how a identifier name was interpreted when the system could not find it in the list of accessible table names ( I chose column/expression names in my examples ). As the engine interpreted it as a valid number but not as an identifier to represent a table, it threw an exception.
And I chose SELECT to clarify, why the table identifier was rejected for not putting in back quotes. Because it represents a number, it can't be used as an identifier directly, but should be surrounded with back quotes.
MySQL allows to suffix aliases just after numerics, numeric expressions surrounded by braces or literals. To one's surprise, a space between them is optional.
In your case, 28012E7A050000F5 is a valid exponent form ( 28012E7 ) of number 280120000000 suffixed with alias A050000F5. And hence 28012E7A050000F5 can't be used as a column name without back quotes. See following observations:
mysql> -- select 28012E7 as A050000F5;
mysql> select 28012E7A050000F5;
+--------------+
| A050000F5 |
+--------------+
| 280120000000 |
+--------------+
1 row in set (0.00 sec)
Following are some valid examples:
mysql> -- select ( item_count * price ) as v from orders;
mysql> select ( item_count * price )v from orders;
+-----+
| v |
+-----+
| 999 |
+-----+
1 rows in set (0.30 sec)
mysql> -- select ( 3 * 2 ) as a, 'Ravinder' as name;
mysql> select ( 3 * 2 )a, 'Ravinder'name;
+---+----------+
| a | name |
+---+----------+
| 6 | Ravinder |
+---+----------+
1 row in set (0.00 sec)

How to insert the default value in temporal tables in MySQL?

I want to create a temporal table from a SELECT statement in MySQL. It involves several JOINs, and it can produce NULL values that I want MySQL to take as zeroes. It sounds like an easy problem (simply default to zero), but MySQL (5.6.12) fails to elicit the default value.
For example, take the following two tables:
mysql> select * from TEST1;
+------+------+
| a | b |
+------+------+
| 1 | 2 |
| 4 | 25 |
+------+------+
2 rows in set (0.00 sec)
mysql> select * from TEST2;
+------+------+
| b | c |
+------+------+
| 2 | 100 |
| 3 | 100 |
+------+------+
2 rows in set (0.00 sec)
A left join gives:
mysql> select TEST1.*,c from TEST1 left join TEST2 on TEST1.b=TEST2.b;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 2 | 100 |
| 4 | 25 | NULL |
+------+------+------+
2 rows in set (0.00 sec)
Now, if I want to save these values in a temporal table (changing NULL for zero), this is the code I would use:
mysql> create temporary table TEST_JOIN (a int, b int, c int default 0 not null)
select TEST1.*,c from TEST1 left join TEST2 on TEST1.b=TEST2.b;
ERROR 1048 (23000): Column 'c' cannot be null
What am I doing wrong? The worst part is that this code used to work before I did a system-wide upgrade (I don't remember which version of MySQL I had, but surely it was lower than my current 5.6). It used to produce the behavior I would expect: if it's NULL, use the default, not the frustrating error I'm getting now.
From the documentation of 5.6 (unchanged since 4.1):
Inserting NULL into a column that has been declared NOT NULL. For
multiple-row INSERT statements or INSERT INTO ... SELECT statements,
the column is set to the implicit default value for the column data
type. This is 0 for numeric types, the empty string ('') for string
types, and the “zero” value for date and time types. INSERT INTO ...
SELECT statements are handled the same way as multiple-row inserts
because the server does not examine the result set from the SELECT to
see whether it returns a single row. (For a single-row INSERT, no
warning occurs when NULL is inserted into a NOT NULL column. Instead,
the statement fails with an error.)
My current workaround is to store the NULL values in the temporal table, and then replace them by zeroes, but it seems rather cumbersome with many columns (and terribly inefficient). Is there a better way to do it?
BTW, I cannot simply ignore some columns in the query (as suggested for another question), because it's a multirow query.
IFNULL(`my_column`,0);
That would set NULLs to 0. Other values stay as is.
Just wrap your values/column names with IFNULL and it will convert them to whatever default value you put into the function. E.g. 0. Or "european swallow", or whatever you want.
Then you can keep strict mode on and still handle NULLs gracefully.

In MySQL, should I quote numbers or not?

For example - I create database and a table from cli and insert some data:
CREATE DATABASE testdb CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
USE testdb;
CREATE TABLE test (id INT, str VARCHAR(100)) TYPE=innodb CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
INSERT INTO test VALUES (9, 'some string');
Now I can do this and these examples do work (so - quotes don't affect anything it seems):
SELECT * FROM test WHERE id = '9';
INSERT INTO test VALUES ('11', 'some string');
So - in these examples I've selected a row by a string that actually stored as INT in mysql and then I inserted a string in a column that is INT.
I don't quite get why this works the way it works here. Why is string allowed to be inserted in an INT column?
Can I insert all MySQL data types as strings?
Is this behavior standard across different RDBMS?
MySQL is a lot like PHP, and will auto-convert data types as best it can. Since you're working with an int field (left-hand side), it'll try to transparently convert the right-hand-side of the argument into an int as well, so '9' just becomes 9.
Strictly speaking, the quotes are unnecessary, and force MySQL to do a typecasting/conversion, so it wastes a bit of CPU time. In practice, unless you're running a Google-sized operation, such conversion overhead is going to be microscopically small.
You should never put quotes around numbers. There is a valid reason for this.
The real issue comes down to type casting. When you put numbers inside quotes, it is treated as a string and MySQL must convert it to a number before it can execute the query. While this may take a small amount of time, the real problems start to occur when MySQL doesn't do a good job of converting your string. For example, MySQL will convert basic strings like '123' to the integer 123, but will convert some larger numbers, like '18015376320243459', to floating point. Since floating point can be rounded, your queries may return inconsistent results. Learn more about type casting here. Depending on your server hardware and software, these results will vary. MySQL explains this.
If you are worried about SQL injections, always check the value first and use PHP to strip out any non numbers. You can use preg_replace for this: preg_replace("/[^0-9]/", "", $string)
In addition, if you write your SQL queries with quotes they will not work on databases like PostgreSQL or Oracle.
Check this, you can understand better ...
mysql> EXPLAIN SELECT COUNT(1) FROM test_no WHERE varchar_num=0000194701461220130201115347;
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | test_no | index | Uniq_idx_varchar_num | Uniq_idx_varchar_num | 63 | NULL | 3126240 | Using where; Using index |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT COUNT(1) FROM test_no WHERE varchar_num='0000194701461220130201115347';
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
| 1 | SIMPLE | test_no | const | Uniq_idx_varchar_num | Uniq_idx_varchar_num | 63 | const | 1 | Using index |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
mysql>
mysql>
mysql> SELECT COUNT(1) FROM test_no WHERE varchar_num=0000194701461220130201115347;
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set, 1 warning (7.94 sec)
mysql> SELECT COUNT(1) FROM test_no WHERE varchar_num='0000194701461220130201115347';
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
AFAIK it is standard, but it is considered bad practice because
- using it in a WHERE clause will prevent the optimizer from using indices (explain plan should show that)
- the database has to do additional work to convert the string to a number
- if you're using this for floating-point numbers ('9.4'), you'll run into trouble if client and server use different language settings (9.4 vs 9,4)
In short: don't do it (but YMMV)
This is not standard behavior.
For MySQL 5.5. this is the default SQL Mode
mysql> select ##sql_mode;
+------------+
| ##sql_mode |
+------------+
| |
+------------+
1 row in set (0.00 sec)
ANSI and TRADITIONAL are used more rigorously by Oracle and PostgreSQL. The SQL Modes MySQL permits must be set IF AND ONLY IF you want to make the SQL more ANSI-compliant. Otherwise, you don't have to touch a thing. I've never done so.
It depends on the column type!
if you run
SELECT * FROM `users` WHERE `username` = 0;
in mysql/maria-db you will get all the records where username IS NOT NULL.
Always quote values if the column is of type string (char, varchar,...) otherwise you'll get unexpected results!
You don't need to quote the numbers but it is always a good habit if you do as it is consistent.
The issue is, let's say that we have a table called users, which has a column called current_balance of type FLOAT, if you run this query:
UPDATE `users` SET `current_balance`='231608.09' WHERE `user_id`=9;
The current_balance field will be updated to 231608, because MySQL made a rounding, similarly if you try this query:
UPDATE `users` SET `current_balance`='231608.55' WHERE `user_id`=9;
The current_balance field will be updated to 231609