Big MySQL table, REPLACE -> very slow query - mysql

I have a table with 17.6 million rows in a MyISAM database.
I want to searh an article number in it, but the result can not depend on special chars as dot,comma and others.
I'm using a query like this:
SELECT * FROM `table`
WHERE
replace(replace(replace( replace( `haystack` , ' ', '' ),
'/', '' ), '-', '' ), '.', '' )
LIKE 'needle'
This method is very-very slow. table has an index on haystack, but EXPLAIN shows query can not use that, That means query must scan 17.6 million rows - in 3.8 sec.
Query runs in a page multiple times (10-15x), so the page loads extremly slow.
What should i do? Is it a bad idea to use replace inside the query?

As you do the replace on the actual data in the table, MySQL can't use the index, as it doesn't have any indexed data of the result of the replace which it needs to compare to the needle.
That said, if your replace settings are static, it might be a good idea to denormalize the data and to add a new column like haystack_search which contains the data with all the replaces applied. This column could be filled during an INSERT or UPDATE. An index on this column can then effectively be used.
Note that you probably want to use % in your LIKE query as else it is effectively the same as a normal equal comparison. Now, if you use a searchterm like %needle% (that is with a variable start), MySQL again can't use the index and falls back to a table scan as it only can use the index if it sees a fixed start of the search term, i.e. something like needle%.
So in the end, you might end up having to tune your database engine so that it can held the table in memory. Another alternative with MyISAM tables (or with MySQL 5.6 and up also with InnoDB tables) is to use a fulltext index on your data which again allows rather efficient searching.

It's "bad" to apply functions to the column as it will force a scan of the column.
Perhaps this is a better method:
SELECT list
, of
, relevant
, columns
, only
FROM your_table
WHERE haystack LIKE 'two[ /-.]needles'
In this scenario we are searching for "two needles", where the space between the words could be any of the character within the square brackets i.e. "two needles", "two/needles", "two-needles" or "two.needles".

You could try using LENGTH on the column, not sure if it gives a better affect. Also, when using LIKE you should use the %
SELECT * FROM `table`
WHERE
haystack LIKE 'needle%' AND
LENGTH(haystack) - LENGTH(REPLACE(haystack,'/','')) = 0 AND
LENGTH(haystack) - LENGTH(REPLACE(haystack,'-','')) = 0 AND
LENGTH(haystack) - LENGTH(REPLACE(haystack,'.','')) = 0;
If the haystack is exactly needle then do this
SELECT * FROM `table`
WHERE
haystack='needle';

Related

Counting how many fields (in a row) are filled in SQL

I want to count how many columns in a row are not NULL.
The table is quite big (more than 100 columns), therefore I would like to not do it manually or using php (since I dont use php) using this approach Counting how many MySQL fields in a row are filled (or empty).
Is there a simple query I can use in a select like SELECT COUNT(NOT ISNULL(*)) FROM big_table;
Thanks in advance...
Agree with comments above:
There is something wrong in the data since there is a need for such analysis.
You can't completely make it automatic.
But I have a recipe for you for simplifying the process. There are only 2 steps needed to achieve your aim.
Step 0. In the step1 you'll need to get the name of your table schema. Normally, the devs know in what schema does the table reside, but still... Here is how you can find it
select *
from information_schema.tables
where table_name = 'test_table';
Step 1. First of all you need to get the list of columns. Getting just the list of cols won't help you out at all, but this list is all we need to be able to create SELECT statement, right? So, let's make database to prepare select statement for us
select concat('select (length(concat(',
group_concat(concat('ifnull(', column_name, ', ''###'')') separator ','),
')) - length(replace(concat(',
group_concat(concat('ifnull(', column_name, ', ''###'')') separator ','),
'), ''###'', ''''))) / length(''###'')
from test_table')
from information_schema.columns
where table_schema = 'test'
and table_name = 'test_table'
order by table_name,ordinal_position;
Step 3. Execute statement you've got on step 2.
select (length(concat(.. list of cols ..)) -
length(replace(concat(.. list of cols .. ), '###', ''))) / length('###')
from test_table
The select looks tricky but it's simple: first replace all nulls with some symbols that you're sure you'll never get in those columns. I usually do that replacing nulls with "###". that what all that "ifnull"s are here for.
Next, count symbols with "length". In my case it was 14
After that, replace all "###" with blanks and count length again. It's 11 now. For that I was using "length(replace" functions together
Last, just divide (14 - 11) by a length of a replacement string ("###" - 3). You'll get 1. This is exactly amount of nulls in my test string.
Here's a test case you can play with
Do not hesitate to ask if needed

Multiple find and replace using SQL query

I'd like to use an SQL query to find and replace multiple values. I've had a look at this question that shows the following answer:
UPDATE
YourTable
SET
Column1 = REPLACE(Column1,'a','b')
WHERE
Column1 LIKE '%a%'
How can I find and replace multiple values instead of just the one?
My data is like the following, there's hundreds of rows, I'm specifically wanting to target each product_id:123:
subscription_id,products
"128","product_id:268|quantity:1|total:3.15|meta:|tax:0;product_id:267|quantity:1|total:2.97|meta:|tax:0"
I need to replace the product id's with new products id's. So it'll be "everything matching 268 will become 195" and "everything matching 267 will become 194".
Is there an efficient way to do it other than taking the code block above and using that for each product. Can I be done with one sweep through?
Simplest possible way would be to chain REPLACEs together, but considering the concatenated nature of the field you need to be sure you don't inadvertently target something that's not actually a product_id value. You can mitigate this by including some contextual content from the string value itself:
UPDATE YourTable
SET products = REPLACE(REPLACE(products, "product_id:267|", "product_id:194|"), "product_id:268|", "product_id:195|");
DBFiddle | MySQL 5.6 Reference Manual :: 13.2.8 REPLACE Statement
If there's some variability in how these strings might appear in a given field and you're running MySQL >=8.0, you can leverage something like REGEXP_REPLACE() to perform this same replacement using a defined RegExp pattern.
Yes, there are ways. For example, you can create a table like
replacements(id, oldval, newval)
and do the following:
UPDATE
Yourtable
JOIN
replacements
ON
Yourtable.Column1 LIKE CONCAT('%', replacements.oldval, '%')
SET
Yourtable.Column1 = REPLACE(Yourtable.Column1, replacements.oldval, replacements.newval);
The problem is that you would need to fill replacements with the pairs of oldval-newval, but MySQL cannot guess that. Insertion is as simple (assuming that id is auto_increment) as
INSERT INTO replacements(oldval, newval) VALUES
('a', 'b'),
('c', 'd'),
...
;

MYSQL REGEX search many words with no order condition

I try to use a regex with mysql that search boundary words in a json array string but I don't want the regex match words order because I don't know them.
So I started firstly to write my regex on regex101 (https://regex101.com/r/wNVyaZ/1) and then try to convert this one for mysql.
WHERE `Wish`.`services` REGEXP '^([^>].*[[:<:]]Hygiène[[:>:]])([^>].*[[:<:]]Radiothérapie[[:>:]]).+';
WHERE `Wish`.`services` REGEXP '^([^>].*[[:<:]]Hygiène[[:>:]])([^>].*[[:<:]]Andrologie[[:>:]]).+';
In the first query I get result, cause "Hygiène" is before "Radiothérapie" but in the second query "Andrologie" is before "Hygiène" and not after like it written in the query. The problem is that the query is generated automatically with a list of services that are choosen with no order importance and I want to match only boundary words if they exists no matter the order they have.
You can search for words in JSON like the following (I tested on MySQL 5.7):
select * from wish
where json_search(services, 'one', 'Hygiène') is not null
and json_search(services, 'one', 'Andrologie') is not null;
+------------------------------------------------------------+
| services |
+------------------------------------------------------------+
| ["Andrologie", "Angiologie", "Hygiène", "Radiothérapie"] |
+------------------------------------------------------------+
See https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#function_json-search
If you can, use the JSON search queries (you need a MySQL with JSON support).
If it's advisable, consider changing the database structure and enter the various "words" as a related table. This would allow you much more powerful (and faster) queries.
JOIN has_service AS hh ON (hh.row_id = id)
JOIN services AS ss ON (hh.service_id = ss.id
AND ss.name IN ('Hygiène', 'Angiologie', ...)
Otherwise, in this context, consider that you're not really doing a regexp search, and you're doing a full table scan anyway (unless MySQL 8.0+ or PerconaDB 5.7+ (not sure) and an index on the full extent of the 'services' column), and several LIKE queries will actually cost you less:
WHERE (services LIKE '%"Hygiène"%'
OR services LIKE '%"Angiologie"%'
...)
or
IF(services LIKE '%"Hygiène"%', 1, 0)
+IF(services LIKE '%"Angiologie"%', 1, 0)
+ ... AS score
HAVING score > 0 -- or score=5 if you want only matches on all full five
ORDER BY score DESC;

use wildcards with update and replace

I have a very large MySQL table with lots of data in it, one of the fields is Invoice No, and is a number starting at 1000.001 (This is a string). I have got this from someone that left the company and they imported the data through excel and some of the numbers have come across as 1000.01 instead of 1000.010.
When I run this query in php my admin, it shows there are over 11k rows, so I can see them ok.
SELECT `AnalysisID` , `InvoiceNo`
FROM `STStbl000010`
WHERE `InvoiceNo` LIKE '%.__'
ORDER BY `STStbl000010`.`AnalysisID` ASC
So simply put I need to add a 0 (Zero) to the end of those entries.
I have tried the following, however, it just returns 0 rows effected.
Can I use wildcards like this in and Update and Replace Statement?
UPDATE `STStbl000010AT`
SET `InvoiceNo` = replace(`InvoiceNo`, '%.__', '%.__0')
WHERE `InvoiceNo` LIKE '%.__'
Thanks
You can't use wildcards in a replace, hence no matched rows.
Luckily, if you just want to add a 0, you can concatenate the string values:
UPDATE `STStbl000010AT`
SET `InvoiceNo` = CONCAT(`InvoiceNo`,'0')
WHERE `InvoiceNo` LIKE '%.__'

mysql match string with start of string in table

I realise that it would be a lot easier if I could modify the table when it was created, but assuming I can't, I have a table that is such as:
abcd
abde
abdf
abff
bbsdf
bcggs
... snip large amount
zza
The values in the table are not fixed length.
I have a string to match such as abffagpokejfkjs .
If it was the other way round, I could do
SELECT * from table where value like 'abff%'
but I need to select the value that matches the start of a string that is provided.
Is there a quick way of doing that, or does it need an itteration through the table to find a match?
Try this:
SELECT col1, col2 -- etc...
FROM your_table
WHERE 'abffagpokejfkjs' LIKE CONCAT(value, '%')
Note that this will not use an index effectively so it will be slow if you have a lot of records.
Also note that some characters in value (e.g. %) may be interpreted by LIKE as having a special meaning, which may undesirable.
LIKE can be avoided, by truncating the comparison string to each value's length:
... WHERE LEFT('abffagpokejfkjs', LENGTH(value)) = value