MySQL - Searching for CC numbers - mysql

I inherited a MySQL server that has CC numbers stored in plaintext. due to PCI requirements, I need to find the numbers and mask them. The trick is they are stored in a field with other text as well. I need to find a way to search for cc numbers and change just those, not the rest of the text.
I have tried the masking feature in MySQL, but it doesn't work for this version. I also looked up a few different sites but can't seem to find anything that will really help with my particular instance.
Edit
to explain better. the previous admin didn't tell the operators to not take CC info through the live chat system. The system is using SSL but the chat history is stored in plain text in a MySQL DB. The company isn't PCI compliant (as far as getting scanned and SAQ is concerned) so we cannot have CC numbers stored anywhere. but the numbers are given in the middle of a conversation. If they were in their own column then that wouldn't be a big deal.
EDIT
I have tried using regexp to just try and search for CC #'s but now I am getting an operand error, which is lazy quantifiers I believe.
SELECT * FROM table_name Where text regexp '^4[0-9]{12}(?:[0-9]{3})?$'
Any Ideas?

You could potentially use a regular expression to search for 16-19 consecutive characters with (using LIKE if you have the numbers separated from the text, or just REGEXP):
The example is given here (where 5 is the number of items to search for, and ^$ requires it to be those at the beginning and end):
mysql> SELECT * FROM pet WHERE name REGEXP '^.{5}$';
+-------+--------+---------+------+------------+-------+
| name | owner | species | sex | birth | death |
+-------+--------+---------+------+------------+-------+
| Claws | Gwen | cat | m | 1994-03-17 | NULL |
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
+-------+--------+---------+------+------------+-------+
Would end up something like:
REGEXP '^([0-9]{16|17|18|19})$'
https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html
And lookie here too:
Regex to match a digit two or four times

Related

Mysql LEFT JOIN on two parameters with one exact and one LIKE match on very large tables - performance

I have two tables of historical data - one (OldData) is 40,000 records from a datasource with partial/inaccurate data that I am trying to clean, the other (LookupData) is a definitive source of just over one million accurate records.
I am trying to enrich the first, smaller table with records from the larger one, and I can predict matching records by joining on surname and a numeric value known as the service number, but in the first table these numbers are often incomplete.
OldData (partial/inaccurate data) examples:
Surname | ServiceNumber
Smith | 12345
Jones | 9876
Brown | 234
LookupData examples:
Surname | ServiceNumber
SMITH | 12345
SMITH | 23456
JONES | 98765
JONES | 19182
BROWN | T12345
BROWN | 56789
Desired result:
OldData.Surname | OldData.ServiceNumber | LookupData.ServiceNumber
Smith | 12345 | 12345
Jones | 9876 | 98765
Brown | 234 | T12345
The current query that I have is
SELECT OldData.*,LookupData.ServiceNumber
FROM `OldData`
LEFT JOIN `LookupData`
ON lower(OldData.Surname) = lower(LookupData.Surname)
AND LookupData.ServiceNumber like concat('%',OldData.ServiceNumber,'%')
but this never seems to complete
If I narrow it down to a single surname for testing, and add
WHERE OldData.Surname='Devlin'
I get the 47 rows from OldData and the accurate LookupData.ServiceNumber where any matches are found (and null where they aren't) but this query still takes 27 seconds on average.
I have indexes on both Surname fields and ServiceNumber fields.
If I'm seeking the impossible I'd at least like to know :) Thanks
Let's look at the two JOIN conditions of your query.
lower(OldData.Surname) = lower(LookupData.Surname)
Using a function on both ends of the equality slows down the search. MySQL string searches are usually case-insensitive by default, unless you use the BINARY operator. This condition can be rewritten as
OldData.Surname = LookupData.Surname
Second JOIN condition is :
LookupData.ServiceNumber like concat('%',OldData.ServiceNumber,'%')
LIKE is not good for performance, especially when there is a % at the beginning : because MySQL indexes are usually ordered, this causes a full scan to be triggered, because there is no way to find an optimized starting point for the search. In your sample data, it looks like you could remove the starting %.
Using INSTR will likely not improve performance.
You could try with a regexp, like :
LookupData.ServiceNumber REGEXP OldData.ServiceNumber
If you really need to search on both ends on large data, the way to go in MySQL is Full-Text Search Functions. This would require creating a FULLTEXT index on the service number columns (and possibly converting them from numeric to text), and then :
MATCH LookupData.ServiceNumber AGAINST OldData.ServiceNumber

MySql Regexp result word part of known word

Been struggling for this for awhile.
Is there a way to find all rows in my table where the word in the column 'word' is a part of a search word?
+---------+-----------------+
| id_word | word |
+---------+-----------------+
| 177041 | utvälj |
| 119270 | fonders |
| 39968 | flamländarens |
| 63567 | hänvisningarnas |
| 61244 | hovdansers |
+---------+-----------------+
I want to extract the row 119270, fonders. I want to do this by passing in the word 'plafonders'.
SELECT * FROM words WHERE word REGEXP 'plafonders$'
That query will of course not work in this case, would've been perfect if it had been the other way around.
Does anyone know a solution to this?
SELECT * FROM words WHERE 'plafonders' REGEXP concat(word, '$')
should accomplish what you want. Your regex:
plafonders$
is looking for plafonders at the end of the column. This is looking for everything the column has until its end, e.g. the regexp is fonders$ for 119270.
See https://regex101.com/r/Ytb3kg/1/ compared to https://regex101.com/r/Ytb3kg/2/.
MySQL's REGEXP does not handle accented letters very well. Perhaps it will work OK in your limited situation.
Here's a slightly faster approach (though it still requires a table scan):
SELECT * FROM words
WHERE 'PLAutvälj' =
RIGHT('PLAutvälj', CHAR_LENGTH(word)) = word;
(To check the accents, I picked a different word from your table.)

How to match hyphen delimited in any order

I need to match a set of characters delimited by a hyphen - for example:
B-B/w-W/Br-W-Br
Where the / are part of what I need, up to 20 spaces.
G-R-B, G/R-B-B/W-O
So I need a regex that covers between the -'s in any order (G-R-B could also be R-B-G)
I've been playing around with a bunch of combo's, but I can't come up with something that will match any order.
The plan is to search this way using mysql. So, it'll be something like
select * from table1 where pinout REGEXP '';
I just can't get the regex right :/
Description
This expression will match the string providing each of the hyphen delimited values are included in the string. The color values can appear in the string in any order so this expression will match W/Br-b-B/w and B/w-W/Br-b... or any other combinations which include those colors.
^ # match the start to of the string
(?=.*?(?:^|-)W\/Br(?=-|$)) # require the string to have a w/br
(?=.*?(?:^|-)b(?=-|$)) # require the string to have a b
(?=.*?(?:^|-)B\/w(?=-|$)) # require the string to have a b/w
.* # match the entire string
MySql doesn't really support the look arounds so this will need to be broken into a group of where statements
mysql> SELECT * FROM dog WHERE ( color REGEXP '.*(^|-)W\/Br(-|$)' and color REGEXP '.*(^|-)b(-|$)' and color REGEXP '.*(^|-)B\/w(-|$)' );
+-------+--------+---------+------+------------+---------------------+
| name | owner | species | sex | birth | color |
+-------+--------+---------+------+------------+---------------------+
| Claws | Gwen | cat | m | 1994-03-17 | B-B/w-W/Br-W-Br |
| Buffy | Harold | dog | f | 1989-05-13 | G-R-B, G/R-B-B/W-O |
+-------+--------+---------+------+------------+---------------------+
See also this working sqlfiddle: http://sqlfiddle.com/#!2/943af/1/0
Using a regex in conjunction with a MySql where statement can be found here: http://dev.mysql.com/doc/refman/5.1/en/pattern-matching.html
I might have misunderstood from your example, try this:
-*([a-zA-Z/]+)-*
The capture region can be altered to include your specific letters of interest, e.g. [GRBWOgrbwo/].
Edit: I don't think this will help you in the context you're using it, but I'll leave it here for posterity.

How do I use mysql to match against multiple possibilities from a second table?

I'm not entirely sure how to ask this question, so I'll lead by providing an example table and an example output and then follow up with a more thorough explanation of what I'm attempting to accomplish.
Imagine that I have two tables. In the first is a list of companies. Some of these companies have duplicate entries due to being imported and continuously updated from different sources. For example, the company table may look something like this:
| rawName | strippedName |
| Kohl's | kohls |
| kohls.com | kohls |
| kohls Corporation | kohls |
So in this situation, we have information that has come in from three different sources. In an attempt to allow my program to understand that each of these sources are all the same store, I created the stripped name column (which I also use for creating URL's and whatnot).
In the second table, we have information about deals, coupons, shipping offers, etc. However, since these come in from their various sources, the end up with the three different rawNames that we identified above. For example, the second table might look something like this:
| merchantName | dealInformation |
| kohls.com | 10% off everything... |
| kohl's | Free shipping on... |
| kohls corporation | 1 Day Flash Sale! |
| kohls.com | Buy one get one... |
So here we have four entries that are all from the same company. However, when a user on the site visits the listing for Kohls, I want it to display all the entries from each source.
Here is what I currently have, but it doesn't seem to be doing the trick. This seems to only work if I set the LIMIT in that sub-query to 1 so that it only brings back one of the rawNames. I need it to match against all of the rawNames.
SELECT * FROM table2
WHERE merchantName = (SELECT rawName FROM table1 WHERE strippedName = '".$strippedName."')
The quickest fix is to replace your mercahantName = with merchantName IN
SELECT * FROM table2
WHERE merchantName IN (SELECT rawName FROM table1 WHERE strippedName = '".$strippedName."')
The = operator needs to have exactly one value on each side - the IN keyword matches a value against multiple values.

Selecting the most popular keyword in a mysql database

I have a column called keywords where users enter up to 4 keywords separated by a coma, ie:
----------------------------------
userId | kewords |
----------------------------------
01 | php,css,html,mysql |
02 | wordpress,css,drupal,xx |
03 | mysql,html,wordpress,css|
----------------------------------
I'm trying to figure out a query to select all the keywords from everyone, explode them by the coma and then count how many there are of each.
I know I can do this quite easily with PHP but I though there might be a way for mysql to do it...
Any ideas?
Try to normalize the data, ie store 4 rows instead of one for each user.
It also possible to split a string into a temporary table but I'm not sure that will help you much. Originally I found this source on mysql forge but that has been shut down so here is a similar code
http://www.pnplogic.com/blog/articles/MySQL_Convert_Delimited_String_To_Temp_Table_Result_Set.php