Find as close as exact matches in database - which way is better?

Find as close as exact matches in database - which way is better? - mysql

I have a situation:
I have a database (MySQL) which contains products and their codes like this
BLACK SUGAR BS 709
HOT SAUCE AX889/9
TOMY 8861
I got an excel spreadsheet which I converted to CSV, this contains prices for the products. Its structure consists in 2 columns, code and price, like this:
BS709 23.00
AX 889 /9 10.89
8861 1.69
I made a script to update the products prices by searching in the database for the respective product code, using a FOREACH and %LIKE% query.
FOREACH row in CSV, search the database using "WHERE product_code LIKE %code%.
This is offcourse a primitive and not so succesfull way of updating the prices, because the codes in CSV are not an exact match (in syntax) of those in the database so if I have two products in the DB containing BS709 (BS70923) in their code I get multiple matches.
Is there a better way of doing this ?

You could trim the columns of spaces and other characters using MySQL replace() before comparing. This will return all exact matches, regardless of any spaces contained.
SELECT * FROM table WHERE REPLACE( product_code, ' ', '' ) LIKE 'code'

Given your examples, I would recommend removing all spaces from both, and then just looking for when the beginning or end of a code matches exactly:
where replace(e.code, ' ', '') like concat(replace(db.code, ' ', ''), '%') or
replace(e.code, ' ', '') like concat('%', replace(db.code, ' ', '')) or
replace(db.code, ' ', '') like concat(replace(e.code, ' ', ''), '%') or
replace(db.code, ' ', '') like concat('%', replace(e.code, ' ', ''));
This may not work for the specific case when one code is a prefix of another.
In any case, if the product codes in a spreadsheet are different from the product codes in the database, I think you have bigger problems. If you cannot really fix the spreadsheets, I would recommend that you manually/semi-automatically create a synonyms table in the database. This would have the Excel product code in one column and the correct product code in the other. Then you can do the lookup just by joining this together.
Yes. That is work. But probably less work than struggling with this problem and getting poor results that have to be repeatedly updated.

Related

Counting how many fields (in a row) are filled in SQL

I want to count how many columns in a row are not NULL.
The table is quite big (more than 100 columns), therefore I would like to not do it manually or using php (since I dont use php) using this approach Counting how many MySQL fields in a row are filled (or empty).
Is there a simple query I can use in a select like SELECT COUNT(NOT ISNULL(*)) FROM big_table;
Thanks in advance...

Agree with comments above:
There is something wrong in the data since there is a need for such analysis.
You can't completely make it automatic.
But I have a recipe for you for simplifying the process. There are only 2 steps needed to achieve your aim.
Step 0. In the step1 you'll need to get the name of your table schema. Normally, the devs know in what schema does the table reside, but still... Here is how you can find it
select *
from information_schema.tables
where table_name = 'test_table';
Step 1. First of all you need to get the list of columns. Getting just the list of cols won't help you out at all, but this list is all we need to be able to create SELECT statement, right? So, let's make database to prepare select statement for us
select concat('select (length(concat(',
group_concat(concat('ifnull(', column_name, ', ''###'')') separator ','),
')) - length(replace(concat(',
group_concat(concat('ifnull(', column_name, ', ''###'')') separator ','),
'), ''###'', ''''))) / length(''###'')
from test_table')
from information_schema.columns
where table_schema = 'test'
and table_name = 'test_table'
order by table_name,ordinal_position;
Step 3. Execute statement you've got on step 2.
select (length(concat(.. list of cols ..)) -
length(replace(concat(.. list of cols .. ), '###', ''))) / length('###')
from test_table
The select looks tricky but it's simple: first replace all nulls with some symbols that you're sure you'll never get in those columns. I usually do that replacing nulls with "###". that what all that "ifnull"s are here for.
Next, count symbols with "length". In my case it was 14
After that, replace all "###" with blanks and count length again. It's 11 now. For that I was using "length(replace" functions together
Last, just divide (14 - 11) by a length of a replacement string ("###" - 3). You'll get 1. This is exactly amount of nulls in my test string.
Here's a test case you can play with
Do not hesitate to ask if needed

FIND_IN_SET woth trim function for values

I have a column that contains a string of comma delimited values. I use FIND_IN_SET to query this column and it works fine until there is a space between the value and the ,. I cannot control the input. The only solution I have found that works is by running REPLACE on the column within the FIND_IN_SET function. Unfortunately this will remove all spaces and could return undesired results.
The blow example would return both row in the table as opposed to the first one only.
col1 | col2
carpet , foo, bar | myVal1
abc, 123 , car pet | myVal2
Query
SELECT FIND_IN_SET('carpet', REPLACE(col1, ' ', ''));
Is there a way of limiting this to only trim the space wither side of the ,

You could try replacing ,[ ] or [ ], with just comma:
SELECT
col1,
col2,
FIND_IN_SET('carpet', REPLACE(REPLACE(col1, ', ', ','), ' ,', ',')) AS output
FROM yourTable;
Demo
Note: This answers assumes that there would be at most one leading/trailing space around the commas, and that your actual data itself does not contain commas. If there could arbitrary amount of whitespace, this answer would fail. In that case, what you would really need is regex replacement. MySQL 8+ does support this, but a better bet would be to normalize your data and stop storing CSV data like this.

count multiple occurence substrings in MySql

I'm pulling data from Twitter API into my DB. There is a column 'hashtags' which stores a list of hashtags used in the tweet.
Table name: brexittweets
Column: hashtags varchar(500)
I want to count the number of hashtags. For example
Hashtags
Tweet1: ['EUref', 'Brexit', 'poll']
Tweet2: ['Brexit', 'Blair']
Tweet3: ['Brexit', 'Blair', 'EUref']
Result should be:
hashtag count(hashtag)
Brexit 3
EUref 2
Blair 2
poll 1
What I was thinking of doing:
Tried to take substring between quotes ' ', but it occurs multiple times in the same row.
Tried using strpos to find instances of ' ', but it returns only the first instance.
Is there a way to do this with queries? I was thinking of trying out a procedure, but it gets complicated because I need to print these results on a web page using PHP.

If only you've normalized your table such that each tag in a tweet gets stored on its own row, your problem would be solved easily by using COUNT with GROUP BY.

Assuming all the tags are separated by ', ', you can do the following:
SELECT
hashtags,
ROUND (
(
LENGTH(hashtags)
- LENGTH( REPLACE ( hashtags , "', '", "") )
) / 4) + 1
AS count
from brexittweets
Here's the SQL Fiddle.

How to make distinct list off words from large column of sentence

I have large list of sentence about 18m records (2gb).
id txt
---------------------------
1 Hi my name is Jim.
2 I love listing music.
....
I want to make new table with all distinct words.
id word
---------------------------
1 Hi
2 my
3 name
...
What is the best way making that request, keeping in mind large database and execution time?
All sentence are FULLTEXT indexed.

This is maybe crazy/naive/impossible - But you can try to:
Dump all data into a text file with SELECT txt FROM old_table INTO OUTFILE 'file_name'
Open the file with a decent text editor
Find and replace all characters you don't need (like . , ! ?)
Find and replace all whitespaces with \n
CREATE TABLE words (word VARCHAR(50) PRIMARY KEY)
Import the data from file ignoring duplicates: LOAD DATA INFILE 'file_name' IGNORE INTO TABLE words
Alter the table to add the id column or use INSERT .. SELECT .. to copy the data to a new table.

Here is one method . . . it just requires scanning the table multiple times and assumes words are separated by a single space:
select substring_index(txt, ' ', 1) as word
from t
union all
select substring_index(substring_index(txt, ' ', 2), ' ', -1) as word
from t
where txt like '% %'
union all
select substring_index(substring_index(txt, ' ', 3), ' ', -1) as word
from t
where txt like '% % %'
union ll
. . .
The problem is that you have to keep adding subqueries up to the maximum number of words in the text.
In other words, SQL can do what you want. However, it is not necessarily the optimal solution. You might be better off reading the data into a tool like Python and then writing it out again to the database.

SQL find in other field data and update another field

I have troubles with doing such sql query for mysql db:
I need to update field A in my db, but also i have B field, which contains much data, for example:
ASIAN HORSE 70з рус 600A (261x175x220)
or
Бэрен polar 55/59з (555112) 480A (242x175x190)
i must fetch 70з and set it in field A, and 55/59з same (but for another record).
But how can i search in B field something what end's with з but is word (not all data as % before з)
I know, that it could sound like homework... but i real don't know ho to select only word with some end...

The MySQL function substring_index can be used to select pieces of a string delimited by something. For example this picks out the third "word" from MyColumn:
select substring_index(substring_index(MyColumn, ' ', 3), ' ', -1) from MyTable
(70з is the third "word" in ASIAN HORSE 70з рус 600A (261x175x220).)
Update If instead of the third word you are looking for the "word" that ends with 'з', you can use:
select substring_index(substring_index(MyColumn, 'з', 1), ' ', -1) from MyTable
This will consider 'з' as the delimiter though, and removes it from the result. You can add it back with concat:
select concat(substring_index(substring_index(MyColumn, 'з', 1), ' ', -1), 'з') from MyTable

If you are trying to parse the field so the third value always goes into a particular field, then you have a hard problem and probably want to create a user-defined function.
However, if you just want to see if 70з is present and set another field, then this should work:
update t
set B = '70з'
where A like '% 70з' or A like '% 70з %' or A like '70з %' or A = '70з'
This uses spaces to define the word boundaries and considers whether the string is at the beginning, in the middle, at the end, or the entire value in A.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Find as close as exact matches in database - which way is better? - mysql

You could trim the columns of spaces and other characters using MySQL replace() before comparing. This will return all exact matches, regardless of any spaces contained. SELECT * FROM table WHERE REPLACE( product_code, ' ', '' ) LIKE 'code'

Related

Counting how many fields (in a row) are filled in SQL

FIND_IN_SET woth trim function for values

count multiple occurence substrings in MySql

How to make distinct list off words from large column of sentence

SQL find in other field data and update another field

Categories

Resources