Soundex against individual values in CSV column - mysql

I suspect this may not be doable, but I figured I'd try anyway.
In a MySQL database, one of the columns related is a comma-separated list of values: bob,sally,james,rick.
For a given row, the number of items in this column is variable.
Now, if I want to do a soundex search against all the items in that column (client request well after this db has been established and integrated), how would I go about this? I'd want to write something like
SELECT `primary` FROM `table` WHERE `related`.split(",").any() SOUNDS LIKE sample
Which is plainly nonsense code but hopefully conveys the idea.
Essentially, explode/split a CSV field into individual values to SOUNDEX compare. If I have to get all those related fields, explode them and then soundex() them individually in a PHP foreach() loop so be it (that language isn't really important, it could be Python, too, with just a touch more effort), but I'd love to avoid it if possible.

Related

How to return substring positions in LIKE query

I retrieve data from a MySQL database using a simple SELECT FROM WHERE LIKE case-insensitive query where I escape any % or _ in the like clause, so really the user can only perform basic text research and cannot mess up with regex because I then surround it myself with % in the LIKE clause.
For every row returned by this query, I have to search again using a JS script in order to find all the indexes of the substring in the original string. I dislike this method because I it's a different pattern matching than the one used by the LIKE query, I can't guarantee that the algorithm is the same.
I found MySQL functions POSITION or LOCATE that can achieve it, but they return only the first index if it was found or 0 if it was not found. Yes you can set the first index to search from, and by searching by passing the previously returned index as the first index until the new returned index is 0, you can find all indexes of the substring, but it means a lot of additional queries and it might end up slowing down my application a lot.
So I'm now wondering: Is there a way to have the LIKE query to return substring positions directly, but I didn't find any because I lack MySQL vocabulary yet (I'm a noob).
Simple answer: No.
Longer answer: MySQL has no syntax or mechanism ot return an array of anything -- from either a SELECT or even a Stored Procedure.
Maybe answer: You could write a Stored procedure that loops through one result, finding the results and packing them into a commalist. But I cringe at how messy that code would be. I would quickly decide to write JS code, as you have already done.
Moral of the story: SQL is is not a full language. It's great at storing and efficiently retrieving large sets of rows, but lousy at string manipulation or arrays (other than "rows").
Commalist
If you are actually searching a simple list of things separated by commas, then FIND_IN_SET() and SUBSTRING_INDEX() in MySQL closely match what JS can be done with its split (on comma) method on strings.

Store and query array or group of words in MYSQL and PHP

I am working on a project that uses PHP/MYSQL as the backend for an IOS app that makes a lot of use of dictionaries and arrays containing text or strings.
I need to store this text in MYSQL (coming from Arrays of srtrings on phone) and then query to see the text contains (case insensitive) a word or phrase in question.
For example, if the array consists of {Ford, Chevy, Toyota, BMW, Buick}, I might want to query it to see it contains Saab.
I know storing arrays in a field is not MYSQL friendly as it prevents optimization. However, it would be way too complicated to create individual tables for these collections of words which are created by users.
So I'm looking for a reasonable way to store them, perhaps delimited with spaces or with commas that makes possible reasonably efficient searches.
If they are stored separated by spaces, I gather you can do something with regex like:
SELECT
*
FROM
`wordgroups`
WHERE
wordgroup regexp '(^|[[:space:]])BLA([[:space:]]|$)';
But this seems funky.
Is there a better way to do this? Thanks for any insights
Consider using a FULLTEXT index. And use MATCH(...) AGAINST(... IN NATURAL LANGUAGE MODE).
FULLTEXT is very fast for "words", and IN NATURAL MODE may solve your Saab example.
Using regexp can achieve what you want, however, your query will be inefficient, since it cannot rely on any indexes.
If you want to store a list of words and their position within the array does not matter, then you may consider storing them in a single field, space delimited. But instead of using a regexp, use fulltext indexing and searching. This method has a clear advantage over searching with regexp: it uses an index. It has some drawbacks as well: there is a stopword list (these are excluded from searching) and there is a minimum word length as well. The good news is that these parameters are configurable. Also, you get all the drawbacks of storing data in a delimited field, as detailed in Is storing a delimited list in a database column really that bad? question here on SO.
However, if you want to use dictionaries (key - value pairs) or the position within the list may be important, then the above data structure will not do.
In this case, I would consider if mysql is the right choice for storing my data in the first place. If you have multi-dimensional lists, or lists containing lists, then I would definitely choose a different nosql solution.
If you only need simple, two-dimensional lists / dictionaries, then you can store all of them in a single table with a similar structure as below:
list_id - unique identifier of the list, primary key
user_id - id of the user the list belongs to
key - for dictionaries this is the lookup field (indexed), for other lists it may store the position of the element. String data type.
value - the field holding the value (indexed). Data type should be string, so that it could hold different data types as well.
A search to determine if a list holds a certain value would be fast and efficient lookup using the index on either the key or value fields.

SQL Index on Strings Helpful?

So I have used MySQL a lot in small projects, for school; however, I'm not taking over a enterprise-ish scale project, and now speed matters, not just getting the right information back. I have Googled around a lot trying to learn how indexes might make my website faster, and I am hoping to further understand how they work, not just when to use them.
So, I find myself doing a lot of SELECT DISTINCTS in order to get all the distinct values, so i can populate my dropdowns. I have heard that this would be faster if this column was indexed; however, I don't completely understand why. If the values in this columns were ints, I would totally understand; basically a data structure like a BST would be created, and search times could be Log(n); however, if my column is strings, how can it put a string in a BST? This doesn't seem possible, since there is no metric to compare a string against another string (like there are with numbers). It seems like an index would just create a list of all the possible values for that column, but it seems as if the search would still require the database to go through every single row, making this search linear, just like if the database just scanned a regular tables.
My second question is what does the database do once it finds the right value in the index data structure. For example, let's say I'm doing a where age = 42. So, the database goes through the data structure until it finds 42, but how does it map that lookup to the whole row? Does the index have some sort of row number associated with it?
Lastly, if I am doing these frequent SELECT DISTINCT statements, is adding an index going to help? I feel like this must be a common task for websites, as many sites have dropdowns where you can filter results, I'm just trying to figure out if I'm approaching it the right way.
Thanks in advance.
You logic is good, however, your assumption that there is no metric to compare string to other strings is incorrect. Strings can simply be compared in alphabetical order, giving them a perfectly usable comparison metric that can be used to build the index.
It takes a tiny bit longer to compare strings then it does ints, however, having an index still speeds things up, regardless of the comparison cost.
I would like to mention however that if you are using SELECT DISTINCT as much as you say, there are probably problems with your database schema.
You should learn about normalizing your database. I recommend starting with this link: http://databases.about.com/od/specificproducts/a/normalization.htm
Normalization will provide you with querying mechanism that can vastly outweigh benefits received from indexing.
if your strings are something small like categories, then an index will help. If you have large chunks of random text, then you will likely want a full text index. If you are having to use select distinct a lot, your database may not be properly normalized for what you are doing. You could also put the distinct values in a separate table (that only has the distinct values), but this only helps if the content does not change a lot. Indexing strategies are particular to your application's access patterns, the data itself, and how the tables are normalized (or not).
HTH

Set Data Type in mySQL

My knowledge of relational databases is more limited, but is there a SQL command that can be used to create a column that contains a set in each row?
I am trying to create a table with 2 columns. 1 for specific IDs and a 2nd for sets that correspond to these IDs.
I read about
http://dev.mysql.com/doc/refman/5.1/en/set.html
However, the set data type requires that you know what items may be in your set. However, I just want there to be a variable-number list of items that don't repeat.
It would be much better to create that list of items as multiple rows in a second table. Then you could have as many items in the list you want, you could sort them, search for a specific item, make sure they're unique, etc.
See also my answer to Is storing a delimited list in a database column really that bad?
No, there's no MySQL data type for arbitrary sets. You can use a string containing a comma-delimited list; there are functions like FIND_IN_SET() that will operate on such values.
But this is poor database design. If you have an open-ended list, you should store it in a table with one row per value. This will allow them to be indexed, making searching faster.
MySQL doesn't support arrays, lists or other data structures like that. It does however support strings so use that and FIND_IN_SET() function:
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_find-in-set
"SET" data type won't be a good choice here.
You can use the "VARCHAR" and store the values in CSV format. You handle them at application level.
Example: INSERT into my_table(id, myset) values(1, "3,4,7");

Comma separated value & wildcards in mysql

I have a value in my database with comma separated data eg.
11,223,343,123
I want to get the data, if it match a certain number (in this example it's number 223).
WHERE wp_postmeta.meta_value IN
('223', '223,%', '%,223,%', '%,223')
I thought I could use wildcard for it, but with no luck. Any ideas of how to do this? Maybe it's better to do this using PHP?
Storing stuff in a comma separated list usually is a bad idea, but if you must, use the FIND_IN_SET(str,strlist) function.
WHERE FIND_IN_SET('223',wp_postmeta.meta_value)
If you can change your database and normalise it, you would get faster results. Create an extra table that links meta_values to your primary_id in your table.
The wp_post_meta table is designed to hold loads of values, and for that simple reason (and because of database normalization, you should not never comma seperated lists as values in databases.
If you absolutely must use it this way, there are some mySQL functions, one being FIND_IN_SET.