Advanced search in concatenating string - mysql

i am creating a search functionality for a website where i need to take the user's full address from an input e.g "Address 32, City,Region,Country, Postal Code"(no necessary with this order) and return the available restaurant that are around the area.
I have a table "address" where there is a field for each of the above elements.
I was thinking of concatenating the users address from the database and compare it with the user's input by help of SQL REGEXP.
Is there any other approximate SQL search that can give me that or can you suggest me a different approach?
A friend suggested using (http://www.simonemms.com/2011/02/08/codeigniter-solr/) however with a small research on it the problem still remains.

Trouble with concatentating the address together in SQL is you will miss out on it using indexes. Hence it will be slow. Added to which if you do not know the order of the input elements the chances of it matching what is concatenated from the database (is a likely different order) is slim.
I would suggest for much of the address items, split them off into different tables (ie a table of regions, another of countries, etc) and just store the ids in the columns in the users table.
For a search, identify which of the search fields go with which actual field then join on those to find the real address.
Also means you can identify typos more easily.

Related

How to perform inexact matches on two data sets

i'm trying to compare two data sets (vendor masters) from two systems. we are moving to one system, so we want to avoid duplication. the issue is that the names, addresses, etc could be slightly different. for example, the name might end in 'Inc' or 'Inc.' or the address could be 'St' or 'Street'. the vendor masters have been dumped to excel, so i was thinking about pulling them into access to compare them, but i'm not sure how to handle the inexact matches. the data fields i need to compare are: name, address, telephone number, feder tax id (if populated), contact name
Here is how I would proceed. You will rarely get answers like this on Stack Exchange, since your question if not focused enough. This is a rather generic set of steps not specific to a particular tool (i.e. database or spreadsheet). As I said in my comments, you'll need to search for specific answers (or ask new ones) about the particular tools you use as you go. Without knowing all the details, Access can certainly be useful in doing some preliminary matching, but you could also utilize Excel directly or even Oracle SQL since you have it as a resource.
Back up your data.
Make a copy of your data for matching purposes.
Ensure that each record for both sets of data have a unique key (i.e. AutoNumber field or similar), so that until you have a confirmed match the records can always be separately identified.
Create new matched-key table and/or fields containing the list of matched unique key values.
Create new "matching" fields and copy your key fields into these new fields.
Scrub the data in all possible matching fields by
Removing periods and other punctuation
Choosing standard abbreviations and replacing all variations by the same value in all records. Example: replace "Incorporation" and "Inc." with "Inc"
Trim excess spaces from the end and between terms
Formatted all phone numbers exactly the same way, or better yet remove all space and punctuation for comparison purposes, excluding extension information: ##########
Parse and split multi-term fields into separate fields. Name -> First, Middle, Last Name fields; Address -> Street number, street name, extra address info.
The parsing process itself can identify and reconcile formatting differences.
Allows easier matching on terms separately.
Etc., etc.
Once the matching fields are sufficiently scrubbed, now match on the different fields.
Define matching priorities, that is which field or fields are likely to produce reliable matches with the least amount of uncertainty.
For records containing Tax ID numbers, that seems like the most logical place to start since an exact match on that number should be valid OR can indicate mistakes in your data.
For each type of match, update the matched-key fields mentioned above
For each successive matching query, exclude records that already have a match in the matched-key table/fields.
Refine and repeat all these steps until you are satisfied that all matches have been found.
Add all non-matched records to your final merged record set.
You never said how many records you have. If possible, it may be worth your organization's time to manually verify the automated matches by listing them side by side and manually tweaking them when needed.
But even if you successfully pair non-exact matches, someone still needs to make the decision of which record to keep for the merged system. I imagine you might have matches on company name and tax id--essentially verifying the match--but still have different addresses and/or contact name. There is no technical answer that will help you know which data to keep or discard. Once again, human review should be done to finalize the merged records. If you set this up correctly, a couple human eyeballs could probably go through thousands of record in just a day.

sql query to check many interests are matched

So I am building a swingers site. The users can search other users by their interests. This is only part of a number of parameters used to search a user. The thing is there are like 100 different interests. When searching another user they can select all the interests the user must share. While I can think of ways to do this, I know it is important the search be as efficient as possible.
The backend uses jdbc to connect to a mysql database. Java is the backend programming language.
I have debated using multiple columns for interests but generating the thing is the sql query need not check them all if those columns are not addressed in the json object send to the server telling it the search criteria. Also I worry i may have to make painful modifications to the table at a later point if i add new columns.
Another thing I thought about was having some kind of long byte array, or number (used like a byte array) stored in a single column. I could & this with another number corresponding to the interests the user is searching for but I read somewhere this is actually quite inefficient despite it making good sense to my mind :/
And all of this has to be part of one big sql query with multiple tables joined into it.
One of the issues with me using multiple columns would be the compiting power used to run statement.setBoolean on what could be 40 columns.
I thought about generating an xml string in the client then processing that in the sql query.
Any suggestions?
I think the correct term is a Bitmask. I could maybe have one table for the bitmask that maps the users id to the bitmask for querying users interests, and another with multiple entries for each interest per user id for looking up which user has which interests efficiently if I later require this?
Basically, it would be great to have a separate table with all the interests, 2 columns: id and interest.
Then, have a table that links the user to the interests: user_interests which would have the following columns: id,user_id,interest_id. Here some knowledge about many-to-many relations would help a lot.
Hope it helps!

Prevent duplicate rows with different queries

Let's say I have a products grid. In this grid there's a product called "Scarf XY".
Now a user wants to search for all items with similar name, so she types in a live-search box the word "Scarf X", and it will be performed an async request to retrieve from DB all rows that match that word.
I would like to prevent the new query to return again the row for "Scarf XY".
Is there a way to, let's say, "keep track" of already returned rows even from different queries?
(Sorry for my english)
Forgot to mention: every item returned from the DB is preserved in a local Array, that's why every new query may cause duplicate entries.
There is a way to do this with MySQL subqueries, but if this is meant for a site, this will be inefficient. For example, a user may type in search terms and then delete them. Such a system you described would result in eight SQL queries for a search of "Scarf XY", which will put an unnecessary load on your database server.
A more modern and resource efficient way of doing this would be to supply the browser a JSON array and use something like Typeahead.js from Twitter to display the information in a search bar client-side.

Should i to normalize this MySQL table

i'm designing a web site for a friend and i'm not sure what's the best way is to go in regards to one of my database tables.
To give you an idea, this is roughly what i have
Table: member_profile
`UserID`
`PlanID`
`Company`
`FirstName`
`LastName`
`DOB`
`Phone`
`AddressID`
`website`
`AllowNonUserComments`
`AllowNonUserBlogComments`
`RequireCaptchaForNonUserComments`
`DisplayMyLocation`
the last four
AllowNonUserComments
AllowNonUserBlogComments
RequireCaptchaForNonUserComments
DisplayMyLocation
(and possibly more such boolean fields to be added in the future) will control certain website functionality based on user preference.
Basically i'm not sure if i should move those fields to a
new table : member_profile_settings
`UserID`
`AllowNonUserComments`
`AllowNonUserBlogComments`
`RequireCaptchaForNonUserComments`
`DisplayMyLocation`
or if i should just leave it be part of the member_profile table since every member is going to have their own settings.
The target is roughly 100000 members on the long run and 10k to 20k in the short run. My main concern is database performance.
And while i'm at it question #2) would it make sense to move contact information of the member such as address street, city, state, zip , phone etc into the member_profile table instead of having address table and having the AddressID like i currently have.
Thank you
I would say "no" and "yes, but" as the answers to 1) and 2). For #1, your queries are going to be a lot easier to manage if you create columns for each preference. The best systems I've worked with were done that way. Moving the preferences into a separate table with "user, preference, value" triples leads to complex queries that join multiple tables just to check a setting.
For #2: there's no reason to put the address in another table, because the single "AddressID" column means there's just one address per member, anyway, and again, it's just going to complicate the queries. If you turn it around backwards and have an address table that embeds userids then that might make sense; it makes even more sense to do phone numbers that way, since people often have multiple phone numbers.
If each member in the database has exactly ONE value for each of the attributes you have listed, then your database is already normalized and thus in a quite convenient form. So, to answer #1, moving these fields to a different table would improve nothing and just make querying more difficult.
As for #2, if you wanted to contemplate the possibility of a member having multiple addresses or phone numbers, you should definitely put those in different tables, allowing many-to-one relationships. This might also make sense if you expect that a number of users will share the same address; this way, you will not be duplicating information by having to store all the same address information for multiple users, you would just reference an addresses table that would have the relevant information one time per address.
However, if you need neither multiple addresses per member nor multiple members per address, then putting the addresses information in another table is just unnecessary complexity. Which solution is more convenient depends on the needs of your specific application.
Since each member has exactly one value in this table, it's already normalized. However, considering query efficiency, sometimes denormalization should be considered.
Except the ID field, the others could seperate into 2 groups: profile group and settings group. If your website usaually use these two groups of data seperately, you should consider to have news table for different usage.
For example, if the profile fields only shows in profile page and the settings fields works in whole site, it's not necessary to look up profile fields all the time.

Shall I put contact information in a separate table?

I'm planning a database who has a couple of tables who contain plenty of address information, city, zip code, email address, phone #, fax #, and so on (about 11 columns worth of it), a table is an organizations table containing (up to) 2 addresses (legal contacts and contacts they should actually be used), plus every user has the same information tied to him.
We are going to have to run some geolocation stuff on those addresses too (like every address that's within X Kilometers from another address).
I have a bunch of options, each with its own problem:
I could put all the information inside every table but that would make for tables with a very large amount of columns which I'd have problems indexing, and if I change my address format it'll take a while to fix it.
I could put all the information inside an array and serialize it, then store the serialized information in one field, same problem with the previous method with a little less columns and much less availability through mysql queries
I could create a separate table with address information and link it to the other tables either by
putting an address_id column in the users and organizations table
putting a related_id and related_table columns in the addresses table
That should keep stuff tidier, but it might create some unforeseen problems with excessive joining or whatever.
Personally I think that solution 3.2 is the best, but I'm not too confident about it, so I'm asking for opinions.
Option 2 is definitely out as it would put the filtering logic into your codes instead of letting the DBMS handle them.
Option 1 or 3 will depend on your need.
if you need fast access to all the data, and you usually access both addresses along with the organization information, then you might consider option 1. But this will make it difficult to query out (i.e. slow) if the table get too big in mysql.
option 3 is good provided you index the tables correctly.