The most efficient sql schema for searching names and lastnames - mysql

I'm creating a list of members on my site, and I want to enable them to look for eachother by first name and last name or either one. The catch is that a user can have several names, like names and then nicknames, also a person can have more than one lastnames, their maiden name and then the lastname after marriage.
Once users fillout their names and last names, each user could have several names and last names, for example There could be a person with 3 names and 2 lastnames - names: Eleonora, Ela, El and lastnames: Smith, Brown.
Then if someone looks for Ela Brown, Eleonora Brown, Eleonora Smith or any other combination, they should find this person.
My question, is how should I set this all up in sql (mysql) so tha schema and search is efficient and fast? Didn't want to reinvent a wheel so I turned to pros and asking a question here.
Thanks guys
P.S. I guess the standard solution would be to have a user table, fname table, lname table, userfname table with userid and fnameid and userlname table with userid and lnameid, but I'm not sure if this is the best way to do this and wether or not search would be fast...

Do you need to differentiate between first names and last names?
I would suggest a Users Table having UserID
and also some UsersNames Table having UserID and Name, a one-to-many relationship.
If you need, you could also add a IsLastName bit to the UsersNames table (or just a LastName column, but the bit is better imho)....
But this way you search one table and can easily locate user ID's, plus you don't limit the number of names each user can have.
EDIT:
You could easily take your input string and split it out too. So if somebody put in "John Smith" you could search for both or either name simply by splitting the string and using it in the WHERE clause using either OR or AND depending on your intended functionality.

The last time I did somethig like this I processed each name into a single column in a NAMES table. All names, first/last/middle. A second table hold a link to the person record in the PERSONS table.
So each NAME field get linked to one or more PERSONS record. If I search for "Scott" I would find the name Scott in the NAMES table, find the links in the NAMES_TO_PERSONS(/PEOPLE?) table and then return all the records for that name. ie: Scott Bruns, John Scott, David Scott Smith.
It worked very well with only a small amount of pre processing.

Text searching is what you need - use Lucene. I've used Lucene on several projects and it's truly amazing - not hard to use and ridiculously fast.

If in your data model the users may have multiple but bounded number of name types then the simplest solution would be to create indecies for each column that stores the name type. You would add a field for first name, last name, nickname, maiden name, etc. This model would be more performant than having a one-many names association.
You may also evaluate if there are general search requirements for the rest of the application or if you would like the search to be more flexible. In this case you can look into using a backend indexing process, such as with Lucene or using full text search. Initially, I would try to avoid this if possible, because it certainly complicates the project.

Related

SQL Stock multiple information in a field or create tables

I'm having conception difficulties to implement something in a database. I have two solutions for a problem, and I was wondering which one is the best.
Problem :
Let's picture a table speciality with 2 fields : speciality_id and speciality_name.
So for example :
1 - Mage
2 - Warrior
3 - Priest
Now, I have a table user with fields such as user_id, name, firstname etc ...
In this table, there is a field called speciality. The speciality stores an integer, corresponding to the speciality_id of the table speciality.
That would be acceptable for users that have only one speciality. I want to improve the model to be able to have multiple specialities for a user.
Here are my two solutions :
Create a table 'solution1' which link the user_id with the speciality_id and remove the speciality field in the user table. So for a user which has 2 specialities, 2 rows will be created in the table 'solution1'.
Change the type of the field speciality in the user table to be able to write down the specialities, separated with commas.
For example 2;3
The problem I got with the second solution is for making foreign keys between my table user and my table specialities, to link them. I may have a bit more difficulties with the PHP in the future too, while wanting to get the specilities for a user (will need to use a parser I guess).
Which solution do you find is the best ?
Thanks.
Absolutely go with your first solution.
Create a third "Many-to-Many" table that allows you to relate a user to multiple specialties. This is the only way to go in your case.
When designing tables, you always want to have each column contain one and only one data element. Think about what querying your second solution would look like. What would you do when you wanted to see all users who had a given specialty?
You might try something like this:
select * from user where specialty like '%2%'
Well, what happens when you have specialties that go to 12? Now "2" matches multiple entities. You could devolve further and try to be tricky, but...you really should just make your data design as normal as possible to avoid all the mess, headache, and errors. Go with Solution 1.
i think the best way is to follow solution1 cause solution2 will end up will lot of complexity later on

Database Design: User Profiles like in Meetup.com

In Meetup.com, when you join a meetup group, you are usually required to complete a profile for that particular group. For example, if you join a movie meetup group, you may need to list the genres of movies you enjoy, etc.
I'm building a similar application, wherein users can join various groups and complete different profile details for each group. Assume the 2 possibilities:
Users can create their own groups and define what details to ask users that join that group (so, something a bit dynamic -- perhaps suggesting that at least an EAV design is required)
The developer decides now which groups to create and specify what details to ask users who join that group (meaning that the profile details will be predefined and "hard coded" into the system)
What's the best way to model such data?
More elaborate example:
The "Movie Goers" group request their members to specify the following:
Name
Birthdate (to be used to compute member's age)
Gender (must select from "male" or "female")
Favorite Genres (must select 1 or more from a list of specified genres)
The "Extreme Sports" group request their member to specify the following:
Name
Description of Activities Enjoyed (narrative form)
Postal Code
The bottom line is that each group may require different details from members joining their group. Ideally, I would like anyone to create a group (ala MeetUp.com). However, I also need the ability to query for members fairly well (e.g. find all women movie goers between the ages of 25 and 30).
For something like this....you'd want maximum normalization, so you wouldn't have duplicate data anywhere. Because your user-defined tables could possibly contain the same type of record, I think that you might have to go above 3NF for this.
My suggestion would be this - explode your tables so that you have something close to 6NF with EAV, so that each question that users must answer will have its own table. Then, your user-created tables will all reference one of your question tables. This avoids the duplication of data issue. (For instance, you don't want an entry in the "MovieGoers" group with the name "John Brown" and one in the "Extreme Sports" group with the name "Johnny B." for the same user; you also don't want his "what is your favorite color" answer to be "Blue" in one group and "Red" in another. Any data that can span across groups, like common questions, would be normalized in this form.)
The main drawback to this is that you'd end up with a lot of tables, and you'd probably want to create views for your statistical queries. However, in terms of pure data integrity, this would work well.
Note that you could probably get away with only factoring out the common fields, if you really wanted to. Examples of common fields would include Name, Location, Gender, and others; you could also do the same for common questions, like "what is your favorite color" or "do you have pets" or something to that extent. Group-specific questions that don't span across groups could be stored in a separate table for that group, un-exploded. I wouldn't advise this because it wouldn't be as flexible as the pure 6NF option and you run the risk of duplication (how do you predetermine which questions won't be common questions?) but if you really wanted to, you could do this.
There's a good question about 6NF here: Would like to Understand 6NF with an Example
I hope that made some sense and I hope it helps. If you have any questions, leave a comment.
Really, this is exactly a problem for which SQL is not a right solution. Forget normalization. This is exactly the job for NoSQL document stores. Every user as a document, having some essential fields like id, name, pwd etc. And every group adds possibility to add some fields. Unique fields can have names group-id-prefixed, shared fields (that grasp some more general concept) can have that field name free.
Except users (and groups) then you will have field descriptions with name, type, possible values, ... which is also very good for a document store.
If you use key-value document store from the beginning, you gain this freeform possibility of structuring your data plus querying them (though not by SQL, but by the means this or that NoSQL database provides).
First i'd like to note that the following structure is just a basis to your DB and you will need to expand/reduce it.
There are the following entities in DB:
user (just user)
group (any group)
template (list of requirement united into template to simplify assignment)
requirement (single requirement. For example: date of birth, gender, favorite sport)
"Modeling":
**User**
user_id
user_name
**Group**
name
group_id
user_group
user_id (FK)
group_id (FK)
**requirement**:
requirement_id
requirement_name
requirement_type (FK) (means the type: combo, free string, date) - should refers to dictionary)
**template**
template_id
template_name
**template_requirement**
r_id (FK)
t_id (FK)
The next step is to model appropriate schema for storing restrictions, i.e. validating rule for any requirement in any template. We have to separate it because for different groups the same restrictions can be different (for example: "age"). You can use the following table:
**restrictions**
group_id
template_id
requirement_id (should be here as template_id because the same requirement can exists in different templates and any group can consists of many templates)
restriction_type (FK) (points to another dict: value, length, regexp, at_least_one_value_choosed and so on)
So, as i said it is the basis. You can feel free to simplify this schema (wipe out tables, multiple templates for group). Or you can make it more general adding opportunity to create and publish temaplate, requirements and so on.
Hope you find this idea useful
You could save such data as JSON or XML (Structure, Data)
User Table
Userid
Username
Password
Groups -> JSON Array of all Groups
GroupStructure Table
Groupid
Groupname
Groupstructure -> JSON Structure (with specified Fields)
GroupData Table
Userid
Groupid
Groupdata -> JSON Data
I think this covers most of your constraints:
users
user_id, user_name, password, birth_date, gender
1, Robert Jones, *****, 2011-11-11, M
group
group_id, group_name
1, Movie Goers
2, Extreme Sports
group_membership
user_id, group_id
1, 1
1, 2
group_data
group_data_id, group_id, group_data_name
1, 1, Favorite Genres
2, 2, Favorite Activities
group_data_value
id, group_data_id, group_data_value
1,1,Comedy
2,1,Sci-Fi
3,1,Documentaries
4,2,Extreme Cage Fighting
5,2,Naked Extreme Bike Riding
user_group_data
user_id, group_id, group_data_id, group_data_value_id
1,1,1,1
1,1,1,2
1,2,2,4
1,2,2,5
I've had similar issues to this. I'm not sure if this would be the best recommendation for your specific situation but consider this.
Provide a means of storing data as XML, or JSON, or some other format that delimits the data, but basically stores it in field that has no specific format.
Provide a way to store the definition of that data
Provide a lookup/index table for the data.
This is a combination of techniques indicated already.
Essentially, you would create some interface to your clients to create a "form" for what they want saved. This form would indicated what pieces of information they want from the user. It would also indicate what pieces of information you want to search on.
Save this information to the definition table.
The definition table is then used to describe the user interface for entering data.
Once user data is entered, save the data (as xml or whatever) to one table with a unique id. At the same time, another table will be populated as an index with
id where the xml data was saved
name of field data is stored in
value of field data stored.
id of data definition.
now when a search commences, there should be no issue in searching for the information in the index table by name, value and definition id and getting back the id of the xml/json (or whatever) data you stored in the table that the data form was stored.
That data should be transformable once it is retrieved.
I was seriously sketchy on the details here, I hope this is enough of an answer to get you started. If you would like any explanation or additional details, let me know and I'll be happy to help.
if you're not stuck to mysql, i suggest you to use postgresql which provides build-in array datatypes.
you can define a define an array of varchar field to store group specific fields, in your groups table. to store values you can do the same in the membership table.
comparing to string parsing based xml types, this array approach will be really fast.
if you dont like array approach you can check out xml datatypes and an optional hstore datatype which is a key-value store.

Any way to compare/match sentences with only a different word order?

I have 2 MySQL tables , each with address data of companies in it. One table is more recent, but has no telephone and no website data. Now I want to unite these tables into 1 recent and complete table.
But for some companies the order of the words is different,like this:
'Bakery Johnson' in table 1 and 'Johnson Bakery' in table 2.
Now I need to find a way to compare these values, as they're obviously the same company.
I think I will somehow have to split those names first, and then order the different parts alphabetically.
Any chance anybody has done something like this before, and willing to share some code or function?
UPDATE:
I found a function that sorts words inside a string. I can use this to detect name swaps as described above. It's quite SLOW though...
See : MySQL: how to sort the words in a string using a stored function?
If your table is MyISAM you can run this query:
SELECT *
FROM mytable
WHERE MATCH(name) AGAINST ('+bakery +johnson')
This will find all records containing the words bakery and johnson (and probably some other words too).
Creating a FULLTEXT index on the table:
CREATE FULLTEXT INDEX
fx_mytable_name
ON mytable (name)
will speed up this query.
Going back a bit on your solution, you could go with a similar way as modern phones resolve duplicate names conflicts
You present your user with the option, as he finds something suspicious:
Is this a duplicate? Use our [ Merge ] option
You are merging Bakery Johnson, please select the source/original item:
[ Johnson Bakery v ] (my amazing dropdown!)
Everything not already in Johnson Bakery gets ported to Bakery Johnson (orders for example), you may also show an intermediate screen displaying what will be merged, or let the user pick, for example, he wants the address info from Johnson Bakery and orders from both etc
It is not self correcting as you asked, but the collaboration from the users may be more accurate than AI here. I also love low-tech solutions like this so let us know what you ended up doing.

DB Design: multiple values for single field in a record

I am working on an auto-suggest feature and as part of it I want to weight the results based on a particular user's preference. For example: If most of my users frequently type in fish, then the algorithm will return fish as the most popular result once the user types f. However, if a particular user mostly types in food, then I want to apply a weight such that it takes that particular user's preference into account.
I initially thought of doing this by having a large auto-suggest index, with a field userids and whenever a user types in a letter, the algorithm would check if that particular user's userid was present in the userids field and if present would apply a corresponding weight to that particular result.
A few records would look like:
word |count |userids
------------------------------------------------------------------------------
food |2 |aa,b,ccd
fish |12 |a,b,c,d,e,f,gg,he,jkl,sd
However, I do not think this is an approach that would scale all that well with even a few hundred active users. What would be a better way to design this DB?
Thanks in advance,
asleepysamurai
P.S. I'm a newbie when it comes to DB design, so please explain your solution in layman terms.
This is not a good idea. The table is not normalized and you will end up with complicated queries when you need to join on this field.
A better design is to have a wordid field on this table as a primary key (identifying the word) and a many to many table to connect words with users (words_to_users with a wordid and userid fields).

Searching a database of names

I have a MYSQL database containing the names of a large collection of people. Each person in the database could could have one or all of the following name types: first, last, middle, maiden or nick. I want to provide a way for people to search this database to see if a person exists in the database.
Are there any off the shelf products that would be suited to searching a database of peoples names?
With a bit of ingenuity, MySQL will do just what you need... The following gives a few ideas how this could be accomplished.
Your table: (I call it tblPersons)
PersonID (primary key of sorts)
First
Last
Middle
Maiden
Nick
Other columns for extra info (address, whatever...)
By keeping the table as-is, and building an index on each of the name-related columns, the following query provides an inefficient but plausible way of finding all persons whose name matches somehow a particular name. (Jack in the example)
SELECT * from tblPersons
WHERE First = 'Jack' OR Last = 'Jack' OR Middle = 'Jack'
OR Maiden = 'Jack' OR Nick = 'Jack'
Note that the application is not bound to only searching for one name value to be sought in all the various name types. The User can also input a specific set of criteria for example to search for the First Name 'John' and Last Name 'Lennon' and the Profession 'Artist' (if such info is stored in the db) etc.
Also, note that even with this single table approach, one of the features of your application could be to let the user tell the search logic whether this is a "given" name (like Paul, Samantha or Fatima) or a "surname" (like Black, McQueen or Dupont). The main purpose of this is that there are names that can be either (for example Lewis or Hillary), and by being, optionally, a bit more specific in their query, the end users can get SQL to automatically weed-out many irrelevant records. We'll get back to this kind of feature, in the context of alternative, more efficient database layout.
Introducing a "Names" table.
Instead (or in addition...) of storing the various names in the tblPersons table, we can introduce an extra table. and relate it to tblPersons.
tblNames
PersonID (used to relate with tblPersons)
NameType (single letter code, say F, L, M, U, N for First, Last...)
Name
We'd then have ONE record in tblPersons for each individual, but as many records in tblNames as they have names (but when they don't have a particular name, few people for example have a Nickname, there is no need for a corresponding record in tblNames).
Then the query would become
SELECT [DISTINCT] * from tblPersons P
JOIN tblNames N ON N.PersonID = P.PersonID
WHERE N.Name = 'Jack'
Such a layout/structure would be more efficient. Furthermore this query would lend itself to offer the "given" vs. "surname" capability easily, just by adding to the WHERE clause
AND N.NameType IN ('F', 'M', 'N') -- for the "given" names
(or)
AND N.NameType IN ('L', 'U', 'N') -- for the "surname" types. Note that
-- we put Nick name in there, but could just as eaily remove it.
Another interest of this approach is that it would allow storing other kinds of names in there, for example the SOUNDEX form of every name could be added, under their own NameType(s), allowing to easily find names even if the spelling is approximate.
Finaly another improvement could be to introduce a separate lookup table containing the most common abbreviations of given names (Pete for Peter, Jack for John, Bill for William etc), and to use this for search purposes (The name columns used for providing the display values would remain as provided in the source data, but the extra lookup/normalization at the level of the search would increase recall).
You shouldn't need to buy a product to search a database, databases are built to handle queries.
Have you tried running your own queries on it? For example: (I'm imagining what the schema looks like)
SELECT * FROM names WHERE first_name='Matt' AND last_name='Way';
If you've tried running some queries, what problems did you encounter that makes you want to try a different solution?
What does the schema look like?
How many rows are there?
Have you tried indexing the data in any way?
Please provide some more information to help answer your question.