Finding duplicate records by searching a few characters

Finding duplicate records by searching a few characters - mysql

I know there are many questions answered under this title. But I believe I have a unique situation where I need a specific way to find and export the duplicate data from my database.
I have a database with over 20.000 contacts. I need a query to find duplicate records in the contacts table. But since there are many same last names, or first names for different people, I want to lookup the first few characters of the first name and the last name to see if there is a duplicate record matching the query.
So, the query could be explained like this: Look at the first two characters from the firstName column, then look at the first three characters from the lastName column, and show it with any similar records.
I would highly appreciate any advice. Thank you.

Here's how I'd do it, assuming your contacts table is called contacts:
select *
from contacts c
join contacts c2 on c2.id!=c.id
and left(c2.firstName,2)=left(c.firstName,2)
and left(c2.lastName,3)=left(c.lastName,3)

Related

cursor in mysql for row having multiple values in it

I have table emails_grouping in that I have one column named 'to_ids' this column contains multiple employee Id's . Now I want to change that Id's with respective employee names. employee data is in employee table.
this is in mysql.
I tried multiple ways but I'm not able to replace id's with names because , that 'to_ids' column contains multiple 'Ids'.
description to_ids
'Inactive Employees with missing Last working day', '11041,11109,899,13375,1715,1026'
above is the column which I want to change Id's with employee names.

This problem should demonstrate to you why it's a bad idea to store "lists" of id's like you're doing. You should instead store one id per row.
You can join to your employee table like this:
SELECT e.name
FROM emails_grouping AS g
JOIN employee AS e
ON FIND_IN_SET(e.id, g.to_ids)
WHERE g.description = 'Inactive Employees with missing Last working day';
But be aware that joining using a function like this is not possible to optimize. It will have very slow performance, because it can't look up the respective employee id's using an index. It has to do a table-scan of the employee table, and evaluate the id's against your comma-separated list one by one.
This is just one reason why using comma-separated lists instead of normal columns is trouble. See my answer to Is storing a delimited list in a database column really that bad?

Mysql query, selecting two values in one field

In the table above table 'sent' I use, among other things, two fields: 'opened' and emailnumber.
I'm looking for a mysql query that the people who opened email number 1 and/or email number 2 one or more times
So in the example it would be everyone, except Fred
I think it's simple, but I can't find the solution

SELECT name
FROM table
GROUP BY name
HAVING SUM(open > 0);

How can I combine these two tables so that I can sort with information on each table, but not get duplicate answers?

I have two tables. The first is named master_list. It has these fields: master_id, item_id, name, img, item_code, and length. My second table is named types_join. It has these fields: master_id and type_id. (There is a third table, but it is not being used in the queries. It is more for reference.) I need to be able to combine these two tables so that I can sift the results to only show certain ones but part of the information to sift is on one table and the other part is on the other one. I don't want duplicate answers.
For example say I only want items that have a type_id of 3 and a length of 18.
When I use
SELECT * FROM master_list LEFT JOIN types_join ON master_list.master_id=types_join.master_id WHERE types_join.type_id = 3 AND master_list.length = 18"
it finds the same thing twice.
How can I query this so I won't get duplicate answers?
Here are the samples from my tables and the result I am getting.
This is what I get with an INNER JOIN:
BTW, master_id and name both only have unique information on the master_list table. However, the types_join table does use the master_id multiple times later on, but not for Lye. That is why I know it is duplicating information.

If you want unique rows from master_list, use exists:
SELECT ml.*
FROM master_list ml
WHERE ml.length = 18 AND
EXISTS (SELECT 1
FROM types_join tj
WHERE ml.master_id = tj.master_id AND tj.type_id = 3
);
Any duplicates you get will be duplicates in master_list. If you want to remove them, you need to provide more information -- I would recommend a new question.

Thank you for the data. But as you can see enter link description here, there is nothing wrong with your query.
Have you tried create an unique index over master_id, just to make sure that you do not have duplicated rows?
CREATE UNIQUE INDEX MyMasterUnique
ON master_list(master_id);

The optimal way to store multiple-selection survey answers in a database

I'm currently working on a survey creation/administration web application with PHP/MySQL. I have gone through several revisions of the database tables, and I once again find that I may need to rethink the storage of a certain type of answer.
Right now, I have a table that looks like this:
survey_answers
id PK
eid
sesid
intvalue Nullable
charvalue Nullable
id = unique value assigned to each row
eid = Survey question that this answer is in reply to
sesid = The survey 'session' (information about the time and date of a survey take) id
intvalue = The value of the answer if it is a numerical value
charvalue = the value of the answer if it is a textual representation
This allowed me to continue using MySQL's mathematical functions to speed up processing.
I have however found a new challenge: storing questions that have multiple responses.
An example would be:
Which of the following do you enjoy eating? (choose all the apply)
Girl Scout Cookies
Bacon
Corn
Whale Fat
Now, when I want to store the result, I'm not sure of the best way to handle it.
Currently, I have a table just for multiple choice options that looks like this:
survey_element_options
id PK
eid
value
id = unique value associated with each row
eid = question/element that this option is associated with
value = textual value of that option
With this setup, I then store my returned multiple selection answers in 'survey_answers' as strings of comma separated id's of the element_options rows that were selected in the survey. (ie something like "4,6,7,9") I'm wondering if that is indeed the best solution, or if it would be more practical to create a new table that would hold each answer chosen, and then reference back to a given answer row which in turn references back to the element and ultimately the survey.
EDIT
for anyone interested, here is the approach I ended up taking (In PhpMyAdmin Relations View):
And a rudimentary query to gather the counts for a multiple select question would look like this:
SELECT e.question AS question, eo.value AS value, COUNT(eo.value) AS count
FROM survey_elements e, survey_element_options eo, survey_answer_options ao
WHERE e.id = 19
AND eo.eid = e.id
AND ao.oid = eo.id
GROUP BY eo.value

This really depends on a lot of things.
Generally, storing lists of comma separated values in a database is bad, especially if you plan to do anything remotely intelligent with that data. Especially if you want to do any kind of advanced reporting on the answers.
The best relational way to store this is to also define the answers in a second table and then link them to the users response to a question in a third table (with multiple entries per user-question, or possibly user-survey-question if the user could take multiple surveys with the same question on it.
This can get slightly complex as a a possible scenario as a simple example:
Example tables:
Users (Username, UserID)
Questions (qID, QuestionsText)
Answers (AnswerText [in this case example could be reusable, but this does cause an extra layer of complexity as well], aID)
Question_Answers ([Available answers for this question, multiple entries per question] qaID, qID, aID),
UserQuestionAnswers (qaID, uID)
Note: Meant as an example, not a recommendation

Convert primary key to not unique index and add answers for the same question under the same id.
For example.
id | eid | sesid | intval | charval
3 45 30 2
3 45 30 4
You can still add another column for regular unique PK if needed.
Keep things simple. No need for relation here.

It's a horses for courses thing really.
You can store as a comma separated string (But then what happens when you have a literal comma in one of your answers).
You can store as a one-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
intvalue Nullable
charvalue Nullable
And then loop over that table. If you picked one answer, it would create one row in this table. If you pick two answers, it will create two rows in this table, etc. Then you would remove the intvalue and charvalue from the survey_answers table.
Another choice, since you're already storing the element options in their own table, is to create a many-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
survey_element_options_id FK
Again, one row per option selected.
Another option yet again is to store a bitmask value. This will remove the need for a many-to-many table.
survey_element_options
id PK
eid FK
value Text
optionnumber unique for each eid
optionbitmask 2 ^ optionnumber
optionnumber should be unique for each eid, and increment starting with one. There will impose a limit of 63 options if you are using bigint, or 31 options if you are using int.
And then in your survey_answers
id PK
eid
sesid
answerbitmask bigint
Answerbitmask is calculated by adding all of the optionbitmask's together, for each option the user selected. For example, if 7 were stored in Answerbitmask, then that means that the user selected the first three options.
Joins can be done by:
WHERE survey_answers.answerbitmask & survey_element_options.optionbitmask > 0
So yeah, there's a few options to consider.

If you don't use the id as a foreign key in another query, or if you can query results using the sesid, try a many to one relationship.
Otherwise I'd store multiple choice answers as a serialized array, such as JSON or through php's serialize() function.

How to query 3 mysql tables and return matching results (with one to many relationships)?

I am trying to query a database to return some matching records and can't work out how to do it in the most efficient way. I have a TUsers table, a TJobsOffered table and a TJobsRequested table. The UserID is the primary key for the TUsers table and is used within the Job tables in a one to many relationship.
Ultimately I want to run a query that returns a list of all matching users based on a particular UserID (eg a matching user is one that has at least one matching record in both tables, eg if UserA has jobid 999 listed in TJobsOffered and UserB has jobid 999 listed in TJobsRequested then this is a match).
In order to try and get my head around it i've simplified it down a lot and am trying to match the records based on the jobids for the user in question, eg:
SELECT DISTINCT TJobsOffered.FUserID FROM TJobsOffered, TJobsRequested
WHERE TJobsOffered.FUserID=TJobsRequested.FUserID AND
(TJobsRequested.FJobID='12' OR TJobsRequested.FJobID='30') AND
(TJobsOffered.FJobID='86' OR TJobsOffered.FJobID='5')
This seems to work fine and returns the correct results however when I introduce the TUsers table (so I can access user information) it starts returning incorrect results. I can't work out why the following query doesn't return the same results as the one listed above as surely it's still matching the same information just with a different connector (or is the one above effectively many to many and the one below 2 sets of one to many comparisons)?
SELECT DISTINCT TUsers.Fid, TUsers.FName FROM TUsers, TJobsOffered, TJobsRequested
WHERE TUsers.Fid=TJobsRequested.FUserID AND TUsers.Fid=TJobsOffered.FUserID AND
(TJobsRequested.FJobID='12' OR TJobsRequested.FJobID='30') AND
(TJobsOffered.FJobID='86' OR TJobsOffered.FJobID='5')
If anyone could explain where i'm going wrong with the second query and how you should incorporate TUsers then that would be greatly appreciated as I can't get my head around the join. If you are able to give me any pointers as to how I can do this all in one query by just passing the user id in then that would be massively appreciated as well! :)
Thanks so much,
Dave

Try this
SELECT DISTINCT TJobsOffered.FUserID , TUsers.FName
FROM TJobsOffered
INNER JOIN TJobsRequested ON TJobsOffered.FUserID=TJobsRequested.FUserID
LEFT JOIN TUsers ON TUsers.Fid=TJobsOffered.FUserID
WHERE
(TJobsRequested.FJobID (12,30) AND
(TJobsOffered.FJobID IN (86 ,5)

You need to add "AND TJobsOffered.FUserID=TJobsRequested.FUserID" to your where clause.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008