MySQL joining two tables with only a partial match - mysql

I'm doing some volunteer work with Nepali refugees in my community and I am trying to organize their addresses. I have 656 Nepali last names in one table and about 608,000 addresses in another table. I have never used MySQL before and have only learned a little bit online to make these tables.
Not real names from table.
My tables:
AddressTable: 4 Columns
Owner_Name Owner_Address Owner_CityState Owner_Zip
------------------------------------------------------
Smith, John | ************* | *************** | *****
adhikari, Prem| ************* | *************** | *****
Baker, Mary | ************* | *************** | *****
NamesTable: 1 Column
Last_Name
-----------
Smith
adhikari
Baker
I only want the addresses for people who have Nepali last names, so I want to select all the columns from my AddressTable that match with the last names from my NamesTable by joining the tables from the Last_Name column in the NamesTable with the Owner_Name column in the AddressTable. Since the Owner_Name column has both last name and the first name I've been having trouble doing this.

Before I answer, let me just say that this is not going to work in all likelihood. Name matching like this is fraught with problems, unless you know that the data is canonically structured.
You can do this in several ways. The idea is that you need functions in the on clause. For instance:
select . . .
from addresstable a join
namestable n
on n.last_name = substring_index(owner_name, ',', 1);
This assumes that the last name is in the owner_name before the first comma.

I recommend using REGEXP here:
SELECT at.*
FROM AddressTable at
INNER JOIN NamesTable nt
ON at.Owner_Name REGEXP CONCAT('^', nt.Last_Name, ',');
Demo
As mentioned in previous above comments, a given last name by itself may not be unique. We can modify the above query to also check the first name, assuming that the names table contains that as well:
SELECT at.*
FROM AddressTable at
INNER JOIN NamesTable nt
ON at.Owner_Name REGEXP CONCAT('^', nt.Last_Name, ',') AND
at.Owner_Name REGEXP CONCAT(' ', nt.First_Name, '$');
But even this might still have problems, because sometimes people have first or last name consisting of two (or more) words. Also, such a thing as a middle name is possible.
For a better solution, you might want to break up the first, middle, and last names into separate columns before bringing your data into the database.

Related

Removing reciprocal/similar records using Inner Join

I have a table with names, some of which are shorthand for others and some which are similar but are not. For instance Michael and Mike are reciprocal, yet Uncle Michael is not. I ran a script to get the either one- or two-way matching e.g.
Michael | Mike
Mike | Michael
yet only
Michael | Uncle Michael
which indicates they are not matching pairs.
I'm trying to use that to then remove the shorter matching term (e.g. Mike).
I have a SqlFiddle demonstrating this, I can get as far as finding only the matching pairs but am unsure how to now do a Delete t1 to delete the shorter of the found record from all of the matching pairs.
This might give you some insight from db server's perspective. We can use a group by clause to group names defined in a name-pair. e.g 'Mike' and 'Michael'. Then we count the number of distinct names in the result set . In the case when more than 1 distinct name exists, we delete the shorter one. Otherwise delete nothing as there is only 1 distinct name existing which we probably want to keep.
delete from Names where exists
(
select count(*) from
(select name from Names where (name='Michael' or name='Mike') group by name ) t
having count(*) >1
)
and name='Mike'
;

Update a cell that contains a specific string where the ROW is SPECIFIED and COLUMN is NOT SPECIFIED

I have a table with the following columns: name - course1 - course2 - course3. Two rows look like this:
John - physics - math - art
Sara - math - chemistry - psychology
Now John has been expelled from the math class and I want to replace "math" with "none" on his row.
When I look for a solution I find things like this:
UPDATE tableName SET `course1` = 'none' WHERE `name`='John' AND `course1`='math';
That could be useful if I knew the column where 'math' was recorded for John. But that word can be under any column. What I need is something like this:
sql_query="find the row where name='John' and then find the column where we have the word 'math' and only there replace 'math' with 'none'.
Can you kindly help me with this?
In this case, I think there is no other way besides evaluating each column, like this:
update
my_table
set
course1 = if(course1 = 'math', 'none', course1),
course2 = if(course2 = 'math', 'none', course2),
course3 = if(course3 = 'math', 'none', course3)
where
name = 'John';
First of all, if your table i.e tbl_Student don't have an ID which is a Primary Key you are in big trouble, I am recommending you to have something like this, again I don't know what you are storing so keep in mind change it to what you need, but with a Primary Key:
please also note I have some changes for your table structure as well, I am starting with the simplest one
tbl_Student
--------------------
sid sName cName
--------------------
1 Shaho Math
2 Awat Physics
now, I want to change Shaho's cName to anything, in your case 'none' use this.
update tbl_Student set cName = 'none' where sid = '1'
because sid is a Primary Key you don't have to worry about duplicate here, since the Primary Key has two main characteristics Not Null and No Duplicate:
let me change some structure for you if you see in your provided example, the courses starts from One to unknown, might be two, or three or 10
you can do c1,c2,c3...c10, but it is not a good practice, since you are using a relational database, we can use that, let me show you an example:
tbl_Student
--------------------
sid sName
--------------------
1 Shaho
2 Awat
tbl_Course
--------------------
cid cName
--------------------
1 Math
2 Physics
here is the inner table
tbl_StudentCourse
--------------------
scid sid cid
--------------------
1 1 1
2 1 1
3 2 1
As you see you can have almost any course assigned to a specific student, but How I can get information for a student since they don't have a direct connection between tables?
well, we have one, the inner table tbl_StudentCourse which connects both of them, so, here we can use the join techniques to get whatever information we want, I want to select sid, sName, came for student one
SELECT tbl_student.sid, tbl_student.sName, tbl_course.cName from tbl_student left join tbl_studentcourse on (tbl_student.sid = tbl_studentcourse.sid) left join tbl_course on (tbl_course.cid = tbl_studentcourse.cid) where tbl_student.sid = 1
if you want all students:
SELECT tbl_student.sid, tbl_student.sName, tbl_course.cName from tbl_student left join tbl_studentcourse on (tbl_student.sid = tbl_studentcourse.sid) left join tbl_course on (tbl_course.cid = tbl_studentcourse.cid);
So, we are using this, why we don't use names or courses?
well, the answer for this is the process on string is much slower rather than the number, and you still might get duplicate values.
On the other hand, using a Key that is not null and not duplicated will change the entire game for us.
the other of using relational one is each student is now unique, each Course is now unique, you just assign which student has which course, also if you see just from looking, all other operation now is easier, do you want to update student name? then update it.
My advice is to use Relational, if you don't then go back to Excel just kidding.

How do I search for an entry out of two SQL tables and know which table it came from?

I'm trying to find a specific entry. This entry can appear in only ONE of my two tables and will never repeat in either table.
Here is a scaled-down version example of my tables:
Table 1:
Date Name Room
2020/01/23 John 201
2020/01/22 Rebecca 203
Table 2 (does NOT have the same amount of columns):
Date Name
2020/01/23 Robert
2020/01/22 Sarah
To find this entry, I need to specify a date and a name. You can assume names never repeat.
So let's say I want to find Sarah 2020/01/22
She could appear in either Table 1 or Table 2, and I don't know which one and I need to know which table she's in.
I'm not sure how I would do this in a single SQL query. So far I just have two separate ones:
SELECT date,name from Table1 WHERE name="Sarah" and date='2020/01/22'
and
SELECT date,name from Table2 WHERE name="Sarah" and date='2020/01/22'
Is there a way to do it in a single query that also tells me which table it came from? It could be another field or some indication that I can get. Thanks.
Use union all, and add another column to each resulset, with a literal value that indicates the table name:
select 't1' as which, date, name from table1 where name = 'Sarah' and date = '2020-01-22'
union all
select 't2' as which, date, name from table2 where name = 'Sarah' and date = '2020-01-22'

\n Separated Search in Column

I have a district table, in which we store user’s preferred districts in district table district_id (varchar(250)) field(column). Value stored in this field is like 1 2 5 6 1 by using \n. So please tell me, how can i search in this specific column?
Don't. Your design is absolutely horrible and this is why you are having this issue in the first place.
When you have a N-N relationship (a user can have many preferred districts and each district can be preferred by many users) you need to make a middle table with foreign keys to both tables.
You need:
A table for districts with only information about districts.
A table with users with only information about users.
A table for preferred districts by user with the district number and the user id as columns and foreign key constraints. This will make sure that any user can have an unlimited number of preferred districts with easy querying.
I would not recommend performing searches on data stored that way, but if you are stuck it can be done with regular expressions.
You have to deal with starting and ending matches for a string as well. So a regular LIKE is not going to work.
MySQL Regular Expressions
Give this SQL a try. To search for the number 5
SELECT * FROM `TABLE` WHERE `field` REGEXP '\\n?(5)\\n?';
If you want to match using the LIKE feature. It can be done using multiple rules.
SELECT * FROM `TABLE` WHERE `field` LIKE '%\\n5\\n%' OR LIKE '5\\n%' OR LIKE '%\\n5';
Note that you have to use a double \ to escape for a new line.
Easiest way is to just use a LIKE query, like this:
SELECT * FROM `preferred_districts` WHERE `district_id` LIKE '%6%';
To make sure it's the right one you'll receive (because this will also match id 16, 26, 674 etc.) you'll have to check manually if it's correct. In php (dunno if you use it) you could use the snippet below:
$id_field = '1 2 5 6 17';
$ids = explode("\n", $id_field);
if(in_array(6, $ids)) {
echo 'Yup, found the right one';
}
Important Although the above will work, your database design isn't how it should be. You should create (what is sometimes called) a pivot table between the districts and the users, something like below.
(Table 'users_preferred_districts')
user_id | district_id
--------+------------
2 | 1
2 | 17
9 | 21
Like this it's quite easy to retrieve the records you want...
I have used mysql function FIND_IN_SET() and I got the desired result through this function.
I got help from this tutorial.
http://www.w3resource.com/mysql/string-functions/mysql-find_in_set-function.php

MySQL Display Table Name Along With Columns

I'm currently debugging a huge MySql call which joins a large amount of tables which share column names such as id, created_at, etc. I started picking it apart but I was wondering if there was a way to do something like:
SELECT * AS table.column_name FROM table1 LEFT JOIN etc etc etc...
In place of having to individually name columns like:
SELECT table1.`column2' AS 'NAME', table1.`column3` AS ...
It would definitely help with speeding up the debugging process if there's a way to do it.
Thanks.
Edit:
Thanks for the answers so far. They're not quite what i'm looking for and I think my question was a bit vague so i'll give an example:
Suppose you have this setup in your MySql Schema:
table: students
fields: INT id | INT school_id | VARCHAR name
table: schools
fields: INT id | INT name
students contains:
1 | 1 | "John Doe"
schools contains:
1 | "Imaginary School One"
Doing the MySql call "SELECT * FROM students LEFT JOIN schools ON (students.school_id = schools.id)" will yield:
id | school_id | name | id | name
1 | 1 | "John Doe" | 1 | "Imaginary School One"
We know better and we know that the first Id and Name columns refer to the students table and the second Id and Name refer to the schools table since the data set is really small and unambiguous with its naming. However, if we had to deal with a result set that contained multiple left joins and columns with similar names, then it would start to get difficult to read and normally you'd have to trace through it by following the joins. We could start doing something like
SELECT school.name AS 'school_name', etc etc etc...
But that gets incredibly impractical once you start dealing with large data sets.
I was wondering though if there was a way to return the result set wherein the column names would look like this instead:
students.id | students.school_id | students.name | schools.id | schools.name
Which would be useful for future references if I need to do something similar again.
What if you select the tables in order, and add a spacer column with the name.
i.e.
select 'table1', t1.*, 'table2', t2.*, 'table3', t3.*
...
At least that way you don't have to name specific columns.
you mean something like?
show tables;
desc <tablename>;
If you want to also return table names along with column names, you can use the CONCAT function.
EXAMPLE:
SELECT CONCAT('tableName', field) FROM tableNAme
Let us know if this is what you are looking for.
Why not use the same dot notation instead of the ambiguous underscores for separating your tables from column names. Just enclose the alias in back-ticks. For example:
SELECT students.id `students.id` FROM students;