Select distinct combinations from two columns - mysql

I have two columns, source and destination in table Hyperlink, to store the source and destination of hyperlinks.
source | destination
--------------------
a | b
b | c
c | d
c | b
There are two hyperlinks involving both b and c. The difference between the two hyperlinks is the direction of the hyperlink. However, my objective is to retrieve unique hyperlinks, no matter which direction. So for hyperlinks such as from b to c and from c to b, I just want to select one of them. Any one would do.
So my results should look like this:
source | destination
--------------------
a | b
b | c
c | d
So far I am able to implement this in Java, with some processing before I execute SQL statements using JDBC. However, this is going to be very tedious when the table becomes very large.
I wonder if there is anyway I can do this in SQL instead.
I tried SELECT DISTINCT source,destination FROM Hyperlink but it returns me the unique permutations. I need the unique combinations.
Thanks!

This is easily achievable with the least() and greatest() operator, but as MySQL doesn't support them you need to use a CASE construct to get the smaller/greater one. With two columns this is ok, but this solution gets pretty messy once more columns are involved
select distinct
case
when source < destination then source
else destination
end as source,
case
when source > destination then source
else destination
end as destination
from hyperlinks

Try the following query:
SELECT DISTINCT source, destination FROM hyperlink
MINUS
SELECT destination, source FROM hyperlinks WHERE source < destination;
This works for Oracle . If you're using PostgreSQL, DB2 or TSQL, use the EXCEPT keyword instead of MINUS.
EDIT:
There's no equivalent of these keywords in MySQL. You'll have to work around it by selecting the values as suggested by Jim Riordan. I'm not going to delete my answer in case if anyone needs to do it in any of the other four major DBMS.

You can use the union of two separate join queries like so:
SELECT
lhs.source, lhs.destination
FROM Hyperlink lhs
LEFT OUTER JOIN Hyperlink rhs
ON rhs.source = lhs.destination
WHERE rhs.source IS NULL
UNION
SELECT
lhs.source, lhs.destination
FROM Hyperlink lhs
JOIN Hyperlink rhs
ON rhs.source = lhs.destination
WHERE rhs.destination <> lhs.source
ORDER BY source;
The first query gets the links that don't have the source as the destination, the second gets the matches that have source as the destination, but different opposites. It's probably not the fastest implementation but ensuring you have indexes on the source and destination columns will help it along, whether it will be performant for you depends how big the Hyperlink table is or is likely to get.

I tried this query and it worked for me
SELECT table1.Source, table1.Destination FROM dbo.hyperlinks table1 WHERE NOT EXISTS
(SELECT * FROM hyperlinks table2 WHERE table1.Source = table2.Destination AND table2.Source = table1.Destination)
UNION
SELECT TOP 1 table1.Source, table1.Destination FROM hyperlinks table1 WHERE
(SELECT COUNT(*) FROM hyperlinks table2 WHERE table1.Source = table2.Destination AND table2.Source = table1.Destination) > 0

Related

How would you compare 3 tables to 1 in MySQL?

so I have a possibly silly question, but I'm looking for a basic approach or strategy for the following problem.
I have 1 master file and 3 source files, lets call them master, src1, src2, and src3. The master file is SUPPOSED to have the same records as the 3 source files combined, however, the master file has more records than the sum of all 3 sources. My goal is to validate that all records in said src1-3 are inside the master file AND also extract the records from the master that aren't in any 1 of the 3 sources. Additionally, each of the 4 files have different (but similar) headers
I have been able to find the distinct records from src1 (and subsequent sources) and mapped it to the matching records in the master file by using the following :
WITH tmp1 AS (
SELECT src1.*
FROM src1 as s1
LEFT JOIN master as mstr
ON (
s1.name = mstr.fname
s1.quant = mstr.qty
s1.item = mstr.obj
s1.price = mstr.prc
s1.age = mstr.time_since_dob
)
) SELECT DISTINCT primaryKey from tmp1;
Using this, I can get a count of distinct matches between the two files that are present in src1 and if that matches the count from select distinct PK from src1 then I'm in decent shape. Albeit, I know that using the criteria above I could easily get many collision since several records could have the same name, quantity, item, price, etc... But suffice it to say, using the above criteria I can get unique matches since there are no matching ID's between the two tables or anything like that. Additionally, the join criteria for each source is slightly different so I had to do the above 3 separate times and validate each source independently.
Having done the above along with some other analysis, I have been able to validate that each distinct record from src1-3 has at least 1 distinct match in the master file. I'm having issue, however, with the second half of this challenge where I have to select the records from the master file that did NOT have a corresponding match.
How can I select those records from the master file that were not matched? Can I do a simple
select * from master not in newView1 where newView1 is the combination of the 3 selects for the 3 sources? Again, I'm using different columns for each join condition so putting 3 sources under the same header might be difficult (but worth pursuing?). Another thing worth mentioning is that each file is ~1gb and the master file is ~3gb so time complexity is worth considering.
Thanks for all and any help.
First, using UNION ALL to get all matching rows and rows contained only in src1-3 tables.
Next, getting also rows of the master table that is contained only in the master table by joining with the tmp1 table.
Refer to the following query:
with tmp1(tbl,name,quant,item,price,age,fname,qty,obj,prc,time_since_dob) as (
select 'src1',s1.*,m.* from src1 s1 left join master1 m on
s1.name=m.fname and
s1.quant=m.qty and
s1.item=m.obj and
s1.price=m.prc and
s1.age=m.time_since_dob
union all
select 'src2',s2.*,m.* from src2 s2 left join master1 m on
s2.name=m.fname and
s2.quant=m.qty and
s2.item=m.obj and
s2.price=m.prc and
s2.age=m.time_since_dob
union all
select 'src3',s3.*,m.* from src3 s3 left join master1 m on
s3.name=m.fname and
s3.quant=m.qty and
s3.item=m.obj and
s3.price=m.prc and
s3.age=m.time_since_dob
)
select 'master',m.fname,m.qty,m.obj,m.prc,m.time_since_dob from master1 m left join tmp1 t on
m.fname=t.name and
m.qty=t.quant and
m.obj=t.item and
m.prc=t.price and
m.time_since_dob=t.age
where t.name is null
union all
select t.tbl,t.name,t.quant,t.item,t.price,t.age from tmp1 t
where t.fname is null
db fiddle

Is it possible to inverse the select statement in SQL?

When I want to select all columns expect foo and bar, what I normally do is just explicitly list all the other columns in select statement.
select a, b, c, d, ... from ...
But if table has dozen columns, this is tedious process for simple means. What I would like to do instead, is something like the following pseudo statement:
select * except(foo, bar) from ...
I would also like to know, if there is a function to filter out rows from the result consisting of multiple columns, if multiple rows has same content in all corresponding columns. In other words duplicate rows would be filtered out.
------------------------
A | B | C
------------------------ ====> ------------------------
A | B | C A | B | C
------------------------ ------------------------
You can query INFORMATION_SCHEMA db and get the list of columns (except two) for that table, e.g.:
SELECT REPLACE(GROUP_CONCAT(COLUMN_NAME), '<foo,bar>,', '')
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = '<your_table>' AND TABLE_SCHEMA = '<database>';
Once you get the list of columns, you can use that in your select query.
You can create view based on this table with all columns except these two columns and then use this view everytime with
select * from view
simple group by on all column will remove such duplicates. there are other options as well - distinct and row_number.
select * except(foo, bar) from
This is a frequently requested feature on SO. However, it has not made it to the SQL Standard and I don't know of any SQL products that support it. I guess when the product managers ask their developers, MVPs, usergroups, etc to measure enthusiasm for this prospective feature, they mostly hear, "SELECT * FROM is considered dangerous, we need to protect new users who don't know what they are doing, etc."
You may find it useful to use NATURAL JOIN rather than INNER JOIN etc which removes what would be duplicated columns from the resulting table expression e.g.
SELECT *
FROM Table1 t1
INNER JOIN Table2 t2
ON t1.foo = t2.foo
AND t1.bar = t2.bar;
will result in two columns named foo and two named bar (and possibly other repeated names), probably de-duplicated in some way e.g. by suffixing the range variable names t1 and t2 that INNER JOIN forced you into using.
Whereas:
SELECT *
FROM Table1 NATURAL JOIN Table2;
doesn't require the use of range variables (a good thing) because there will only be one column named foo and one named bar in the result.
And to remove duplicated rows as well as columns changed the implied SELECT ALL * into the explicit SELECT DISTINCT * e.g.
SELECT DISTINCT *
FROM Table1 NATURAL JOIN Table2;
Doing this may reduce your need for the SELECT ALL BUT { these columns } feature you desire.
Of course, if you do that you will be told, "NATURAL JOIN is considered dangerous, we need to protect you from yourself in case you don't know what you are doing, etc." :)

Fetching value from table and passing it in url

How can i use in table field values in the url
SQL Query wherein all 3 tables are joined
select * from nfojm_usedcar_variants cv
inner join nfojm_usedcar_products cp
inner join nfojm_usedcar_categories cc on
cc.id=cp.prod_cat_id and
cp.id=cv.v_prod_id and
cv.state='1' order by cv.id desc
Output as checked
Then it combines all 3 tables
nfojm_usedcar_variants
nfojm_usedcar_products
nfojm_usedcar_categories
However - all 3 tables have unique field i.e id (but with different values)
I need to pass on value of id and v_prod_id in a url
say url been :-
<a href="index.php?option=com_usedcar&pid='.$row->v_prod_id.'&vid='.$row->id.'">
But id been common field in most of the tables hence its not picking in correctly from nfojm_usedcar_variants,
Can some one help to modify a function so as to fetch in value of id and v_prod_id from the respective table of nfojm_usedcar_variants
thanks
If you have multiple tables in a join that share a common column name, and you need them, then alias them. Such as:
select a.id as aid,a.theName,b.id as bid,b.year
from tableA a
join tableB b
on b.id=a.id
then refer to those columns as aid and bid in your code that follows.
Try to avoid Ever doing a select *. Be explicit. You never know what comes flying out of a select * typically. And odds are you don't need it all. Select * is fine for messing around, but not for production code. And you can't control common column names with select *. We like to control things afterall, no?

Why am I getting a different number of results when I change the formatting of the WHERE statement?

I am writing a SQL script (in MySQL Workbench, if that's relevant) that involves looking for variations of the same string in my data. This way, I can get an accurate pull even if something is misspelled. The script goes something like this:
SELECT A.Name, A.ID
FROM A
LEFT JOIN B on B.ID = A.ID
WHERE B.type = 1
AND (
B.notes LIKE '%apple%'
OR B.notes LIKE '%aple%'
OR B.notes LIKE '%appl%'
)
Because a single entry in table A can be linked to several different notes, which are kept in table B, if a single person has a note that contains "apple" and another that says "appl" they should show up twice, which is fine. After running this script, I got 9627 rows returned.
I tried the same script but changed the WHERE to this:
WHERE B.type = 1
AND (B.notes LIKE '%apple%'
OR '%aple%'
OR '%appl%')
and now I have 9929 rows.
Do these two different WHERE clauses function differently because I left out the B.notes in the second? I am working with large data sets (A has over 20,000 entries, B has over 1.6 million) so it's not particularly practical to go through and check all of the data's integrity by hand. I'm fairly new to SQL so I appreciate any information or suggestions as to why this happens and how to get an accurate result.
I think it's because in one case you have a LIKE keyword before both 'aple' and 'appl', but on the other case you don't.
You could either try adding those in, or one way you could write it in shorthand is AND B.notes IN ('apple', 'aple', 'appl').

MySQL : Get row of table on same column of three tables

I have three tables, the first one is called "File" :
JobId FilenameId FileId
5 2 1
7 3 2
And the second one is called "Filename"
Filename FilenameId
File1 2
File2 3
And the third one is called "Client" :
ClientId JobId
1 5
2 7
Now I want to get the ClientId of File1, how can I do it? I'm new to SQL.
Thanks.
Edit : this is what I tried but it's not working
Select c.ClientId
From `File` f, Filename fn, Client c
Where f.FilenameId = fn.FilenameId and f.JobId = c.JobId and fn.Filename = "File1";
First, I hate the negative banter that sometimes goes on, but yes, you need to get yourself more educated in SQL during your learning. Look here at real-life scenarios and how people offer different solutions to the same.
Now to YOUR question. First, get rid of old style sql where you put all the join criteria in your where clause. Get started knowing the proper relationships between the tables. Second, your WHERE clause should be the basis of your specific criteria -- such as you want File 1. From that, get to the other tables. My personal standard of SQL coding shows first the what criteria do I want and from what table. Ensure indexes are available for optimizing the query. THEN join to the other tables to get the other elements needed to complete the row of data. (Good use of table "aliases", and keep with it).
First, your main criteria. Simple enough.
select
fn.FileNameID,
fn.FileName
from
FileName fn
where
fn.FileName = 'File1'
From there, do your joins to get the next pieces of information from file to client relationships
select
fn.FileNameID,
fn.FileName,
c.clientID
from
FileName fn
JOIN File f
on fn.FileNameID = f.FileNameID
JOIN Client c
on f.JobID = c.JobID
where
fn.FileName = 'File1'
Notice the hierarchical indentation from file name to the file, then from file to the client... you can visually see how the tables are related. Then, just grab your other columns as you need and add to your field list with proper aliases.
Try this:
select ClientId from Client where JobId in (select JobId from File where FilenameId in (select FilenameId from Filename where Filename="File1"));