How to join two tables without primary key or unique key? - mysql

I have two tables named LocalVSDB and TemporaryVSDB. Both tables have the same columns:
LocalVSDB: msisdn,activateDate
TemporaryVSDB: msisdn,activateDate
But both tables also have duplicate rows for MSIDSN
I need to join these two tables. My intended result looks like this:
MSISDN LocalActivateDate TemporaryActivateDate Datediff
60103820251 2013-12-14 2013-10-05 70
601111000254 2013-12-14 2013-10-05 70
601111000254 2013-12-18 2013-09-10 80
But, since there are duplicate MSIDSNs, I am getting duplicate rows when I join. For example there are 6 rows for certain MSISDN in each table so when I am joining I am getting total 36 rows for that MSISDN.
I am joining using the following query:
SELECT t.msisdn,t.activateDate AS VSDB_Activate_Date,
l.activateDate AS Local_Activate_Date,
DATEDIFF(D,l.activateDate,t.activateDate) AS date_Diff
FROM temporaryVSDB2 t
INNER JOIN LocalVSDB l ON t.msisdn = l.msisdn
WHERE t.activateDate > l.activateDate
Please help me how can I get 6 rows for 6 MSISDN?
Thanks in advance.

The problem is:
where t.activateDate > l.activateDate
That means one row in table one can join to all six rows in table two. You either need to change this to an = or just get a single row from the second table based on certain criteria.

SELECT m.MSIDN, m.ActiveDate, t.ActiveDate, DATEDIFF(DAY, m.ActiveDate, t.ActiveDate) Duration
FROM LocalVSDB m
OUTER APPLY
(
SELECT TOP 1 d.MSIDN, d.ActiveDate
FROM TemporaryVSDB d
WHERE d.ActiveDate > m.ActiveDate
ORDER BY d.ActiveDate
) t
This would find the nearest partner record and duration (the last record will have a null partner though)

You can use your own query adding group by clause provided msidn and activateDate produce unique row.
SELECT t.msisdn,t.activateDate AS VSDB_Activate_Date,
l.activateDate AS Local_Activate_Date,
DATEDIFF(D,l.activateDate,t.activateDate) AS date_Diff
FROM temporaryVSDB2 t INNER JOIN LocalVSDB l ON t.msisdn = l.msisdn
WHERE t.activateDate > l.activateDate
group by t.msisdn, t.activateDate

Related

Working of SQL JOINS in this example

Ok so i was learning sql joins and was curious to try all joins on the following table:
Table name Demo1:
A
1
1
1
1
1
Table name Demo2:
B
1
1
1
1
1
To my amazement no matter which join i apply i end up with same 25 entries. I am sure about cross join since it gives all combination but what about the other joins how are they returning the same answers for these two tables.
How join statement work is it pick up all entries from the first table
the for every entry, it pick all entries from the second table that is sastified by the on condition.
Hence, the number of result in this case = number of records in A * number of records in B = 25.

SQL Validate a column with the same column

I have the following situation. I have a table with all info of article. I will like to compare the same column with it self. because I have multiple type of article. Single product and Master product. the only way that I have to differences it, is by SKU. for example.
ID | SKU
1 | 11111
2 | 11112
3 | 11113
4 | 11113-5
5 | 11113-8
6 | 11114
7 | 11115
8 | 11115-1-W
9 | 11115-2
10 | 11116
I only want to list or / and count only the sku that are full unique. follow th example the sku that are unique and no have variant are (ID = 1, 2, 6 and 10) I will want to create a query where if 11113 are again on the column not cout it. so in total I will be 4 unique sku and not "6 (on total)". Please let me know. if this are possible.
Assuming the length of master SKUs are 5 characters, try this:
select a.*
from mytable a
left join mytable b on b.sku like concat(a.sku, '%')
where length(a.sku) = 5
and b.sku is null
This query joins master SKUs to child ones, but filters out successful joins - leaving only solitary master SKUs.
You can do this by grouping and counting the unique rows.
First, we will need to take your table and add a new column, MasterSKU. This will be the first five characters of the SKU column. Once we have the MasterSKU, we can then GROUP BY it. This will bundle together all of the rows having the same MasterSKU. Once we are grouping we get access to aggregate functions like COUNT(). We will use that function to count the number of rows for each MasterSKU. Then, we will filter out any rows that have a COUNT() over 1. That will leave you with only the unique rows remaining.
Take that unique list and LEFT JOIN it back into your original table to grab the IDs.
SELECT ID, A.MasterSKU
FROM (
SELECT
MasterSKU = SUBSTRING(SKU,1,5),
MasterSKUCount = COUNT(*)
FROM MyTable
GROUP BY SUBSTRING(SKU,1,5)
HAVING COUNT(*) = 1
) AS A
LEFT JOIN (
SELECT
ID,
MasterSKU = SUBSTRING(SKU,1,5)
FROM MyTable
) AS B
ON A.MasterSKU = B.MasterSKU
Now one thing I noticed from you example. The original SKU column really looks like three columns in one. We have multiple values being joined with hypens.
11115-1-W
There may be a reason for it, but most likely this violates first normal form and will make the database hard to query. It's part of the reason why such a complicated query is needed. If the SKU column really represents multiple things then we may want to consider breaking it out into MasterSKU, Version, and Color or whatever each hyphen represents.

mysql select statement. why does it return 6 of the same record

When I make this sql statement I get 6 of the same record returned. So if I expect to get 2 records returned, I get six of each record back so that is 12 in total.
SELECT
ce2.*
FROM customerentry ce, customerentrytrace cet, customerentry ce2
WHERE ce.accountid = 1
AND ce.companyid = 1
AND ce.accountid=cet.accountid
AND ce.accountid=ce2.accountid
AND ce.companyid=cet.companyid
AND ce.companyid=ce2.companyid
AND cet.documentno = '2012Faktura1'
AND cet.documenttype = 1
AND ce2.documentno = cet.offsetdocumentno
AND ce2.documenttype = cet.offsetdocumenttype
ORDER BY created;
I know that I can solve it by adding distinct, but I would like to know why I get 6 of the same record returned. Anyone who can help me?
Since we have no idea about your table structure probably there are some columns that are related 1 to n items and you haven't handled them in the WHERE section of your query.
As an extra measure you can focus on your data needs and add a GROUP BY section before your ORDER section.
You are using an INNER JOIN, so for example there are two entries in table cet matching your where clause for combining table ce and cet, giving you 2 entries/entry of table ce.
Thinking this further you can see that if there are 3 entries in table ce2 matching the where clause for combining table cet and ce2 you get 3 entries/entry of table cet.
Which makes 6 entries per entry of table ce in total, giving you 12 entries in total even if you have only 2 entries in table ce.
So think again about what join could be the right for your desired solution.
Here a link for some more explanation: Short explanation of joins
Problem might be because you have not properly joined tables. Please read about JOIN
SELECT ce2.*
FROM customerentry ce INNER JOIN customerentrytrace cet ON ce.accountid=cet.accountid AND ce.companyid=cet.companyid,
INNER JOIN customerentry ce2 ON ce.accountid=ce2.accountid AND ce.companyid=ce2.companyid AND ce2.documentno = cet.offsetdocumentno AND ce2.documenttype = cet.offsetdocumenttype
WHERE ce.accountid = 1
AND ce.companyid = 1
AND cet.documentno = '2012Faktura1'
AND cet.documenttype = 1
ORDER BY created;

Is it possible to write a query to compare rows to other rows in same table?

I have a table with the following structure. I need to return all rows where the district of the record immediately preceding and immediately following the row are different than the district for that row. Is this possible? I was thinking of a join on the table itself but not sure how to do it.
id | zip_code | district
__________________________
20063 10169 12
20064 10169 9
20065 10169 12
Assuming that "preceding" and "following" are in the sense of the ID column, you can do:
select *
from zip_codes z1
inner join zip_codes z2 on z1.id=z2.id + 1
inner join zip_codes z3 on z1.id=z3.id - 1
where z1.district <> z2.district and z1.district <> z3.district
This will automatically filter out the first and last rows, because of the inner joins, if you need those to count, change it to left outer join.
Also, this checks if it's different from both. To find if it's different from either (as is implied in the comment), change the and in the where clause to an or. But note, that then, all three rows in your example fit that criteria, even if there are long rows of twelves above and below these rows.

GROUP BY does not remove duplicates

I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.
I've tried GROUP BY watch.watch_id, GROUP BY rec.record_id, none of any types of group I've tried seems to remove duplicates. I'm not sure what I'm doing wrong.
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE watch.user_id = 1
GROUP BY watch.watch_id
LIMIT 0, 25
The watchlist table looks like this:
+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
| 13 | 1 | 22 | 1314038274 |
| 14 | 1 | 25 | 1314038995 |
+----------+---------+-----------+------------+
GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.
If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:
SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id
EDIT
The OP asked for some clarification.
Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.
First, you might have completely duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 3
3 4 5
Then simply use DISTINCT
SELECT DISTINCT * FROM V;
a b c
- - -
1 2 3
3 4 5
Or, you might have partially duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 6
3 4 5
Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would not be combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:
SELECT DISTINCT a,b FROM V;
a b
- -
1 2
3 4
Or you could perform some kind of aggregation on them. You could add them up:
SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;
a b tot
- - ---
1 2 9
3 4 5
You could add pick the smallest value:
SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;
a b first
- - -----
1 2 3
3 4 5
Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.
What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."
While SELECT DISTINCT may indeed work in your case, it's important to note why what you have is not working.
You're selecting fields that are outside of the GROUP BY. Although MySQL allows this, the exact rows it returns for the non-GROUP BY fields is undefined.
If you wanted to do this with a GROUP BY try something more like the following:
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN est8_records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN est8_members usr ON rec.user_id = usr.user_id
)
WHERE watch.watch_id IN (
SELECT watch_id FROM watch WHERE user_id = 1
GROUP BY watch.watch_id)
LIMIT 0, 25
I Would never recommend using SELECT DISTINCT, it's really slow on big datasets.
Try using things like EXISTS.
You are grouping by watch.watch_id and you have two results, which have different watch IDs, so naturally they would not be grouped.
Also, from the results displayed they have different records. That looks like a perfectly valid expected results. If you are trying to only select distinct values, then you don't want ot GROUP, but you want to select by distinct values.
SELECT DISTINCT()...
If you say your watchlist table is unique, then one (or both) of the other tables either (a) has duplicates, or (b) is not unique by the key you are using.
To suppress duplicates in your results, either use DISTINCT as #Laykes says, or try
GROUP BY watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
It sort of sounds like you expect all 3 tables to be unique by their keys, though. If that is the case, you are simply masking some other problem with your SQL by trying to retrieve distinct values.