How to make an inner join while maintaining unique rows - mysql

I have a ternary relationship in which I stablish the relation between Offers, Profiles, and Skills. The ternary relationship table, called ternary for example, has the IDs of the three tables as primary key. It could look something like this:
id_Offer - id_Profile - id_Skill
1 - 1 - 1
1 - 1 - 2
1 - 1 - 3
1 - 2 - 1
2 - 1 - 1
2 - 3 - 2
2 - 1 - 3
2 - 5 - 1
[and so on, there would be more registers for each id_Offer from Offer but I want to limit the example]
So I have 2 offers in total, with a number of profiles in each one.
The table Offer looks something like this:
Offer - business_name
1 - business-1
2 - business-1
3 - business-1
4 - business-1
5 - business-2
6 - business-2
7 - business-2
8 - business-3
So when I do a query like
select distinct id_offer, business_name, COUNT(*)
FROM Offer
GROUP BY business_name
Order by COUNT(*);
I get that for business-1 I have 4 offers.
Now if I want to take into account the offers for some Profile, I have to make a join with my ternary relationship. But even if I do something as simple as the following
select distinct business_name
from Offer
INNER JOIN ternary ON Offer.id_Offer = ternary.id_Offer
GROUP BY business_name
WHERE business_name = 'business-1'
No matter what I put on the group by, or if I write distinct or not, I do not get what I want. The reality is that for business-1, I have 4 offers. Right now in the ternary only appear two. So it should return 2 unique offers for this name with no filtering by profile.
But instead I get 8 offers, because that is how many times it appears in the ternary, the id_Offer's that match.
How should this be done? If I need no filters I can simply look at Offers table alone. But what if I need to filter by id_skill or id_Profile AND want to return the business_name?
I have seen solutions such as this but I can not make them work, I do not understand what the ? is, how is it called to learn more about it, if MariaDB works the same in this sense, I could not find information about it because I do not know how that operation is called. When I try to build that query for my data I get:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '? ORDER BY COUNT(*) DESC' at line 1
But as I said, it is kind of hard to look for '?' as an... Operator? Function?

There are two basic solutions.
SELECT
o.business_name,
COUNT(DISTINCT o.id_offer) AS unique_offers
FROM
Offer AS o
INNER JOIN
ternary AS t
ON t.id_Offer = o.id_Offer
WHERE
o.business_name = 'business-1'
AND t.id_profile IN (1, 2, 3, 5)
GROUP BY
o.business_name
That's the simplest to write and think about. But, it can also be quite intensive because you're still joining each row in offer to 4 rows in ternary - Creating 8 rows to aggregate and process through DISTINCT.
The "better" (in my opinion) route is to filter then aggregate the ternary table in a sub-query.
SELECT
o.business_name,
COUNT(*) AS unique_offers
FROM
Offer AS o
INNER JOIN
(
SELECT id_Offer
FROM ternary
WHERE id_profile IN (1, 2, 3, 5)
GROUP BY id_Offer
)
AS t
ON t.id_Offer = o.id_Offer
WHERE
o.business_name = 'business-1'
GROUP BY
o.business_name
This ensures the t only ever has one row for any given offer. This in turn means that each row in offer only ever joins to one row in t; no duplication. That in turn means there is no need to use COUNT(DISTINCT) and relieves some overhead (By moving it to the inner query's GROUP BY).

Are you saying that you want to see offers for a particular business, but you want to limit these according to certain profiles or skills?
We limit query results in the WHERE clause. If we want to look up data in another table, we use IN or EXISTS. For instance:
select *
from offer
where business_name = 'business-1'
and id_offer in
(
select id_offer
from ternary
where id_profile = 1
and id_skill = 2
);

Related

MySQL Return id where occurance count > attribute value

I have two table:
Bike__________________________ Kiosk
With columns:
BikeID, Location_________________ KioskID, Capacity
and data such as:
1, 1 ___________________________ 1, 10
2, 1 ___________________________ 2, 5
3, 1 ___________________________ 3, 15
4, 2
5, 1
etc... Location is a foreign key that points to kioskid. I am trying to write a query that returns only the KioskIDs of kiosks that have capacity. In other words, if 7 bikes are parked at kiosk 1, kiosk 1 is returned. If 5 bikes are parked at kiosk 2 it is not returned. I was able to write code that returns the count of bikes at each kiosk, but am confused as to how to use this (nested query?) to return only the kiosks whose capacity>count(*).
SELECT k.kioskid, COUNT(*)
FROM kiosk AS k
JOIN bike AS b ON b.location = k.kioskid
GROUP BY k.kioskid
You were almost there. All that's needed is a HAVING clause to compare the amount of bikes per kiosk to the capacity.
SQL Fiddle
SELECT k.kioskid
FROM kiosk k
left outer join bike b on b.location = k.kioskid
GROUP BY
k.kioskid
HAVING
COUNT(*) < MAX(k.Capacity)
As a sidenote, I strongly recommend to rename the location column to kioskid as to implicitly make it clear what the foreign key relation is.
I think you might be looking for HAVING as in:
SELECT k.kioskid, COUNT(kiosk.location) AS cap
FROM kiosk AS k
JOIN bike AS b ON b.location = k.kioskid
GROUP BY k.kioskid HAVING cap > k.capacity
Correct code:
SELECT kioskid, COUNT(location), capacity
FROM kiosk AS k
JOIN bike ON location = kioskid
GROUP BY kioskid
HAVING COUNT(location) < capacity;
Two issues I found with this error after delving into the issue further. 1. since MySQL works inside out, so any alias established by the AS clause must exist in the inner most code -- inn this case the HAVING clause. 2. The SQL standard requires that HAVING must reference only columns in the GROUP BY clause or columns used in aggregate functions. However, MySQL supports an extension to this behavior, and permits HAVING to refer to columns in the SELECT list and columns in outer subqueries as well. So by removing all aliases and including capacity in the SELECT clause, I got the code to work finally. Thanks #Lieven Keersmaekers and #Jim Dennis for your help.

MySQL ORDER BY Column = value AND distinct?

I'm getting grey hair by now...
I have a table like this.
ID - Place - Person
1 - London - Anna
2 - Stockholm - Johan
3 - Gothenburg - Anna
4 - London - Nils
And I want to get the result where all the different persons are included, but I want to choose which Place to order by.
For example. I want to get a list where they are ordered by LONDON and the rest will follow, but distinct on PERSON.
Output like this:
ID - Place - Person
1 - London - Anna
4 - London - Nils
2 - Stockholm - Johan
Tried this:
SELECT ID, Person
FROM users
ORDER BY FIELD(Place,'London'), Person ASC "
But it gives me:
ID - Place - Person
1 - London - Anna
4 - London - Nils
3 - Gothenburg - Anna
2 - Stockholm - Johan
And I really dont want Anna, or any person, to be in the result more then once.
This is one way to get the specified output, but this uses MySQL specific behavior which is not guaranteed:
SELECT q.ID
, q.Place
, q.Person
FROM ( SELECT IF(p.Person<=>#prev_person,0,1) AS r
, #prev_person := p.Person AS person
, p.Place
, p.ID
FROM users p
CROSS
JOIN (SELECT #prev_person := NULL) i
ORDER BY p.Person, !(p.Place<=>'London'), p.ID
) q
WHERE q.r = 1
ORDER BY !(q.Place<=>'London'), q.Person
This query uses an inline view to return all the rows in a particular order, by Person, so that all of the 'Anna' rows are together, followed by all the 'Johan' rows, etc. The set of rows for each person is ordered by, Place='London' first, then by ID.
The "trick" is to use a MySQL user variable to compare the values from the current row with values from the previous row. In this example, we're checking if the 'Person' on the current row is the same as the 'Person' on the previous row. Based on that check, we return a 1 if this is the "first" row we're processing for a a person, otherwise we return a 0.
The outermost query processes the rows from the inline view, and excludes all but the "first" row for each Person (the 0 or 1 we returned from the inline view.)
(This isn't the only way to get the resultset. But this is one way of emulating analytic functions which are available in other RDBMS.)
For comparison, in databases other than MySQL, we could use SQL something like this:
SELECT ROW_NUMBER() OVER (PARTITION BY t.Person ORDER BY
CASE WHEN t.Place='London' THEN 0 ELSE 1 END, t.ID) AS rn
, t.ID
, t.Place
, t.Person
FROM users t
WHERE rn=1
ORDER BY CASE WHEN t.Place='London' THEN 0 ELSE 1 END, t.Person
Followup
At the beginning of the answer, I referred to MySQL behavior that was not guaranteed. I was referring to the usage of MySQL User-Defined variables within a SQL statement.
Excerpts from MySQL 5.5 Reference Manual http://dev.mysql.com/doc/refman/5.5/en/user-variables.html
"As a general rule, other than in SET statements, you should never assign a value to a user variable and read the value within the same statement."
"For other statements, such as SELECT, you might get the results you expect, but this is not guaranteed."
"the order of evaluation for expressions involving user variables is undefined."
Try this:
SELECT ID, Place, Person
FROM users
GROUP BY Person
ORDER BY FIELD(Place,'London') DESC, Person ASC;
You want to use group by instead of distinct:
SELECT ID, Person
FROM users
GROUP BY ID, Person
ORDER BY MAX(FIELD(Place, 'London')), Person ASC;
The GROUP BY does the same thing as SELECT DISTINCT. But, you are allowed to mention other fields in clauses such as HAVING and ORDER BY.

Excluding 'near' duplicates from a mysql query

We have an iPhone app that sends invoice data by each of our employees several times per day. When they are in low cell signal areas tickets can come in as duplicates, however they are assigned a unique 'job id' in the mysql database, so they're viewed as unique. I could exclude the job id and make the rest of the columns DISTINCT, which gives me the filtered rows I'm looking for (since literally every data point is identical except for the job id), however I need the job ID since it's the primary reference point for each invoice and is what I point to for: approvals, edits, etc.
So my question is, how can I filter out 'near' duplicate rows in my query, while still pulling in the job id for each ticket?
The current query is below:
SELECT * FROM jobs, users
WHERE jobs.job_csuper = users.user_id
AND users.user_email = '".$login."'
AND jobs.job_approverid1 = '0'
Thanks for looking into it!
Edit (examples provided):
This is what I meant by 'near duplicate'
Job_ID - Job_title - Job_user - Job_time - Job_date
2345 - Worked on circuits - John Smith - 1.50 - 2013-01-01
2344 - Worked on circuits - John Smith - 1.50 - 2013-01-01
2343 - Worked on circuits - John Smith - 1.50 - 2013-01-01
So everything is identical except for the Job_ID column.
You want a group by:
SELECT *
FROM jobs, users
WHERE jobs.job_csuper = users.user_id
AND users.user_email = '".$login."'
AND jobs.job_approverid1 = '0'
group by <all fields from jobs except jobid>
I think the final query should look something like this:
select min(Job_ID) as JobId, Job_title, user.name as Job_user, Job_time, Job_date
FROM jobs join users
on jobs.job_csuper = users.user_id
WHERE jusers.user_email = '".$login."' AND jobs.job_approverid1 = '0'
group by Job_title, user.name, Job_time, Job_date
(This uses ANSI syntax for joins and is explicit about the fields coming back.)
It's better to prevent the double submission.
Given that you cannot prevent the double submission...
I would query like this:
select
min(Job_ID) as real_job_id
,count(Job_ID) as num_dup_job_ids
,group_concat(Job_ID) as all_dup_job_ids
,j.Job_title, j.Job_user, j.Job_time, j.Job_date
from
jobs j
inner join users u on u.user_id = j.job_csuper
where
whatever_else
group by
j.Job_title, j.Job_user, j.Job_time, j.Job_date
That includes more than you explicitly asked for. But it's probably good to be reminded of how many dups you have, and it gives you easy access to the duplicate id info when you need it.
How about creating a hash for each row and comparing them:
`SHA1(concat_ws(field1, field2, field3, ...)) AS jobhash`

MySQL select only new records

How to write a MySQL query to achieve this task?
Table: writers
w_id w_name
---------------
1 Michael
2 Samantha
3 John
---------------
Table: articles
a_id w_id timestamp a_name
----------------------------------------
1 1 0000000001 PHP programming
2 3 0000000003 Other programming languages
3 3 0000000005 Another article
4 2 0000000015 Web design
5 1 0000000020 MySQL
----------------------------------------
Need to SELECT only those writers who published their first article not earlier than 0000000005. (only writers who published at least one article can be selected)
In this example the result would be:
2 Samantha
SQL code can be tested here http://sqlfiddle.com/#!2/7a308
Untested, but close:
SELECT w_id, MIN(timestamp) as min_time
from writers w
JOIN articles a on w.w_id = a.w_id
GROUP BY 1
HAVING min_time > 5
Here's one approach, using an inline view (or "derived table" as MySQL calls it) to get the earliest timestamp for each writer:
SELECT w.w_id
, w.w_name
-- , e.earliest_timestamp
FROM writers w
LEFT
JOIN ( SELECT a.w_id
, MIN(a.timestamp) AS earliest_timestamp
FROM articles a
GROUP BY a.w_id
) e
ON e.w_id = w.w_id
WHERE e.earliest_timestamp >= '0000000005'
ORDER BY w.w_id
This may not be the most efficient approach, but you can run just the query in the inline view (aliased as e) to see what it returns. We can then reference the result set from that query like we do a table (with some restrictions.)
(Other approaches can make better use of suitable indexes.)
I'm unclear on the datatype of earliest_timestamp column. The SQL above assumes it's character datatype. If it's integer rather than character, the WHERE clause could look like this:
WHERE e.earliest_timestamp >= 5

GROUP BY does not remove duplicates

I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.
I've tried GROUP BY watch.watch_id, GROUP BY rec.record_id, none of any types of group I've tried seems to remove duplicates. I'm not sure what I'm doing wrong.
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE watch.user_id = 1
GROUP BY watch.watch_id
LIMIT 0, 25
The watchlist table looks like this:
+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
| 13 | 1 | 22 | 1314038274 |
| 14 | 1 | 25 | 1314038995 |
+----------+---------+-----------+------------+
GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.
If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:
SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id
EDIT
The OP asked for some clarification.
Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.
First, you might have completely duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 3
3 4 5
Then simply use DISTINCT
SELECT DISTINCT * FROM V;
a b c
- - -
1 2 3
3 4 5
Or, you might have partially duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 6
3 4 5
Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would not be combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:
SELECT DISTINCT a,b FROM V;
a b
- -
1 2
3 4
Or you could perform some kind of aggregation on them. You could add them up:
SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;
a b tot
- - ---
1 2 9
3 4 5
You could add pick the smallest value:
SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;
a b first
- - -----
1 2 3
3 4 5
Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.
What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."
While SELECT DISTINCT may indeed work in your case, it's important to note why what you have is not working.
You're selecting fields that are outside of the GROUP BY. Although MySQL allows this, the exact rows it returns for the non-GROUP BY fields is undefined.
If you wanted to do this with a GROUP BY try something more like the following:
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN est8_records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN est8_members usr ON rec.user_id = usr.user_id
)
WHERE watch.watch_id IN (
SELECT watch_id FROM watch WHERE user_id = 1
GROUP BY watch.watch_id)
LIMIT 0, 25
I Would never recommend using SELECT DISTINCT, it's really slow on big datasets.
Try using things like EXISTS.
You are grouping by watch.watch_id and you have two results, which have different watch IDs, so naturally they would not be grouped.
Also, from the results displayed they have different records. That looks like a perfectly valid expected results. If you are trying to only select distinct values, then you don't want ot GROUP, but you want to select by distinct values.
SELECT DISTINCT()...
If you say your watchlist table is unique, then one (or both) of the other tables either (a) has duplicates, or (b) is not unique by the key you are using.
To suppress duplicates in your results, either use DISTINCT as #Laykes says, or try
GROUP BY watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
It sort of sounds like you expect all 3 tables to be unique by their keys, though. If that is the case, you are simply masking some other problem with your SQL by trying to retrieve distinct values.