I have three tables, payout, earnings, user. I try to get first the userid which I use to get earnings and payouts information about the user. With this I also count together with DISTINCT and substract it to get the CURRENT earnings. The problem is if any of the table has no entry example the payouts table so I get NULL instead of number. How can I solve this. I tried IFNULL but did not work
This is how I do
SELECT SUM(DISTINCT user_earnings.amount) -
SUM(DISTINCT user_payouts.payout_amount) as total_earnings
FROM user, user_earnings, user_payouts
WHERE user.id = 103
and user_earnings.user_id = user.id
and user_payouts.user_id = user.id
EDIT: I made the current tables
user_earnings ( the problem is just only with the user_id 103
id user_id amount
1 102 250
2 102 1000
3 101 5000
4 102 352
18 102 375
19 102 442
20 103 338 <-----
user_payouts
id user_id payout_amount
1 102 500
2 102 100
3 101 1000
user
id payout_address
102 ***
103 ***
As you see payouts have has no entry about user_id 103 because he never did a payout. Thats why I get null (I think)
Try this:
SELECT u.id,
(SUM(DISTINCT ue.amount) -
COALESCE(SUM(DISTINCT up.payout_amount),0)) as total_earnings
FROM
user u
left outer join user_earnings ue on u.id = ue.user_id
left outer join user_payouts up on u.id = up.user_id
WHERE
u.id = 103;
Sorry, I didn't put enough effort in to the above. While it will work for a small sample, the above breaks down with a larger set as you are filtering out any instance where a single user got two payouts of the same amount.
I think this is your real solution:
SELECT
u.id,
(ue.earnings - up.payouts) as total_earnings
FROM
user u
inner join
(select distinct
u1.id,
SUM(COALESCE(ue.amount,0)) as earnings
from
user u1
left join user_earnings ue on u1.id = ue.user_id group by u1.id) ue on u.id = ue.id
inner join
(select distinct
u1.id,
SUM(COALESCE(up.payout_amount,0)) as payouts
from
user u1
left join user_payouts up on u1.id = up.user_id group by u1.id) up on u.id = up.id
WHERE
u.id = 103;
Just for fun, here's another option, much shorter:
SELECT
u.id,
((select COALESCE(SUM(ue.amount),0) from user_earnings ue where ue.user_id = u.id) -
(select COALESCE(SUM(up.payout_amount),0) from user_payouts up where up.user_id = u.id)) as total_earnings
FROM
user u
WHERE
u.id = 103;
Note1:
I would definitely NOT use Distinct in this manner... If the user in the future has another earned_amount for 338, the second 338 would not get added to the total.
Note2:
See my usage of COALESCE instead of IFNULL.
Note3:
I updated your join syntax. Please confirm I kept your join as desired.
EDIT: Updated answer and fiddle to account for duplicate earned_amounts per user as well as duplicate payout_amounts per user
View Fiddle to notice duplicates of both entered for user 103 (400 + 400 - 100 - 100))
CLICK Here for SQLFiddle
SELECT (
COALESCE(ue.earned_amount,0) - COALESCE(up.payout_amount,0)
) AS total_earnings
FROM user u
JOIN (SELECT user_id, SUM(earned_amount) AS earned_amount
FROM Earned
GROUP BY user_id) ue
ON ue.user_id = u.id
JOIN (SELECT user_id, SUM(payout_amount) AS payout_amount
FROM Payouts
GROUP BY user_id) up
ON up.user_id = u.id
WHERE u.id = 103
Performance Note:
If your Earned or Payout tables are expected to be large a few ideas for performance improvement will be.
limit the size of the returned result sets of the nested queries by adding a Where clause to each so they only return results for user_id 103.
Add indexes to both Earned and Payout for the column user_id.
OR first create these joins to a temp table with indexes, then join the temp table to this query
You have 3 problems -- DISTINCT, JOIN followed by GROUP BY, and failure to deal with NULL.
SELECT u.id,
(
SELECT ( IFNULL(SUM(amount), 0)
FROM user_earnings WHERE user_id = u.id ) -
SELECT ( IFNULL(SUM(payout_amount), 0 )
FROM user_payouts WHERE user_id = u.id )
) as total_earnings
FROM user u
WHERE u.id = 103;
should solve all of them.
DISTINCT was already discussed -- there could be duplicate value for a given user.
JOIN followed by GROUP BY -- I call this "inflate-deflate syndrone". The JOIN happens first, thereby inflating the number of rows, then the GROUP BY tries to compensate for it. Usually this is a performance problem; in your case it mangles the data, too.
NULL was discussed, but I thing this formulation is 'correct'.
This is a 'rare' case where subqueries is better than LEFT JOIN. It avoids the over-calculation of SUM. ghenghy's first solution is probably just as good for a single u.id; mine should work for multiple ids.
Related
I have two table client and cash.
Table client:
ID Name ... other data
------------------------
1 Bob
2 Marry
3 Tom
Table cash:
ID Cash Id_client_fk
----------------------
1 500 1
2 500 3
3 500 3
4 500 1
I want to sum cash from every client, even if not exist in table cash.
The foreign key is id_client_fk to table client -> ID
You could use SUM() and a left join to achive this:
SELECT u.ID, u.Name, SUM(c.Cash) cash FROM client u
LEFT JOIN cash c ON c.Id_client_fk = u.ID
GROUP BY u.ID
To remove the NULL values you could use an IF statement:
SELECT u.ID, u.Name, SUM(IF(c.Cash > 0, c.Cash, 0)) cash FROM client u
LEFT JOIN cash c ON c.Id_client_fk = u.ID
GROUP BY u.ID
Not quite clear it. Please bring examples of data and what result you want. for example, just a few lines to understand the idea.
If I understand correctly, you need something like this:
SELECT client.id,client.name,s.sum_c FROM client INNER JOIN (SELECT `id_client_fk`, sum(`cash_row`) sum_c FROM `cash` GROUP BY `id_client_fk`) s ON client.id=s.id_client_fk ORDER BY 3 desc limit 20;
I have two tables. One contains User and company relationship a show below
User_company
UserId CompanyId
1 2
2 1
3 1
4 2
Another table holds user information
User
Id Name City
1 Peter LA
2 Harry SF
3 John NY
4 Joe CI
How do I make a statement which will give me All the users which are in company 1? Will something like
Select * from User where Id in (Select UserId from User_company where CompanyId = 1)
work?
SELECT * from User
left join User_company on User_company.UserId=User.Id
This would work...
SELECT * works but can be sluggish over time as it may not scale well with more data.
FROM User
WHERE Id in (Select UserId from User_company where CompanyId = 1)
So would this.. - best if you need data from both tables.
SELECT *
FROM User U
INNER JOIN User_Company UC
ON U.ID = UC.UserID
WHERE UC.CompanyID = 1
As would this - Probably the fastest if you just need data from user table.
Select * from User U
where exists (Select * from User_Company UC where U.ID = UC.UserID and CompanyID = 1)
OUTER joins are only needed if you need all records from one table and only those that match in another.
As to which is the best above: it depends on existing indexes and other requirements. Any of the above will return what's been asked for.
Try this
Select u.*
from User u
inner join User_company uc
on u.Id = uc.UserId
and uc.CompanyId = 1
BTW, what's wrong with the query you have posted? It will work as well fine. Just that it's a subquery and you better replace it with Join for performance.
Select * from User where Id in
(Select UserId from User_company where CompanyId = 1)
SELECT U.* FROM User AS U LEFT JOIN
User_company AS UC ON U.Id = UC.UserId WHERE UC.CompanyId = 1
My SQL is only returning one field when it should be returning one for each user.
Any idea where I'm going wrong? If you need additional information I can provide, but I'm just not sure where to go with this at the moment.
Here is my SQL:
SELECT uId, uForename, SUM(biProductPrice * biQuantity) AS uTotalSpent
FROM users
LEFT JOIN orders ON uId = ordUserId
LEFT JOIN basket ON ordUserId = bUserId
LEFT JOIN basketitems ON bId = biBasketId
WHERE ordStatus BETWEEN 4 AND 50
GROUP BY uId, uForename
any columns starting with u belong to the users table.
any columns starting with ord belong to the orders table.
any columns starting with b belong to the basket table.
any columns starting with bi belong to the basketitems table.
EDIT:
Everything now works fine except for my SUM, there are only 2 fields with an ordStatus between 4 and 50, so they are the only ones that apply, the biQuantity for one is 8 and the biProductPrice is 100, the other field has a biQuantity of 1 and a biProductPrice of 100, why is it returning a value of 400?
Group by the user and the sum will be returned for each one
SELECT users.id, users.name, SUM(biProductPrice) AS uTotalSpent
FROM users
LEFT JOIN orders ON uId = ordUserId
LEFT JOIN basket ON ordUserId = bUserId
LEFT JOIN basketitems ON bId = biBasketId
WHERE ordStatus BETWEEN 4 AND 50
group by users.uId, users.name
SELECT users.id, users.name, SUM(biProductPrice) AS uTotalSpent
FROM users
LEFT JOIN orders ON uId = ordUserId
LEFT JOIN basket ON ordUserId = bUserId
LEFT JOIN basketitems ON bId = biBasketId
WHERE ordStatus BETWEEN 4 AND 50
group by users.uId, users.name
This query suggests friendship based on how many words users have in common. in_common sets this threshold.
I was wondering if it was possible to make this query completely % based.
What I want to do is have user suggested to current user, if 30% of their words match.
curent_user total words 100
in_common threshold 30
some_other_user total words 10
3 out of these match current_users list.
Since 3 is 30% of 10, this is a match for the current user.
Possible?
SELECT users.name_surname, users.avatar, t1.qty, GROUP_CONCAT(words_en.word) AS in_common, (users.id) AS friend_request_id
FROM (
SELECT c2.user_id, COUNT(*) AS qty
FROM `connections` c1
JOIN `connections` c2
ON c1.user_id <> c2.user_id
AND c1.word_id = c2.word_id
WHERE c1.user_id = :user_id
GROUP BY c2.user_id
HAVING count(*) >= :in_common) as t1
JOIN users
ON t1.user_id = users.id
JOIN connections
ON connections.user_id = t1.user_id
JOIN words_en
ON words_en.id = connections.word_id
WHERE EXISTS(SELECT *
FROM connections
WHERE connections.user_id = :user_id
AND connections.word_id = words_en.id)
GROUP BY users.id, users.name_surname, users.avatar, t1.qty
ORDER BY t1.qty DESC, users.name_surname ASC
SQL fiddle: http://www.sqlfiddle.com/#!2/c79a6/9
OK, so the issue is "users in common" defined as asymmetric relation. To fix it, let's assume that in_common percentage threshold is checked against user with the least words.
Try this query (fiddle), it gives you full list of users with at least 1 word in common, marking friendship suggestions:
SELECT user1_id, user2_id, user1_wc, user2_wc,
count(*) AS common_wc, count(*) / least(user1_wc, user2_wc) AS common_wc_pct,
CASE WHEN count(*) / least(user1_wc, user2_wc) > 0.7 THEN 1 ELSE 0 END AS frienship_suggestion
FROM (
SELECT u1.user_id AS user1_id, u2.user_id AS user2_id,
u1.word_count AS user1_wc, u2.word_count AS user2_wc,
c1.word_id AS word1_id, c2.word_id AS word2_id
FROM connections c1
JOIN connections c2 ON (c1.user_id < c2.user_id AND c1.word_id = c2.word_id)
JOIN (SELECT user_id, count(*) AS word_count
FROM connections
GROUP BY user_id) u1 ON (c1.user_id = u1.user_id)
JOIN (SELECT user_id, count(*) AS word_count
FROM connections
GROUP BY user_id) u2 ON (c2.user_id = u2.user_id)
) AS shared_words
GROUP BY user1_id, user2_id, user1_wc, user2_wc;
Friendship_suggestion is on SELECT for clarity, you probably need to filter by it, so yu may just move it to HAVING clause.
I throw this option into your querying consideration... The first part of the from query is to do nothing but get the one user you are considering as the basis to find all others having common words. The where clause is for that one user (alias result OnePerson).
Then, add to the from clause (WITHOUT A JOIN) since the OnePerson record will always be a single record, we want it's total word count available, but didn't actually see how your worked your 100 to 30 threashold if another person only had 10 words to match 3... I actually think its bloat and unnecessary as you'll see later in the where of PreQuery.
So, the next table is the connections table (aliased c2) and that is normal INNER JOIN to the words table for each of the "other" people being considered.
This c2 is then joined again to the connections table again alias OnesWords based on the common word Id -- AND -- the OnesWords user ID is that of the primary user_id being compared against. This OnesWords alias is joined to the words table so IF THERE IS a match to the primary person, we can grab that "common word" as part of the group_concat().
So, now we grab the original single person's total words (still not SURE you need it), a count of ALL the words for the other person, and a count (via sum/case when) of all words that ARE IN COMMON with the original person grouped by the "other" user ID. This gets them all and results as alias "PreQuery".
Now, from that, we can join that to the user's table to get the name and avatar along with respective counts and common words, but apply the WHERE clause based on the total per "other users" available words to the "in common" with the first person's words (see... I didn't think you NEEDED the original query/count as basis of percentage consideration).
SELECT
u.name_surname,
u.avatar,
PreQuery.*
from
( SELECT
c2.user_id,
One.TotalWords,
COUNT(*) as OtherUserWords,
GROUP_CONCAT(words_en.word) AS InCommonWords,
SUM( case when OnesWords.word_id IS NULL then 0 else 1 end ) as InCommonWithOne
from
( SELECT c1.user_id,
COUNT(*) AS TotalWords
from
`connections` c1
where
c1.user_id = :PrimaryPersonBasis ) OnePerson,
`connections` c2
LEFT JOIN `connections` OnesWords
ON c2.word_id = OnesWords.word_id
AND OnesWords.user_id = OnePerson.User_ID
LEFT JOIN words_en
ON OnesWords.word_id = words_en.id
where
c2.user_id <> OnePerson.User_ID
group by
c2.user_id ) PreQuery
JOIN users u
ON PreQuery.user_id = u.id
where
PreQuery.OtherUserWords * :nPercentToConsider >= PreQuery.InCommonWithOne
order by
PreQuery.InCommonWithOne DESC,
u.name_surname
Here's a revised WITHOUT then need to prequery the total original words of the first person.
SELECT
u.name_surname,
u.avatar,
PreQuery.*
from
( SELECT
c2.user_id,
COUNT(*) as OtherUserWords,
GROUP_CONCAT(words_en.word) AS InCommonWords,
SUM( case when OnesWords.word_id IS NULL then 0 else 1 end ) as InCommonWithOne
from
`connections` c2
LEFT JOIN `connections` OnesWords
ON c2.word_id = OnesWords.word_id
AND OnesWords.user_id = :PrimaryPersonBasis
LEFT JOIN words_en
ON OnesWords.word_id = words_en.id
where
c2.user_id <> :PrimaryPersonBasis
group by
c2.user_id
having
COUNT(*) * :nPercentToConsider >=
SUM( case when OnesWords.word_id IS NULL then 0 else 1 end ) ) PreQuery
JOIN users u
ON PreQuery.user_id = u.id
order by
PreQuery.InCommonWithOne DESC,
u.name_surname
There might be some tweaking on the query, but your original query leads me to believe you can easily find simple things like alias or field name type-o instances.
Another options might be to prequery ALL users and how many respective words they have UP FRONT, then use the primary person's words to compare to anyone else explicitly ON those common words... This might be more efficient as the multiple joins would be better on the smaller result set. What if you have 10,000 users and user A has 30 words, and only 500 other users have one or more of those words in common... why compare against all 10,000... but if having up-front a simple summary of each user and how many should be an almost instant query basis.
SELECT
u.name_surname,
u.avatar,
PreQuery.*
from
( SELECT
OtherUser.User_ID,
AllUsers.EachUserWords,
COUNT(*) as CommonWordsCount,
group_concat( words_en.word ) as InCommonWords
from
`connections` OneUser
JOIN words_en
ON OneUser.word_id = words_en.id
JOIN `connections` OtherUser
ON OneUser.word_id = OtherUser.word_id
AND OneUser.user_id <> OtherUser.user_id
JOIN ( SELECT
c1.user_id,
COUNT(*) as EachUserWords
from
`connections` c1
group by
c1.user_id ) AllUsers
ON OtherUser.user_id = AllUsers.User_ID
where
OneUser.user_id = :nPrimaryUserToConsider
group by
OtherUser.User_id,
AllUsers.EachUserWords ) as PreQuery
JOIN users u
ON PreQuery.uer_id = u.id
where
PreQuery.EachUserWords * :nPercentToConsider >= PreQuery.CommonWordCount
order by
PreQuery.CommonWordCount DESC,
u.name_surname
May I suggest a different way to look at your problem?
You might look into a similarity metric, such as Cosine Similarity which will give you a much better measure of similarity between your users based on words. To understand it for your case, consider the following example. You have a vector of words A = {house, car, burger, sun} for a user u1 and another vector B = {flat, car, pizza, burger, cloud} for user u2.
Given these individual vectors you first construct another that positions them together so you can map to each user whether he/she has that word in its vector or not. Like so:
| -- | house | car | burger | sun | flat | pizza | cloud |
----------------------------------------------------------
| A | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
----------------------------------------------------------
| B | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
----------------------------------------------------------
Now you have a vector for each user where each position corresponds to the value of each word to each user. Here it represents a simple count but you can improve it using different metrics based on word frequency if that applies to your case. Take a look at the most common one, called tf-idf.
Having these two vectors, you can compute the cosine similarity between them as follows:
Which basically is computing the sum of the product between each position of the vectors above, divided by their corresponding magnitude. In our example, that is 0.47, in a range that can vary between 0 and 1, the higher the most similar the two vectors are.
If you choose to go this way, you don't need to do this calculation in the database. You compute the similarity in your code and just save the result in the database. There are several libraries that can do that for you. In Python, take a look at the numpy library. In Java, look at Weka and/or Apache Lucene.
I will solely explain what's related to the question and omit any other unrelated details.
Current Situation:
I have two tables, coin and users.
coin has three fields id, uid, fid.
Coin table relates between users (who already registered in the system and are able to invite friends) and their friends (who already has accepted the invitation and also became members.) -the table only stores successful registrations after being invited-
id is unique index.
uid is to store users id.
fid is to store friends id (the friend who accepted the invitation and became a member of the system).
users has the usual info about users such as id, fname, lname, email ...etc and date_create
Objective:
To find the winners who had made most invitations. In other words.
To find top user(s) who have made the greatest number of invitation and their invited friends must have registered before 2012-08-31. Date is stored in table 'users' column date_create. Date Format is yyyy-mm-dd
Example-
Table coin
id uid fid
1 333 777
2 444 888
3 555 999
4 333 123
5 444 456
6 333 789
Table users
id date_create
333 2012-07-15
444 2012-07-20
555 2012-07-25
777 2012-07-25
888 2012-07-25
999 2012-10-02 <-- I don't need this to be counted
123 2012-07-25
456 2012-07-25
789 2012-07-25
means user 333 has the most number of invitation (invited 3 users) -> 777,123 and 789
and user 444 invited 2 users and user 555 only invited 1 user, but for 555 it will not be counted since his friend (999) registered after 2012-08-31
What i want is user
333 has made 3 invitations before 2012-08-31.
444 has made 2 invitations before 2012-08-31.
What i did so far is: [TOTALLY NOT SURE ABOUT THIS]
SELECT
c.uid,
u.fname,
u.lname,
u.phone,
u.email,
u.country,
COUNT(c.fid) AS NoOfFriends
FROM coin AS c
JOIN users AS u
ON c.uid = u.id
GROUP BY c.uid
ORDER BY NoOfFriends desc
This query brings (as far as I know) the user with most invitations regardless of when his/her friends have registered.
So my question is:
Q1) How to apply the date condition into my query?
I want the user who has the greatest number of friends invitation. His/her invited friends must have accepted the invitation and registered before 2012-08-31. Any other accepted invitations after that date should not be counted.
Please provide me with code examples.
A lot of the answers confused the actual user with the invited friend.
SELECT
c.uid
info.fname,
info.lname,
info.phone,
info.email,
info.country,
info.date_create,
COUNT(c.fid) AS [NoOfFriends]
FROM coin AS c
JOIN users AS friend
ON c.fid = friend.id
LEFT JOIN users AS info
ON c.uid = info.id
WHERE friend.date_create < '20121031'
GROUP BY c.uid
EDIT Fixed the data format
You can use SUM in combination with CASE:
SUM(CASE WHEN <cond> THEN 1 ELSE 0 END) as NoOfFriends
The complete statement would look something like this:
SELECT
c.uid,
u.fname,
u.lname,
u.phone,
u.email,
u.country,
u.`date_create`,
SUM(CASE WHEN f.date_create < 2012-10-31 THEN 1 ELSE 0 END) AS NoOfFriends
FROM coin AS c
JOIN users AS u
ON c.uid = u.id
join users as f
on c.fid = f.id
GROUP BY c.uid, u.fname, u.lname, u.phone, u.email, u.country, u.`date_create`
ORDER BY NoOfFriends desc
The count(expression) function counts the rows where the result of that expression is not null. If the expression just returns true or false all rows will be counted because both true and false are not null.
To use an expression with count you must make it return null if false:
COUNT(friend.date_create < '2012-10-31' or null) AS NoOfFriends
Your method of only grouping by c.uid will work in MySQL, but you will become unstuck with other RDBMS.
In this situation I would move the count to a subquery, in my opinion it is easier to read, and it nicely separates out the query inot the logic you are looking to apply, i.e. Get all invites where the friend registered before a certain date, then joins back to the user that invited them.
SELECT u.ID,
u.fname,
u.lname,
u.phone,
u.email,
u.country,
u.`date_create`,
COALESCE(FriendCount, 0) AS FriendCount
COALESCE(FriendCount2, 0) AS FriendCount2
FROM users AS u
LEFT JOIN
( SELECT coin.UID,
COUNT(coind.fid) AS FriendCount,
COUNT(CASE WHEN Users.Date_Create < '20121031' THEN coind.fid END) AS FriendCount2
FROM coin
INNER JOIN Users
ON coin.FID = users.ID
GROUP BY coin.UID
) c
ON c.uid = u.id
ORDER BY FriendCount2 DESC
If you do not care about total friends, and only those before 31st October then you can simplify this to
SELECT u.ID,
u.fname,
u.lname,
u.phone,
u.email,
u.country,
u.`date_create`,
COALESCE(FriendCount, 0) AS FriendCount
FROM users AS u
LEFT JOIN
( SELECT coin.UID,
COUNT(coind.fid) AS FriendCount
FROM coin
INNER JOIN Users
ON coin.FID = users.ID
WHERE Users.Date_Create < '20121031'
GROUP BY coin.UID
) c
ON c.uid = u.id
ORDER BY FriendCount DESC
This is my answer, thanks for everyone who tried and helped, especially to LastCoder who provided an actual solution rapidly.
SELECT
c.uid,
info.id,
info.fname,
info.lname,
info.phone,
info.email,
info.country,
COUNT(DISTINCT c.fid) AS NoOfFriends -- modified this to get unique invitations
FROM coin c
JOIN users AS friends
ON c.fid = friends.id
LEFT JOIN users AS info
ON c.uid = info.id
WHERE friends.date_create < '20120831'
GROUP BY c.uid
ORDER BY NoOfFriends DESC
LIMIT 10
Honesty, i did not use other solutions because i don't understand SUM(CASE WHEN ...) clearly.