SQL Sum Query Behaving Strangely?

SQL Sum Query Behaving Strangely? - mysql

I'm having an issue getting this SQL query to work properly.
I have the following query
SELECT apps.*,
SUM(IF(adtracking.appId = apps.id AND adtracking.id = transactions.adTrackingId, transactions.payoutAmount, 0)) AS 'revenue',
SUM(IF(adtracking.appId = apps.id AND adtracking.type = 'impression', 1, 0)) AS 'impressions'
FROM apps, adtracking, transactions
WHERE apps.userId = '$userId'
GROUP BY apps.id
Everything is working, HOWEVER for the 'impressions' column I am generating in the query, I am getting a WAY larger number than there should be. For example, one matching app for this query should only have 72 for 'Impressions' yet it is coming up with a value of over 3,000 when there aren't even that many rows in the adtracking table. Why is this? What is wrong here?

Your problem is you have no join conditions, so you are getting every row of every table being joined in your query result - called a cartesian product.
To fix, change your FROM clause to this:
FROM apps a
LEFT JOIN adtracking ad ON ad.appId = a.id
LEFT JOIN transactions t ON t.adTrackingId = ad.id
You haven't provided the schema for your tables, so I guessed the names of the relevant columns - you may have to adjust them. Also, your transaction table may join to adtracking - it's impossible to know from your question, so agin you have have to alter things slightly. Hopefully you get the idea.
Edit:
Note: your group-by clause is incorrect. You either need to list every column of apps (not recommended), or change your select to only select the id column from apps (recommended). Change your select to this:
SELECT apps.id,
-- rest of query the same
Otherwise you'll get weird, incorrect, results.

Related

Complex MySQL query problems and also SQL hangs

I am trying to write an SQL query which is pretty complex. The requirements are as follows:
I need to return these fields from the query:
track.artist
track.title
track.seconds
track.track_id
track.relative_file
album.image_file
album.album
album.album_id
track.track_number
I can select a random track with the following query:
select
track.artist, track.title, track.seconds, track.track_id,
track.relative_file, album.image_file, album.album,
album.album_id, track.track_number
FROM
track, album
WHERE
album.album_id = track.album_id
ORDER BY RAND() limit 10;
Here is where I am having trouble though. I also have a table called "trackfilters1" thru "trackfilters10" Each row has an auto incrementing ID field. Therefore, row 10 is data for album_id 10. These fields are populated with 1's and 0's. For example, album #10 has 10 tracks, then trackfilters1.flags will contain "1111111111" if all tracks are to be included in the search. If track 10 was to be excluded, then it would contain "1111111110"
My problem is including this clause.
The latest query I have come up with is the following:
select
track.artist, track.title, track.seconds,
track.track_id, track.relative_file, album.image_file,
album.album, album.album_id, track.track_number
FROM
track, album, trackfilters1, trackfilters2
WHERE
album.album_id = track.album_id
AND
( (album.album_id = trackfilters1.id)
OR
(album.album_id=trackfilters2.id) )
AND
( (mid(trackfilters1.flags, track.track_number,1) = 1)
OR
( mid(trackfilters2.flags, track.track_number,1) = 1))
ORDER BY RAND() limit 2;
however this is causing SQL to hang. I'm presuming that I'm doing something wrong. Does anybody know what it is? I would be open to suggestions if there is an easier way to achieve my end result, I am not set on repairing my broken query if there is a better way to accomplish this.
Additionally, in my trials, I have noticed when I had a working query and added say, trackfilters2 to the FROM clause without using it anywhere in the query, it would hang as well. This makes me wonder. Is this correct behavior? I would think adding to the FROM list without making use of the data would just make the server procure more data, I wouldn't have expected it to hang.

There's not enough information here to determine what's causing the performance issue.
But here's a few suggestions and comments.
Ditch the old-school comma syntax for the join operations, and use the JOIN keyword instead. And relocate the join predicates to an ON clause.
And for heaven's sake, format the SQL so that it's decipherable by someone trying to read it.
There's some questions here... will there always be a matching row in both trackfilters1 and trackfilters2 for rows you want to return? Or could a row be missing from trackfilters2, and you still want to return the row if there's a matching row in trackfilters1? (The answer to that question determines whether you'd want to use an outer join vs an inner join to those tables.)
For best performance with large sets, having appropriate indexes defined is going to be critical.
Use EXPLAIN to see the execution plan.
I suggest you try writing your query like this:
SELECT track.artist
, track.title
, track.seconds
, track.track_id
, track.relative_file
, album.image_file
, album.album
, album.album_id
, track.track_number
FROM track
JOIN album
ON album.album_id = track.album_id
LEFT
JOIN trackfilters1
ON trackfilters1.id = album.album_id
LEFT
JOIN trackfilters2
ON trackfilters2.id = album.album_id
WHERE MID(trackfilters1.flags, track.track_number, 1) = '1'
OR MID(trackfilters2.flags, track.track_number, 1) = '1'
ORDER BY RAND()
LIMIT 2
And if you want help with performance, provide the output from EXPLAIN, and what indexes are defined.

something wrong with the result of mysql query with joins and select

Good day,
I am trying to join 3 tables for my inventory report but I am getting weird results out of it.
my query
SELECT i_inventory.xid,
count(x_transaction_details.xitem) AS occurrence,
i_inventory.xitem AS itemName,
SUM(i_items_group.or_qty) AS `openingQty`,
avg(x_transaction_details.cost) AS avg_cost,
SUM(x_transaction_details.qty) AS totalNumberSold,
SUM(i_items_group.or_qty) - SUM(x_transaction_details.qty) AS totalRemQty
FROM x_transaction_details
LEFT JOIN i_inventory ON x_transaction_details.xitem = i_inventory.xid
LEFT JOIN i_items_group ON i_inventory.xid = i_items_group.xitem
WHERE (x_transaction_details.date_at BETWEEN '2015-01-18 03:14:54' AND '2015-10-18 03:14:54')
AND i_inventory.xid = 3840
GROUP BY x_transaction_details.xitem
ORDER BY occurrence DESC
This query gives me this result:
See the openingQty column, I then tried to do a simple query to verify the result,
here's my query for checking the openingQty with joining only 2 tables i_items_group table (batches are stored) and i_inventory table (item Information are stored).
SELECT i_inventory.xid,
i_inventory.xitem,
SUM(i_items_group.or_qty) AS openingQty,
i_items_group.cost
FROM i_inventory
INNER JOIN i_items_group ON i_inventory.xid = i_items_group.xitem
WHERE i_inventory.xid = 3840
AND (i_items_group.date_at BETWEEN '2015-01-18 03:14:54' AND '2015-10-18 03:14:54')
my result was:
which is the correct data.
I also made a query on my x_transaction_details table also to verify if its correct or not.
heres my query:
select xitem, qty as qtySold from x_transaction_details where xitem = 3840
AND (date_at BETWEEN '2015-01-18 03:14:54' AND '2015-10-18 03:14:54')
result:
Which would total to: 15-quatitySold.
I'm just confused on how did I get 3269 as a result of my query where as the true openingQty should be only 467.
I guess the problem was in my query with joins, its messing up with number of transactions then it sums it up (I really dont know though).
Can you please help me identify it, and help me come up with the correct query.

This is a common problem with multiple SUM statements in a single query. Keep in mind how SQL does aggregation: first it generates a set of data that is not aggregated, then it aggregates it. Try your query without the GROUP BY or aggregate functions, and you'll be surprised what you turn up. There aren't enough of the right details in your post for me to determine where the breakdown is, but I can guess.
It looks like you have an xitem corresponding to some kind of product, then you have joined that to both transactions and items groups. Suppose a particular xitem matches with 3 transactions and 5 item groups. You'll get 15 records from that join. And when you sum it, any SUM calculations based on fields from the transaction table will be 5x higher than you expect, and any SUM calculations from the item groups table will be 3x higher than you expect. The key symptom here is the aggregate result being a multiple of the correct value, but seemingly different multiples for different rows.
There are multiple ways to address this kind of error. Some developers like to calculate one of hte aggregates in a subquery, then do the other aggregate in the main query and group by the already correct result from the subquery. Others like to write in-line queries to do the aggregate right in the expression:
SELECT xitem, (SELECT SUM(i_items_group.or_qty) FROM i_items_group WHERE i_inventory.xid = i_items_group.xitem) AS `openingQty`
, -- select more fields
Find what approach works best for you. But if you want to see the evidence for yourself, run this query with the aggregates gone and you'll see why those SUMs are doing what they are doing:
SELECT i_inventory.xid,
x_transaction_details.xitem AS occurrence,
i_inventory.xitem AS itemName,
i_items_group.or_qty,
x_transaction_details.cost,
x_transaction_details.qty,
i_items_group.or_qty - x_transaction_details.qty AS RemainingQty
FROM x_transaction_details
LEFT JOIN i_inventory ON x_transaction_details.xitem = i_inventory.xid
LEFT JOIN i_items_group ON i_inventory.xid = i_items_group.xitem
WHERE (x_transaction_details.date_at BETWEEN '2015-01-18 03:14:54' AND '2015-10-18 03:14:54')
AND i_inventory.xid = 3840
ORDER BY occurrence DESC

I want to fetch cost between January and December using case statement, now i am getting Null instead of cost here is my code below

SELECT dbo.postst.cost as Cost2014,
case when (dbo.InvNum.OrderDate) between '2014/01/01' and '2014/12/31' then dbo.Postst.Cost end as Cost2014
from dbo.postst INNER JOIN dbo.invnum on dbo.invnum.autoindex = dbo.postst.accountLink

Here is the technique I use for tracking down an issue like this:
SELECT dbo.postst.cost as Cost2014,
dbo.InvNum.OrderDate,
case when (dbo.InvNum.OrderDate) between '2014/01/01' and '2014/12/31' then dbo.Postst.Cost end as Cost2014
from dbo.postst
left JOIN dbo.invnum on dbo.invnum.autoindex = dbo.postst.accountLink
So I add the column I am doing some kind of processing on to see the values I am returning. Then I change the inner join to a left join (temporarily). Then when I run the select, I can usually see why they may not be meeting my expectations and why my query is not retuning the correct results.
In this case, you may not have any data in the right range or the join might be incorrect and thus no records are picked up at all.
What you have is a relatively simple query, If it more complex, I often use select * instead of the sepcific columns just to see if there is something in the other columns that is affecting the results. This is often the case when you have a one-many relationship and want to get only one record but are getting duplicates in the fields you selected.

Trouble with Joins and Counts in MySQL

I am confused on this MySQL select query, I get the correct information back except the COUNT(messages) and COUNT(project_ideas) are coming back twice as many.
SELECT
create_project.title,
image1,
create_project.description,
create_project.date,
create_project.active,
create_project.completed,
create_project.project_id,
categories.name,
messages.receiver_read,
project_ideas.project_id,
COUNT(messages.ideas_id) AS num_of_messages,
COUNT(project_ideas.ideas_id) AS num_of_ideas
FROM
create_project
LEFT JOIN project_ideas ON create_project.project_id = project_ideas.project_id
LEFT JOIN messages ON messages.project_id = create_project.project_id
JOIN categories ON create_project.category = categories.category_id
WHERE
create_project.user_id = {$_SESSION['user']['user_id']}
AND create_project.active = 1
AND create_project.completed = 1
GROUP BY project_ideas.project_id
ORDER BY create_project.date ASC
Any help would be appreciated thanks.

If there is more than one row in your create_project table that matches to a single row in your messages table, then the row in messages will show up once for each matching row in create_project. Additionally, since you have many joins, you have many places for duplicate rows to show up. If a project belongs to more than one category, for example, your join against categories will result in every row from the other tables being duplicated for each category that a project belongs to. I'd wager this is actually the source of your error. And what makes it so insidious is that the GROUP BY hides the duplication everywhere except in functions that do counting and summing.
#Wrikken's comment is correct and useful. If you remove the GROUP BY, you'll see every row included in the count. There you should see that rows from the messages table are repeated. As #Wrikken also said, you can mitigate this by using COUNT(DISTINCT ...). I would try, however, to make sure your joins are correct or that your table data is correct, before papering over the problem with a COUNT(DISTINCT ...). That is to say, make sure that COUNT(DISTINCT ...) really makes logical sense in terms of the data you are looking for.
Unrelated to your action question, I had to point out something that I see (and have done myself before I knew better). Although MySQL lets you include columns in your select list that are not in a GROUP BY or an aggregate function (e.g., COUNT()), it's bad practice to do so. The results are technically undefined (see: http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html). I think MySQL is wrong for doing this, but it's not my call. Other database systems would flag this as an error.

Try this:
COUNT(messages.ideas_id) OVER(PARTITION BY messages.project_id) AS num_of_messages,
COUNT(project_ideas.ideas_id) OVER(PARTITION BY project_ideas.project_id) AS num_of_ideas

Removing duplicate rows in a complex MySQL Query - not sure if I'm doing this wrong, or it's a MySQL bug

I have a reasonably complex MySQL query being run on another developer's database. I am trying to copy over his data to our new database structure, so I'm running this query to get a load of the data over to copy. The main table has around 45,000 rows.
As you can see from the query below, there's a lot of fields from several different tables. My problem is that the field Ref.refno (as ref_id) is being pulled through, in some cases, two or three times. This is because in the table LandlordOnlineRef (LLRef) there are sometimes multiple rows with this same reference number - in this case, because the row should have been edited, but instead was duplicated...
Here's what I've tried doing: -
SELECT DISTINCT(Ref.refno) [...] - this makes no difference to the output at all, although I would've assumed it would stop selecting duplicate refno IDs
Is this a MySQL bug, or me? - I also tried adding GROUP BY ref_id to the end of my query. The query normally takes a few milliseconds to run, but when I add GROUP BY to the end, it seems to run infinitely - I waited several minutes but nothing was happening. I thought it might be struggling because I'm using LIMIT 1000, so I also tried LIMIT 10 but still get the same effect.
Here's the problem query - thanks!
SELECT
-- progress
Ref.refno AS ref_id,
Ref.tenantid AS tenant_id,
Ref.productid AS product_id,
Ref.guarantorid AS guarantor_id,
Ref.agentid AS agent_id,
Ref.companyid AS company_id,
Ref.status AS status,
Ref.startdate AS ref_start_date,
Ref.enddate AS ref_end_date,
-- ReferenceDetails
RefDetails.creditscore AS credit_score,
-- LandlordOnlineRef
LLRef.propaddress AS prev_ll_address,
LLRef.rent AS prev_ll_rent,
LLRef.startdate AS prev_ll_start_date,
LLRef.enddate AS prev_ll_end_date,
LLRef.arrears AS prev_ll_arrears,
LLRef.arrearsreason AS prev_ll_arrears_reason,
LLRef.propertycondition AS prev_ll_property_condition,
LLRef.conditionreason AS prev_ll_condition_reason,
LLRef.consideragain AS prev_ll_consider_again,
LLRef.completedby AS prev_ll_completed_by,
LLRef.contactno AS prev_ll_contact_no,
LLRef.landlordagent AS prev_ll_or_agent,
-- EmpDetails
EmpRef.cempname AS emp_name,
EmpRef.cempadd1 AS emp_address_1,
EmpRef.cempadd2 AS emp_address_2,
EmpRef.cemptown AS emp_address_town,
EmpRef.cempcounty AS emp_address_county,
EmpRef.cemppostcode AS emp_address_postcode,
EmpRef.ctelephone AS emp_telephone,
EmpRef.cemail AS emp_email,
EmpRef.ccontact AS emp_contact,
EmpRef.cgross AS emp_income,
EmpRef.cyears AS emp_years,
EmpRef.cmonths AS emp_months,
EmpRef.cposition AS emp_position,
-- EmpLlodReference
ELRef.lod_ref_status AS prev_ll_status,
ELRef.lod_ref_email AS prev_ll_email,
ELRef.lod_ref_tele AS prev_ll_telephone,
ELRef.emp_ref_status AS emp_status,
ELRef.emp_ref_tele AS emp_telephone,
ELRef.emp_ref_email AS emp_email
FROM ReferenceDetails AS RefDetails
LEFT JOIN progress AS Ref ON Ref.refno
LEFT JOIN LandlordOnlineRef AS LLRef ON LLRef.refno = Ref.refno
LEFT JOIN EmpLlodReference AS ELRef ON ELRef.refno = Ref.refno
LEFT JOIN EmpDetails AS EmpRef ON EmpRef.tenantid = Ref.tenantid
-- For testing purposes to speed things up, limit it to 1000 rows
LIMIT 1000

LEFT JOIN progress AS Ref ON Ref.refno
is going to basically turn that into a cartesian join. You're not doing an explicit comparison, you're saying "join all records where there's a non-null value".
Shouldn't it be
LEFT JOIN progress AS Ref ON Ref.refno = RefDetails.something
?

Put all of the selected columns into DISTINCT, separated by ,. If you want to keep the renaming, wrap another SELECT DISTINCT(*) FROM (YOUR_SELECT) around.
Are there indexes on the columns in the GROUP BY clause? LIMIT is applied after GROUP BY. So limiting does not affect the query runtime.

General rule is to never group by more columns then you have to. Use a subquery with a group by on the table thats returning duplicate rows to get rid of them.
Change:
LEFT JOIN LandlordOnlineRef AS LLRef ON LLRef.refno = Ref.refno
to:
Left Join (select refno
, othercolumns you need
from LandlordOnlineRef
group by refno,othercolumn) as LLRef
Not sure on which columns you'll want to include here, but at any table level, you can change that table to a subquery to eliminate duplicate rows before the join. As MarkBannister says, you'll need some logic to identify a unique refno within LLRef . You can also use a date column for 'most recent' or any other logic you can think of to get back a unique LLRef and the info related to that record.
ugh, auto spell check is changing refno to refine. ha

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008