Remove rows that net each other out

Remove rows that net each other out - mysql

I have a result set that contains; order_ids, a total for that order, and the quantities of items within.
Some totals are negative (if a refund has occurred) and others are positive. I would like to work out a count of the orders who's order_total, doesn't net out with with the negative values.
orders_id order_total products_quantity customers_id
--------- ------------- ----------------- --------------
1140898 -99.95830000 -1 459800
1140868 99.95830000 1 459800
1140867 99.95833333 1 459800
866932 -106.33333333 -2 459800
860100 125.08333333 3 459800
857864 106.33333333 2 459800
Would result in
orders_id order_total products_quantity customers_id
--------- ------------- ----------------- --------------
1140867 99.95833333 1 459800
860100 125.08333333 3 459800
I've attempted to write a cursor to iterate over each result, storing the last order_total and checking the current row for a diff.
This works as long as the negative order comes before or after the positive. Unfortunately, this wont always be the case.
Can anyone explain what approach/methods I should adhere to ensure the output below is achieved?

Based on your description, the problem is impossible. Consider:
orders_id order_total customers_id
--------- ------------- --------------
1 -100 1
2 50 1
3 50 1
4 50 1
(I assume that you only want to consider that each value only affects the "net" for a specific customer)
In the case above, orders_id=1 might be considered to offset 2 and 3 leaving 4 in the output, 3 4 leaving 2 in the output, or 2 and 4 leaving 3 in the output.
What if the lines with negative amounts are not an exact amount match for one or more of those with positives? Even if some combination of the negatives adds up to some combination of the positives, you would need to try every possible combination - just calculating the order of that algorithm makes my head hurt (O(N!)^2 I think).

Related

sum function is returning wrong value

I have 3 tables
1.Franchiese
Id Name
1 Vivek
2.Purchase
Id Fran_id commission_amount
1 1 100
2 1 1
3.Fran_payment
Id Fran_id amount
1 1 50
My SQL Query is
select franchiese.id,franchiese.name,sum(fran_payment.amount) as paid,sum(purchase.commission_amount) as tot,sum(purchase.commission_amount)-sum(fran_payment.amount) as rem from franchiese left join fran_payment on franchiese.id=fran_payment.fran_id left join purchase on franchiese.id=purchase.fran_id
It's giving me
Id Name Tot Paid Rem
1 vivek 101 100 1
Expected Answer
Id Name Tot Paid Rem
1 vivek 101 50 51

I think you have two identical entries in the Fran_payment table. Your SQL statement should work as intended, and is giving you logically correct values, but I think you have unexpected data in your table.

You are joining 3 tables which have unequal number of rows. Purchase table has 2 rows, while fran_payment has only one. At the time of join, the row in fran_payment is repeated to match the number of rows in purchase. Hence the row is duplicated and sum becomes 50 + 50 = 100 and your data would look like something like this-
ID | Name | fran_payment.amount | purchase.comission_amount
1 | Vivek | 50 | 100
1 | Vivek | 50 | 1
Try something like this
Select fran_id, sum(fran_payment.amount) as paid from purchase;
This should work.
Also, you'll need to run a sub query to only fetch data for given entry. Or, normal sum function would return the sum of while column, irrespective of the ID.
Select id, sum(fran_payment.amount from fran_payment where fran_payment.fran_id = id) as paid from franchise;
I hope that works. All the best.
PS: It's franchise, not franchiese.

Perform action on selected columns depending on their name

I've got a huge table, containing three "selection"-columns and many "data"-columns.
ID Thing1 Thing2 Thing3 avgData1 avgData2 highestEtc
---- -------- -------- -------- ---------- ---------- ------------
1 1 2 2 321 654 999
2 2 1 1 123 456 11
3 2 1 1 987 789 77
4 2 1 1 765 567 11
In my queries, I'm now selecting all entries with "Thing1" = x, "Thing2" = y, "Thing3" = z (Those three columns are selection-criteria.)
The purpose of getting those lines is to perform an action on each of the following data-columns: If it starts with "avg", I want to calculate an average of the specific column on all selected entries. On another prefix I want to count which number appears the most.
Is there a way of letting the MySQL Database do all this for me? I need a SQL-Statement that calculates the averages of the columns automatically, and performs other actions too.
For example, let's say I'd select the criteria Thing1=2, Thing2=1 and Thing3=1. Is there a way of writing the statement so that it returns only ONE entry, with the calculated things?
Result
----------------- ----------------- ----
(123+987+765)/3 (456+789+567)/3 11
I heard that this should be possible, and that it is a bad method of NOT letting the database perform those actions directly. Unfortunately, I have no idea how to do it.

Try this:-
SELECT ID, AVG(avgData1) AS RESULT1, AVG(avgData2) AS RESULT2, highestEtc
FROM YOUR_TAB
WHERE Thing1 = 2
AND Thing2 = 1
AND Thing3 = 1
GROUP BY ID
HAVING COUNT(highestEtc) > 1;
Hope this helps you.

How to remove duplicate observations in Stata

Let's say I have the following data:
id disease
1 0
1 1
1 0
2 0
2 1
3 0
4 0
4 0
I would like to remove the duplicate observations in Stata.
For example
id disease
1 1
2 1
3 0
4 0
For group id=1, keep observation 2
For group id=2, keep observation 2
For group id=3, keep observation 1 (because it has only 1 obs)
For group id=4, keep observation 1 (or any of them but one obs)
I am trying Stata duplicates command,
duplicates tag id if disease==0, generate(info)
drop if info==1
but it's not working as I required.

It is no surprise that duplicates does not do what you are wanting, as it does not fit your problem. For example, the observation with id == 2, disease == 0 is not a duplicate of any other observation. More generally, duplicates does not purport to be a general-purpose command for dropping observations you don't want.
Your criteria appear to be
Keep one observation for each id.
If id has any observation with value of 1, that is to be kept.
A solution to that is
bysort id (disease) : keep if _n == _N
That keeps the last observation for each distinct id: after sorting within id on disease observations with the disease are necessarily at the end of each group.

mysql query to calculate values local to Cartesian products of logical groups of rows

I'm trying to write a query to process a single table that looks like this:
record_id item_id part_id part_length
----------- ------- -------- ------------
1 0 0 123.12
2 0 0 123.09
3 0 1 231.24
4 0 1 239.14
5 1 0 45.91
6 1 0 46.12
7 1 1 62.24
8 1 1 59.40
which is basically a table of inaccurate length measurements of some parts of some items recorded multiple times (not twice, actually each part has 100s of measurements). With a single select, I want to get a result like this:
record_id item_id part_id unit part_length_ratio
----------- ------- -------- ----- ----------------
1 0 0 1 123.12 / 231.24
2 0 0 1 123.09 / 239.14
3 0 1 0 231.24 / 123.12
4 0 1 0 239.14 / 123.09
5 1 0 1 45.91 / 62.24
6 1 0 1 46.12 / 59.40
7 1 1 0 62.24 / 45.91
8 1 1 0 59.40 / 46.12
which is basically selecting each part of an item as the unit and calculates the ratio of the length of other parts of the same item to this unit while matching the measurement times. I wrote a script which computes this kind of table but would like to do it with sql. I can understand if you fail to understand the question :)
for each item i
for each part unit of i
for each part other of i
if unit != other
print i.id other.part_id unit.part_id other.length / unit.length

As I said in a comment, tables are unordered sets: there is no first or second row...
... unless if you want to use the id column to explicitly order the rows.
However, can you guarantee that there will always be (exactly) two samples for each case and that the "lower ID" always match the first sample? This appears to be quite fragile as in real-life, there will probably have cases where a test will be performed twice or a test will be missing or done "late". Not mentioning concurrent access to your DB.
Can't you simply add a "sample number" column?

Compare rows and get percentage

I found it hard to find a fitting title. For simplicity let's say I have the following table:
cook_id cook_rating
1 2
1 1
1 3
1 4
1 2
1 2
1 1
1 3
1 5
1 4
2 5
2 2
Now I would like to get an output of 'good' cooks. A good cook is someone who has a rating of at least 70% of 1, 2 or 3, but not 4 or 5.
So in my example table, the cook with id 1 has a total of 10 ratings, 7 of which have type 1, 2 and 3. Only three have type 4 or 5. Therefore the cook with id 1 would be a 'good' cook, and the output should be the cook's id with the number of good ratings.
cook_id cook_rating
1 7
The cook with id 2, however, doesn't satisfy my condition, therefore should not be listed at all.
select cook_id, count(cook_rating) - sum(case when cook_rating = 4 OR cook_rating = 5 then 1 else 0 end) as numberOfGoodRatings from cook
where cook_rating in (1,2,3,4,5)
group by cook_id
order by numberOfGoodRatings desc
However, this doesn't take into account the fact that there might be more 4 or 5 than good ratings, resulting in negative outputs. Plus, the requirement of at least 70% is not included.

You can get this with a comparison in your HAVING clause. If you must have just the two columns in the result set, this can be wrapped as a sub-select select cook_id, positive_ratings FROM (...)
SELECT
cook_id,
count(cook_rating < 4 OR cook_rating IS NULL) as positive_ratings,
count(*) as total_ratings
FROM cook
GROUP BY cook_id
HAVING (positive_ratings / total_ratings) >= 0.70
ORDER BY positive_ratings DESC
Edit Note that count(cook_rating < 4) is intended to only count rows where the rating is less than 4. The MySQL documentation says that count will only count non-null rows. I haven't tested this to see if it equates FALSE with NULL but I would be surprised it it doesn't. Worst case scenario we would need to wrap that in an IF(cook_rating < 4, 1,NULL).

I suggest you change a little your schema to make this kind of queries trivial.
Suppose you add 5 columns to your cook table, to simply count the number of each ratings :
nb_ratings_1 nb_ratings_2 nb_ratings_3 nb_ratings_4 nb_ratings_5
Updating such a table when a new rating is entered in DB is trivial, just as would be recomputing those numbers if having redundancy makes you nervous. And it makes all filterings and sortings fast and easy.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Remove rows that net each other out - mysql

Related

sum function is returning wrong value

Perform action on selected columns depending on their name

How to remove duplicate observations in Stata

mysql query to calculate values local to Cartesian products of logical groups of rows

Compare rows and get percentage

Categories

Resources