Determine number of factors of a given number without overlap in MySQL - mysql

I have a data set with orders of tickets. Tickets can be bought in packs of 5, or 3, as well as individually. I need to group the data using the quantity of tickets sold per order, to determine if it was a 5 pack (divisible by five), then 3 pack, or else/then individually (1 or 2 qty). So if I have a quantity of 27, I know that order consisted of five "5 packs", and 2 individual tickets.
SUM(CASE WHEN (id % 5) = 0 THEN 1 ELSE 0 END) fivepack
I have this in my query, but stringing these together for fivepack, and threepack, doesn't eliminate the starting number from the total quantity on the next operation. So a quantity of 27, would yield a result of 5 "five packs" and 9 "three packs", and then 27 "individuals".
So given a quantity, how would you first divide by a large factor, get the remainder and divide by the smaller, then finally handle the remainder?
Edit:
The sample packs provide a discount of the purchase price(not relevant to the technical issue), so the first maximum division needs to occur first. So as Gordon Linoff asked below, in the case of 27 tickets quantity, you would take the maximum number of 5 divisions first, then pass the remainder to try to divide by 3, and then return the final remainder as individuals.
The issue is passing the value of one operation in SQL to the next operation, so so on. So I can do Math1, pass Answer1 to Math2, and then pass Answer2 to Math3.

I don't fully understand why 27 would be 5 five packs and 2 individuals rather than any of the following:
27 individuals
9 3-packs
4 5-packs, 2 3-packs, 1-individual
8 3-packs and 3 individuals
and so on.
But, if you want a greedy approach, you can use the following arithmetic:
select floor(num / 5) as five_packs,
floor( (num - 5 * floor(num / 5)) / 3) as three_packs,
num - 5 * floor(num / 5) - 3 * floor( (num - 5 * floor(num / 5)) / 3) as singles
Here is a SQL Fiddle illustrating the logic.

Related

Popularity ranking algorithm based on votes from 1 to 5

I'm developing a new Website where there's some "entities" to vote.
Every vote could be a number between 1 and 5 where 1 is the worst vote and 5 is the best vote.
Now, in the same website I have a "Popular entities chart" where I list the most popular "entities" based on their vote.
Now, I can't do a simply arithmetic average because an "entity" with one vote of 5 could have the same ranking as an "entity" with 100 votes of 5.
I thought about storing for every "entity" not ony the arithmetic average but also the numbers of votes and doing an SQL Query where I order by number of votes and arithmetic average but seems that after this, an entity with many votes of 1 could get popularity (when it isn't popular).
What algorithm could I use?
For a basic solution try order by [average vote] desc, [vote count] desc this way out of two entities with the same average, the one with 100 votes will go above the one with 1 vote, but one with average of 4.5 will never go above one with average of 5.
Edit 1
If you want 100 vote average of 4.5 to win against 10 vote average of 5, why not count votes ignoring 1, 2 and 3, or [count of votes 4 and 5] - [count of votes 1 and 2]? This way count of positive votes would pull entities up in ranking.
Edit 2
You might want to give extra importance to recent votes. Something might have changed about an entity that changed user opinion of it. Could build another average of votes made last month and adjust final ranks based on it.
Edit 3
What about calculate a [popularityScore] column and just order by it?
-- sum instead of average
-- square root of sum will reduce importance of vote count a bit
select
entity,
sqrt(sum(vote - 3)) as popularityScore
from Votes
group by entity
order by rank desc
-- 50 votes of 5 -> popularityScore = 12.25
-- 100 votes of 4 -> popularityScore = 10
-- 200 votes of 4 -> popularityScore = 14.14
-- 2000 votes of 4 -> popularityScore = 44.72
-- 2000 votes of 5 -> popularityScore = 63.25
-- 100000000 votes of 3 -> popularityScore = 0
Could calculate same score for last month and add it to this value.

Spotfire intersect first 'n' periods

Is there a way to use an Over and Intersect function to get the average sales for the first 3 periods (not always consecutive months, sometimes a month is skipped) for each Employee?
For example:
EmpID 1 is 71.67 ((80 + 60 + 75)/3) despite skipping "3/1/2007"
EmpID 3 is 250 ((350 + 250 + 150)/3).
I'm not sure how EmpID 2 would work because there are just two data points.
I've used a work-around by calculated column using DenseRank over Date, "asc", EmpID and then used another Boolean calculated column where DenseRank column name is <= 3, then used Over functions over the Boolean=TRUE column but I want to figure the correct way to do this.
There are Last 'n' Period functions but I haven't seen anything resembling a First 'n' Period function.
EmpID Date Sales
1 1/1/2007 80
1 2/1/2007 60
1 4/1/2007 75
1 5/1/2007 30
1 9/1/2007 100
2 2/1/2007 200
2 3/1/2007 100
3 12/1/2006 350
3 1/1/2007 250
3 3/1/2007 150
3 4/1/2007 275
3 8/1/2007 375
3 9/1/2007 475
3 10/1/2007 300
3 12/1/2007 200
I suppose the solution depends on where you want this data represented, but here is one example
If((Rank([Date],"asc",[EmpID])<=3) and (Max(Rank([Date],"asc",[EmpID])) OVER ([EmpID])>=3),Avg([Sales]) over ([EmpID]))
You can insert this as a calculated column and it will give you what you want (assuming your data is sorted by date when imported).
You may want to see the row numbering, and in that case insert this as a calculated column as well and name it RN
Rank([Date],"asc",[EmpID])
Explanation
Rank([Date],"asc",[EmpID])
This part of the function is basically applying a row number (labeled as RN in the results below) to each EmpID grouping.
Rank([Date],"asc",[EmpID])<=3
This is how we are taking the top 3 rows regardless if Months are skipped. If your data isn't sorted, we'd have to create one additional calculated column but the same logic applies.
(Max(Rank([Date],"asc",[EmpID])) OVER ([EmpID])>=3)
This is where we are basically ignoring EmpID = 2, or any EmpID who doesn't have at least 3 rows. Removing this would give you the average (dynamically) for each EmpID based on their first 1, 2, or 3 months respectively.
Avg([Sales]) over ([EmpID])
Now that our data is limited to the rows we care about, just take the average for each EmpID.
#Chris- Here is the solution I came up with
Step 1: Inserted a calculated column 'rank' with the expression below
DenseRank([Date],"asc",[EmpID])
Step 2: Created a cross table visualization from the data table and limited data with the expression below

Calculate max value of list of numbers with a maximum combination of "x"

ok, i'm not sure if i can explain this right.
Lets say i have a table with three columns (id, price, maxcombo)
maybe there's like 5 rows in this table with random numbers for price. 2. id is just incremental unique key)
maxcombo specified if that price can be in a combination of up to whatever number it is.
If x was 3, i would need to find the combination that has the maximum value of the sum 1-3 columns.
So say the table had:
1 - 100 - 1
2 - 50 - 3
3 - 10 - 3
4 - 15 - 3
5 - 20 - 2
the correct answer with be just row id 1.
since 100 alone (and can only be alone based on the maxcombo number)
is greater than say 50 + 20 + 15 or 20 + 15 or 10 + 20 etc.
Does that make sense?
I mean i could just calculate all the diff combinations and see which has the largest value, but i would imagine that would take a very long time if the table was larger than 5 rows.
Was wondering any math genius or super dev out there had some advice or creative way to figure this out in a more efficient manner.
Thanks ahead of time!
I built this solution to achieve the desired query. However, it hasn't been tested in terms of efficiency.
Following the example of colums 1-3:
SELECT max(a+b+c) FROM sample_table WHERE a < 3;
EDIT:
Looking at:
The correct answer will be just row id 1
...I considered maybe I misunderstood your question, and you want the query just obtain the rowid. So, I made this other one:
SELECT a FROM sum_combo WHERE a+b+c=(
SELECT max(a+b+c) FROM sum_combo WHERE a > 3
);
Which would for sure take too long in larger tables than just 5 rows.

SSIS Conditional Split Decimals

I have a database table which contains column SLA Breach. This column (int) is composed from the numbers only (e.g. 25, 70, 30, ...) expressing hours.
In SSIS I have an conditional split in the format "SLA Breach / 8 < 2"
meaning divide the SLA Breach row number by 8 and set the condition. Based on several conditions I create the derived columns afterwards.
The conditions are as this:
ISNULL(SLA_Breach)
SLA_Breach / 8 <= 2
SLA_Breach > 2 && SLA_Breach / 8 <= 5
SLA_Breach > 5 && SLA_Breach / 8 <= 10
SLA_Breach / 8 > 10
For each conditions there is a derived column to only assign an ID (9, 1, 2, 3, 4) based on the condition.
The example:
The SLA Breach is 23. Based on the condition (23/8) is the result 2,875, so the third condition should apply and the ID 2 should be assigned to derived column. However, in DB table there is ID 1 assigned to this row (second condition) as it does not take the decimals into account.
The other example with SLA Breach being 24 works OK. 24/8 is 3 so the third condition is applied and the correct ID is assigned.
So the problem is not SSIS does not take the decimals into account. How this could be fixed?
Thanks
Store the result of SLA_Breach / 8 into derived column of decimal type, then do condition check.
decSLA_Breach = (DT_DECIMAL,2) SLA_Breach / 8

Compare rows and get percentage

I found it hard to find a fitting title. For simplicity let's say I have the following table:
cook_id cook_rating
1 2
1 1
1 3
1 4
1 2
1 2
1 1
1 3
1 5
1 4
2 5
2 2
Now I would like to get an output of 'good' cooks. A good cook is someone who has a rating of at least 70% of 1, 2 or 3, but not 4 or 5.
So in my example table, the cook with id 1 has a total of 10 ratings, 7 of which have type 1, 2 and 3. Only three have type 4 or 5. Therefore the cook with id 1 would be a 'good' cook, and the output should be the cook's id with the number of good ratings.
cook_id cook_rating
1 7
The cook with id 2, however, doesn't satisfy my condition, therefore should not be listed at all.
select cook_id, count(cook_rating) - sum(case when cook_rating = 4 OR cook_rating = 5 then 1 else 0 end) as numberOfGoodRatings from cook
where cook_rating in (1,2,3,4,5)
group by cook_id
order by numberOfGoodRatings desc
However, this doesn't take into account the fact that there might be more 4 or 5 than good ratings, resulting in negative outputs. Plus, the requirement of at least 70% is not included.
You can get this with a comparison in your HAVING clause. If you must have just the two columns in the result set, this can be wrapped as a sub-select select cook_id, positive_ratings FROM (...)
SELECT
cook_id,
count(cook_rating < 4 OR cook_rating IS NULL) as positive_ratings,
count(*) as total_ratings
FROM cook
GROUP BY cook_id
HAVING (positive_ratings / total_ratings) >= 0.70
ORDER BY positive_ratings DESC
Edit Note that count(cook_rating < 4) is intended to only count rows where the rating is less than 4. The MySQL documentation says that count will only count non-null rows. I haven't tested this to see if it equates FALSE with NULL but I would be surprised it it doesn't. Worst case scenario we would need to wrap that in an IF(cook_rating < 4, 1,NULL).
I suggest you change a little your schema to make this kind of queries trivial.
Suppose you add 5 columns to your cook table, to simply count the number of each ratings :
nb_ratings_1 nb_ratings_2 nb_ratings_3 nb_ratings_4 nb_ratings_5
Updating such a table when a new rating is entered in DB is trivial, just as would be recomputing those numbers if having redundancy makes you nervous. And it makes all filterings and sortings fast and easy.