MySQL Percentage Rank without the function - mysql

I am forced to create a percent rank on a number of columns in a database table and I am really struggling with this. Usually I would use the PERCENT_RANK function but I am forced to use conventional query as our MySQL version doesn't offer that function.
I have a table that contains the columns:
UID Total-Orders-Placed Last-Order-Date-Diff
12884 8 351
10985 11 106
30613 3 43
30820 2 134
23421 9 76
I would like to add 2 ranking columns as below:
UID Total-Orders-Placed Last-Order-Date-Diff rec_rank freq_rank
12884 8 351 0.34 0.86
10985 11 106 0.64 0.91
30613 3 43 0.85 0.59
30820 2 134 0.57 0.40
23421 9 76 0.77 0.88
In reality I have thousands of rows and additional columns but that's the gist. I have been able to do it perfectly in Excel but I am really struggling to convert into Query/Views in our MySQL Database so our data can be viewed realtime.
I have tried PERCENT_RANK() but as I mention above this function isn't available to us.
I have tried the queries discussed here without too much success yet: http://code.openark.org/blog/mysql/sql-ranking-without-self-join
Any help writing the code or to give me a better understanding of it would really help.

Related

5 point average in SSRS

I try to put a 5 point avg in my chart. I add a trendline, but it looks like this:
And then I created a new series to calculate there the avg. and this looks like this:
but I would like to show this in a 5 point average. How can I do this?
This answer is based on my experience with Excel, not reporting services, but it is probably the same problem.
Your chart is probably a scatter plot rather than a line chart (note: this is Excel terminology). A scatter plot does not have an intrinsic ordering in the data. A line chart does.
The solution (for a scatter plot) is simply to sort the data by the x-values. The same will probably work for you. If you are pulling the data from a database, then order by can accomplish this. Otherwise, you can sort the data in the application.
Using this post as a starting point you can see that it is possible to calculate a moving average for a chart using the SQL query that pulls the data from the database.
For example, using this table in my database called mySalesTable
myDate sales myDate sales myDate sales
---------- ------ ---------- ------ ---------- ------
01/01/2015 456 16/01/2015 546 31/01/2015 658
02/01/2015 487 17/01/2015 12 01/02/2015 121
03/01/2015 245 18/01/2015 62 02/02/2015 654
04/01/2015 812 19/01/2015 516 03/02/2015 261
05/01/2015 333 20/01/2015 1 04/02/2015 892
06/01/2015 449 21/01/2015 65 05/02/2015 982
07/01/2015 827 22/01/2015 15 06/02/2015 218
08/01/2015 569 23/01/2015 656 07/02/2015 212
09/01/2015 538 24/01/2015 25 08/02/2015 312
10/01/2015 455 25/01/2015 549 09/02/2015 21
11/01/2015 458 26/01/2015 261
12/01/2015 542 27/01/2015 21
13/01/2015 549 28/01/2015 21
14/01/2015 432 29/01/2015 61
15/01/2015 685 30/01/2015 321
You can pull out this data, and create a Moving average based on the last 5 dates by using the following query for your dataset
SELECT mst.myDate, mst.sales, avg(mst_past.sales) AS moving_average
FROM mySalesTable mst
JOIN mySalesTable as mst_past
ON mst_past.myDate
BETWEEN DATEADD(D, -4, mst.myDate) AND mst.myDate
GROUP BY mst.myDate, mst.sales
ORDER BY mst.myDate ASC
This is effectively joining a sub-table for each row consisting of the previous 4 dates and the current date, and finds the average for these dates, outputting that as the column moving_average
You can then chart both these fields as normal, to give the following output (with the data table so you and see the actual calculated moving average)
Hopefully this will help you. Please let me know if you require further assistance

Mysql search for exact number in field

in my MySql DB, i have column containing that kind of value
63 61 57 52 50 47 46 44 43 34 33 27 23 22 21 10 5 3 2 1
Those numbers are separated by tab.
Impossible to get the good result for a simple query that would aid something like this
SELECT * FROM mytable WHERE mycolumn = 63
I'm not sure if "=" is the good method, i've also tried LIKE, IN and even FIND_IN_SET
I need some help :)

Want to Calculate Hightest Pay Value For a table?

I want a list of employees who
have worked on the activity that has the highest Total Pay value.
don't use code such as …where actid = 151…ect
• Note: Total Pay worked for an activity is the sum of the (Total Hours Worked * matching
Hourly Rate)
(e.g. Total Pay for Activity 151 is 10.5 hrs # $50.75 + 11.5 hrs # $25 + 3hrs # $33,)
You must use a subquery in your
solution.
ACTID HRSWORKED HOURLYRATE Total Pay
163 10 45.5 455
163 8 45.5 364
163 6 45.5 273
151 5 50.75 253.75
151 5.5 50.75 279.125
155 10 30 300
155 10 30 300
165 20 25 500
155 10 30 300
155 8 27 216
151 11.5 25 287.5
151 1 33 33
151 1 33 33
151 1 33 33
You time and effort much appreciated. Thanks !!
Without knowledge of the schema, I can only provide a possible sketch (you'll have to compute total pay and provide all necessary JOINs and predicates):
SELECT DISTINCT(employee id) -- reconfigure if more then just employee id
FROM <table(s)>
[WHERE...]
{ WHERE | AND } total pay = (SELECT MAX(total pay)
FROM <table(s)>
[WHERE...]);
I used DISTINCT because it's possible to have more than one activity with the same MAX value and overlapping employees. If you're including ACTID in the output, then you won't need DISTINCT because the same employee shouldn't be on a project twice (unless they are tracked by roles on a project in which case a single employee might have multiple roles - it all depends on the data set).

Need Help regarding this IIF statement

I entered this iif statement and it says expression you entered is to complex can someone give me advice on how to approach this. Do you think i should split the formula.
IIf([MarkUpI]=100 And [Stock/NonStock1]="Stock",[QTY1]+[UnitPrice1]*1.22,
IIf([MarkUpI]=101 And [Stock/NonStock1]="Stock",[QTY1]+[UnitPrice1]*1.22,
IIf([MarkUpI]=200 And [Stock/NonStock1]="Stock",[QTY1]+[UnitPrice1]*1.22,
IIf([MarkUpI]=201 And [Stock/NonStock1]="Stock",[QTY1]+[UnitPrice1]*1.22,
IIf([MarkUpI]=300 And [Stock/NonStock1]="Stock",[QTY1]+[UnitPrice1]*1,
IIf([MarkUpI]=400 And [Stock/NonStock1]="Stock",[QTY1]+[UnitPrice1]*1.05,
IIf([MarkUpI]=500 And [Stock/NonStock1]="Stock",[QTY1]+[UnitPrice1]*1.03,
IIf([MarkUpI]=600 And [Stock/NonStock1]="Stock",[QTY1]+[UnitPrice1]*22,
IIf([MarkUpI]=100 And [Stock/NonStock1]="Non-Stock",[QTY1]+[UnitPrice1]*1.22,
IIf([MarkUpI]=101 And [Stock/NonStock1]="Non-Stock",[QTY1]+[UnitPrice1]*1.05,
IIf([MarkUpI]=200 And [Stock/NonStock1]=Non-Stock,[QTY1]+[UnitPrice1]*1.22,
IIf([MarkUpI]=201 And [Stock/NonStock1]=Non-Stock,[QTY1]+[UnitPrice1]*1.05,
IIf([MarkUpI]=300 And [Stock/NonStock1]=Non-Stock,[QTY1]+[UnitPrice1]*1,
IIf([MarkUpI]=400 And [Stock/NonStock1]=Non-Stock,[QTY1]+[UnitPrice1]*1.05,
IIf([MarkUpI]=500 And [Stock/NonStock1]=Non-Stock,[QTY1]+[UnitPrice1]*1.03,
IIf([MarkUpI]=600 And [Stock/NonStock1]=Non-Stock,[QTY1]+[UnitPrice1]*22,0))))))))))))))))
Consider using Switch() function instead of embedded iifs.
You could also leave those 5 cases where the factor is 1.22 as a "Else" case
It seems you could also create a table with the different [MarkUpI] and [Stock/NonStock1] values and JOIN that table to get what you need ?
It appears Access has a limit to the amount of nested IIFs, but in your case you can divide it into 2 main IIFs (for Stock and Non-Stock), with the others nested inside these, and remove the common calculation as follows:
CalcResult: [QTY1]+[UnitPrice1] *
IIf([Stock/NonStock1]="Stock",
IIf([MarkUpI]=100,1.22,
IIf([MarkUpI]=101,1.22,
IIf([MarkUpI]=200,1.22,
IIf([MarkUpI]=201,1.22,
IIf([MarkUpI]=300,1,
IIf([MarkUpI]=400,1.05,
IIf([MarkUpI]=500,1.03,
IIf([MarkUpI]=600,22,0)))))))),
IIf([Stock/NonStock1]="Non-Stock",
IIf([MarkUpI]=100,1.22,
IIf([MarkUpI]=101,1.05,
IIf([MarkUpI]=200,1.22,
IIf([MarkUpI]=201,1.05,
IIf([MarkUpI]=300,1,
IIf([MarkUpI]=400,1.05,
IIf([MarkUpI]=500,1.03,
IIf([MarkUpI]=600,22,0))))))))))
NOTE: I'm not saying this is the best way to do it (I also agree that a lookup table would be better) but this will get you around your problem.
If you really want to do it all in a query, then use the Switch function like this:
[QTY1]+[UnitPrice1] *
IIf([Stock/NonStock1]="Stock",
Switch([MarkUpI]=100,1.22,[MarkUpI]=101,1.22,[MarkUpI]=200,1.22,[MarkUpI]=201,1.22,[MarkUpI]=300,1,[MarkUpI]=400,1.05,[MarkUpI]=500,1.03,[MarkUpI]=600,22,0),
Switch([MarkUpI]=100,1.22,[MarkUpI]=101,1.05,[MarkUpI]=200,1.22,[MarkUpI]=201,1.05,[MarkUpI]=300,1,[MarkUpI]=400,1.05,[MarkUpI]=500,1.03,[MarkUpI]=600,22,0))
But you really want to use a lookup table as the others have strongly suggested. It would be a LOT easier to change the values later and would look like this:
Stocked MarkupI Amt
Stock 100 1.22
Stock 101 1.22
Stock 200 1.22
Stock 201 1.22
Stock 300 1
Stock 400 1.05
Stock 500 1.03
Stock 600 22
Non-Stock 100 1.22
Non-Stock 101 1.05
Non-Stock 200 1.22
Non-Stock 201 1.05
Non-Stock 300 1
Non-Stock 400 1.05
Non-Stock 500 1.03
Non-Stock 600 22

To create new DB connection or not?

I'm running a cron job (every 15 minutes) which takes about a minute to execute. It makes lots of API calls and stores data to the database.
Right now I create a mysql connection at the beginning and use the same connection through out the code. Most of the time is spent making the API calls.
Will it be more efficient to create a new database connection only when it's time to store the data (below)?
Kill the last connection
Wait for API call to complete
Create new DB connection
Execute query
Goto 1
[Edit] Here's the MYSQL report. I'm new to mysql - is there any reason to re-connect to DB based on the following report?
1 MySQL 5.1.26-rc-5.1.26r uptime 0 1:8:58 Tue Jun 15 21:25:03 2010
2
3 __ Key _________________________________________________________________
4 Buffer used 33.00k of 24.00M %Used: 0.13
5 Current 4.52M %Usage: 18.84
6 Write hit 33.33%
7 Read hit 69.16%
8
9 __ Questions ___________________________________________________________
10 Total 1.75k 0.4/s
11 COM_QUIT 319.92k 77.3/s %Total: 18312.
12 -Unknown 319.90k 77.3/s 18311.
13 DMS 1.53k 0.4/s 87.58
14 Com_ 199 0.0/s 11.39
15 QC Hits 1 0.0/s 0.06
16 Slow 144 0.0/s 8.24 %DMS: 9.41
17 DMS 1.53k 0.4/s 87.58
18 SELECT 1.22k 0.3/s 69.83 79.74
19 INSERT 155 0.0/s 8.87 10.13
20 UPDATE 155 0.0/s 8.87 10.13
21 REPLACE 0 0/s 0.00 0.00
22 DELETE 0 0/s 0.00 0.00
23 Com_ 199 0.0/s 11.39
24 check 86 0.0/s 4.92
25 show_status 41 0.0/s 2.35
26 set_option 23 0.0/s 1.32
27
28 __ SELECT and Sort _____________________________________________________
29 Scan 653 0.2/s %SELECT: 53.52
30 Range 0 0/s 0.00
31 Full join 0 0/s 0.00
32 Range check 0 0/s 0.00
33 Full rng join 0 0/s 0.00
34 Sort scan 0 0/s
35 Sort range 590 0.1/s
36 Sort mrg pass 0 0/s
37
38 __ Query Cache _________________________________________________________
39 Memory usage 43.57k of 12.00M %Used: 0.35
40 Block Fragmnt 25.35%
41 Hits 1 0.0/s
42 Inserts 916 0.2/s
43 Insrt:Prune 916:1 0.2/s
44 Hit:Insert 0.00:1
45
46 __ Table Locks _________________________________________________________
47 Waited 0 0/s %Total: 0.00
48 Immediate 1.65k 0.4/s
49
50 __ Tables ______________________________________________________________
51 Open 47 of 1024 %Cache: 4.59
52 Opened 54 0.0/s
53
54 __ Connections _________________________________________________________
55 Max used 3 of 60 %Max: 5.00
56 Total 319.92k 77.3/s
57
58 __ Created Temp ________________________________________________________
59 Disk table 2 0.0/s
60 Table 28 0.0/s
61 File 5 0.0/s
62
63 __ Threads _____________________________________________________________
64 Running 3 of 3
65 Cached 0 of 4 %Hit: 100
66 Created 3 0.0/s
67 Slow 0 0/s
68
69 __ Aborted _____________________________________________________________
70 Clients 0 0/s
71 Connects 319.86k 77.3/s
72
73 __ Bytes _______________________________________________________________
74 Sent 52.36M 12.7k/s
75 Received 23.17M 5.6k/s
It's rarely advantageous to drop connections and re-establish them. Making a connection to a DB is normally a fairly heavy process. Lots of apps create connection pools just to avoid this: make some reasonable number of connections and then keep them for long periods of time, maybe even forever, letting each thread or user take connections from the pool when you need them and then giving them back.
If you're having a problem with connections getting orphaned -- the query fails and you never manage to free the connection -- the real solution is to implement proper exception handling so that doesn't happen.
If you have one thread sitting on an inactive connection while other threads are failing because the database has hit its connection maximum, then, yes, you need to look at freeing up connections.
Side note: MySQL claims that it can make connections very quickly, so that this is less of an issue than for other database engines. I have never benchmarked this.
It's a bit hard to give an opinion considering that we have no idea what happens in (2).
Remember that the first rule of optimization is: "Don't do it". In the sense that unless you have good reasons (DB slow for other users, CPU maxed during your cron process and so on) to address the performance problem maybe it's better not to do anything.
If you do have some reason to improve the efficiency of the program, then you will have some hard numbers to compare against (example: your cron batch takes so long that you had to skip some runs, or it ends too late to satisfy user requirements, or fills up the rollback structures etc.) and you can simply apply your modification in your test environment (it looks like a simple fix, unless you forgot to tell us that it would be very complicated to implement) and see if it improves what you have measured and found lacking at the start.
I am sorry but "I wonder if this could be more efficient" without having an idea of what problem you are really trying to address is a recipe for problems.
If the bottle neck is that you do not have enough free slots to connect to the DB then, yes, close the connection when possible.
Otherwise, use it and reuse it (at least in the same request).