How to divide two values in same column - mysql

I am trying divide "counts". The requirements for division are the values should have the same batch ID and Carry acronym. The divisor should be the count value for "Dental -NEBD" and the dividend should be "Added from batch".
How can I do that?
Here's a data sample:
Batch Carr_Acronym DATE Count Datatype
45056 ARM 12/31/2014 20 Added from batch
45056 ARM 12/31/2014 0 Deleted from batch
45056 ARM 12/31/2014 5 Dental - NEDB
45055 CUU 12/31/2014 0 Dental - NEDB

Something like this should work. You can create a temporary table to join on and select exactly what you need. The temporary table will be dropped after the transaction. This will take your current table (in the question) and duplicate it to a temp table. It will then join the two tables (current one in the question plus the newly created temp table) based on the conditions (matching Batch and matching Carr_Acronym) and then divide the counts when the Datatypes are of the appropriate value.
CREATE TEMPORARY TABLE IF NOT EXISTS tempTable AS (SELECT * FROM MyTable);
SELECT (`a`.`Count` / `b`.`Count`) as `result`
FROM MyTable `a`
INNER JOIN tempTable `b` ON (`a`.`Batch` = `b`.`Batch`) AND (`a`.`Carr_Acronym` = `b`.`Carr_Acronym`)
WHERE a.Datatype LIKE 'added%' AND b.Datatype LIKE 'dental%';

One approach is to user conditional aggregation:
select batch, Carr_Acronym,
(sum(case when datatype = 'Added from batch' then count else 0 end) /
sum(case when datatype = 'Dental - NEDB' then count end)
) as ratio
from table t
group by batch, Carr_Acronym;

Related

How to check in MySQL whether the result of one SELECT contains another SELECT?

I have tables
CREATE TABLE one (
op INT,
value INT
);
and
CREATE TABLE two (
tp INT,
value INT
);
Now I want to get all op values for which the set of values for the op contains all values for a given tp.
I would write this as:
SELECT op FROM one AS o1 WHERE (
(SELECT value FROM one AS o2 WHERE o1.op = o2.op)
CONTAINS ALL
(SELECT value FROM two WHERE tp=<specific-value>)
)
Unfortunately, I couldn't find such a CONTAINS ALL operator and nothing which would be close that.
Table one contains 50M entries, table two contains 1M entries. On average, there are 20 different values for a single op and tp.
Consider your tables name ops and tps.
SELECT
ops.op
FROM ops
INNER JOIN tps ON tps.value = ops.value
WHERE tps.tp = 1
GROUP BY ops.op
HAVING COUNT(DISTINCT ops.value) = (SELECT COUNT(DISTINCT tps.value) FROM tps WHERE tps.tp = 1); --- You can replace 1 with any tp value.

Calculate the rate of change in SSRS

I need to calculate the rate of change in drug usage between two dates using SSRS. I am used to using SQL, therefore I am having a difficult time with SSRS.
I have the following information:
Avg_Alcohol_Use_Month Avg_Drug_Use_Month
First_interview_Date 1.63% 1.34%
1/30/2017
Followup_interview_date 2.58% .80%
6/30/2017
How do I create a report that reflects the rate of change in drug usage between two dates? I need to create the report in SSRS but, I don't know how to write a query in SSRS that will reflect the rate of change.
I cannot create the query in SQL because I only have access to the data through SSRS.
(This example is for SQL Server)
You can do it in SQL if you save the initial results in a table or a temp data structure. If you subquery, you can subtract the line's rates by the previous line's rate. This is by date, so you chose the MAX(date) for the given patient/doctor, whatever that (Primary Key?) is. In this case I have used "PatientID" to identify the patient. See below:
--Assume your values are saved in a table or other temp table
DECLARE #tmp TABLE (PatientID int, Interview_Date date, Avg_Alcohol_Use_Month decimal (4,2), Avg_Drug_Use_Month decimal (4,2))
INSERT INTO #tmp
VALUES
(1, '2017-01-30', 1.63, 1.34)
,(2, '2017-06-30', 2.58, 0.80)
,(1, '2017-03-01', 1.54, 1.23)
,(1, '2017-07-02', 3.21, 0.20)
,(2, '2017-08-23', 2.10, 4.52)
SELECT PatientID
,Interview_Date
,Avg_Alcohol_Use_Month
,Avg_Drug_Use_Month
,Avg_Alcohol_Use_Month
-
(SELECT Avg_Alcohol_Use_Month
FROM #tmp T2
WHERE T2.PatientID = T1.PatientID
AND T2.Interview_Date = (SELECT MAX(Interview_Date)
FROM #tmp T3
WHERE T3.Interview_Date < T1.Interview_Date
AND T3.PatientID = T1.PatientID
-- or whatever PK that makes the row unique for the patient.
)
) AS [Alcohol Use Rate change]
,Avg_Drug_Use_Month
-
(SELECT Avg_Drug_Use_Month
FROM #tmp T2
WHERE T2.PatientID = T1.PatientID
AND T2.Interview_Date = (SELECT MAX(Interview_Date)
FROM #tmp T3
WHERE T3.Interview_Date < T1.Interview_Date
AND T3.PatientID = T1.PatientID
-- or whatever PK makes the row unique for the patient.
)
) AS [Drug Use Rate change]
FROM #tmp T1
ORDER BY PatientID, Interview_Date
Use such a query as the dataset for SSRS.

MYSQL, Creating a view and pulling information from two tables

Okay so I have two tables:
hscust and hssales_rep
I need to create a view that shows me the reps fname and lname (as well as the customers) and show how much the customer is over on there credit balance.
This is the code I have:
CREATE VIEW OverLimit AS
SELECT
CONCAT(hssales_rep.last,hssales_rep.first) AS Rep,
CONCAT(hscust.last,hscust.first) AS Cust,
SUM(credit_limit - balance)
FROM hscust
INNER JOIN hssales_rep ON hscust.sales_rep = hssales_rep.repid
And it returns an empty result.
Any help is greatly appreciated!
salesrep table
cust table
A CREATE VIEW statement doesn't return a resultset.
A SELECT statement can return an empty resultset. But we'd expect the SELECT statement in your view definition to return either a single row, or throw an error.
I suggest you break this down a bit.
1) What problem is being solved by the CREATE VIEW statement. Why do you need a view?
2) Before you write a CREATE VIEW statement, first develop and test a SELECT statement that returns the required resultset. Do that before you put that into a view definition.
I also strongly recommend that you qualify all column references in the SELECT statement either with the table name or (preferably) a short table alias.
If you want to return a row for each Cust with an aggregate function (e.g. SUM) in your SELECT list, then add an appropriate GROUP BY clause to your SELECT statement.
It's not clear why we would want to use a SUM aggregate function.
The difference between "credit_limit" and "balance" would be the available (remaining) credit. A negative value would indicate the balance was "over" the credit limit.
SELECT CONCAT(r.last,r.first) AS Rep
, CONCAT(c.last,c.first) AS Cust
, c.credit_limit - c.balance AS available_credit
FROM hscust c
JOIN hssales_rep r
ON c.sales_rep=r.repid
ORDER
BY CONCAT(r.last,r.first)
, CONCAT(c.last,c.first)
, c.custid
If we only want to return rows for customers that are "over" their credit limit, we can add a WHERE clause.
SELECT CONCAT(r.last,r.first) AS Rep
, CONCAT(c.last,c.first) AS Cust
, c.credit_limit - c.balance AS available_credit
FROM hscust c
JOIN hssales_rep r
ON c.sales_rep=r.repid
WHERE c.credit_limit - c.balance < 0
ORDER
BY CONCAT(r.last,r.first)
, CONCAT(c.last,c.first)
, c.custid
Again, get a SELECT statement working (returning the required resultset) before you wrap it in a CREATE VIEW.

MySQL query index & performance improvements

I have created an application to track progress in League of Legends for me and my friends. For this purpose, I collect information about the current rank several times a day into my MySQL database. To fetch the results and show the to them in the graph, I use the following query / queries:
SELECT
lol_summoner.name as name, grid.series + ? as timestamp,
AVG(NULLIF(lol.points, 0)) as points
FROM
series_tmp grid
JOIN
lol ON lol.timestamp >= grid.series AND lol.timestamp < grid.series + ?
JOIN
lol_summoner ON lol.summoner = lol_summoner.id
GROUP BY
lol_summoner.name, grid.series
ORDER BY
name, timestamp ASC
SELECT
lol_summoner.name as name, grid.series + ? as timestamp,
AVG(NULLIF(lol.points, 0)) as points
FROM
series_tmp grid
JOIN
lol ON lol.timestamp >= grid.series AND lol.timestamp < grid.series + ?
JOIN
lol_summoner ON lol.summoner = lol_summoner.id
WHERE
lol_summoner.name IN (". str_repeat('?, ', count($names) - 1) ."?)
GROUP BY
lol_summoner.name, grid.series
ORDER BY
name, timestamp ASC
The first query is used in case I want to retrieve all players which are saved in the database. The grid table is a temporary table which generated timestamps in a specific interval to retrive information in chunks of this interval. The two variable in this query are the interval. The second query is used if I want to retrieve information for specific players only.
The grid table is produces by the following stored procedure which is called with three parameters (n_first - first timestamp, n_last - last timestamp, n_increments - increments between two timestamps):
BEGIN
-- Create tmp table
DROP TEMPORARY TABLE IF EXISTS series_tmp;
CREATE TEMPORARY TABLE series_tmp (
series bigint
) engine = memory;
WHILE n_first <= n_last DO
-- Insert in tmp table
INSERT INTO series_tmp (series) VALUES (n_first);
-- Increment value by one
SET n_first = n_first + n_increment;
END WHILE;
END
The query works and finishes in reasonable time (~10 seconds) but I am thankful for any help to improve the query by either rewriting it or adding additional indexes to the database.
/Edit:
After review of #Rick James answer, I modified the queries as follows:
SELECT lol_summoner.name as name, (lol.timestamp div :range) * :range + :half_range as timestamp, AVG(NULLIF(lol.points, 0)) as points
FROM lol
JOIN lol_summoner ON lol.summoner = lol_summoner.id
GROUP by lol_summoner.name, lol.timestamp div :range
ORDER by name, timestamp ASC
SELECT lol_summoner.name as name, (lol.timestamp div :range) * :range + :half_range as timestamp, AVG(NULLIF(lol.points, 0)) as points
FROM lol
JOIN lol_summoner ON lol.summoner = lol_summoner.id
WHERE lol_summoner.name IN (<NAMES>)
GROUP by lol_summoner.name, lol.timestamp div " . $steps . "
ORDER by name, timestamp ASC
This improves the query execution time by a really good margin (finished way under 1s).
Problem 1 and Solution
You need a series of integers between two values? And they differ by 1? Or by some larger value?
First, create a permanent table of the numbers from 0 to some large enough value:
CREATE TABLE Num10 ( n INT );
INSERT INTO Num10 VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
CREATE TABLE Nums ( n INT, PRIMARY KEY(n))
SELECT a.n*1000 + b.n*100 + c.n*10 + d.n
FROM Num10 AS a
JOIN Num10 AS b -- note "cross join"
JOIN Num10 AS c
JOIN Num10 AS d;
Now Nums has 0..9999. (Make it bigger if you might need more.)
To get a sequence of consecutive numbers from 123 through 234:
SELECT 123 + n FROM Nums WHERE n < 234-123+1;
To get a sequence of consecutive numbers from 12345 through 23456, in steps of 15:
SELECT 12345 + 15*n FROM Nums WHERE n < (23456-12345+1)/15;
JOIN to a SELECT like one of those instead of to series_tmp.
Barring other issue, that should significantly speed things up.
Problem 2
You are GROUPing BY series, but ORDERing by timestamp. They are related, so you might get the 'right' answer. But think about it.
Problem 3
You seem to be building "buckets" (called "series"?) from "timestamps". Is this correct? If so, let's work backwards -- Turn a "timestamp" into a "bucket" number:
bucket_number = (timestamp - start) / bucket_size
By doing that throughout, you can avoid 'Problem 1' and eliminate my solution to it. That is, reformulate the entire queries in terms of buckets.

Optimizing tree branch data aggregation in SQL Server 2008 (recursion)

I have a table containing stages and sub-stages of certain projects, and a table with specific tasks and estimated costs.
I need some way to aggregate each level (stages/sub-stages), to see how much it costs, but to do it at a minimum performance cost.
To illustrate this, I will use the following data structure:
CREATE TABLE stage
(
id int not null,
fk_parent int
)
CREATE TABLE task
(
id int not null,
fk_stage int not null,
cost decimal(18,2) not null default 0
)
with the following data:
==stage==
id fk_parent
1 null
2 1
3 1
==task==
id fk_stage cost
1 2 100
1 2 200
1 3 600
I want to obtain a table containing the total costs on each branch. Something like this:
Stage ID Total Cost
1 900
2 300
3 600
But, I also want it to be productive. I don't want to end up with extremely bad solutions like The worst algorithm in the world. I mean this is the case. In case I'll request the data for all the items in the stage table, with the total costs, each total cost will be evaluated D times, where D is the depth in the tree (level) at which it is situated. I am afraid I'll hit extremely low performances at large amounts of data with a lot of levels.
SO,
I decided to do something which made me ask this question here.
I decided to add 2 more columns to the stage table, for caching.
...
calculated_cost decimal(18,2),
date_calculated_cost datetime
...
So what I wanted to do is pass another variable within the code, a datetime value which equals to the time when this process was started (pretty much unique). That way, if the stage row already has a date_calculated_cost which equals to the one I'm carrying, I don't bother calculating it again, and just return the calculated_cost value.
I couldn't do it with Functions (updates are needed to the stage table, once costs are calculated)
I couldn't do it with Procedures (recursion within running cursors is a no-go)
I am not sure temporary tables are suitable because it wouldn't allow concurrent requests to the same procedure (which are least likely, but anyway I want to do it the right way)
I couldn't figure out other ways.
I am not expecting a definite answer to my question, but I will reward any good idea, and the best will be chosen as the answer.
1. A way to query the tables to get the aggregated cost.
Calculate the cost for each stage.
Use a recursive CTE to get the level for each stage.
Store the result in a temp table.
Add a couple of indexes to the temp table.
Update the cost in the temp table in a loop for each level
The first three steps is combined to one statement. It might be good for performance to do the first calculation, cteCost, to a temp table of it's own and use that temp table in the recursive cteLevel.
;with cteCost as
(
select s.id,
s.fk_parent,
isnull(sum(t.cost), 0) as cost
from stage as s
left outer join task as t
on s.id = t.fk_stage
group by s.id, s.fk_parent
),
cteLevel as
(
select cc.id,
cc.fk_parent,
cc.cost,
1 as lvl
from cteCost as cc
where cc.fk_parent is null
union all
select cc.id,
cc.fk_parent,
cc.cost,
lvl+1
from cteCost as cc
inner join cteLevel as cl
on cc.fk_parent = cl.id
)
select *
into #task
from cteLevel
create clustered index IX_id on #task (id)
create index IX_lvl on #task (lvl, fk_parent)
declare #lvl int
select #lvl = max(lvl)
from #task
while #lvl > 0
begin
update T1 set
T1.cost = T1.cost + T2.cost
from #task as T1
inner join (select fk_parent, sum(cost) as cost
from #task
where lvl = #lvl
group by fk_parent) as T2
on T1.id = T2.fk_parent
set #lvl = #lvl - 1
end
select id as [Stage ID],
cost as [Total Cost]
from #task
drop table #task
2. A trigger on table task that maintains a calculated_cost field in stage.
create trigger tr_task
on task
after insert, update, delete
as
-- Table to hold the updates
declare #T table
(
id int not null,
cost decimal(18,2) not null default 0
)
-- Get the updates from inserted and deleted tables
insert into #T (id, cost)
select fk_stage, sum(cost)
from (
select fk_stage, cost
from inserted
union all
select fk_stage, -cost
from deleted
) as T
group by fk_stage
declare #id int
select #id = min(id)
from #T
-- For each updated row
while #id is not null
begin
-- Recursive update of stage
with cte as
(
select s.id,
s.fk_parent
from stage as s
where id = #id
union all
select s.id,
s.fk_parent
from stage as s
inner join cte as c
on s.id = c.fk_parent
)
update s set
calculated_cost = s.calculated_cost + t.cost
from stage as s
inner join cte as c
on s.id = c.id
cross apply (select cost
from #T
where id = #id) as t
-- Get the next id
select #id = min(id)
from #T
where id > #id
end