Mysql calculation in select statement - mysql

I have been doing my office work in Excel.and my records have become too much and want to use mysql.i have a view from db it has the columns "date,stockdelivered,sales" i want to add another calculated field know as "stock balance".
i know this is supposed to be done at the client side during data entry.
i have a script that generates php list/report only based on views and tables,it has no option for adding calculation fields, so i would like to make a view in mysql if possible.
in excel i used to do it as follows.
i would like to know if this is possible in mysql.
i don't have much experience with my sql but i imagine first
one must be able to select the previous row.colomn4
then add it to the current row.colomn2 minus current row.colomn3
If there is another way to achieve the same out put please suggest.

Generally speaking, SQL wasn't really intended to yield "running totals" like you desire. Other RDBMS have introduced proprietary extensions to deliver analytic functions which enable calculations of this sort, but MySQL lacks such features.
Instead, one broadly has four options. In no particular order:
Accumulate a running total in your application, as you loop over the resultset;
Alter your schema to keep track of a running total within your database (especially good in situations like this, where new data is only ever appended "to the end");
Group a self-join:
SELECT a.Sale_Date,
SUM(a.Stock_Delivered) AS Stock_Delivered,
SUM(a.Units_Sold) AS Units_Sold,
SUM(b.Stock_Delivered - b.Units_Sold) AS `Stock Balance`
FROM sales_report a
JOIN sales_report b ON b.Sale_Date <= a.Sale_Date
GROUP BY a.Sale_Date
Accumulate the running total in a user variable:
SELECT Sale_Date,
Stock_Delivered,
Units_Sold,
#t := #t + Stock_Delivered - Units_Sold AS `Stock Balance`
FROM sales_report, (SELECT #t:=0) init
ORDER BY Sale_Date

Eggyal has four good solutions. I think the cleanest way to do a running total in MySQL is using a correlated subquery -- it eliminates the group by at the end. So I would add to the list of options:
SELECT sr.Sale_Date, sr.Stock_Delivered, sr.Units_Sold,
(select SUM(sr2.Stock_Delivered) - sum(sr2.Units_Sold)
from sales_report sr2
where sr2.sale_date <= sr.sale_date
) as StockBalance
FROM sales_report sr
ORDER BY Sale_Date

SELECT
sales_report.Stock_Delivered,
sales_report.Units_Sold,
sales_report.Stock_Delivered - sales_report.Units_Sold
FROM
sales_report;

Related

Using "dynamic" variables in a MySQL query [duplicate]

I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.

Do we have a workaround to use alias with 'where' in sql

Sales :
Q1) Return the name of the agent who had the highest increase in sales compared to the previous year
A) Initially I wrote the following query
Select name, (sales_2018-sales_2017) as increase
from sales
where increase= (select max(sales_2018-sales_2017)
from sales)
I got an error saying I cannot use increase with the keyword where because "increase" is not a column but an alias
So I changed the query to the following :
Select name, (sales_2018-sales_2017) as increase
from sales
where (sales_2018-sales_2017)= (select max(sales_2018-sales_2017)
from sales)
This query did work, but I feel there should be a better to write this queryi.e instead of writing where (sales_2018-sales_2017)= (select max(sales_2018-sales_2017) from sales). So I was wondering if there is a work around to using alias with where.
Q2) suppose the table is as following, and we are asked to return the EmpId, name who got rating A for consecutive 3 years :
I wrote the following query its working :
select id,name
from ratings
where rating_2017='A' and rating_2018='A' and rating_2019='A'
Chaining 3 columns (ratings_2017,rating_2018,rating_2019) with AND is easy, I want know if there is a better way to chain columns with AND when say we want to find a employee who has rating 'A' fro 10 consective years.
Q3) Last but not the least, I'm really interested in learning to write intermediate-complex SQL queries and take my sql skills to next level. Is there a website out there that can help me in this regard ?
1) You are referencing an expression with a table column value, and therefore you would need to define the expression first(either using an inline view/cte for increase). After that you can refer it in the query
Eg:
select *
from ( select name, (sales_2018-sales_2017) as increase
from sales
)x
where x.increase= (select max(sales_2018-sales_2017)
from sales)
Another option would be to use analytical functions for getting your desired results, if you are in mysql 8.0
select *
from ( select name
,(sales_2018-sales_2017) as increase
,max(sales_2018-sales_2017) over(partition by (select null)) as max_increase
from sales
)x
where x.increase=x.max_increase
Q2) There are alternative ways to write this. But the basic issue is with the table design where you are storing each rating year as a new column. Had it been a row it would have been more easy.
Here is another way
select id,name
from ratings
where length(concat(rating_2017,rating_2018,rating_2019))-
length(replace(concat(rating_2017,rating_2018,rating_2019)),'A','')=3
Q3) Check out some example of problems from hackerrank or https://msbiskills.com/tsql-puzzles-asked-in-interview-over-the-years/. You can also search for the questions and answers from stackoverflow to get solutions to tough problems people faced
Q1 : you can simply order and limit the query results (hence no subquery is necessary) ; also, column aliases are allowed in the ORDER BY clause
SELECT
name,
sales_2018-sales_2017 as increase
FROM sales
ORDER BY increase DESC
LIMIT 1
Q2 : your query is fine ; other options exists, but they will not make it faster or easier to maintain.
Finally, please note that your best option overall would be to modify your database layout : you want to have yearly data in rows, not in columns ; there should be only one column to store the year instead of several. That would make your queries simpler to write and to maintain (and you wouldn’t need to create a new column every new year...)

Query takes too long to run

I am running the below query to retrive the unique latest result based on a date field within a same table. But this query takes too much time when the table is growing. Any suggestion to improve this is welcome.
select
t2.*
from
(
select
(
select
id
from
ctc_pre_assets ti
where
ti.ctcassettag = t1.ctcassettag
order by
ti.createddate desc limit 1
) lid
from
(
select
distinct ctcassettag
from
ctc_pre_assets
) t1
) ro,
ctc_pre_assets t2
where
t2.id = ro.lid
order by
id
Our able may contain same row multiple times, but each row with different time stamp. My object is based on a single column for example assettag I want to retrieve single row for each assettag with latest timestamp.
It's simpler, and probably faster, to find the newest date for each ctcassettag and then join back to find the whole row that matches.
This does assume that no ctcassettag has multiple rows with the same createddate, in which case you can get back more than one row per ctcassettag.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
INNER JOIN
(
SELECT
ctcassettag,
MAX(createddate) AS createddate
FROM
ctc_pre_assets
GROUP BY
ctcassettag
)
newest
ON newest.ctcassettag = ctc_pre_assets.ctcassettag
AND newest.createddate = ctc_pre_assets.createddate
ORDER BY
ctc_pre_assets.id
EDIT: To deal with multiple rows with the same date.
You haven't actually said how to pick which row you want in the event that multiple rows are for the same ctcassettag on the same createddate. So, this solution just chooses the row with the lowest id from amongst those duplicates.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
WHERE
ctc_pre_assets.id
=
(
SELECT
lookup.id
FROM
ctc_pre_assets lookup
WHERE
lookup.ctcassettag = ctc_pre_assets.ctcassettag
ORDER BY
lookup.createddate DESC,
lookup.id ASC
LIMIT
1
)
This does still use a correlated sub-query, which is slower than a simple nested-sub-query (such as my first answer), but it does deal with the "duplicates".
You can change the rules on which row to pick by changing the ORDER BY in the correlated sub-query.
It's also very similar to your own query, but with one less join.
Nested queries are always known to take longer time than a conventional query since. Can you append 'explain' at the start of the query and put your results here? That will help us analyse the exact query/table which is taking longer to response.
Check if the table has indexes. Unindented tables are not advisable(until unless obviously required to be unindented) and are alarmingly slow in executing queries.
On the contrary, I think the best case is to avoid writing nested queries altogether. Bette, run each of the queries separately and then use the results(in array or list format) in the second query.
First some questions that you should at least ask yourself, but maybe also give us an answer to improve the accuracy of our responses:
Is your data normalized? If yes, maybe you should make an exception to avoid this brutal subquery problem
Are you using indexes? If yes, which ones, and are you using them to the fullest?
Some suggestions to improve the readability and maybe performance of the query:
- Use joins
- Use group by
- Use aggregators
Example (untested, so might not work, but should give an impression):
SELECT t2.*
FROM (
SELECT id
FROM ctc_pre_assets
GROUP BY ctcassettag
HAVING createddate = max(createddate)
ORDER BY ctcassettag DESC
) ro
INNER JOIN ctc_pre_assets t2 ON t2.id = ro.lid
ORDER BY id
Using normalization is great, but there are a few caveats where normalization causes more harm than good. This seems like a situation like this, but without your tables infront of me, I can't tell for sure.
Using distinct the way you are doing, I can't help but get the feeling you might not get all relevant results - maybe someone else can confirm or deny this?
It's not that subqueries are all bad, but they tend to create massive scaleability issues if written incorrectly. Make sure you use them the right way (google it?)
Indexes can potentially save you for a bunch of time - if you actually use them. It's not enough to set them up, you have to create queries that actually uses your indexes. Google this as well.

Write a query that will compare its results to the same results but with another date as reference

I have a query that compares the final balance of a month with the final balance of the same month but from the year before.
The query works just fine, the issue is when I want to check against more than 2 years before, a query was made by my predecessor but this query takes too much time to print the results, it just adds another query per year of what we want to see, so the higher the year, the larger the query.
Another predecessor created a pivot table to see the results to present his information, only showing up to 3 years before, the query itself is good but when we want to display the whole information due to all the joins and unions the query becomes inefficient time-wise.
The project has been recently passed on to me, I see the original(structure/backbone) query looks good in order to achieve the results of the months final balance compared to last years monthly final balance, but I wish to make a more dynamic report regardless of the year/month we're looking into, and not just entirely hard coded or with repetition of the same query over and over again. But I've literally hit a wall since I can't come up with any idea of how to make it work in a more dynamic way. I'm fairly new to reporting and data analysis and that's basically what's limiting my progress.
SELECT T2.[Segment_0]+'-'+T2.[Segment_1]+'-'+T2.[Segment_2] Cuenta,
T2.[AcctName], SUM(T0.[Debit]) Debito, SUM(T0.[Credit]) Credito,
SUM(T0.[Debit])-SUM(T0.[Credit]) Saldo
FROM [server].[DB1].[dbo].[JDT1] T0
INNER JOIN [server].[DB1].[dbo].[OJDT] T1
ON T1.[TransId] = T0.[TransId]
INNER JOIN [server].[DB1].[dbo].[oact] T2
ON T2.[AcctCode] = T0.[Account]
WHERE T0.[RefDate] >= '2007-12-31' AND T0.[RefDate] <= '2016-06-30'
GROUP BY T2.[Segment_0]+'-'+T2.[Segment_1]+'-'+T2.[Segment_2],T2.[AcctName]
I'm not looking for someone to do this for me, but for someone who can point me and guide through the best possible course of action to achieve this.
Here are some suggestions:
It isn't clear to me why you need [server].[DB1].[dbo].[OJDT] T1. Its data doesn't appear in the output and it isn't needed to join T0 to T2. If you can omit it, do so.
If you can't omit it because you need to exclude transactions from T0 that aren't in T1, use an EXISTS clause rather than joining it in.
Use a CTE to group the T0 records by Account, and then join the CTE to T2. That way T2 doesn't have to join to every record in T0, just the summarized result. You also don't need to group by your composite field and your account name, because if you do your grouping in the CTE, they won't be grouped.
Here's a sort of outline of what that would look like:
;
WITH Summed as (
SELECT Account
, SUM(Credito) as SumCredito
...
FROM [JDT1] T0
WHERE T0.[RefDate] >= ...
GROUP BY Account
)
SELECT (.. your composite segment field ..)
, AccountName
, SumCredito
FROM Summed T1
JOIN [oact] T2
ON T1.account = T2.acctcode
If you want dynamic dates, you will probably need to parameterize this and turn it into a stored proc if it isn't one already.
Push as much formatting (which includes pivoting already-grouped data from a list into a matrix) into the reporting tool as possible. Achieving dynamic pivoting is tricky in T-SQL but trivial in SSRS, to pick just one tool.
Remember, you can always dynamically set the column headers in your tool: you don't have to change the column names in your data.
Hope this helps.

Getting different results from group by and distinct

this is my first post here since most of the time I already found a suitable solution :)
However this time nothing seems to help properly.
Im trying to migrate information from some mysql Database I have just read-only access to.
My problem is similar to this one: Group by doesn't give me the newest group
I also need to get the latest information out of some tables but my tables have >300k entries therefore checking whether the "time-attribute-value" is the same as in the subquery (like suggested in the first answer) would be too slow (once I did "... WHERE EXISTS ..." and the server hung up).
In addition to that I can hardly find the important information (e.g. time) in a single attribute and there never is a single primary key.Until now I did it like it was suggested in the second answer by joining with subquery that contains latest "time-attribute-entry" and some primary keys but that gets me in a huge mess after using multiple joins and unions with the results.
Therefore I would prefer using the having statement like here: Select entry with maximum value of column after grouping
But when I tried it out and looked for a good candidate as the "time-attribute" I noticed that this queries give me two different results (more = 39721, less = 37870)
SELECT COUNT(MATNR) AS MORE
FROM(
SELECT DISTINCT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG
FROM
FKT_LAB
) AS TEMP1
SELECT COUNT(MATNR) AS LESS
FROM(
SELECT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG,
LAB_PDATUM
FROM
FKT_LAB
GROUP BY
LAB_MTKNR,
LAB_STG,
LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
)AS TEMP2
Although both are applied to the same table and use "GROUP BY" / "SELECT DISTINCT" on the same entries.
Any ideas?
If nothing helps and I have to go back to my mess I will use string variables as placeholders to tidy it up but then I lose the overview of how many subqueries, joins and unions I have in one query... how many temproal tables will the server be able to cope with?
Your second query is not doing what you expect it to be doing. This is the query:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG, LAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
) TEMP2;
The problem is the having clause. You are mixing an unaggregated column (LAB_PDATUM) with an aggregated value (MAX(LAB_PDATAUM)). What MySQL does is choose an arbitrary value for the column and compare it to the max.
Often, the arbitrary value will not be the maximum value, so the rows get filtered. The reference you give (although an accepted answer) is incorrect. I have put a comment there.
If you want the most recent value, here is a relatively easy way:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG,
max(LAB_PDATUM) as maxLAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
) TEMP2;
It does not, however, affect the outer count.