SQL Replace Join With Something Faster [closed] - mysql

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I have a view in SQL that uses a join and takes a lot longer than I would like for it to take. I think it would run much faster if I converted it to a subquery instead, but I'm having trouble with that.
Basically, I want to create a "target" column that calculates the 24h percent change of the price of an asset. Right now, the way I go about it is I create a first view which is the normal table and then a second view which is a copy of the first table but with date+1 that I can then use to calculate the 24h target. Below is my sql code. I am working in MySQL.
create view PricesView1 as
select Date,Symbol, avg(Price) as 'Price', avg(BTC_Dominance) as 'BTC_Dominance',
pkdummy,pkey from Prices group by Date,pkdummy,pkey, Symbol
having right(pkdummy,2)=22 and Date > '2018-11-22';
create view PricesView2 as
select sq.Date, sq.oldDate, sq.Symbol, sq.Price, newP.Price as 'NewPrice',
newP.BTC_Dominance as 'NewBTCdominance', newP.pkdummy from (
select date_add(Date, INTERVAL 1 DAY) as 'Date', Date as 'oldDate',Symbol,avg(Price) as 'Price',
avg(BTC_Dominance) as 'BTC_Dominance', pkdummy,pkey from Prices
group by Date,date_add(Date, INTERVAL 1 DAY),pkdummy,pkey, Symbol having right(pkdummy,2)=22)sq
join Prices newP on newP.Date=sq.Date and newP.Symbol=sq.Symbol
where right(newP.pkdummy,2)=22 and sq.Date > '2018-11-22' order by datetime desc;
#Use other two views to calculate target
create view priceTarget as
select pv1.Date, pv1.Symbol, avg(pv1.Price) as 'Initial Price', avg(pv2.NewPrice) as 'Price24hLater',
avg(((pv2.NewPrice-pv1.Price)/pv1.Price)*100) as 'Target24hChange',
avg(((pv2.NewBTCdominance-pv1.BTC_Dominance)/pv1.BTC_Dominance)*100) as 'BTCdominance24hChange',
pv1.pkey from PricesView1 pv1
join PricesView2 pv2 on pv1.Date=pv2.oldDate and pv1.Symbol=pv2.Symbol
group by pv1.Date, pv1.Symbol;
Here is a screenshot of the output of the query: SELECT * FROM priceTarget WHERE symbol = 'btc' ORDER BY date desc;
Any thoughts on how I can achieve the same result with a faster query that avoids using a join?
Any help would be very much appreciated!
EDIT: I guess it just comes down to the fact that I simply have a lot of data being loaded. I created a new first view to filter my data ahead of time and that reduced the load times from around 32 seconds to just over 10 seconds. Thanks to those who helped!

In the creation of PriceView2 there seems to be some unecessary code
Like the order by at the end and you calculate the Price and BTC but don't use them in the priceTarget view (you use the already available values from PriceView1). I'd think you left it there to have unique date/symbol, you can use a select DISTINCT to achieve the same result.
I don't know if it is intentional but the BTC and price are calculated from an average in PricesView1 and they are not in PricesView2.
This is my suggestion for the PricesView2:
create view PricesView2 as
select
sq.Date,
newP.Date,
sq.Symbol,
newP.Price as 'NewPrice',
newP.BTC_Dominance as 'NewBTCdominance',
newP.pkdummy
from (
select distinct
Date as 'oldDate',
Symbol,
pkdummy,
pkey
from Prices
having right(pkdummy,2)=22) sq
join Prices newP on
newP.Date=date_add(sq.oldDate, INTERVAL 1 DAY)
and newP.Symbol=sq.Symbol
where right(newP.pkdummy,2)=22
and sq.Date > '2018-11-22'
My understanding of views is that they are comparable to macros in other languages: more like code replacement than pre-computation.
So when you do in priceTarget avg(pv1.Price) considering that pv1.Price is defined as avg(Price) you're averaging an average.
In addition to the changes I suggested above I'd change PricesView2 to calculate the new price and BTC average so the priceTarget view doesn't have to
Last in your priceTarget view you should also group by pv1.pkey in addition to pv1.Date and pv1.symbol.

I would first do some analysis on the queries themselves to find out what is causing the bottle neck i.e. how the tables are being called, how many rows each table returns, which indexes are being used, etc. Simple things like reordering the tables in the FROM clause could help performance. Your queries may lacking an index or two that may greatly improve performance.

Related

Do we have a workaround to use alias with 'where' in sql

Sales :
Q1) Return the name of the agent who had the highest increase in sales compared to the previous year
A) Initially I wrote the following query
Select name, (sales_2018-sales_2017) as increase
from sales
where increase= (select max(sales_2018-sales_2017)
from sales)
I got an error saying I cannot use increase with the keyword where because "increase" is not a column but an alias
So I changed the query to the following :
Select name, (sales_2018-sales_2017) as increase
from sales
where (sales_2018-sales_2017)= (select max(sales_2018-sales_2017)
from sales)
This query did work, but I feel there should be a better to write this queryi.e instead of writing where (sales_2018-sales_2017)= (select max(sales_2018-sales_2017) from sales). So I was wondering if there is a work around to using alias with where.
Q2) suppose the table is as following, and we are asked to return the EmpId, name who got rating A for consecutive 3 years :
I wrote the following query its working :
select id,name
from ratings
where rating_2017='A' and rating_2018='A' and rating_2019='A'
Chaining 3 columns (ratings_2017,rating_2018,rating_2019) with AND is easy, I want know if there is a better way to chain columns with AND when say we want to find a employee who has rating 'A' fro 10 consective years.
Q3) Last but not the least, I'm really interested in learning to write intermediate-complex SQL queries and take my sql skills to next level. Is there a website out there that can help me in this regard ?
1) You are referencing an expression with a table column value, and therefore you would need to define the expression first(either using an inline view/cte for increase). After that you can refer it in the query
Eg:
select *
from ( select name, (sales_2018-sales_2017) as increase
from sales
)x
where x.increase= (select max(sales_2018-sales_2017)
from sales)
Another option would be to use analytical functions for getting your desired results, if you are in mysql 8.0
select *
from ( select name
,(sales_2018-sales_2017) as increase
,max(sales_2018-sales_2017) over(partition by (select null)) as max_increase
from sales
)x
where x.increase=x.max_increase
Q2) There are alternative ways to write this. But the basic issue is with the table design where you are storing each rating year as a new column. Had it been a row it would have been more easy.
Here is another way
select id,name
from ratings
where length(concat(rating_2017,rating_2018,rating_2019))-
length(replace(concat(rating_2017,rating_2018,rating_2019)),'A','')=3
Q3) Check out some example of problems from hackerrank or https://msbiskills.com/tsql-puzzles-asked-in-interview-over-the-years/. You can also search for the questions and answers from stackoverflow to get solutions to tough problems people faced
Q1 : you can simply order and limit the query results (hence no subquery is necessary) ; also, column aliases are allowed in the ORDER BY clause
SELECT
name,
sales_2018-sales_2017 as increase
FROM sales
ORDER BY increase DESC
LIMIT 1
Q2 : your query is fine ; other options exists, but they will not make it faster or easier to maintain.
Finally, please note that your best option overall would be to modify your database layout : you want to have yearly data in rows, not in columns ; there should be only one column to store the year instead of several. That would make your queries simpler to write and to maintain (and you wouldn’t need to create a new column every new year...)

Write a query that will compare its results to the same results but with another date as reference

I have a query that compares the final balance of a month with the final balance of the same month but from the year before.
The query works just fine, the issue is when I want to check against more than 2 years before, a query was made by my predecessor but this query takes too much time to print the results, it just adds another query per year of what we want to see, so the higher the year, the larger the query.
Another predecessor created a pivot table to see the results to present his information, only showing up to 3 years before, the query itself is good but when we want to display the whole information due to all the joins and unions the query becomes inefficient time-wise.
The project has been recently passed on to me, I see the original(structure/backbone) query looks good in order to achieve the results of the months final balance compared to last years monthly final balance, but I wish to make a more dynamic report regardless of the year/month we're looking into, and not just entirely hard coded or with repetition of the same query over and over again. But I've literally hit a wall since I can't come up with any idea of how to make it work in a more dynamic way. I'm fairly new to reporting and data analysis and that's basically what's limiting my progress.
SELECT T2.[Segment_0]+'-'+T2.[Segment_1]+'-'+T2.[Segment_2] Cuenta,
T2.[AcctName], SUM(T0.[Debit]) Debito, SUM(T0.[Credit]) Credito,
SUM(T0.[Debit])-SUM(T0.[Credit]) Saldo
FROM [server].[DB1].[dbo].[JDT1] T0
INNER JOIN [server].[DB1].[dbo].[OJDT] T1
ON T1.[TransId] = T0.[TransId]
INNER JOIN [server].[DB1].[dbo].[oact] T2
ON T2.[AcctCode] = T0.[Account]
WHERE T0.[RefDate] >= '2007-12-31' AND T0.[RefDate] <= '2016-06-30'
GROUP BY T2.[Segment_0]+'-'+T2.[Segment_1]+'-'+T2.[Segment_2],T2.[AcctName]
I'm not looking for someone to do this for me, but for someone who can point me and guide through the best possible course of action to achieve this.
Here are some suggestions:
It isn't clear to me why you need [server].[DB1].[dbo].[OJDT] T1. Its data doesn't appear in the output and it isn't needed to join T0 to T2. If you can omit it, do so.
If you can't omit it because you need to exclude transactions from T0 that aren't in T1, use an EXISTS clause rather than joining it in.
Use a CTE to group the T0 records by Account, and then join the CTE to T2. That way T2 doesn't have to join to every record in T0, just the summarized result. You also don't need to group by your composite field and your account name, because if you do your grouping in the CTE, they won't be grouped.
Here's a sort of outline of what that would look like:
;
WITH Summed as (
SELECT Account
, SUM(Credito) as SumCredito
...
FROM [JDT1] T0
WHERE T0.[RefDate] >= ...
GROUP BY Account
)
SELECT (.. your composite segment field ..)
, AccountName
, SumCredito
FROM Summed T1
JOIN [oact] T2
ON T1.account = T2.acctcode
If you want dynamic dates, you will probably need to parameterize this and turn it into a stored proc if it isn't one already.
Push as much formatting (which includes pivoting already-grouped data from a list into a matrix) into the reporting tool as possible. Achieving dynamic pivoting is tricky in T-SQL but trivial in SSRS, to pick just one tool.
Remember, you can always dynamically set the column headers in your tool: you don't have to change the column names in your data.
Hope this helps.

Comparing two sets of data from same table and column without calling the table twice in MySQL

I am trying to make my query more efficient because it is still heavy and in the future it will get allot worst.
Here is my query:
SELECT SUM(fb_diff.shares) shares
FROM (
SELECT (SUM(fb.shares) - SUM(fbs.shares)) shares
FROM (
SELECT post_id, shares
FROM wp_facebook_total_stats
WHERE date = '2014-08-01 00:00:00'
GROUP BY post_id
) fbs
LEFT JOIN wp_facebook_total_stats fb ON fb.post_id = fbs.post_id
WHERE fb.date = '2014-09-28'
) fb_diff
It works... I get the data... But is there a way to the same without getting the same table twice?
Because when I do EXPLAIN, I get this:
2 DERIVED fb ALL post_id NULL NULL NULL 588849 Using where
3 DERIVED wp_facebook_total_stats index post_id post_id 8 NULL 588849 Using where
If you are trying to get the difference between post shares based on different dates or lapsed time and don't want to join recursively to the same table, I can see at least a couple of options:
Create a view that does this ahead of time and can be cached then query the view.
Pull the data into an array within your code by changing your select statement to group on date and post_id, then doing the math within your code to show shares differences.
Modify your schema to better meet your needs, if possible. For example add a column(s) to wp_facebook_total_stats which shows difference between shares versus previous day, previous week, previous month, etc. Whatever you will need to get the job done.
Each option has its benefits and drawbacks, consider them carefully.
Hope this helps, good luck.

SQL Server: Selecting DateTime and grouping by Date

This simple SQL problem is giving me a very hard time. Either because I'm seeing the problem the wrong way or because I'm not that familiar with SQL. Or both.
What I'm trying to do: I have a table with several columns and I only need two of them: the datetime when the entry was created and the id of the entry. Note that the hours/minutes/seconds part is important here.
However, I want to group my selection according to the DATE part only. Otherwise all groups will most likely have 1 element.
Here's my query:
SELECT MyDate as DateCr, COUNT(Id) as Occur
FROM MyTable tb WITH(NOLOCK)
GROUP BY CAST(tb.MyDate as Date)
ORDER BY DateCr ASC
However I get the following error from it:
Column "MyTable.MyDate" is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If I don't do the cast in the GROUP BY, everything fine. If I cast MyDate to DATE in the SELECT and keep the CAST from GROUP BY, everything fine once more. Apparently it wants to keep the same DATE or DATETIME format in the GROUP BY as in the SELECT.
My approach can be completely wrong so I am not necessarily looking to fix the above query, but to find the proper way to do it.
LE: I get the above error on line 1.
LE2: On a second look, my question indeed is not very explicit. You can ignore the above approach if it is completely wrong. Below is a sample scenario
Let me tell you what I need: I want to retrieve (1) the DateTime when each entry was created. So if I have 20 entries, then I want to get 20 DateTimes. Then if I have multiple entries created on the same DAY, I want the number of those entries. For example, let's say I created 3 entries on Monday, 1 on Tuesday and 2 today. Then from my table I need the datetimes of these 6 entries + the number of entries which were created on each day (3 for 19/03/2012, 1 for 20/03/2012 and 2 for 21/03/2012).
I'm not sure why you're objecting to performing the CONVERT in both the SELECT and the GROUP BY. This seems like a perfectly logical way to do this:
SELECT
DateCr = CONVERT(DATE, MyDate),
Occur = COUNT(Id)
FROM dbo.MyTable
GROUP BY CONVERT(DATE, MyDate)
ORDER BY DateCr;
If you want to keep the time portion of MyDate in the SELECT list, why are you bothering to group? Or how do you expect the results to look? You'll have a row for every individual date/time value, where the grouping seems to indicate you want a row for each day. Maybe you could clarify what you want with some sample data and example desired results.
Also, why are you using NOLOCK? Are you willing to trade accuracy for a haphazard turbo button?
EDIT adding a version for the mixed requirements:
;WITH d(DateCr,d,Id) AS
(
SELECT MyDate, d = CONVERT(DATE, MyDate), Id
FROM dbo.MyTable)
SELECT DateCr, Occur = (SELECT COUNT(Id) FROM d AS d2 WHERE d2.d = d.d)
FROM d
ORDER BY DateCr;
Even though this is an old post, I thought I would answer it. The solution below will work with SQL Server 2008 and above. It uses the over clause, so that the individual lines will be returned, but will also count the rows grouped by the date (without time).
SELECT MyDate as DateCr,
COUNT(Id) OVER(PARTITION BY CAST(tb.MyDate as Date)) as Occur
FROM MyTable tb WITH(NOLOCK)
ORDER BY DateCr ASC
Darren White

MySQL adding the difference between time values to find the avg difference.

I have a column that is Time formatted it needs to be sorted newest to oldest. What I would like to do is find the differences in time between each adjoin record. The tricky part is I need to sum all of the time differences then divide by the count – 1 of all the time records. Can this be done in MySQL
Im sorry if i am being a bit too wordy, but i cant quite glean your level of mysql experience.
Also apologies if i dont understand your question. But here goes...
First of all, you dont need to sum and devide, MySQL has an average function for you called avg(). See here for details
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
What you want can be done by sub-queries i think. For more info on subqueries look here.
http://dev.mysql.com/doc/refman/5.0/en/select.html
Basically, you want to first create a table that sorts the column.
SELECT someid, time
FROM table
ORDER BY TIME
Use that in a subquery that joins the table with itself but with a shifted index (To get the time before and time after)
SELECT *
FROM table1 as t1 INNER JOIN table2 as t2 ON t1.someid = t2.someid+1
And use avg on that
SELECT avg(t2.time-t1.time)
GROUP BY t1.someid