How can I simplify this query? Can this query be simplified? I tried some
joins but the results were not the same as this query below. Please give me
some insights.
SELECT trafficbywebsite.`adwordsCampaignID`,
trafficbywebsite.adwordsAdGroupID, trafficbywebsite.adPlacementDomain,
trafficbywebsite.counts traffic, convertedtrafficbywebsite.counts
convertedclicks
FROM
(
SELECT `adwordsAdGroupID`, `adPlacementDomain`, COUNT(*) counts
FROM
(
SELECT GA_entrances.*
FROM
GA_entrances,
GA_conversions
WHERE
GA_entrances.clientId=GA_conversions.clientId
AND (eventLabel='myurl' OR eventLabel='myotherurl')
AND YEAR(GA_entrances.timestamp)>=2016
AND MONTH(GA_entrances.timestamp)>=6
AND YEAR(GA_conversions.timestamp)>=2016
AND MONTH(GA_conversions.timestamp)>=6
GROUP BY GA_entrances.clientId
) clickers
GROUP BY `adwordsAdGroupID`, `adPlacementDomain`
) convertedtrafficbywebsite
,(
SELECT `adwordsCampaignID`, `adwordsAdGroupID`, adPlacementDomain,
COUNT(*) counts
FROM
GA_entrances
WHERE
YEAR(timestamp)>=2016
AND MONTH(timestamp)>=6
GROUP BY `adwordsAdGroupID`, `adPlacementDomain`
) trafficbywebsite
WHERE
convertedtrafficbywebsite.counts>=(trafficbywebsite.counts/10)
ORDER BY traffic DESC
Without sample data it is difficult to be certain but it appears unlikely you can remove one of the subqueries. What you can do however is improve the way you flter for the dates. The thing to avoid is using functions on data to suit your filter criteria. For example you want data from 2016-06-01 onward, that is a single date, yet you are amending every row of data to match to a year and a month.
AND YEAR(GA_entrances.timestamp) >= 2016
AND MONTH(GA_entrances.timestamp) >= 6
AND YEAR(GA_conversions.timestamp) >= 2016
AND MONTH(GA_conversions.timestamp) >= 6
;
There is no need for all those functions, just compare to a single date:
AND GA_entrances.timestamp) >= '2016-06-01'
AND GA_conversions.timestamp >= '2016-06-01'
;
The other thing to avoid is using commas as a way to join tables. ANSI standard syntax for this 25+ years old. This is the antique way of joining:
FROM GA_entrances, GA_conversions
WHERE GA_entrances.clientId = GA_conversions.clientId
This is considered best practice:
GA_entrances.*
FROM GA_entrances
INNER JOIN GA_conversions ON GA_entrances.clientId = GA_conversions.clientId
Related
I have this SQL query running on a PHP website. This is an old site and the query was build by previous developer few years ago. But now as the site data is increased to around 230mb, this query has become pretty slow to execute. Take around 15-20 seconds. Is there any way I can make this run faster?
SELECT DISTINCT
NULL AS bannerID,
C1.url AS url,
LOWER(C1.Organization) AS company_name,
CONCAT(
'https://mywebsite.co.uk/logos/',
C1.userlogo
) AS logo_url
FROM
Company AS C1
INNER JOIN Vacancy AS V1 ON LOWER(V1.company_name) = LOWER(C1.Organization)
WHERE
V1.LiveDate <= CURDATE()
AND url = ''
AND V1.ClosingDate >= CURDATE()
AND C1.flag_show_logo = 1
As commented, your query is suffering from being non-sargable due to the use of lower function.
Additionally I suspect you can remove the distinct by using exists instead of joining your tables
select null as bannerID,
C1.url as url,
Lower(C1.Organization) as company_name,
Concat('https://mywebsite.co.uk/logos/', C1.userlogo) as logo_url
from Company c
where c.flag_show_logo = 1
and c.url = ''
and exists (
select * from Vacancy v
where v.LiveDate <= CURDATE()
and v.ClosingDate >= CURDATE()
and v.company_name = c.Organization
)
Avoid the sargable problem by changing to
ON V1.company_name = C1.Organization
and declaring those two columns to be the same collation, namely a collation ending with "_ci".
And have these composite indexes:
C1: INDEX(flag_show_logo, url, Organization, userlogo)
V1: INDEX(company_name, LiveDate, ClosingDate)
(These indexes should help Stu's answer, too.)
I have users and orders tables with this structure (simplified for question):
USERS
userid
registered(date)
ORDERS
id
date (order placed date)
user_id
I need to get array of users (array of userid) who placed their 25th order during specified period (for example in May 2019), date of 25th order for each user, number of days to place 25th order (difference between registration date for user and date of 25th order placed).
For example if user registered in April 2018, then placed 20 orders in 2018, and then placed 21-30th orders in Jan-May 2019 - this user should be in this array, if he placed 25th (overall for his account) order in May 2019.
How I can do this with MySQL request?
Sample data and structure: http://www.sqlfiddle.com/#!9/998358 (for testing you can get 3rd order as ex., not 25th, to not add a lot of sample data records).
One request is not required - if this can't be done in one request, few is possible and allowed.
You can use a correlated subquery to get the count of orders placed before the current one by a user. If that's 24 the current order is the 25th. Then check if the date is in the desired range.
SELECT o1.user_id,
o1.date,
datediff(o1.date, u1.registered)
FROM orders o1
INNER JOIN users u1
ON u1.userid = o1.user_id
WHERE (SELECT count(*)
FROM orders o2
WHERE o2.user_id = o1.user_id
AND o2.date < o1.date
OR o2.date = o1.date
AND o2.id < o1.id) = 24
AND o1.date >= '2019-01-01'
AND o1.date < '2019-06-01';
The basic inefficient way of doing this would be to get the user_id for every row in ORDERS where the date is in your target range AND the count of rows in ORDERS with the same user_id and a lower date is exactly 24.
This can get very ugly, very quickly, though.
If you're calling this from code you control, can't you do it from the code?
If not, there should be a way to assign to each row an index describing its rank among orders for its specific user_id, and select from this all user_id from rows with an index of 25 and a correct date. This will give you a select from select from select, but it should be much faster. The difficulty here is to control the order of the rows, so here are the selects I envision:
Select all rows, order by user_id asc, date asc, union-ed to nothing from a table made of two vars you'll initialize at 0.
from this, select all while updating a var to know if a row's user_id is the same as the last, and adding a field that will report so (so for each user_id the first line in order will have a specific value like 0 while the other rows for the same user_id will have a 1)
from this, select all plus a field that equals itself plus one in case the first added field is 1, else 0
from this, select the user_id from the rows where the second added field is 25 and the date is in range.
The union thingy is only necessary if you need to do it all in one request (you have to initialize them in a lower select than the one they're used in).
Edit: Well if you need the date too you can just select it along with the user_id, but calculating the number of days in sql will be a pain. Just join the result table to the users table and get both the date of 25th order and their date of registration, you'll surely be able to do the difference in code.
I'll try building an actual request, however if you want to truly understand what you need to make this you gotta read up on mysql variables, unions, and conditional statements.
"Looks too complicated. I am sure that this can be done with current DB structure and 1-2 requests." Well, yeah. Use the COUNT request, it will be easy, and slow as hell.
For the complex answer, see http://www.sqlfiddle.com/#!9/998358/21
Since you can use multiple requests, you can just initialize the vars first.
It isn't actually THAT complicated, you just have to understand how to concretely express what you mean by "an user's 25th command" to a SQL engine.
See http://www.sqlfiddle.com/#!9/998358/24 for the difference in days, turns out there's a method for that.
Edit 5: seems you're going with the COUNT method. I'll pray your DB is small.
Edit 6: For posterity:
The count method will take years on very large databases. Since OP didn't come back, I'm assuming his is small enough to overlook query speed. If that's not your case and let's say it's 10 years from now and the sqlfiddle links are dead; here's the two-queries solution:
SET #PREV_USR:=0;
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT orders.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
orders
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
Just change RANK = ? and the conditions to fit your needs. If you want to fully understand it, start by the innermost SELECT then work your way high; this version fuses the points 1 & 2 of my explanation.
Now sometimes you will have to use an API or something and it wont let you keep variable values in memory unless you commit it or some other restriction, and you'll need to do it in one query. To do that, you put the initialization one step lower and make it so it does not affect the higher statements. IMO the best way to do this is in a UNION with a fake table where the only row is excluded. You'll avoid the hassle of a JOIN and it's just better overall.
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT DERIVED_4.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
(SELECT * FROM orders
UNION
SELECT * FROM (
SELECT (#PREV_USR:=0) AS INIT_PREV_USR, 0 AS COL_2, 0 AS COL_3
) AS DERIVED_3
WHERE INIT_PREV_USR <> 0
) AS DERIVED_4
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
With that method, the thing to watch for is the amount and the type of columns in your basic table. Here orders' first field is an int, so I put INIT_PREV_USR in first then there are two more fields so I just add two zeroes with names and call it a day. Most types work, since the union doesn't actually do anything, but I wouldn't try this when your first field is a blob (worst comes to worst you can use a JOIN).
You'll note this is derived from a method of pagination in mysql. If you want to apply this to other engines, just check out their best pagination calls and you should be able to work thinks out.
Trying to calculate month to date revenue entered into the system for every day of the year by market. Current query works but it keeps timing out within my BI tool's mysql instance (set to 15 min). My BI tool may also not allow for mysql variables and if they do it would have to be conditional. I would ideally like to add more conditions.
/* Current query with subquery, this works syntactically, but is inefficient*/
SELECT
d.`Event Date`,
d.`market`,
(SELECT SUM(s.`Revenue`)
FROM time_from_start s
WHERE s.`created` <= d.`Event Date`
AND s.`Month/Year` = d.`Month/Year`
AND s.`market` = d.`market`) as 'Revenue to Date'
FROM time_from_start d
GROUP BY d.`Event Date`,d.market
Try using a query that avoids the use of a correlated subquery by using a self join instead.
This query should give the same results as your original query:
SELECT d1.`Event Date`, d1.market, SUM(d2.Revenue) AS Revenue_to_date
FROM time_from_start d1
LEFT JOIN time_from_start d2
ON d2.market = d1.market
AND d2.`Month/Year` = d1.`Month/Year`
AND d2.created <= d1.`Event Date`
GROUP BY d1.`Event Date`, d1.market
Also, make sure that there are indexes on the columns used in the query.
I have a Data as follows:
Order_id Created_on Comment
1 20-07-2015 18:35 Order Placed by User
1 20-07-2015 18:45 Order Reviewed by the Agent
1 20-07-2015 18:50 Order Dispatched
2 20-07-2015 18:36 Order Placed by User
And I am trying to find the difference between the
first and second Date
Second and third Date for each Order. How Do i Obtain this through a SQL query?
SQL is about horizontal relations - vertical relations do not exist. To a relational database they're just 2 rows, stored somewhere on a disk, and until you apply ordering to a result set the 'first and second' are just 2 randomly picked rows.
In specific cases it's possible to calculate the time difference within SQL, but rarely a good idea for performance reason, as it requires costly self-joins or subqueries. Just selecting the right data in the right order and then calculating the differences during postprocessing in C#/PHP/whatever is far more practical and faster.
I think you can use a query like this:
SELECT t1.Order_id, t1.Created_on, TIMEDIFF(mi, t1.Created_on, COALESCE(MIN(t2.Created_on), t1.Created_on)) AS toNextTime
FROM yourTable t1
LEFT JOIN yourTable t2 ON t1.Order_id = t2.Order_id AND t1.Created_on < t2.Created_on
GROUP BY t1.Order_id, t1.Created_on
Posting this even though another answer has been accepted already - and I don't disagree with the accepted answer - but there is in fact a fairly neat way to do this with mySQL variables.
This query will give you the time between stages in minutes - it can't be expressed as a datetime as it's an interval between two dates:
SELECT
Order_id,
Created_on,
Comment,
if (#last_id = Order_id, TIMESTAMPDIFF(MINUTE, #last_date, Created_on), 0) as StageMins,
#last_id := Order_id,
#last_date := Created_on
FROM tblData
ORDER BY Order_id, Created_on;
SQL Fiddle here: http://sqlfiddle.com/#!9/6ffdd/10
Info on mySQL TIMESTAMPDIFF function here: https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_timestampdiff
Im running a sql query that is returning results between dates I have selected (2012-07-01 - 2012-08-01). I can tell from the values they are wrong though.
Im confused cause its not telling me I have a syntax error but the values returned are wrong.
The dates in my database are stored in the date column in the format YYYY-mm-dd.
SELECT `jockeys`.`JockeyInitials` AS `Initials`, `jockeys`.`JockeySurName` AS Lastname`,
COUNT(`runs`.`JockeysID`) AS 'Rides',
COUNT(CASE
WHEN `runs`.`Finish` = 1 THEN 1
ELSE NULL
END
) AS `Wins`,
SUM(`runs`.`StakeWon`) AS 'Winnings'
FROM runs
INNER JOIN jockeys ON runs.JockeysID = jockeys.JockeysID
INNER JOIN races ON runs.RacesID = races.RacesID
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` <= STR_TO_DATE('2012,08,01', '%Y,%m,%d')
GROUP BY `jockeys`.`JockeySurName`
ORDER BY `Wins` DESC`
It's hard to guess what the problem is from your question.
Are you looking to summarize all the races in July and the races on the first of August? That's a slightly strange date range.
You should try the following kind of date-range selection if you want to be more precise. You MUST use it if your races.RaceDate column is a DATETIME expression.
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` < STR_TO_DATE('2012,08,01', '%Y,%m,%d') + INTERVAL 1 DAY
This will pick up the July races and the races at any time on the first of August.
But, it's possible you're looking for just the July races. In that case you might try:
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` < STR_TO_DATE('2012,07,01', '%Y,%m,%d') + INTERVAL 1 MONTH
That will pick up everything from midnight July 1, inclusive, to midnight August 1 exclusive.
Also, you're not using GROUP BY correctly. When you summarize, every column in your result set must either be a summary (SUM() or COUNT() or some other aggregate function) or mentioned in your GROUP BY clause. Some DBMSs enforce this. MySQL just rolls with it and gives strange results. Try this expression.
GROUP BY `jockeys`.`JockeyInitials`,`jockeys`.`JockeySurName`
My best guess is that the jocky surnames are not unique. Try changing the group by expression to:
group by `jockeys`.`JockeyInitials`, `jockeys`.`JockeySurName`
In general, it is bad practice to include columns in the SELECT clause of an aggregation query that are not included in the GROUP BY line. You can do this in MySQL (but not in other databases), because of a (mis)feature called Hidden Columns.