MySQL subquery, alias, calculation from table joining query - mysql

Pretty new to MySQL here. I have the following query that isn't quite working the way I want: **SELECT round(sum(temp_pop.pop_total * demographics.coefficient)) as demand, pop_proj.pop_total from pop_proj, d.age, d.coefficient, d.educationalattainment from demographics d JOIN (SELECT year,sum(pop_total), age FROM pop_proj GROUP BY year, age) AS temp_pop WHERE d.age = CASE WHEN temp_pop.age < 18 then '00 to 17' WHEN temp_pop.age > 64 then '65 to 80+' ELSE '18 to 64' end;**
The sources are the subquery shown above in the syntax which I'm trying to join with a table called "demographics" with only three columns that shows education level (educational attainment), an age range (age) - shown in the case statement, and a coefficient, used in the calculation at the beginning of the query. The pop_proj table provides a year, age, and population total (pop_total column). I'm trying to use temp_pop as an alias for the subquery. I'm fairly sure the case is written out correctly. However, when I run the query, it tells me this: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'mysql> SELECT round(sum(temp_pop.pop_total * demographics.coefficient)) as deman' at line 1
I also want to group the results by year and education level, I just haven't added that in there yet.
Previously I had it written only slightly differently and it was telling me it didn't recognize the column name pop_total, but haven't gotten a result without an error yet. I may be totally off on how write this query, but hoping I'm getting close. I would appreciate some help! Thanks in advance!

It's difficult to understand what you're really after but this may get you a step closer.
The below makes the assumption that you're relating demographics to pop_total only on the contrived age range string in your CASE which is suspicious and may not be valid.
This doesn't attempt to group the results, but that should be straightforward once you can see a working detailed output that meets your needs.
SELECT d.age, d.coefficient, d.educationalattainment, temp_pop.pop_total
FROM demographics d JOIN (
SELECT `year`, age, sum(pop_total) as pop_total
FROM pop_proj
GROUP BY `year`, age
) temp_pop ON d.age =
CASE WHEN temp_pop.age < 18 THEN '00 to 17'
WHEN temp_pop.age > 64 THEN '65 to 80+'
ELSE '18 to 64' END
;

Related

mysql Query performance is low

I have a query which is running for around 2 hours in last few days. But
before that it took only 2 to 3 minutes of time. i could not able to find
the reason for its sudden slowness. Can any one help me on this?
Please find the below query explain plan[![enter image description here][1]]
[1]...
select
IFNULL(EMAIL,'') as EMAIL,
IFNULL(SITE_CD,'') as SITE_CD,
IFNULL(OPT_TYPE_CD,'') as OPT_TYPE_CD,
IFNULL(OPT_IN_IND,'') as OPT_IN_IND,
IFNULL(EVENT_TSP,'') as EVENT_TSP,
IFNULL(APPLICATION,'') as APPLICATION
from (
SELECT newsletter_entry.email email,
newsletter.site_cd site_cd,
REPLACE (newsletter.TYPE, 'OPTIN_','') opt_type_cd,
CASE
WHEN newsletter_event_temp.post_status = 'SUBSCRIBED' THEN 'Y'
WHEN newsletter_event_temp.post_status = 'UNSUBSCRIBED' THEN
'N'
ELSE ''
END
opt_in_ind,
newsletter_event_temp.event_date event_tsp,
entry_context.application application
FROM amg_toolkit.newsletter_entry,
amg_toolkit.newsletter,
(select NEWSLETTER_EVENT.* from amg_toolkit.NEWSLETTER_EVENT,
amg_toolkit.entry_context where newsletter_event.EVENT_DATE >= '2017-07-11
00:01:23' AND newsletter_event.EVENT_DATE < '2017-07-11 01:01:23' and
newsletter_event.ENTRY_CONTEXT_ID = entry_context.ENTRY_CONTEXT_ID and
entry_context.APPLICATION != 'feedbackloop') newsletter_event_temp,
amg_toolkit.entry_context
WHERE newsletter_entry.newsletter_id = newsletter.newsletter_id
AND newsletter_entry.newsletter_entry_id =
newsletter_event_temp.newsletter_entry_id
AND newsletter.TYPE IN ('OPTIN_PRIM', 'OPTIN_THRD', 'OPTIN_WRLS')
AND newsletter_event_temp.entry_context_id NOT IN
(select d.ENTRY_CONTEXT_ID from amg_toolkit.sweepstake a,
amg_toolkit.sweepstake_entry b, amg_toolkit.user_entry c,
amg_toolkit.entry_context d where a.exclude_data = 'Y' and
a.sweepstake_id=b.sweepstake_id and b.USER_ENTRY_ID=c.USER_ENTRY_ID and
c.ENTRY_CONTEXT_ID = d.ENTRY_CONTEXT_ID)
AND newsletter_event_temp.entry_context_id =
entry_context.entry_context_id
AND newsletter_event_temp.event_date >= '2017-07-11 00:01:23'
AND newsletter_event_temp.event_date < '2017-07-11 01:01:23') a;`
[1]: https://i.stack.imgur.com/cgsS1.png
dont use .*
select only the columns of data you are using in your query.
Avoid nested sub selects if you dont need them.
I don't see a need for them in this query. You query the data 3 times this way instead of just once.
Slowness can be explained by an inefficient query haveing to deal with tables that have a growing number of records.
"Not in" is resource intensive. Can you do that in a better way avoiding "not in" logic?
JOINs are usually faster than subqueries. NOT IN ( SELECT ... ) can usually be turned into LEFT JOIN ... WHERE id IS NULL.
What is the a in a.exclude_data? Looks like a syntax error.
These indexes are likely to help:
newsletter_event: INDEX(ENTRY_CONTEXT_ID, EVENT_DATE) -- in this order
You also need it for newsetter_event_temp, but since that is not possible, something has to give. What version of MySQL are you running? Perhaps you could actually CREATE TEMPORARY TABLE and ADD INDEX.

Using a COUNT value in an expression getting..does not include specified expression as part of an aggregate function

I am trying to display a warning if a bike station gets to over 90% full or less than 10% full. When i run this query I get "you are trying to execute query that does not include the iif statment... as part of an aggregate function.
Bike_locations table - Bicycle_id and Locations_ID
Locations table - Locations_ID, No_of_Spaces, Location_Address
SELECT Locations.Location_Address, Count(Bike_Locations.Bicycle_ID) AS CountOfBicycle_ID,
IIf(((([CountOfBicycle_ID]/[LOCATIONS]![No_Of_Spaces])*100)>90),"This Station is nearly full.
Need to move some bicycles out of here",IIf(((([CountOfBicycle_ID]/[LOCATIONS]![No_Of_Spaces])*100)
<10),"This station is nearly empty. Need to move some bicycles here","")) AS Warnings
FROM Locations INNER JOIN Bike_Locations ON Locations.[LOCATIONS_ID] = Bike_Locations.[LOCATIONS_ID]
GROUP BY Locations.Location_Address;
Anyone got a scooby
When you use a GROUP BY, you should have the exact same fields in both your SELECT and GROUP BY statements, except for the aggregate function that should only be specified in the SELECT
The aggregate function in your case is the COUNT(*)
The fields you aggregate on are:
in the SELECT : Location_Address and Warnings
in the GROUP BY : Location_Address only
The error message is telling you that you don't have the same in both statements.
2 solutions:
Remove the Warnings from the SELECT statement
Add the Warnings to the GROUP BY statement
Note that in MS Access SQL, you can't (unfortunately) use in the GROUP BY, the Aliases specified in the SELECT. So you have to copy over the whole field, which would be the long iif in your case
Edit: better solution proposal:
I would radically change your approach as you'll go no where with all those nested iff
Create the following Query and Name it (for instance) Stations_Occupation
SELECT L.Locations_ID AS ID,
L.Location_Address AS Addr,
L.No_of_Spaces AS TotSpace,
BL.cnt AS OccSpace,
ROUND((BL.cnt/L.No_of_Spaces*100),0) AS OccPourc
FROM Locations L
LEFT JOIN
(
SELECT Locations_ID, COUNT(*) AS cnt
FROM Bike_Locations
GROUP BY LOCATIONS_ID
) AS BL ON L.Locations_ID = BL.Locations_ID
This query will probably be a lot helpfull in many parts of your application, and not only here, as it calculates the occupation % of each station
Some examples:
Get all stations with >90% occupation:
SELECT Addr
FROM Stations_Occupation
WHERE OccPourc > 90
Get all stations with <10% occupation:
SELECT Addr
FROM Stations_Occupation
WHERE OccPourc < 10
Get Occupation level of a specific station:
SELECT OccPourc
FROM Stations_Occupation
WHERE ID=specific_station_ID
Get number of bikes and max on a specific station:
SELECT OccSpace & "/" & TotSpace
FROM Stations_Occupation
WHERE ID=specific_station_ID

Error Converting MySQL Query to SQL Server

Trying to convert below query into SQL, query works fine on MySQL. Problem seems to be the GROUP BY area. Even when I use just 1 GROUP BY field I get same error. Using query in InformaticaCloud.
ERROR
"the FROM Config_21Cent WHERE resp_ind = 'Insurance' GROUP BY
resp_Ind;;] is empty in JDBC connection:
[jdbc:informatica:sqlserver://cbo-aps-inrpt03:1433;DatabaseName=SalesForce]."
SELECT sum(Cast(Resp_Ins_Open_dol AS decimal(10,2))) as baltotal,
carrier_code,
carrier_name,
carrier_grouping,
collector_name,
dataset_loaded,
docnum,
envoy_payer_id,
loc,
market,
master_payor_grouping,
plan_class,
plan_name,
resp_ins,
resp_ind,
resp_payor_grouping,
Resp_Plan_Type,
rspphone,
state
FROM Config_21Cent
WHERE resp_ind = 'Insurance'
GROUP BY
(resp_ins + resp_payor_grouping +
carrier_code + state + Collector_Name);
Your entire query isn't going to work. The group by statement contains a single expression, the summation of a bunch of fields. The select statement contains zillions of columns without aggregates. Perhaps you intend for something like this:
select resp_ins, resp_payor_grouping, carrier_code, state, Collector_Name,
sum(Cast(Resp_Ins_Open_dol AS decimal(10,2))) as baltotal
from Config_21Cent
WHERE resp_ind = 'Insurance'
GROUP BY resp_ins, resp_payor_grouping, carrier_code, state, Collector_Name;
THis will work in both databases.
The columns in SELECT statement must be a subset (not proper subset but subset) of columns in 'GROUP BY' statement. There is no such restriction on aggregates in SELECT statement though. There could be any number of aggregates; aggregates even on columns not in GROUP BY statement can be included.

Using GROUP_CONCAT

I have three tables. TB_Main is a table of Entities. TB_BoardMembers is a table of People. TB_BoardMembersLINK is a bridging table which references the other two by ids and also has start and end dates for when a Person was on the board of an Entity. These dates are often incomplete.
I have been asked to export as part of a report a CSV with one row per Entity per year in which I have a list of board members for that year with their occupations in a single field delimited by newlines.
I don't need bml.Entity in the result but added it to try to debug. I'm getting one row where I expect 85. Tried with and without GROUP BY and the fact that the result is the same suggests I am misusing GROUP_CONCAT. How should I construct this to get the result they want?
SELECT
GROUP_CONCAT(
DISTINCT CONCAT(bm.First, ' ', bm.Last,
IF (bm.Occupation != '', ' - ', ''),
bm.Occupation) SEPARATOR "\n") as Board,
bml.Entity
FROM
TB_Main arfe,
TB_BoardMembers bm,
TB_BoardMembersLINK bml
WHERE YEAR(bml.start) <= 2011
AND (YEAR(bml.end) >= 2011 OR bml.end IS NULL)
AND bml.start > 0
AND bml.Entity = arfe.ID
GROUP BY bml.Entity
ORDER BY Board
There are a few issues with this query. The main issue appears to be that you are missing a condition to link board members to the link table, so you have a cross join, i.e. you will be returning every broadband member regardless of their start/end dates, and assuming you have 85 rows where the criteria matches, you will actually be returning each board member 85 times. This highlights a very good reason to switch from the ANSI 89 implicit joins you are using, to the ANSI 92 explicit join syntax. This article highlights some very good reasons to make the switch.
So your query would become (I've had to guess at your field names):
SELECT *
FROM TB_Main arfe
INNER JOIN TB_BoardMembersLINK bml
ON bml.Entity = arfe.ID
INNER JOIN TB_BoardMembers bm
ON bm.ID = bml.BoardMemberID
The next thing I noticed about your query is that using functions in the where clause is not very efficient at all, so because of this:
WHERE YEAR(bml.start) <= 2011
AND (YEAR(bml.end) >= 2011 OR bml.end IS NULL)
You are operating the YEAR function twice for every row, and removing any possible chance of using an index on bml.Start or bml.End (if any exist). Yet again Aaron Bertrand has written a nice article highlighting good practises when querying date ranges, it is target at SQL-Server, but the principles are still the same, so your where clause would become:
WHERE bml.Start <= '20110101'
AND (bml.End >= '20110101' OR bml.End IS NULL)
AND bml.start > 0
Your final query should then be:
SELECT bml.Entity,
GROUP_CONCAT(DISTINCT CONCAT(bm.First, ' ', bm.Last,
IF (bm.Occupation != '', ' - ', ''), bm.Occupation)
SEPARATOR "\n") as Board
FROM TB_Main arfe
INNER JOIN TB_BoardMembersLINK bml
ON bml.Entity = arfe.ID
INNER JOIN TB_BoardMembers bm
ON bm.ID = bml.BoardMemberID
WHERE bml.Start <= '20110101'
AND (bml.End >= '20110101' OR bml.End IS NULL)
AND bml.start > 0
GROUP BY bml.Entity
ORDER BY Board;
Example on SQL Fiddle
If you read up on Group_Concat
"This function returns a string result with the concatenated non-NULL values from a group."
Here in this case, the group seems to be just one group, as you say there is only one entity? I am not sure if that is the case from your description. Why dont you also group by firstname, lastname and Occupation, this may give you all the members.
I am also not sure of your joins, without real data its tough to explain that part as every join works for some set of data properly, even though its not the best way to write a query

SQL Server LEFT JOIN fails to match rows without JOIN hint

I have what appears to be a corrupt index?
Here is what is happening. I have two table-functions which the first is a set of cases and the second is a set of aware dates. These two sets have a 1 (case) to 0 or 1 (aware date) relationship. Normally I query them like;
SELECT c.CaseID, a.AwareDate
FROM Cases(#date) AS c
LEFT JOIN AwareDates(#date) AS a ON c.CaseID = a.CaseID;
The trouble is that not all of the rows from AwareDates which match seem to be JOIN'd. If I add a join hint, they then do. say;
SELECT c.CaseID, a.AwareDate
FROM Cases(#date) AS c
LEFT MERGE JOIN AwareDates(#date) AS a ON c.CaseID = a.CaseID;
What I notice from the query plan is that adding the join hint adds a sort of the AwareDate data before the join which is not there otherwise. Also, the query planner flips the join to a RIGHT OUTER JOIN when there is no hint, and of course keeps the LEFT JOIN where the hint is present.
I've done the following with no errors detected;
DBCC UPDATEUSAGE (0) WITH INFO_MESSAGES, COUNT_ROWS;
EXECUTE sp_updatestats 'resample';
DBCC CHECKDB (0) WITH ALL_ERRORMSGS, EXTENDED_LOGICAL_CHECKS;
I'm stumped... any ideas?
Here are the UDF definitions
ALTER FUNCTION dbo.Cases( #day date ) RETURNS TABLE
WITH SCHEMABINDING
AS RETURN (
SELECT
CaseID -- other 42 columns ommitted
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY CaseID ORDER BY UpdateDate DESC, UpdateNumber DESC) AS RecordAge,
CaseID,
Action
FROM
dbo.CaseAudit
WHERE
convert(date,UpdateDate) <= #day
) AS History
WHERE
RecordAge = 1 -- only the most current record version
AND isnull(Action,'') != N'DEL' -- only include cases that have not been deleted
)
ALTER FUNCTION dbo.AwareDates( #day date ) RETURNS TABLE
WITH SCHEMABINDING
AS RETURN (
WITH
History AS (
SELECT row_number() OVER (PARTITION BY CaseID, ContactID ORDER BY UpdateDate DESC, UpdateNumber DESC) AS RecordAge,
CaseID, InfoReceived, ReceiveDate, ResetClock, Action
FROM dbo.ContactLogAudit WITH (NOLOCK)
WHERE convert(date,UpdateDate) <= #day
),
Notes AS (
SELECT
CaseID,
convert(date,ReceiveDate,112) AS ReceiveDate,
ResetClock
FROM History
WHERE RecordAge = 1 -- only the most current record version
AND isnull(Action,'') != N'DEL' -- only include notes that have not been deleted
AND InfoReceived = N'Y' -- only include notes that have Info Rec'd checked
AND len(ReceiveDate) = 8 AND isnumeric(ReceiveDate) = 1 AND isdate(ReceiveDate) = 1 -- only include those with a valid aware date
),
Initials AS (
SELECT CaseID, min(ReceiveDate) AS ReceiveDate
FROM Notes
GROUP BY CaseID
),
Resets AS (
SELECT CaseID, max(ReceiveDate) AS ReceiveDate
FROM Notes
WHERE ResetClock = N'Y'
GROUP BY CaseID
)
SELECT
i.CaseID AS CaseID,
i.ReceiveDate AS InitialAwareDate, -- the oldest valid aware date value (must have AE Info Reveived checked and a received date)
coalesce(r.ReceiveDate,i.ReceiveDate) AS AwareDate -- either the newest valid aware date value with the Reset Clock checked, otherwise the initial aware date value
FROM Initials AS i
LEFT JOIN Resets AS r
ON i.CaseID = r.CaseID
);
I have further found that if I drop the "WITH (NOLOCK)" table hint, I get correct results. Also if add a join hint to the AwareDates UTF or even add a COLLATE Latin1_General_BIN on the LEFT JOIN relation between Initials and Resets.
Query plan row counts -- without join hint (broken)
Cases { Actual: 25,891, Estimate: 19,071.9 }
AwareDates { Actual: 24,693, Estimated: 1,463.09 }
Initials { Actual: 24,693, Estimated: 1,463.09 }
Rests { Actual: 985, Estimated: 33.2671 }
AwareDates matches 8,108 of the Cases rows in the join'd result-set
Query plan row counts -- with join hint (working)
Cases { Actual: 25,891, Estimate: 19,071.9 }
AwareDates { Actual: 24,673, Estimated: 1,837.67 }
Initials { Actual: 24,673, Estimated: 1,837.67 }
Rests { Actual: 982, Estimated: 42.6238 }
AwareDates matches 24,673 of the Cases rows in the join'd result-set
I have further whittled down the scope of the issue. I can;
SELECT * FROM AwareDate(#date);
and
SELECT * FROM AwareDate(#date) ORDER BY CaseID;
With different row counts.
You don't specify the specific version of SQL (##version), but this seems suspiciously like a bug that was fixed in Cumulative Update 6 for SQL 2008 R2 (apparently it also applies to SQL 2008).
KB 2433265
FIX: You may receive an incorrect result when you run a query that uses the
ROW_NUMBER function together with a left outer join in SQL Server 2008
The example in the article specifies DISTINCT. The article, however, is worded ambiguously -- it's not clear whether you NEED a distinct or if DISTINCT is one of the triggers.
Your example doesn't have a distinct like the article, but it appears modified for the sake of asking the question(i.e. 42 columns missing). Is there a distinct? Also in the AwareDates udf by the time i get down to the Initials CTE you do a GROUP BY which could have the same effect as a DISTINCT.
UPDATE
#Dennis from your comment I still can't tell if you're using SQL 20080 or 2008 R2.
If you're running 2008, the KB article says "The fix for this issue was first released in Cumulative Update 11 for SQL Server 2008 Service Pack 1." So, post SP1.
On the other hand, if you're using SQL 2008 R2, you are correct that this was fixed in CU 6, which was part of SP1. But this bug appears to have resurfaced. Look at Cumulative update package 4 for SQL Server 2008 R2 Service Pack 1 -- released post SP1.
970198 FIX: You receive an incorrect result when you run a
query that uses the row_number function in SQL Server 2008
or in SQL Server 2008 R2
In the associated KB article MS dropped the reference to distinct:
Consider the following scenario. You run a query against a table that has a
clustered index in Microsoft SQL Server 2008 or in Microsoft SQL Server 2008
R2. In the query, you use the row_number function. In this scenario, you
receive an incorrect result when a parallel execution plan is used for the
query. If you run the query many times, you may receive different results.
This seems to confirm my earlier reading of KB 2433265 -- the phrasing suggests distinct is just one of many conditions that can cause the behavior. It seems that a parallel execution plan is the culprit this time around.