I've got a user-defined function to split lists of integers into a table of values. I'm using this to parse input to select a set of records for a given set of types or statuses.
This works:
select * from RequestStatus
where RequestStatusUID in (select [value] from dbo.SplitIDs('1,2,3', ','))
This does not:
select * from Requests
where RequestStatusUID in (select [value] from dbo.SplitIDs('1,2,3', ','))
The Requests query returns the error "Conversion failed when converting the varchar value '1,2,3' to data type int." RequestStatusUID on both tables are int columns. Both Explain plans look the same to me. The function is working perfectly the same way in unrelated queries. So far as I can tell it's only the Requests table that has the problem.
CREATE TABLE [dbo].[Requests] (
[RequestUID] int IDENTITY(1,1) NOT NULL,
[UserUID] int NOT NULL,
[LocationUID] int NOT NULL,
[DateOpened] date NULL,
[DateClosed] date NULL,
[RequestStatusUID] int NOT NULL,
[DiscussionUID] int NULL,
[RequestTypeUID] int NOT NULL,
[RequestNo] varchar(16) NOT NULL,
[LastUpdateUID] int NOT NULL,
[LastUpdated] date NOT NULL,
CONSTRAINT [PK_Requests] PRIMARY KEY NONCLUSTERED([RequestUID])
It does work if I use a different function that returns varchars and I convert the RequestStatusUID column to a varchar as well:
select * from Requests
where cast(RequestStatusUID as varchar(4)) in (select [value] from dbo.Split('1,2,3', ','))
For reference, the SplitIDs function I'm using (a modified version of Arnold Fribble's solution). The Split function is the same without the cast as int at the end:
ALTER FUNCTION [dbo].[SplitIDs] ( #str VARCHAR(MAX), #delim char(1)=',' )
RETURNS TABLE
AS
RETURN
(
with cte as (
select 0 a, 1 b
union all
select b, cast(charindex(#delim, #str, b) + 1 as int)
from cte
where b > a
)
select cast(substring(#str,a,
case when b > 1 then b-a-1 else len(#str) - a + 1 end) as int) [value]
from cte where a >0
)
I can use the convert-to-strings solution but I'd really like to know why this is failing in the first place.
I think you will find that this syntax performs a lot better:
SELECT r.* FROM dbo.Requests AS r
INNER JOIN dbo.SplitIDs('1,2,3', ',') AS s
ON r.RequestStatusUID = s.value;
The predicate still has a bunch of implicit converts, due to your function choice, but the join eliminates an expensive table spool. You may see even slightly better performance if you use a proper column list, limited to the actual columns you need, instead of using SELECT *. You should change this even if you do need all of the columns.
Your IN () query, with an expensive table spool (click to enlarge):
My JOIN version, where the cost is transferred to the scan you're doing anyway (click to enlarge):
And here are runtime metrics (based on a small number of rows of course) - (click to enlarge):
The conversion errors seem to be stemming from the function. So I substituted my own (below). Even after adding the foreign key we didn't initially know about, I was unable to reproduce the error. I am not sure exactly what the problem is with the original function, but all those implicit converts it creates seem to cause an issue to the optimizer at some point. So I suggest this one instead:
CREATE FUNCTION dbo.SplitInts
(
#List VARCHAR(MAX),
#Delimiter VARCHAR(255) = ','
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT [value] = y.i.value('(./text())[1]', 'int')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
So, seems to me like you want to get rid of the function.
Also, here is one way to use a join and still make the param optional:
DECLARE #param VARCHAR(MAX) = NULL;-- also try = '1,2,3';
SELECT r.*
FROM dbo.Requests AS r
LEFT OUTER JOIN dbo.SplitInts(#param, default) AS s
ON r.RequestStatusUID = s.value
WHERE (r.RequestStatusUID = s.value OR #param IS NULL);
Related
We faced with unpredictable duration of executing our code. The same query can take less than a second, or more than a minute. It looks like our attempt to use MySQL JSON_ functions probably was not a good idea.
select p.pk,
p.name,
... /* fields from primary table */
(SELECT JSON_ARRAYAGG(CODE)
FROM (SELECT region.region_pk AS CODE
FROM program_region region
WHERE region.program_pk = p.pk) AS a)
AS region_codes,
(SELECT JSON_ARRAYAGG(CODE)
FROM (SELECT affiliate.affiliate_pk AS CODE
FROM program_affiliate affiliate
WHERE affiliate.program_pk = p.pk) AS a) AS affiliate_codes,
(SELECT JSON_ARRAYAGG(user_id)
FROM (SELECT DISTINCT user_id
FROM (SELECT role.user_id
FROM program_role role
WHERE role.program_pk = p.pk
UNION ALL
SELECT p.created_by) AS us) AS a) AS user_ids
from program p;
Structure:
program:
pk bigint,
name varchar ...
program_affiliate and other child tables
pk bigint,
program_pk bigint /* FK with index */
code varchar(20)
...
We checked another queries works as usual but only new ones has these issues.
Does anybody else faced with performance issues with JSON functions before? Any recommendations?
I have that error with execute "The multi-part identifier "od.Ordernumber" could not be bounds"
"The multi-part identifier "od.Location_code" could not be bounds"
create function Mbse.udf_ordertotal
(#Numberoforder int , #loction_code int )
returns int
as
begin
declare #amount as int
set #amount=(select sum(od.amount) from Mbse.OrderDetails as od
where (#Numberoforder=od.Ordernumber and #loction_code=od.Location_code)
)
return #amount
end
alter table Mbse.orders
add amount as Mbse.udf_ordertotal(Mbse.OrderDetails.Ordernumber , Mbse.OrderDetails.location_code)
i expect solve for this problem please
Like Jeff said in the comments, your computed column using a user-defined function to aggregate the Mbse.OrderDetails table is not a good idea for multiple reasons. It'll be heavy, will process RBAR (row by agonizing row) aka once per every row, and will prevent parallelism for any queries that reference that function or your Mbse.orders table directly or even indirectly.
You'd be better off with proper indexing on your OrderDetails table and a view that joins it to your Orders table like so:
-- Columnstore indexes are typically very quick for aggregative queries
CREATE NONCLUSTERED COLUMNSTORE INDEX IX_OrderDetails_Amount ON Mbse.OrderDetails (Ordernumber, Location_code, amount);
CREATE VIEW Mbse.OrdersWithTotals
AS
WITH _OrderDetailsTotals AS
(
SELECT
Ordernumber,
Location_code,
SUM(amount) AS TotalAmount
FROM Msbe.OrderDetails
GROUP BY
Ordernumber,
Location_code
)
SELECT
O.Ordernumber,
O.location_code,
ODT.TotalAmount
FROM Mbse.orders AS O
LEFT JOIN _OrderDetailsTotals AS ODT
ON O.Ordernumber = ODT.Ordernumber
AND O.location_code = ODT.Location_code;
I am facing a challenge while filtering records in a SQL Server 2017 table which has a VARCHAR column having JSON type values:
Sample table rows with JSON column values:
Row # 1. {"Department":["QA"]}
Row # 2. {"Department":["DEV","QA"]}
Row # 3. {"Group":["Group 2","Group 12"],"Cluster":[Cluster 11"],"Vertical":
["XYZ"],"Department":["QAT"]}
Row # 4. {"Group":["Group 20"],"Cluster":[Cluster 11"],"Vertical":["XYZ"],"Department":["QAT"]}
Now I need to filter records from this table based on an input parameter which can be in the following format:
Sample JSON input parameter to query:
1. `'{"Department":["QA"]}'` -> This should return Row # 1 as well as Row # 2.
2. `'{"Group":["Group 2"]}'` -> This should return only Row # 3.
So the search should be like if the column value contains "any available json tag with any matching value" then return those matching records.
Note - This is exactly similar to PostgreSQL jsonb as shown below:
PostgreSQL filter clause:
TableName.JSONColumnName #> '{"Department":["QA"]}'::jsonb
By researching on internet I found OPENJSON capability that is available in SQL Server which works as below.
OPENJSON sample example:
SELECT * FROM
tbl_Name UA
CROSS APPLY OPENJSON(UA.JSONColumnTags)
WITH ([Department] NVARCHAR(500) '$.Department', [Market] NVARCHAR(300) '$.Market', [Group] NVARCHAR(300) '$.Group'
) AS OT
WHERE
OT.Department in ('X','Y','Z')
and OT.Market in ('A','B','C')
But the problem with this approach is that if in future there is a need to support any new tag in JSON (like 'Area'), that will also need to be added to every stored procedure where this logic is implemented.
Is there any existing SQL Server 2017 capability I am missing or any dynamic way to implement the same?
Only thing I could think of as an option when using OPENJSON would be break down your search string into its key value pair, break down your table that is storing the json you want to search into its key value pair and join.
There would be limitations to be aware of:
This solution would not work with nested arrays in your json
The search would be OR not AND. Meaning if I passed in multiple "Department" I was searching for, like '{"Department":["QA", "DEV"]}', it would return the rows with either of the values, not those that only contained both.
Here's a working example:
DECLARE #TestData TABLE
(
[TestData] NVARCHAR(MAX)
);
--Load Test Data
INSERT INTO #TestData (
[TestData]
)
VALUES ( '{"Department":["QA"]}' )
, ( '{"Department":["DEV","QA"]}' )
, ( '{"Group":["Group 2","Group 12"],"Cluster":["Cluster 11"],"Vertical": ["XYZ"],"Department":["QAT"]}' )
, ( '{"Group":["Group 20"],"Cluster":["Cluster 11"],"Vertical":["XYZ"],"Department":["QAT"]}' );
--Here is the value we are searching for
DECLARE #SeachJson NVARCHAR(MAX) = '{"Department":["QA"]}';
DECLARE #SearchJson TABLE
(
[Key] NVARCHAR(MAX)
, [Value] NVARCHAR(MAX)
);
--Load the search value into a temp table as its key\value pair.
INSERT INTO #SearchJson (
[Key]
, [Value]
)
SELECT [a].[Key]
, [b].[Value]
FROM OPENJSON(#SeachJson) [a]
CROSS APPLY OPENJSON([a].[Value]) [b];
--Break down TestData into its key\value pair and then join back to the search table.
SELECT [TestData].[TestData]
FROM (
SELECT [a].[TestData]
, [b].[Key]
, [c].[Value]
FROM #TestData [a]
CROSS APPLY OPENJSON([a].[TestData]) [b]
CROSS APPLY OPENJSON([b].[Value]) [c]
) AS [TestData]
INNER JOIN #SearchJson [srch]
ON [srch].[Key] COLLATE DATABASE_DEFAULT = [TestData].[Key]
AND [srch].[Value] = [TestData].[Value];
Which gives you the following results:
TestData
-----------------------------
{"Department":["QA"]}
{"Department":["DEV","QA"]}
I'm far from a MYSQL expert, and I'm struggling with a relatively complicated query.
I have two tables:
A Data table with columns as follows:
`Location` bigint(20) unsigned NOT NULL,
`Source` bigint(20) unsigned NOT NULL,
`Param` bigint(20) unsigned NOT NULL,
`Type` bigint(20) unsigned NOT NULL,
`InitTime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`ValidTime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`Value` double DEFAULT NULL
A Location Group table with columns as follows:
`Group` bigint(20) unsigned NOT NULL,
`Location` bigint(20) unsigned NOT NULL,
The data table stores data of interest, where each 'value' is valid for a particular 'validtime'. However, the data in the table comes from a calculation which is run periodically. The initialisation time at which the calculation is run is stored in the 'inittime' field. A given calculation with particular inittime may result in, say 10 values being output with valid times (A - J). A more recent calculation, with a more recent inittime, may result in another 10 values being output with valid times (B - K). There is therefore an overlap in available values. I always want a result set of Values and ValidTimes for the most recent inittime (i.e. max(inittime)).
I can determine the most recent inittime using the following query:
SELECT MAX(InitTime)
FROM Data
WHERE
Location = 100060 AND
Source = 10 AND
Param = 1 AND
Type = 1;
This takes 0.072 secs to execute.
However, using this as a sub-query to retrieve data from the Data table results in an execution time of 45 seconds (it's a pretty huge table, but not super ridiculous).
Sub-Query:
SELECT Location, ValidTime, Value
FROM Data data
WHERE Source = 10
AND Location IN (SELECT Location FROM Location Group WHERE Group = 3)
AND InitTime = (SELECT max(data2.InitTime) FROM Data data2 WHERE data.Location = data2.Location AND data.Source = data2.Source AND data.Param = data2.Param AND data.Type = data2.Type)
ORDER BY Location, ValidTime ASC;
(Snipped ValidTime qualifiers for brevity)
I know there's likely some optimisation that would help here, but I'm not sure where to start. Instead, I created a stored procedure to effectively perform the MAX(InitTime) query, but because the MAX(InitTime) is determined by a combo of Location, Source, Param and Type, I need to pass in all the locations that comprise a particular group. I implemented a cursors-based stored procedure for this before realising there must be an easier way.
Putting aside the question of optimisation via indices, how could I efficiently perform a query on the data table using the most recent InitTime for a given location group, source, param and type?
Thanks in advance!
MySQL can do a poor job optimizing IN with a subquery (sometimes). Also, indexes might be able to help. So, I would write the query as:
SELECT d.Location, d.ValidTime, d.Value
FROM Data d
WHERE d.Source = 10 AND
EXISTS (SELECT 1 FROM LocationGroup lg WHERE d.Location = lg.Location and lg.Group = 3) AND
d.InitTime = (SELECT max(d2.InitTime)
FROM Data d2
WHERE d.Location = d2.Location AND
d.Source = d2.Source AND
d.Param = d2.Param AND
d.Type = d2.Type
)
ORDER BY d.Location, d.ValidTime ASC;
For this query, you want indexes on data(Location, Source, Param, Type, InitTime) and LocationGroup(Location, Group), and data(Source, Location, ValidTime).
I have the following table:
CREATE TABLE `relations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`relationcode` varchar(25) DEFAULT NULL,
`email_address` varchar(100) DEFAULT NULL,
`firstname` varchar(100) DEFAULT NULL,
`latname` varchar(100) DEFAULT NULL,
`last_contact_date` varchar(25) DEFAULT NULL,
PRIMARY KEY (`id`)
)
In this table there are duplicates, these are relation with exact the same relationcode and email_address. They can be in there twice or even 10 times.
I need a query that selects the id's of all records, but excludes the ones that are in there more than once. Of those records, I only would like to select the record with the most recent last_contact_id only.
I'm more into Oracle than Mysql, In Oracle I would be able to do it this way:
select * from (
select row_number () over (partition by relationcode order by to_date(last_contact_date,'dd-mm-yyyy')) rank,
id,
relationcode,
email_address ,
last_contact_date
from RELATIONS)
where rank = 1
But I can't figure out how to modify this query to work in MySql. I'm not even dure it's possible to do the same thing in a single query in MySQl.
Any ideas?
Normal way to do this is a sub query to get the latest record and then join that against the table:-
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM RELATIONS
INNER JOIN
(
SELECT relationcode, email_address, MAX(last_contact_date) AS latest_contact_date
FROM RELATIONS
GROUP BY relationcode, email_address
) Sub1
ON RELATIONS.relationcode = Sub1.relationcode
AND RELATIONS.email_address = Sub1.email_address
AND RELATIONS.last_contact_date = Sub1.latest_contact_date
It is possible to manually generate the kind of rank that your Oracle query uses using variables. Bit messy though!
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM
(
SELECT id, relationcode, email_address, firstname, latname, last_contact_date, #seq:=IF(#relationcode = relationcode AND #email_address = email_address, #seq + 1, 1) AS seq, #relationcode := relationcode, #email_address := email_address
(
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM RELATIONS
CROSS JOIN (SELECT #seq:=0, #relationcode := '', #email_address :='') Sub1
ORDER BY relationcode, email_address, last_contact_date DESC
) Sub2
) Sub3
WHERE seq = 1
This uses a sub query to initialise the variables. The sequence number is added to if the relation code and email address are the same as the previous row, if not they are reset to 1 and stored in a field. Then the outer select check the sequence number (as a field, not as the variable name) and records only returned if it is 1.
Note that I have done this as multiple sub queries. Partly to make it clearer to you, but also to try to force the order that MySQL executes it is. There are a couple of possible issues with how MySQL says it may order the execution of things that could cause an issue. They never have done for me, but with sub queries I would hope for force the order.
Here is a method that will work in both MySQL and Oracle. It rephrases the question as: Get me all rows from relations where the relationcode has no larger last_contact_date.
It works something like this:
select r.*
from relations r
where not exists (select 1
from relations r2
where r2.relationcode = r.relationcode and
r2.last_contact_date > r.last_contact_date
);
With the appropriate indexes, this should be pretty efficient in both databases.
Note: This assumes that last_contact_date is stored as a date not as a string (as in your table example). Storing dates as strings is just a really bad idea and you should fix your data structure