We faced with unpredictable duration of executing our code. The same query can take less than a second, or more than a minute. It looks like our attempt to use MySQL JSON_ functions probably was not a good idea.
select p.pk,
p.name,
... /* fields from primary table */
(SELECT JSON_ARRAYAGG(CODE)
FROM (SELECT region.region_pk AS CODE
FROM program_region region
WHERE region.program_pk = p.pk) AS a)
AS region_codes,
(SELECT JSON_ARRAYAGG(CODE)
FROM (SELECT affiliate.affiliate_pk AS CODE
FROM program_affiliate affiliate
WHERE affiliate.program_pk = p.pk) AS a) AS affiliate_codes,
(SELECT JSON_ARRAYAGG(user_id)
FROM (SELECT DISTINCT user_id
FROM (SELECT role.user_id
FROM program_role role
WHERE role.program_pk = p.pk
UNION ALL
SELECT p.created_by) AS us) AS a) AS user_ids
from program p;
Structure:
program:
pk bigint,
name varchar ...
program_affiliate and other child tables
pk bigint,
program_pk bigint /* FK with index */
code varchar(20)
...
We checked another queries works as usual but only new ones has these issues.
Does anybody else faced with performance issues with JSON functions before? Any recommendations?
Related
I have that error with execute "The multi-part identifier "od.Ordernumber" could not be bounds"
"The multi-part identifier "od.Location_code" could not be bounds"
create function Mbse.udf_ordertotal
(#Numberoforder int , #loction_code int )
returns int
as
begin
declare #amount as int
set #amount=(select sum(od.amount) from Mbse.OrderDetails as od
where (#Numberoforder=od.Ordernumber and #loction_code=od.Location_code)
)
return #amount
end
alter table Mbse.orders
add amount as Mbse.udf_ordertotal(Mbse.OrderDetails.Ordernumber , Mbse.OrderDetails.location_code)
i expect solve for this problem please
Like Jeff said in the comments, your computed column using a user-defined function to aggregate the Mbse.OrderDetails table is not a good idea for multiple reasons. It'll be heavy, will process RBAR (row by agonizing row) aka once per every row, and will prevent parallelism for any queries that reference that function or your Mbse.orders table directly or even indirectly.
You'd be better off with proper indexing on your OrderDetails table and a view that joins it to your Orders table like so:
-- Columnstore indexes are typically very quick for aggregative queries
CREATE NONCLUSTERED COLUMNSTORE INDEX IX_OrderDetails_Amount ON Mbse.OrderDetails (Ordernumber, Location_code, amount);
CREATE VIEW Mbse.OrdersWithTotals
AS
WITH _OrderDetailsTotals AS
(
SELECT
Ordernumber,
Location_code,
SUM(amount) AS TotalAmount
FROM Msbe.OrderDetails
GROUP BY
Ordernumber,
Location_code
)
SELECT
O.Ordernumber,
O.location_code,
ODT.TotalAmount
FROM Mbse.orders AS O
LEFT JOIN _OrderDetailsTotals AS ODT
ON O.Ordernumber = ODT.Ordernumber
AND O.location_code = ODT.Location_code;
I have the following table:
CREATE TABLE `relations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`relationcode` varchar(25) DEFAULT NULL,
`email_address` varchar(100) DEFAULT NULL,
`firstname` varchar(100) DEFAULT NULL,
`latname` varchar(100) DEFAULT NULL,
`last_contact_date` varchar(25) DEFAULT NULL,
PRIMARY KEY (`id`)
)
In this table there are duplicates, these are relation with exact the same relationcode and email_address. They can be in there twice or even 10 times.
I need a query that selects the id's of all records, but excludes the ones that are in there more than once. Of those records, I only would like to select the record with the most recent last_contact_id only.
I'm more into Oracle than Mysql, In Oracle I would be able to do it this way:
select * from (
select row_number () over (partition by relationcode order by to_date(last_contact_date,'dd-mm-yyyy')) rank,
id,
relationcode,
email_address ,
last_contact_date
from RELATIONS)
where rank = 1
But I can't figure out how to modify this query to work in MySql. I'm not even dure it's possible to do the same thing in a single query in MySQl.
Any ideas?
Normal way to do this is a sub query to get the latest record and then join that against the table:-
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM RELATIONS
INNER JOIN
(
SELECT relationcode, email_address, MAX(last_contact_date) AS latest_contact_date
FROM RELATIONS
GROUP BY relationcode, email_address
) Sub1
ON RELATIONS.relationcode = Sub1.relationcode
AND RELATIONS.email_address = Sub1.email_address
AND RELATIONS.last_contact_date = Sub1.latest_contact_date
It is possible to manually generate the kind of rank that your Oracle query uses using variables. Bit messy though!
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM
(
SELECT id, relationcode, email_address, firstname, latname, last_contact_date, #seq:=IF(#relationcode = relationcode AND #email_address = email_address, #seq + 1, 1) AS seq, #relationcode := relationcode, #email_address := email_address
(
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM RELATIONS
CROSS JOIN (SELECT #seq:=0, #relationcode := '', #email_address :='') Sub1
ORDER BY relationcode, email_address, last_contact_date DESC
) Sub2
) Sub3
WHERE seq = 1
This uses a sub query to initialise the variables. The sequence number is added to if the relation code and email address are the same as the previous row, if not they are reset to 1 and stored in a field. Then the outer select check the sequence number (as a field, not as the variable name) and records only returned if it is 1.
Note that I have done this as multiple sub queries. Partly to make it clearer to you, but also to try to force the order that MySQL executes it is. There are a couple of possible issues with how MySQL says it may order the execution of things that could cause an issue. They never have done for me, but with sub queries I would hope for force the order.
Here is a method that will work in both MySQL and Oracle. It rephrases the question as: Get me all rows from relations where the relationcode has no larger last_contact_date.
It works something like this:
select r.*
from relations r
where not exists (select 1
from relations r2
where r2.relationcode = r.relationcode and
r2.last_contact_date > r.last_contact_date
);
With the appropriate indexes, this should be pretty efficient in both databases.
Note: This assumes that last_contact_date is stored as a date not as a string (as in your table example). Storing dates as strings is just a really bad idea and you should fix your data structure
I've got a user-defined function to split lists of integers into a table of values. I'm using this to parse input to select a set of records for a given set of types or statuses.
This works:
select * from RequestStatus
where RequestStatusUID in (select [value] from dbo.SplitIDs('1,2,3', ','))
This does not:
select * from Requests
where RequestStatusUID in (select [value] from dbo.SplitIDs('1,2,3', ','))
The Requests query returns the error "Conversion failed when converting the varchar value '1,2,3' to data type int." RequestStatusUID on both tables are int columns. Both Explain plans look the same to me. The function is working perfectly the same way in unrelated queries. So far as I can tell it's only the Requests table that has the problem.
CREATE TABLE [dbo].[Requests] (
[RequestUID] int IDENTITY(1,1) NOT NULL,
[UserUID] int NOT NULL,
[LocationUID] int NOT NULL,
[DateOpened] date NULL,
[DateClosed] date NULL,
[RequestStatusUID] int NOT NULL,
[DiscussionUID] int NULL,
[RequestTypeUID] int NOT NULL,
[RequestNo] varchar(16) NOT NULL,
[LastUpdateUID] int NOT NULL,
[LastUpdated] date NOT NULL,
CONSTRAINT [PK_Requests] PRIMARY KEY NONCLUSTERED([RequestUID])
It does work if I use a different function that returns varchars and I convert the RequestStatusUID column to a varchar as well:
select * from Requests
where cast(RequestStatusUID as varchar(4)) in (select [value] from dbo.Split('1,2,3', ','))
For reference, the SplitIDs function I'm using (a modified version of Arnold Fribble's solution). The Split function is the same without the cast as int at the end:
ALTER FUNCTION [dbo].[SplitIDs] ( #str VARCHAR(MAX), #delim char(1)=',' )
RETURNS TABLE
AS
RETURN
(
with cte as (
select 0 a, 1 b
union all
select b, cast(charindex(#delim, #str, b) + 1 as int)
from cte
where b > a
)
select cast(substring(#str,a,
case when b > 1 then b-a-1 else len(#str) - a + 1 end) as int) [value]
from cte where a >0
)
I can use the convert-to-strings solution but I'd really like to know why this is failing in the first place.
I think you will find that this syntax performs a lot better:
SELECT r.* FROM dbo.Requests AS r
INNER JOIN dbo.SplitIDs('1,2,3', ',') AS s
ON r.RequestStatusUID = s.value;
The predicate still has a bunch of implicit converts, due to your function choice, but the join eliminates an expensive table spool. You may see even slightly better performance if you use a proper column list, limited to the actual columns you need, instead of using SELECT *. You should change this even if you do need all of the columns.
Your IN () query, with an expensive table spool (click to enlarge):
My JOIN version, where the cost is transferred to the scan you're doing anyway (click to enlarge):
And here are runtime metrics (based on a small number of rows of course) - (click to enlarge):
The conversion errors seem to be stemming from the function. So I substituted my own (below). Even after adding the foreign key we didn't initially know about, I was unable to reproduce the error. I am not sure exactly what the problem is with the original function, but all those implicit converts it creates seem to cause an issue to the optimizer at some point. So I suggest this one instead:
CREATE FUNCTION dbo.SplitInts
(
#List VARCHAR(MAX),
#Delimiter VARCHAR(255) = ','
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT [value] = y.i.value('(./text())[1]', 'int')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
So, seems to me like you want to get rid of the function.
Also, here is one way to use a join and still make the param optional:
DECLARE #param VARCHAR(MAX) = NULL;-- also try = '1,2,3';
SELECT r.*
FROM dbo.Requests AS r
LEFT OUTER JOIN dbo.SplitInts(#param, default) AS s
ON r.RequestStatusUID = s.value
WHERE (r.RequestStatusUID = s.value OR #param IS NULL);
I have a table defined like the following...
CREATE table actions (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
end BOOLEAN,
type VARCHAR(15) NOT NULL,
subtype_a VARCHAR(15),
subtype_b VARCHAR(15),
);
I'm trying to query for the last end action of some type to happen on each unique (subtype_a, subtype_b) pair, similar to a group by (except SQLite doesn't say what row is guaranteed to be returned by a group by).
On an SQLite database of about 1MB, the query I have now can take upwards of two seconds, but I need to speed it up to take under a second (since this will be called frequently).
example query:
SELECT * FROM actions a_out
WHERE id =
(SELECT MAX(a_in.id) FROM actions a_in
WHERE a_out.subtype_a = a_in.subtype_a
AND a_out.subtype_b = a_in.subtype_b
AND a_in.status IS NOT NULL
AND a_in.type = "some_type");
If it helps, I know all the unique possibilities for a (subtype_a,subtype_b)
eg:
(a,1)
(a,2)
(b,3)
(b,4)
(b,5)
(b,6)
Beginning with version 3.7.11, SQLite guarantees which record is returned in a group:
Queries of the form: "SELECT max(x), y FROM table" returns the value of y on the same row that contains the maximum x value.
So greatest-n-per-group can be implemented in a much simpler way:
SELECT *, max(id)
FROM actions
WHERE type = 'some_type'
GROUP BY subtype_a, subtype_b
Is this any faster?
select * from actions where id in (select max(id) from actions where type="some_type" group by subtype_a, subtype_b);
This is the greatest-in-per-group problem that comes up frequently on StackOverflow.
Here's how I solve it:
SELECT a_out.* FROM actions a_out
LEFT OUTER JOIN actions a_in ON a_out.subtype_a = a_in.subtype_a
AND a_out.subtype_b = a_in.subtype_b
AND a_out.id < a_in.id
WHERE a_out.type = "some type" AND a_in.id IS NULL
If you have an index on (type, subtype_a, subtype_b, id) this should run very fast.
See also my answers to similar SQL questions:
Fetch the row which has the Max value for a column
Retrieving the last record in each group
SQL join: selecting the last records in a one-to-many relationship
Or this brilliant article by Jan Kneschke: Groupwise Max.
Is there a method to give a shorthand notation to a query?
ex.
Q1 = (select * from tablename2)
Q2 = (select * from tablename2)
select name from Q1;
select name from Q2;
I am aware of views but I do not intend to use them.
Yes. Create a view.
CREATE VIEW Q1 AS (
SELECT
name,
id,
othercol
FROM tablename1
);
/* Works with a WHERE clause too */
CREATE VIEW Q2 AS (
SELECT
name,
id,
othercol
FROM tablename2
WHERE othercol = 'some limitation'
);
SELECT name FROM Q1;
/* aggregates work too */
SELECT name, COUNT(*) AS numrows FROM Q2 GROUP BY name;
Note: It is not recommended to SELECT * in a view (or really anywhere in production code). Always be explicit about the columns in the select list so their order will be deterministic.
Alternatively, create a temporary table using the CREATE TEMPORARY TABLE ... SELECT syntax.
CREATE TEMPORARY TABLE Q1
SELECT name, id, othercol FROM tablename1;
/* select from it */
SELECT name FROM Q1 WHERE id IN (1,2,3,4,5)
/* When done, drop it. Otherwise, it will be dropped when the client connection terminates. */
DROP TABLE Q1;
I think you're talking about views. check out the docs for them.
A view is similar to what you're looking for.
But a "view" is less a notational convenience than an "alias" or "facade" for other queries or updates. A good metaphore is "a view is a virtual table". In MSSQL, "views" can also be an effective way to enhance security.
Here is a good article on the subject:
http://msdn.microsoft.com/en-us/library/aa214068%28v=sql.80%29.aspx