I'm fairly certain that adding parameter sniffing to table valued parameters is of little or no value however I was wondering if someone could confirm this?
(INT_LIST is a user defined table type which is a single column of type INT)
CREATE PROCEDURE [dbo].[TVPSniffTest](
#param1 varchar(50),
#idList INT_LIST readonly
)
AS
BEGIN
DECLARE #param1_sniff VARCHAR(50) = #param1 --this is worth doing
DECLARE #idList_sniff INT_LIST
INSERT INTO #idList_sniff SELECT value FROM #idList --will this help?
--query code here
END
As Jeroen already mentioned, there is no parameter sniffing issue with TVPs. And also that one option to mitigate the lack of statistics is to copy the TVP to a local temp table (which does maintain statistics).
But, another option that is sometimes more efficient is to do a statement-level recompile on any queries using the table variable (i.e. the TVP). The statistics won't be maintained across queries so it needs to be done on any query that involves the table variable that is not something like a simple SELECT.
The following illustrates this behavior:
DECLARE #TableVariable TABLE (Col1 INT NOT NULL);
INSERT INTO #TableVariable (Col1)
SELECT so.[object_id]
FROM [master].[sys].[objects] so;
-- Control-M to turn on "Include Actual Execution Plan".
-- For each of the 3 following queries, hover over the "Table Scan"
-- operator to see the "Estimated Number of Rows".
SELECT * FROM #TableVariable; -- Estimated Number of Rows = 1 (incorrect)
SELECT * FROM #TableVariable
OPTION (RECOMPILE); -- Estimated Number of Rows = 91 (correct)
SELECT * FROM #TableVariable; -- Estimated Number of Rows = 1 (back to incorrect)
This has no effect whatsoever -- in fact, it's detrimental to performance because you're copying the whole table first.
The optimizer maintains no statistics for either table-valued parameters or table variables. This can easily lead to bad query plans with cardinality mismatches; the solution for that is usually an intermediate temp table. In any case, parameter sniffing won't be an issue -- the table contents are never used to optimize the query plan.
Incidentally, while you can assign the parameter to a local variable to circumvent sniffing, a more flexible option is to use the OPTIMIZE FOR or RECOMPILE hints in queries that are particularly affected (or WITH RECOMPILE on the whole stored procedure, but that's a little more drastic). This prevents cluttering the procedure with copies of everything.
Related
I'm working on a system where I need to generate a six digit code for every user that signs up. So I'm using this statement (SELECT LEFT(CAST(RAND()*1000000000+999999 AS INT),6) for generating it. I have made that particular row UNIQUE. The thing is that this is all happening through a trigger. My question is, What happens if the number generated by this RAND() is already in use? Will the trigger be executed again as that particular is UNIQUE? or Do I need to write any condition in the trigger itself? If I need to write any condition, Please help me with it.
If the randomizer generates a value that has already been used, and stores it in a column that has a UNIQUE constraint, then the row will violate the constraint, and the INSERT and any other data changed by the trigger will be cancelled.
The trigger will not retry. A retry would need to be executed by your application code, after catching the error.
It would be far simpler to use a table's auto-increment mechanism to guarantee that values are not reused.
An example. Use with caution!!!
CREATE TRIGGER tr_bi_generate_pin
BEFORE INSERT
ON test
FOR EACH ROW
BEGIN
REPEAT
SET NEW.pin = CEIL(255 * RAND()); -- 255 is MAXVALUE for TINYINT UNSIGNED
SET NEW.iterations = NEW.iterations + 1;
UNTIL NOT EXISTS ( SELECT NULL
FROM test
WHERE pin = NEW.pin ) END REPEAT;
END
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=11c263a2eb07b8db133ae13a3d22e549
This code is relatively safe - it counts the amount of iterations, and if it reaches 256 the insertion will fail. But on real system, without such counting and with more wide datatype, the code may cause server hang because of too long, infinite-like, loop. So add maximal iterations amount checking - query fail is better than server hang.
I know this question has been discussed quite a lot here. But I have a particular case when I need to pass a list of parameters (comma - separated) which prevents me to have a local variable declared and used for input parameter.
As pointed out in the above discussion, it is suggested to declare a local variable and assign the parameters to this variable. However, what should I do in case my parameter is of type Text and can be comma - separated list?
For example -
CREATE DEFINER=`Admin`#`%` PROCEDURE `MyReport`(
p_myparameter_HK Text
)
BEGIN
SELECT
*
FROM MyTable
WHERE
(find_in_set(MyTable.column_HK, p_myparameter_HK) <> 0 OR MyTable.column_HK IS NULL)
;
END
Performance:
Query
If I just run the query - 300 ms
Stored Procedure
CALL MyReport('0000_abcd_fake_000')
This procedure keeps running endlessly.
My question is, how can I disable parameter sniffling and use local variable instead of find_in_set to match the query performance.
The times that I have needed to pass an arbitrary list of things to a Stored Procedure, I did it this way:
CREATE (or already have) a TABLE for passing the info in. Both the caller and the Procedure know the name of the procedure. (Or it could be passed in, but adds some messy "prepare-executes".)
Do a bulk INSERT into that table. (INSERT INTO tbl (a,b) VALUES (...), (..), ...;)
Perform JOINs or whatever to use the table efficiently.
In my case, the extra effort was worth it.
I have inherited a MySQL InnoDB table with around 500 million rows. The table has IP numbers and the name of the ISP to which that number belongs, both as strings.
Sometimes, I need to update the name of an ISP to a new value, after company changes such as mergers or rebranding. But, because the table is so big, a simple UPDATE...WHERE statement doesn't work - The query usually times out, or the box runs out of memory.
So, I have written a stored procedure which uses a cursor to try and make the change one record at a time. When I run the procedure on a small sample table, it works perfectly. But, when I try to run it against the whole 500 million row table in production, I can see a temporary table gets created (because a /tmp/xxx.MYI and /tmp/xxx.MYD file appear). The temporary table file keeps growing in size until it uses all available disk space on the box (around 40 GB).
I'm not sure why this temporary table is necessary. Is the server trying to maintain some kind of rollback state? My real question is, can I change the stored procedure such that the temporary table is not created? I don't really care if some, but not all of the records get updated - I can easily add some reporting and just keep running the proc until no records are altered.
At this time, architecture changes are not really an option – I can't change the structure of the table, for example.
Thanks in advance for any help.
David
This is my stored proc;
DELIMITER $$
DROP PROCEDURE IF EXISTS update_isp;
CREATE PROCEDURE update_isp()
BEGIN
DECLARE v_finished INT DEFAULT 0;
DECLARE v_num VARCHAR(255) DEFAULT "";
DECLARE v_isp VARCHAR(255) DEFAULT "";
DECLARE ip_cursor CURSOR FOR
SELECT ip_number, isp FROM ips;
DECLARE CONTINUE HANDLER
FOR NOT FOUND SET v_finished = 1;
OPEN ip_cursor;
get_ip: LOOP
IF v_finished = 1 THEN
LEAVE get_ip;
END IF;
FETCH ip_cursor INTO v_num, v_isp;
IF v_isp = 'old name' THEN
UPDATE ips SET isp = 'new name' WHERE ip_number = v_num;
END IF;
END LOOP get_ip;
CLOSE ip_cursor;
END$$
DELIMITER ;
CALL update_isp();
I have also tried wrapping the update statement in a transaction. It didn't make any difference.
[EDIT] My assumption below, that a simple counting procedure does not create a temporary table, was wrong. The temporary table is still created, but it grows more slowly and the box does not run out of disk space before the procedure completes.
So the problem seems to be that any use of a cursor in a stored procedure results in a temporary table being created. I have no idea why, or if there is any way to prevent this.
If your update is essentially:
UPDATE ips
SET isp = 'new name'
WHERE isp = OLDNAME;
I am guessing that this update -- without the cursor -- will work better if you have an index on isp(isp):
create index idx_isp_isp on isp(isp);
Your original query should be fine once this index is created. There should be no performance issue updating a single row even in a very large table. The issue is in all likelihood finding the row, not updating it.
I don't think there is a solution to this problem.
From this page; http://spec-zone.ru/mysql/5.7/restrictions_cursor-restrictions.html
In MySQL, a server-side cursor is materialized into an internal
temporary table. Initially, this is a MEMORY table, but is converted
to a MyISAM table when its size exceeds the minimum value of the
max_heap_table_size and tmp_table_size system variables.
I misunderstood how cursors work. I assumed that my cursor functioned as a pointer to the underlying table. But, it seems MySQL must build the full result set first, and then give you a pointer to that. So, I don't really understand the benefits of cursors in MySQL. Thanks to everyone who tried to help.
David
If the table has some numerical index also you can specify a
WHERE myindex > 123 AND myindex < 456
in your update query and do that for a couple of intevals (with a loop for example) until the whole table is covered.
(sorry, my rep is too low to ask in the comment section, so I'll just post my guess-answer here to be able to comment on)
You could try to fake a numerical index with
SELECT ROW_NUMBER() as n, thetable.* FROM thetable ORDER BY oneofyourcolumns;
and then try what I suggested above.
I'm extremely new to Views so please forgive me if this is a silly question, but I have a View that is really helpful in optimizing a pretty unwieldy query, and allows me to select against a small subset of columns in the View, however, I was hoping that the View would actually be stored somewhere so that selecting against it wouldn't take very long.
I may be mistaken, but I get the sense (from the speed with which create view executes and from the duration of my queries against my View) that the View is actually run as a query prior to the external query, every time I select against it.
I'm really hoping that I'm overlooking some mechanism whereby when I run CREATE VIEW it can do the hard work of querying the View query *then, so that my subsequent select against this static View would be really swift.
BTW, I totally understand that obviously this VIEW would be a snapshot of the data that existed at the time the VIEW was created and wouldn't reflect any new info that was inserted/updated subsequent to the VIEW's creation. That's actually EXACTLY what I need.
TIA
What you want to do is materialize your view. Have a look at http://www.fromdual.com/mysql-materialized-views.
What you're talking about are materialised views, a feature of (at least) DB2 but not MySQL as far as I know.
There are ways to emulate them by creating/populating a table periodically, or on demand, but a true materialised view knows when the underlying data has changed, and only recalculates if required.
If the data will never change once the view is created (as you seem to indicate in a comment), just create a brand new table to hold the subset of data and query that. People always complain about slow speed but rarely about data storage requirements :-)
You can do this with:
A MySQL Event
A separate table (for caching)
The REPLACE INTO ... SELECT statement.
Here's a working example.
-- create dummy data for testing
CREATE TABLE MyTable (
id INT NOT NULL,
groupvar INT NOT NULL,
myvar INT
);
INSERT INTO MyTable VALUES
(1,1,1),
(2,1,1),
(3,2,1);
-- create the view, making sure rows have a unique identifier (groupvar)
CREATE VIEW MyView AS
SELECT groupvar, SUM(myvar) as myvar_sum
FROM MyTable
GROUP BY groupvar;
-- create cache table, setting primary key to unique identifier (groupvar)
CREATE TABLE MyView_Cache (PRIMARY KEY (groupvar))
SELECT *
FROM MyView;
-- create a table to keep track of when the cache has been updated (optional)
CREATE TABLE MyView_Cache_updated (update_id INT NOT NULL AUTO_INCREMENT, PRIMARY KEY (update_id));
-- create event to update cache table (e.g., daily)
DELIMITER |
CREATE EVENT MyView_Cache_Event
ON SCHEDULE EVERY 1 DAY STARTS CURRENT_TIMESTAMP + INTERVAL 1 HOUR
DO
BEGIN
REPLACE INTO MyView_Cache
SELECT *
FROM MyView_Cache;
INSERT INTO MyView_Cache_updated
SELECT NULL, NOW() AS last_updated;
END |
DELIMITER ;
You can now query MyView_Cache for faster response times, and query MyView_Cache_updated to inform users of the last time the cache was updated (in this example, daily).
Since a view is basically a SELECT statement you can use query cache to improve performance.
But first you should check if :
you can add indexes in the tables involved to speed up the query (use EXPLAIN)
the data isn't changing very often you can materialize the view (make snapshots)
Use a materiallised view.. It can store data like count sum etc but yes after updating the table you need to refresh the view to get correct results as they are not auto updated.. Moreover after querying from view the results are stored in cache so the memory cycles reduces to 2 which are 4 in case of querying from the table itself. So it gets efficient from the second time.. When you query for 1st time from view the data is fetched from main memory and is stored in cache after it.
I'm referencig name, description and user_id columns of meta table. Twice, and maybe more (who knows?) in future. Those columns are used to compute the ETag of my meta resource.
Adding one column that contributes to compute ETag in the future will force me to change the code N times, and this is bad.
Is there any way to make it DRY and store these column names elsewhere? Because I'd like to use these column names also when INSERT on meta is performed.
IF only = true THEN
-- Calculate ETag on meta fields only
UPDATE meta
SET etag = etag(CONCAT(name, description, user_id))
WHERE id = meta_id;
ELSE
-- Calculate Etag on meta fields and meta customers
BEGIN
DECLARE c_etags VARCHAR(32);
-- Compute c_etags
UPDATE meta
SET etag = etag(CONCAT(etag(name, description, user_id), c_etags))
WHERE id = meta_id;
END;
END IF;
Disclaimer: this code is untested, I'm pretty new to MySQL stuff, apart for simple statements.
EDIT: etag is MD5 MySQL function. Maybe this is one option:
CREATE PROCEDURE set_meta_etag(IN meta_id INT, IN related TEXT)
NOT DETERMINISTIC
BEGIN
UPDATE meta
SET etag = etag(CONCAT(name, description, user_id,
IF(related IS NOT NULL, related, '')))
WHERE id = meta_id;
END //
-- First call
CALL set_meta_etag(meta_id, NULL);
-- Second call
CALL set_meta_etag(meta_id, c_etags);
But it won't work for INSERT statement.
The obvious thing (foreach column, if it's the one I want, use it to help make the etag) doesn't work in SQL with any ease, because SQL doesn't, historically, contemplate column names stored in variables.
You could write a program in your favorite non-SQL programming language (Java, PHP, etc) to create and then define your procedure.
You could also use so-called "dynamic sql" to do this, if you were willing to do the work and take the slight performance hit. See
How To have Dynamic SQL in MySQL Stored Procedure
for information on how to PREPARE and EXECUTE statements in a stored procedure.
By the way, I have had good success building systems that have various kind of metadata stored in the column contents. For example, you could write code looking for the string '[etag]' in your column contents. The comments for columns are stored in
information_schema.COLUMNS.COLUMN_COMMENT
and are very easy to process when your program is starting up.
If you know this is confined to one table, you could add a trigger. Using an AFTER trigger should allow your stored proc to work for both INSERT and UPDATE. See MySQL Fire Trigger for both Insert and Update.