TSQL verify sort order / UNION ALL - sql-server-2008

CREATE PROCEDURE Test
AS
BEGIN
SELECT * FROM (
SELECT 1 AS a,'test1' as b, 'query1' as c
UNION ALL
SELECT 2 AS a,'test22' as b, 'query22' as c
UNION ALL
SELECT 2 AS a,'test2' as b, 'query2' as c
UNION ALL
SELECT 3 AS a,'test3' as b, 'query3' as c
UNION ALL
SELECT 4 AS a,'test4' as b, 'query4' as c
) As sample
FOR XML RAW
END
Can we guarantee that the stored procedure returns results in given order?
Normally it says when we insert these select query to temporary table we can't guarantee its inserting order. So we have to use order by clause. But most of time it gives same order. Can we enforce to give it some different order? Is this related with clustered and non clustered indices.
In second case can we enforce inserting order by adding Identity column?

When you insert data, SQL refers to it as a set. When even writing data to disc it tries to take minimum space and starts inserting rows in free pages it finds in non-uniform extents at first. So when you query data the result depends on the order of the information which is in the cash and the order of the information which is read from hard disc. I think it is almost impossible to predict that orders as it depends on the work of OS , other programs and so on.

Related

MariaDB subquery use whole row

Usually subqueries compare single or multiple fields and delete statements usually delete values by an ID. Unfortunately I don't have a ID field and I have to use an generic approach for differnt kind of tables.
That's why I am working with a subquery using limit and offset as resolving rows.
I know that approach is risky, however is there any way to delete rows by subquerying and comparing the whole row?
DELETE FROM table WHERE * = ( SELECT * FROM table LIMIT 1 OFFSET 6 )
I am using the latest version of MariaDB
This sounds like a really strange need, but who am I to judge? :)
I would simply rely on the primary key:
DELETE FROM table WHERE id_table = (SELECT id_table FROM table LIMIT 1 OFFSET 6)
update: oh, so you don't have a primary key? You can join on the whole row this way (assuming it has five columns named a, b, c, d, e):
DELETE t
FROM table t
INNER JOIN (
SELECT a, b, c, d, e
FROM table
ORDER BY a, b, c, d, e
LIMIT 1 OFFSET 6
) ROW6 USING (a, b, c, d, e);
Any subset of columns (e.g. a, c, d) that uniquely identify a row will do the trick (and is probably what you need as a primary key anyway).
Edit: Added an ORDER BY clause as per The Impaler's excellent advice. That's what you get for knocking an example up quickly.
DELETE FROM t
ORDER BY ... -- fill in as needed
LIMIT 6
(Works on any version)

Is there a way to multiply results in SQL?

I am building a website which populates from a database. I'm testing now, and I'd like to see what my site will look like with a lot of data (mainly so I can watch performance, build out pagination, and address any issues with presentation). I have about 10 pieces of data in my table, which is great, but I'd like to display about 2,000 on my page.
Is there a way I can read from the same SELECT * FROM table statement over and over again in the same query in order to read the table multiple times?
I can do this by feeding all my results into a variable and echoing that variable multiple times, but it won't allow me to set a LIMIT or give me the proper count of rows from the query.
I'm surprised I haven't found a way to do this by Googling. It seems like it would be an easy, built-in thing.
If there's not, can you suggest any other way I can do this without modifying my original table?
Please use Cross Join. Cross Join will give you a cartesian product of rows from tables joined. Cross Join can generate a lot of data in quick amount of time. Can be useful for extensive testing.
Example:
SELECT * FROM A
CROSS JOIN B;
You can cross join on the same table as well.
As of MySQL 8 you can use a recursive query to get your rows multifold:
with recursive cte (a, b, c) as
(
select a, b, 1 from mytable
union all
select a, b, c + 1 from cte where c < 10 -- ten times as many
)
select a, b from cte;
(You can of course alter the generated values in the part after union all, e.g.: select a + 5, b * 2, c + 1 from cte where c < 10.)
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=3a2699c167e1f4a7ffbe4e9b17ac7241

MySQL: How do I efficiently reuse the results of a query in other queries?

I'm running the exact same query four times, twice as a subquery, gathering different information each time. What is the best way to pass the results of the first query to the other three so it doesn't have to run three more times?
On the average, it returns around 2,000 rows, but can be anywhere from 0 (in which case I skip the other three) to all. The primary table has nearly 300,000 rows, is growing by about 800 per day, rows are never deleted, and thousands of rows are updated throughout the day, many multiple times.
I looked into query cache, but it doesn't look like it has a bright future:
disabled-by-default since MySQL 5.6 / MariaDB 10.1.7
depreciated as of MySQL 5.7.20
removed in MySQL 8.0
I considered using GROUP_CONCAT with IN, but somehow I doubt that would work very well (if at all) with larger queries.
This is in a library I use to format the results for other scripts, so the original query can be nearly anything. Usually, it is on indexed columns, but can be horribly complicated using stored functions and take several minutes. It always involves the primary table, but may also join in other tables (but only to filter results from the primary table).
I am using Perl 5.16 and MariaDB 10.1.32 (will upgrade to 10.2 shortly) on CentOS 7. I am using prepare_cached and placeholders. The user this library runs as has SELECT-only access to tables plus EXECUTE on a couple stored functions, but I can change that if needed.
I've minimized the below as much as I can and used metasyntactic variables (inside angle brackets) as much as possible in an attempt to make the logic clear. id is 16 bytes and the primary key of the primary table (labeled a below).
I'm accepting three parameters as input. <tables> always includes a and may include a join like a join b on a.id=b.id. <where> can be simple like e=3 or horribly complex. I'm also getting an array of data for the placeholders, but I've left that out of the below because it doesn't affect the logic.
<search> = FROM <tables> WHERE (<where>)
<foo> = k < NOW() - INTERVAL 3 HOUR
<bar> = j IS NOT NULL OR <foo>
<baz> = j IS NULL AND k > NOW() - INTERVAL 3 HOUR
so <baz> is !<bar>. Every row should match one or the other
<where> often includes 1 or more of foo/bar/baz
SELECT a.id, b, c, d, <foo> x <search> ORDER BY e, id
SELECT COUNT(*) <search> AND <baz>
I really only need to know if any of the above rows match <baz>
SELECT c, COUNT(*) t, SUM(<bar>) o FROM a WHERE c IN (SELECT c <search> GROUP BY c) GROUP BY c
SELECT d, COUNT(*) t, SUM(<bar>) o FROM a WHERE d IN (SELECT d <search> GROUP BY d) GROUP BY d
The last two get a list of all unique c or d from the rows in the original query and then count how many total rows (not just the ones in the original query) have matching c or d and how many of those match <bar>. Those results get dumped into hashes so I can look up those counts while I iterate through the rows from the original query. I'm thinking running those two queries once is more efficient than running two smaller queries for each row.
Thank you.
Edited to add solution:
A temporary table was the answer, just not quite in the way Raymond suggested. Using EXPLAIN on my queries indicates that MariaDB was already using a temporary table for each, and deleting it when each was complete.
An inner join only returns rows that exist in both tables. So by making a temporary table of IDs that match my first SELECT, and then joining it to the primary table for the other SELECTs, I only get the data I want, without having to copy all that data to the temporary table.
"To create a temporary table, you must have the CREATE TEMPORARY TABLES privilege. After a session has created a temporary table, the server performs no further privilege checks on the table. The creating session can perform any operation on the table, such as DROP TABLE, INSERT, UPDATE, or SELECT." - https://dev.mysql.com/doc/refman/5.7/en/create-temporary-table.html
I also figured out that GROUP BY sorts by default, and you can get better performance if you don't need the data sorted by telling it not to.
DROP TEMPORARY TABLE IF EXISTS `temp`;
CREATE TEMPORARY TABLE temp AS ( SELECT a.id FROM <tables> WHERE <where> );
SELECT a.id, b, c, d, <foo> x FROM a JOIN temp ON a.id=temp.id ORDER BY e, id;
SELECT COUNT(*) FROM a JOIN temp WHERE <baz>;
SELECT c, COUNT(*) t, SUM(<bar>) o FROM a WHERE c IN (SELECT c FROM a JOIN temp GROUP BY c ORDER BY NULL) GROUP BY c ORDER BY NULL;
SELECT d, COUNT(*) t, SUM(<bar>) o FROM a WHERE d IN (SELECT d FROM a JOIN temp GROUP BY d ORDER BY NULL) GROUP BY d ORDER BY NULL;
DROP TEMPORARY TABLE IF EXISTS `temp`;
The best i could think of is by using a TEMPORARY table.
p.s iám using valid MySQL SQL code mixed with the same pseudo code as the topicstarter
CREATE TEMPORARY TABLE <name> AS ( SELECT FROM <tables> WHERE (<where>) )
<foo> = k < NOW() - INTERVAL 3 HOUR
<bar> = j IS NOT NULL OR <foo>
<baz> = j IS NULL AND k > NOW() - INTERVAL 3 HOUR
so <baz> is !<bar>. Every row should match one or the other
<where> often includes 1 or more of foo/bar/baz
SELECT a.id, b, c, d, <foo> x FROM <name> ORDER BY e, id
SELECT COUNT(*) FROM <name> WHERE <baz>
SELECT c, COUNT(*) t, SUM(<bar>) o FROM a WHERE c IN (SELECT c FROM <name> GROUP BY c) GROUP BY c
SELECT d, COUNT(*) t, SUM(<bar>) o FROM a WHERE d IN (SELECT d FROM <name> GROUP BY d) GROUP BY d

How do I query for the sum of averages?

I have data that resemble stock data that is being updated every hour. So there are 24 entries every day for each stock. (just using stock as an example). But sometimes, the data may not be updated.
For example, let's assume we have 3 stocks, A, B, C. And assume that we gather data at various intervals during the day for each stock. The data would look something like this...
row A B C
1 3 4 5
2 3.5 4.1 5
3 2.9 3.8 4.3
What I want is to sum up the average value of each stock for this time period or
Avg(A) + Avg(B) + Avg(C)
In reality I have hundreds of stocks and hundreds of thousands of rows. I need this to calculate for a single day.
I tried this (stock names are in an array - stocks = array('A','B','C'))
SELECT SUM(AVG(stock_price)) FROM table WHERE date = [mydate] AND stock_name IN () ('".implode("','", $stocks)."') GROUP BY stock_name
but that didn't work. Can someone provide some insight?
Thanks, in advance.
Calculate the per-stock averages in a sub-query, then sum them in the main query.
SELECT SUM(average_price) AS total_averages
FROM (SELECT AVG(price) AS average_price)
FROM table
WHERE <conditions>
GROUP BY stock_name) AS averages
One way to do it, use an inline view as a rowsource:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM ( SELECT AVG(t.stock_price) AS avg_stock_price
FROM table t
WHERE t.date = [mydate]
AND t.stock_name IN ('a','b','c')
GROUP BY t.stock_name
) a
You can run just the query from the inline view (aliased as a) to get verify the results it returns. The outer query runs against the set of rows returned by the inline view query. (MySQL refers to the inline view (aliased as a) as a "derived table".
The outer query is effectively like this:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM a
The "trick" is that "a" isn't a regular table, it's a set of rows returned by a query; but in terms of relational algebra theory, it works the same... it's a set or rows. If a were a regular table, we could write:
SELECT b.col
FROM (
SELECT col FROM a
) b
We don't want to do that in MySQL when we don't have to, because of the inefficient way that MySQL processes that. MySQL first runs the inner query (the query in the inline view). MySQL creates a temporary MyISAM table, and inserts the rows returned by the query into the temporary MyISAM table. MySQL then runs the outer query, against that temporary table (which MySQL refers to as a "derived table") to return the result. Creating and populating a temporary table that's a copy of a regular table is a lot of overhead, especially with large sets.
What makes this powerful is that inline view query can include JOINs, WHERE clause, aggregates, GROUP BY, whatever. As long as it returns a set of rows (with appropriate column names), we can wrap the query in parens, and reference it in another query like it was a table.

Can there be a database-agnostic SQL query to fetch top N rows?

We want to be able to select top N rows using a SQL Query. The target database could be Oracle or MySQL. Is there an elegant approach to this? (Needless to say, we're dealing with sorted data here.)
To get the top 5 scorers from this table:
CREATE TABLE people
(id int,
name string,
score int)
try this SQL:
SELECT id,
name,
score
FROM people p
WHERE (SELECT COUNT(*)
FROM people p2
WHERE p2.score > p.score
) <=4
I believe this should work in most places.
No. The syntax is different.
You may, however, create views:
/* Oracle */
CREATE VIEW v_table
AS
SELECT *
FROM (
SELECT *
FROM table
ORDER BY
column
)
WHERE rownum <= n
/* MySQL */
CREATE VIEW v_table
AS
SELECT *
FROM table
ORDER BY
column
LIMIT n
I don't think that's possible even just between mysql and mssql. I do an option for simulating such behaviour though:
create views that have an auto incremented int column; say 'PagingHelperID'
write queries like: SELECT columns FROM viewname WHERE PagingHelperID BETWEEN startindex AND stopindex
This will make ordering difficult, you will need different views for every order in which you intend to retreive data.
You could also "rewrite" your sql on the fly when querying depending on the database and define your own method for the rewriter, but I don't think there is any "good" way to do this.
If there is a unique key on the table yes...
Select * From Table O
Where (Select Count(*) From Table I
Where [UniqueKeyValue] < O.UniqueKeyValue) < N
You can substitute your own criteria if you want the "Top" definition to be based on some other logic than on the unique key...
EDIT: If the "sort" that defines the meaning of "Top" is based on a non-unique column, or set of columns, then you can still use this, but you can't guarantee you will be able to get exactly N records out...
Select * From Table O
Where (Select Count(*) From Table I
Where nonUniqueCol < O.nonUniqueCol) < 10
If records 8, 9, 10, 11, and 12 all have the same value in [nonUniqueCol], then the query will either only generate 7 records, (with '<') ... , or 12 (if you use '<=')
NOTE: As this involves a correlated sub-query, the performance can be an issue for very large tables...
Starting with MySQL 8, you can use ROW_NUMBER() filtering to get the semantics of LIMIT (MySQL) or FETCH (Oracle) in a uniform, standards compliant syntax:
SELECT t.a, t.b, t.c, t.o
FROM (
SELECT a, b, c, o, ROW_NUMBER() OVER (ORDER BY o)
FROM x
) t
WHERE rn <= :limit
ORDER BY o
But this is likely to be less efficient than using the vendor specific syntax, so if you have some means of abstracting over LIMIT and FETCH (e.g. using an ORM like jOOQ or Hibernate, or even some templating language), that should be preferred.
The big problem, after looking this over, is that MySQL isn't ISO SQL:2003 compliant. If it was, you'd have these handy windowing functions:
SELECT * from
( SELECT
RANK() OVER (ORDER BY <blah>) AS ranking,
<rest of columns here>,
FROM <table>
)
WHERE ranking <= <N>
Alas, MySQL (and others that mimic it's behavior, eg SQLite), do not, hence the whole limiting issue.
Check out this snippet from Wikipedia (http://en.wikipedia.org/wiki/Window_function_(SQL)#Limiting_result_rows)