MySQL split table query using varchar column - mysql

I have a table in MySQL which I want to query parallel by executing multiple select statements that select
non-overlapping equal parts from the table, like:
1. select * from mytable where col between 1 and 1000
2. select * from mytable where col between 1001 and 2000
...
The problem is that the col in my case is varchar. How can I split the query in this case?
In Oracle we can operate with NTILE in combination with rowids. But I didn't find a similar approach in case of MySQL.
That's why my thinking is to hash the col value and mod it by the number of equal parts I want to have.
Or instead of hashing, dynamically generated rownums could be used.
What would be an optimal solution considering that the table big (xxxM rows) and I want to avoid full table
scans for each of the queries?

You can use limit for the purpose of paging, so you will have:
1. select * from mytable limit 0, 1000
2. select * from mytable limit 1000, 1000

you can use casting for varchar column to integer like this cast(col as int)
Regards
Tushar

Without scanning fulltable, it will produce results
SELECT * FROM mytable
ORDER BY ID
OFFSET 0 ROWS
FETCH NEXT 100 ROWS ONLY

Related

SQL Query with Arithmetic Sub-Queries

I have a simple MYSQL table with about 5 columns or so. The row size of the table changes quite frequently.
One of these columns is named has_error, and is a column that has a value of either 1 or 0.
I want to create a single SQL query that will be the equivalent of the following simple equation:
(Number of rows with has_error = 1/Total number of rows in table) * 100
I can create the individual SQL queries (see below), but not sure how to put it all together.
SELECT COUNT(*) AS total_number_of_rows FROM my_table
SELECT COUNT(*) AS number_of_rows_with_errors FROM My_table WHERE has_error = 1
This is easy because you can just use avg(has_error):
SELECT AVG(has_error) * 100
FROM My_table;

mysql query for table row range

This might be a very basic question but I am struggling with queying the specific rows in a table based only on the row range.
Let's say I have a table ABC where I have populated 1000 rows. Now I want a sql query so that I can fetch first 100 rows ( that is the range 1 to 100) and then the next 100 ( 101 to 200) and so on till I am done with all rows. And this should be done without querying/filtering on the table's id or any column id.
I am not able to figure it out as I am trained only on querying specific columns in WHERE clause so would appreciate if someone can plz help
You have to use the LIMIT clause in the SELECT query. MySQL allows you to set two parameters for the clause, the offset (first parameter) and the number of rows to fetch (second parameter).
SELECT * FROM `ABC` LIMIT 0, 100
SELECT * FROM `ABC` LIMIT 100, 100
SELECT * FROM `ABC` LIMIT 200, 100
-- etc...
However, you cannot guarantee the order of these rows unless you sort by one or more specific column(s) using the ORDER BY clause.
Read more about the SELECT statement here: http://dev.mysql.com/doc/refman/5.6/en/select.html
you can use limit in mysql.
limit accept 2 parameters.
this will return 1-10 records.
select * from abcd limit 10
this will return 10-20 records.
select * from abcd limit 10,10
this will return 20-30 records.
select * from abcd limit 20,10

Speed of two queries versus one query but limit output

I am running a query where i need to know the number of lines total in a table but only need to show the first 6.
So, is it faster to run select count(*) then select * ... limit 6 and print data returned? Or, just select * with no limit and put a counter in the while loop printing the results? With the latter I can obviously use mysql_num_rows to get the total.
The table in question will contain up to 1 million rows, the query includes a where row = xxx that column will be indexed
Use FOUND_ROWS(). Here's an example:
SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name WHERE id > 100 LIMIT 10;
SELECT FOUND_ROWS();
Do two queries. Your count query will use an index and will not have to scan the whole table, only the index. The second query will only have to read the 6 rows from the table.

How to optimize a MySQL query so that a selected value in a WHERE clause is only computed once?

I need to randomly select, in an efficient way, 10 rows from my table.
I found out that the following works nicely (after the query, I just select 10 random elements in PHP from the 10 to 30 I get from the query):
SELECT * FROM product WHERE RAND() <= (SELECT 20 / COUNT(*) FROM product)
However, the subquery, though relatively cheap, is computed for every row in the table. How can I prevent that? With a variable? A join?
Thanks!
A variable would do it. Something like this:
SELECT #myvar := (SELECT 20 / COUNT(*) FROM product);
SELECT * FROM product WHERE RAND() <= #myvar;
Or, from the MySql math functions doc:
You cannot use a column with RAND()
values in an ORDER BY clause, because
ORDER BY would evaluate the column
multiple times. However, you can
retrieve rows in random order like
this:
mysql> SELECT * FROM tbl_name ORDER BY
> RAND();
ORDER BY RAND() combined with LIMIT is
useful for selecting a random sample
from a set of rows:
mysql> SELECT * FROM table1, table2
> WHERE a=b AND c<d -> ORDER BY RAND()
> LIMIT 1000;
RAND() is not meant to be a perfect
random generator. It is a fast way to
generate random numbers on demand that
is portable between platforms for the
same MySQL version.
Its a highly mysql specific trick but by wrapping it in another subquery MySQL will make it a constant table and compute it only once.
SELECT * FROM product WHERE RAND() &lt= (
select * from ( SELECT 20 / COUNT(*) FROM product ) as const_table
)
SELECT * FROM product ORDER BY RAND() LIMIT 10
Don't use order by rand(). This will result in a table scan. If you have much data at all in your table this will not be efficient at all. First determine how many rows are in the table:
select count(*) from table might work for you, though you should probably cache this value for some time since it can be slow for large datasets.
explain select * from table will give you the db statistics for the table (how many rows the statistics thinks are in the table) This is much faster, however it is less accurate and less accurate still for InnoDB.
once you have the number of rows, you should write some code like:
pseudo code:
String SQL = "SELECT * FROM product WHERE id IN (";
for (int i=0;i<numResults;i++) {
SQL += (int)(Math.rand() * tableRows) + ", ";
}
// trim off last ","
SQL.trim(",");
SQL += ")";
this will give you fast lookup on PK and avoid the table scan.

SQL select a sample of rows

I need to select sample rows from a set. For example if my select query returns x rows then if x is greater than 50 , I want only 50 rows returned but not just the top 50 but 50 that are evenly spread out over the resultset. The table in this case records routes - GPS locations + DateTime.
I am ordering on DateTime and need a reasonable sample of the Latitude & Longitude values.
Thanks in advance
[ SQL Server 2008 ]
To get sample rows in SQL Server, use this query:
SELECT TOP 50 * FROM Table
ORDER BY NEWID();
If you want to get every n-th row (10th, in this example), try this query:
SELECT * From
(
SELECT *, (Dense_Rank() OVER (ORDER BY Column ASC)) AS Rank
FROM Table
) AS Ranking
WHERE Rank % 10 = 0;
Source
More examples of queries selecting random rows for other popular RDBMS can be found here: http://www.petefreitag.com/item/466.cfm
Every n'th row to get 50:
SELECT *
FROM table
WHERE row_number() over() MOD (SELECT Count(*) FROM table) / 50 == 0
FETCH FIRST 50 ROWS ONLY
And if you want a random sample, go with jimmy_keen's answer.
UPDATE:
In regard to the requirement for it to run on MS SQL, I think it should be changed to this (no MS SQL Server around to test though):
SELECT TOP 50 *
FROM (
SELECT t.*, row_number() over() AS rn, (SELECT count(*) FROM table) / 50 AS step
FROM table t
)
WHERE rn % step == 0
I suggest that you add a calculated column to your resultset on selection that is obtained as a random number, and then select the top 50 sorted by that column. That will give you a random sample.
For example:
SELECT TOP 50 *, RAND(Id) AS Random
FROM SourceData
ORDER BY Random
where SourceData is your source data table or view. This assumes T-SQL on SQL Server 2008, by the way. It also assumes that you have an Id column with unique ids on your data source. If your ids are very low numbers, it is a good practice to multiply them by a large integer before passing them to RAND, like this:
RAND(Id * 10000000)
If you want an statically correct sample, tablesample is a wrong solution. A good solution as I described in here based on a Microsoft Research paper, is to create a materialized view over your table which includes an additional column like
CAST( ROW_NUMBER() OVER (...) AS BYTE ) AS RAND_COL_, then you can add an index on this column, plus other interesting columns and get statistically correct samples for your queries fairly quickly. (by using WHERE RAND_COL_ = 1).