How to use apache drill do page search - apache-drill

i want use apache drill to do a page search. But it just provide a limit key words,I don't know how to write a good sql.Do any anybody can help me?Thank you!

Drill supports both LIMIT and OFFSET operators. So, pagination can be achieved using these.
Sample query:
SELECT * FROM cp.`employee.json` order by employee_id LIMIT 20 OFFSET 10 ROWS
Some important ponits from Drill docs:
The OFFSET number must be a positive integer and cannot be larger than the number of rows in the underlying result set or no rows are returned.
You can use the OFFSET clause in conjunction with the LIMIT and ORDER BY clauses.
When used with the LIMIT option, OFFSET rows are skipped before starting to count the LIMIT rows that are returned. If the LIMIT option is not used, the number of rows in the result set is reduced by the number of rows that are skipped.
The rows skipped by an OFFSET clause still have to be scanned, so it might be inefficient to use a large OFFSET value.

Related

What is the fastest way to count the number of MySql rows left after a limited results query

If I have a mysql limited query:
SELECT * FROM my_table WHERE date > '2020-12-12' LIMIT 1,16;
Is there a faster way to check and see how many results are left after my limit?
I was trying to do a count with limit, but that wasn't working, i.e.
SELECT count(ID) AS count FROM my_table WHERE date > '2020-12-12' LIMIT 16,32;
The ultimate goal here is just to determine if there ARE any other rows to be had beyond the current result set, so if there is another faster way to do this that would be fine too.
It's best to do this by counting the rows:
SELECT count(*) AS count FROM my_table WHERE date > '2020-12-12'
That tells you how many total rows match the condition. Then you can compare that to the size of the result you got with your query using LIMIT. It's just arithmetic.
Past versions of MySQL had a function FOUND_ROWS() which would report how many rows would have matched if you didn't use LIMIT. But it turns out this had worse performance than running two queries, one to count rows and one to do your limit. So they deprecated this feature.
For details read:
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
https://dev.mysql.com/worklog/task/?id=12615
(You probably want OFFSET 0, not 1.)
It's simple to test whether there ARE more rows. Assuming you want 16 rows, use 1 more:
SELECT ... WHERE ... ORDER BY ... LIMIT 0,17
Then programmatically see whether it returned only 16 rows (no more available) or 17 (there ARE more).
Because it is piggybacking on the fetch you are already doing and not doing much extra work, it is very efficient.
The second 'page' would use LIMIT 16, 17; 3rd: LIMIT 32,17, etc. Each time, you are potentially getting and tossing an extra row.
I discuss this and other tricks where I point out the evils of OFFSET: Pagination
COUNT(x) checks x for being NOT NULL. This is [usually] unnecessary. The pattern COUNT(*) (or COUNT(1)) simply counts rows; the * or 1 has no significance.
SELECT COUNT(*) FROM t is not free. It will actually do a full index scan, which is slow for a large table. WHERE and ORDER BY are likely to add to that slowness. LIMIT is useless since the result is always 1 row. (That is, the LIMIT is applied to the result, not to the counting.)

Order by / limit execution in SQL

Lots of thread already on web, just trying to understand some nuances which had me confused!
Quoting the doc reference
If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as
soon as it has found the first row_count rows of the sorted result,
rather than sorting the entire result. If ordering is done by using an
index, this is very fast.
and a SO thread
It will order first, then get the first 20. A database will also
process anything in the WHERE clause before ORDER BY.
Taking the same query from the question :
SELECT article
FROM table1
ORDER BY publish_date
LIMIT 20
lets say table has 2000 rows, of which query is expected to return 20 rows, now, looking at mysql ref ....stops sorting as soon as it has found the first row_count rows.... confuses me as i find it little ambiguous!!
Why does it say stops sorting? isn't the limit clause being applied on an already sorted data returned via order by clause ( assuming its a non-indexed column ) or is my understanding wrong and SQL is limiting first and then sorting!!??
The optimization mentioned in the documentation generally only works if there's an index on the publish_date column. The values are stored in the index in order, so the engine simply iterates through the index of the column, fetching the associated rows, until it has fetched 20 rows.
If the column isn't indexed, the engine will generally need to fetch all the rows, sort them, and then return the first 20 of these.
It's also useful to understand how this interacts with WHERE conditions. Suppose the query is:
SELECT article
FROM table1
WHERE last_read_date > '2018-11-01'
ORDER BY publish_date
LIMIT 20
If publish_date is indexed and last_read_date is not, it will scan the publish_date index in order, test the associated last_read_date against the condition, and add article to the result set if the test succeeds. When there are 20 rows in the result set it will stop and return it.
If last_read_date is indexed and publish_date is not, it will use the last_read_date index to find the subset of all the rows that meet the condition. It will then sort these rows using the publish_date column, and return the first 20 rows from that.
If neither column is indexed it will do a full table scan to test last_read_date, sort all the rows that match the condition, and return the first 20 rows of this.
MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result
This is actually a very sensible optimisation within mysql. If you use limit to return 20 rows and mysql knows it already found them, then why would mysql (or you) care how exactly the rest of the records are sorted? It does not matter, therefore mysql stops sorting the rest of the rows.
If the order by is done on an indexed column, then mysql can tell pretty quickly, if it found the top n records.

Retrieving only a fixed number of rows in MySQL

I am testing my database design under load and I need to retrieve only a fixed number of rows (5000)
I can specify a LIMIT to achieve this, however it seems that the query builds the result set of all rows that match and then returns only the number of rows specified in the limit. Is that how it is implemented?
Is there a for MySQL to read one row, read another one and basically stop when it retrieves the 5000th matching row?
MySQL is smart in that if you specify a LIMIT 5000 in your query, and it is possible to produce that result without generating the whole result set first, then it will not build the whole result.
For instance, the following query:
SELECT * FROM table ORDER BY column LIMIT 5000
This query will need to scan the whole table unless there is an index on column, in which case it does the smart thing and uses the index to find the rows with the smallest column.
SELECT * FROM `your_table` LIMIT 0, 5000
This will display the first 5000 results from the database.
SELECT * FROM `your_table` LIMIT 1001, 5000
This will show records from 1001 to 6000 (counting from 0).
Complexity of such query is O(LIMIT) (unless you specify order by).
It means that if 10000000 rows will match your query, and you specify limit equal to 5000, then the complexity will be O(5000).
#Jarosław Gomułka is right
If you use LIMIT with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. In either case, after the initial rows have been found, there is no need to sort any remainder of the result set, and MySQL does not do so.
if the set is not sorted it terminates the SELECT operation as soon as it's got enough rows to the result set.
The exact plan the query optimizer uses depends on your query (what fields are being selected, the LIMIT amount and whether there is an ORDER BY) and your table (keys, indexes, and number of rows in the table). Selecting an unindexed column and/or ordering by a non-key column is going to produce a different execution plan than selecting a column and ordering by the primary key column. The later will not even touch the table, and only process the number of rows specified in your LIMIT.
Each database defines its own way of limiting the result set size depends on the database you are using.
While the SQL:2008 specification defines a standard syntax for limiting a SQL query, MySQL 8 does not support it.
Therefore, on MySQL, you need to use the LIMIT clause to restrict the result set to the Top-N records:
SELECT
title
FROM
post
ORDER BY
id DESC
LIMIT 50
Notice that we are using an ORDER BY clause since, otherwise, there is no guarantee which are the first records to be included in the returning result set.

How does 'LIMIT' parameter work in sql?

I have 4000 rows for example, and I define X limit.
The query stops after it finds X rows? or the query finds all the rows and then takes X rows from the found rows?
Thank you.
From MySQL Reference Manual:
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as
soon as it has found the first row_count rows of the sorted result,
rather than sorting the entire result. If ordering is done by using an
index, this is very fast. If a filesort must be done, all rows that
match the query without the LIMIT clause must be selected, and most or
all of them must be sorted, before it can be ascertained that the
first row_count rows have been found. In either case, after the
initial rows have been found, there is no need to sort any remainder
of the result set, and MySQL does not do so.
So it looks like it's possible that the entire result set is known before the LIMIT is applied. But MySQL will try everything it can not to do so. And you can help it by providing useful indexes that match your queries.
EDIT: Furthermore, if the set is not sorted it terminates the SELECT operation as soon as it's streamed enough rows to the result set.
SELECT * FROM your_table LIMIT 0, 10
This will display the first 10 results from the database.
SELECT * FROM your_table LIMIT 5, 5
This will show records 6, 7, 8, 9, and 10
It's like telling MySql; I want you to start counting from 5+1 or the 6th record, but Select only upto 5 records
I'm assuming you're thinking about MySQL, in which according to the documentation, the answer is it depends. If you're using a LIMIT (without a HAVING), then:
If you are selecting only a few rows with LIMIT, MySQL uses indexes
in some cases when normally it would prefer to do a full table scan.
As soon as MySQL has sent the required number of rows to the client, it aborts the query unless you are using SQL_CALC_FOUND_ROWS.
There are a few other cases which you should read about in the documentation.
Introduction to MySQL LIMIT clause
The following illustrates the LIMIT clause syntax with two arguments:
SELECT
select_list
FROM
table_name
LIMIT [offset,] row_count;
The offset specifies the offset of the first row to return. The offset of the first row is 0, not 1.
The row_count specifies the maximum number of rows to return.
The following picture illustrates the LIMIT clause:
Therefore, these two clauses are equivalent:
> LIMIT row_count;
> LIMIT 0 , row_count;
The following picture illustrates the evaluation order of the LIMIT clause in the SELECT statement:
It stops after it found the number of rows specified in the LIMIT clause. This can be verified with a large amount of data. It retrieves the result in a time that is not possible if it is getting all the rows of the table and filtering after that.
If you are using MS SQL Server, then you can write it as given below.
Select TOP [x]
*
From MyTable
Hope it helps.
Vamyip

How to get ALL rows starting from row x in MySQL

In MySQL, how can I retrieve ALL rows in a table, starting from row X? For example, starting from row 6:
LIMIT 5,0
This returns nothing, so I tried this:
LIMIT 5,ALL
Still no results (sql error).
I'm not looking for pagination functionality, just retrieving all rows starting from a particular row. LIMIT 5,2000 seems like overkill to me. Somehow Google doesn't seem to get me some answers. Hope you can help.
Thanks
According to the documentation:
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95, 18446744073709551615;
This is the maximum rows a MyISAM table can hold, 2^64-1.
There is a limit of 2^32 (~4.295E+09) rows in a MyISAM table. If you build MySQL with the --with-big-tables option, the row limitation is increased to (2^32)^2 (1.844E+19) rows. See Section 2.16.2, “Typical configure Options”. Binary distributions for Unix and Linux are built with this option.
If you're looking to get the last x number of rows, the easiest thing to do is SORT DESC and LIMIT to the first x rows. Granted, the SORT will slow your query down. But if you're opposed to setting an arbitrarily large number as the second LIMIT arg, then that's the way to do it.
The only solution I am aware of currently is to do as you say and give a ridiculously high number as the second argument to LIMIT. I do not believe there is any difference in performance to specifying a low number or a high number, mysql will simply stop returning rows at the end of the result set, or when it hits your limit.
I think you don't need to enter max value for select all by LIMIT. It is enough to find count of table and then use it as max LIMIT.
The next query should work too, and is in my opinion more effective...
SELECT * FROM mytbl WHERE id != 1 ORDER BY id asc
By ordering the query will find the id imediately and skip this one, so the next rows he won't check anymore whether the id = 1.