Is there a way to customize the Drill execution plan? - apache-drill

We want to execute a query using Drill with the PostGIS storage plugin, the query is:
SELECT zone, count(primary_roads.id) as roads
FROM pg.test.zones, pg.test.primary_roads
WHERE ST_Crosses(geom_linestring, geom)
AND zone IN ('Astoria Park', 'Red Hook', 'Douglaston')
GROUP BY zone
ORDER BY roads desc;
Adding logs to Drill we see that what actually Drill does is splitting the query in two parts:
SELECT *
FROM "test"."primary_roads"
And
SELECT *
FROM "test"."zones"
WHERE "zone" = 'Astoria Park' OR "zone" = 'Red Hook' OR "zone" = 'Douglaston'
As you can see, it does not include the ST_Crosses function, also the GROUP BY and ORDER BY clauses.
So, is there a way we can pass the entirely query to PostGIS and avoid the splitting?

Internally Drill's planner has decided that this is the best way to run the query, and unfortunately there is no way to explicitly tell Drill not to do this. To make the planner behave differently and pushdown the ST_Crosses function along with the group by and order by we would have to make some code changes. If you'd like to see this optimization done in Drill please get in touch with the Drill team on the mailing lists. You can find all the mailing list information here.

Related

MySQL ORDER BY FIELD equivalent in Redis search

How do I accomplish this custom sort by field feature available in MySQL in Redis search?
select * from product ORDER BY FIELD(id,3,2,1,4)
For some business reason, I need to enforce custom orders.
There is not equivalent of the FIELD function in RediSearch.
With FT.SEARCH / SORTBY you can sort results using a field.
In your hash, you may create a new field (NUMERIC SORTABLE) holding a value that will be used to sort the result. Of course, that would work only if you don't want need to specify a different order on each query.
Second option: This could be handled by using FT.AGGREGATE with an appropriate function. You may have a look at the existing function and see if one could be used for that. IF the function does not exist, you may do a feature request.
A third option is to implement your own scoring function using the extension API (but it may be over engineering ...)

Storing SQL queries in table and run them on-the-fly when selecting - Possible? Good practice?

I'm currently thinking about a database schema in MySQL where I store SELECT queries into a certain table column, just to execute them on-the-fly when getting selected, and having the result passed instead of the actual query.
Would this be possible somehow? Or may this be bad practice? Is it even technically possible to have a result table passed to a single field, at least so I could run the query through PDO to get back a nested result array? Are there any alternatives?
I've read that this may be achieved through stored procedures, and although I grip the concept of those I can't think of how I could use those to achieve that.
You could do this, but what purpose do you have for doing it?
I would suggest using views:
The syntax should be valid when the view is created, unlike storing
the SQL in a field which may have invalid syntax.
It's easier to debug and modify.
For example, let's say one of the queries you want to store is:
SELECT product_category, COUNT(*) AS category_count
FROM product
GROUP BY product_category;
You can create a new "view" object that defines this query:
CREATE VIEW prod_cat_count AS
SELECT product_category, COUNT(*) AS category_count
FROM product
GROUP BY product_category;
Now, the object called "prod_cat_count" is stored in the database. Internally, the database just knows that "prod_cat_count" is equal to the SELECT query we mentioned. When the view is created, the database validates the syntax (checks that all columns exist, checks you haven't forgotten the GROUP BY, for example)
Then, whenever you want to get this data/run this query, you can run this statement (in SQL or in application code, for example):
SELECT product_category, category_count
FROM prod_cat_count;
If you then decide you want to change the way the product categories are counted, you can adjust the view:
SELECT product_category, COUNT(*) AS category_count
FROM product
GROUP BY product_category
ORDER BY product_category;
Hope that helps!

How do you edit an SQL query in Apache Superset?

I have set up Superset on my Jupiter notebook and it is working i.e. the sample dashboards etc, work. When I try to create a simple table view to just do a SELECT * from Table to view the whole table (it is a small table), Superset keeps generating the SQL:
SELECT
FROM*
(SELECT Country,
Region,
Users,
Emails
FROM `UserStats`
LIMIT 50000) *AS expr_qry
LIMIT 50000
The first SELECT FROM and the AS expr_qry LIMIT 50000 are automatically generated and I cannot get rid of them (i.e. in the Slice view it shows this as the query, but won't let me edit it). Why does it generate its own SQL and where do you change this?
I tried to find workarounds for this but I feel I am missing something fundamental here.
there is a configuration for 'ROW_LIMIT' in superset/config.py,(default is 50000). if you need remove 'limit cause', try superset/viz.py, query_obj function.

Couchbase view on multiple columns with WHERE and ORDER BY clause as in SQL

I am new to Couchbase noSql database.
I am trying to create a view, i want this view should give result as below SQL query.
SELECT * FROM Employee e WHERE e.name = "DESM%" AND e.salary < 1000 ORDER BY e.id desc
Any suggestion is very appreciated.
If you look at existing beer samples in Couchbase (it comes with it), you will find views defined there. In admin console you can run a view. Notice when you run a view you can provide filtering criteria and sort order for the result...that might be an equivalent for your SQL like functionality. Read more on Views and Indexes
yet another option is to use Couchbase v3 that comes with its own N1QL query language that can serve as another alternative. You can try it out online here.

How to restrict/obfuscate MySQL value when querying

So I'm building a bit of an API where users can query my database with read-only access. However, I want to block certain fields, specifically IP addresses. I'm currently using preg_replace in PHP to match and switch out IPs, but I feel like someone could get around that with come clever string-splitting MySQL functions.
Is there a way I can block/replace/obfuscate this particular field for this read-only MySQL user?
The record would be at (table.field):
`TrafficIp`.`Value`
An example query they might use would be
SELECT COUNT(*) Hits, Value IpAddress
FROM TrafficIp
INNER JOIN Traffic
ON Traffic.IpId = TrafficIp.Id
GROUP BY Value
ORDER BY Hits DESC
How would I bait and switch?
You could create a view of your table that omits the field with the IP address, and let API users query that view, but not the underlying table.
Really, instead of trying to do "damage control" on the back end of the query, your API should be filtering the queries before they ever make it to the database. It is highly inadvisable to just pass through raw SQL queries from the outside world, into your database.