Multiple, unknown number of fields passed into a query - couchbase

Is it possible to create a generic query that would work for different types of documents? For example I have "cases" and "factories",
They have different set of fields. e.g:
{
id: 'case_o1',
name: 'Case numero uno',
amount: 40
}
{
id: 'factory_002',
location: 'Venezuela',
workers: 200,
operating: true
}
Is it possible to create a generic query where I would pass the type of an entity (case or factory) and additional parameters and it would filter results based on those?
I could of course use javascript view, but it doesn't allow me to filter by multiple fields. Let's say I want to fetch all factories located in Venezuela, with number of workers between 20 and 55.
I started with this, but then I got stuck:
select * from `mybucket` as entity
where position(meta(entity).id, $entity_type) == 0
How do I pass multiple predicates and have the query to recognize them?
I can of course list fields like this:
where position(meta(entity).id, $entity_type) == 0
and entity.location == 'Venezuela'
and entity.workers > $workers_min
and entity.workers < $workers_max
but then
I'm gonna have to create a separate query for each entity
And even then it won't solve my problem - I have no idea how to ignore predicates, what if next time $workers_min and $workers_max are not passed, does it mean I have to create a query for every single predicate (column)?
For security reasons I cannot generate free-form queries and pass them to Couchbase server, all the queries are already stored in the database, our api just picks them up out of a document and executes them
I think it's possible to create a query that would be "short-circuiting" for args that's undefined (e.g. WHERE $location IS MISSING OR entity.location == $location or something like that)
Is it possible at all to create a query that would be able to effectively filter and order a dataset based on arbitrary parameters? Or there's no way?

#Agzam. Sorry. I were writting my comment when you said it. But anyway. What you are asking for is possible by using coalesces in a not too complex expressions, but it is a REALLY bad idea because this will drastically throw down most of internal database optimizations. Including the use of any existing index. So, except if you are dealing with a relatively small database (and you are sure it will remain being approximately the same size), I suggest you to better try distinct approach… This is, in fact, the reason I implmented sqlapi.
If you need to have all querys previously stored in database, it probably could be much better to sort given arguments by its name and precalculate and store precalculated querys for each possible combination.

You can do it by assigning a default value to the variable when is not used. For instance if $location is not used you can set it to -1 as default value.
Then the where condition would be:
WHERE ($location=-1 OR entity.location = $location)

Related

Can you construct an ActiveRecord scope with a variable query string?

Setup:
I'm using Ruby on Rails with ActiveRecord and MySQL.
I have a Coupon model.
It has an attribute called query, it is a string which could be run with a where.
For example:
#coupon.query
=> "'http://localhost:3003/hats' = :url OR 'http://localhost:3003/shoes' = :url"`
If I were to run this query it would either pass or fail based on the :url value I pass in.
# passes
Coupon.where(#coupon.query, url: 'http://localhost:3003/hats')
Coupon.where(#coupon.query, url: 'http://localhost:3003/shoes')
# fails
Coupon.where(#coupon.query, url: 'http://localhost:3003/some_other_url')
This query varies between Coupon models, but it will always be compared to the current url.
I need a way to say: Given an ActiveRecord collection #coupons only keep coupons with queries that pass.
The structure of the where is always the same, but the query changes.
Is there any way to do this without a loop? I could potentially have a lot of coupons and I am hoping to do this an ActiveRecord scope. Something like this?
#coupons.where(self.query, url: #url)
Perhaps I need to write a user defined function in my database?
Using multiple variables in a query is easy, but where the thing you are comparing your variable to is also a variable - that has me stumped. Any suggestions very appreciated.
I would agree with Les Nightingill's comment that this looks like something that should probably be solved at a more architectural level. I'd imagine an easy refactoring to extract a new CouponQuery model that's a 1:n table containing multiple entries for a coupon_id for each query url that should pass. Then you could use a simple join like
Coupon.joins(:coupon_query).where(coupon_queries: { url: my_url })
If adding a new table is not an option, and if you're running on a newer MySQL version (>= 5.7), you could consider transforming the query column (or adding a new json_query column) into a MySQL JSON field and using the new JSON_CONTAINS query.
If from the user-side they should be able to manage the queries as a plain text field, you could use a before_save hook on your model to translate this into the separate table structure or JSON format respectively.
But if neither is an option for you and you really need to stick with the query column that stores a plain string, then you could use a LIKE query to match the sub-string 'your-url' = :url:
Coupon.where('url LIKE "%? = :url%"', my_url)
which, if you e.g. pass 'http://localhost:3003/hats' as my_url would return something like this SQL query:
SELECT `coupons`.* FROM `coupons`
WHERE (url LIKE "%'http://localhost:3003/hats' = :url%")

Store results of expensive function calls in a MySQL table

Let's suppose I have a set of integers of a variable length. I apply a function on this set of integers and I obtain a result.
myFunction(setOfIntegers) => myResult
Let's suppose a call to myFunction is very expensive and I would like to somehow store the results of this function calls.
In my application I am already using MySQL and what I was thinking was to somehow create a table with the setOfIntegers as a PK and myResult as an additional field.
I was thinking that I could do this by transforming the setOfIntegers to a string before storing it in the DB.
Can this be done in any other way? Or would there be a better way to store results of such function calls in order to avoid calling them a 2nd time with the same set of integers?
I don't know about Java, but Perl has my $str = join(',', $array) and PHP has $str = implode(',', $array). Then the string $str could be used as the PRIMARY KEY (assuming it is not too long). And the result would go in the other column.
Your app code (in Java) would need to first do an implode and SELECT to see if the function has already been evaluated for the given array. If not, then perform the function and end by INSERTing a new row.
If this will be multi-threaded, you could use INSERT IGNORE to deal with dups. (There are other solutions, too.)
Another note: If your set-of-integers is ordered, then what I describe is 'complete'. If it is unordered, then sort it before imploding. This will provide a canonical representation.
If the function can be implemented in MySQL directly, I would suggest using Views.
https://www.mysqltutorial.org/mysql-views-tutorial.aspx/

Why does MySQL permit non-exact matches in SELECT queries?

Here's the story. I'm testing doing some security testing (using zaproxy) of a Laravel (PHP framework) application running with a MySQL database as the primary store for data.
Zaproxy is reporting a possible SQL injection for a POST request URL with the following payload:
id[]=3-2&enabled[]=on
Basically, it's an AJAX request to turn on/turn off a particular feature in a list. Zaproxy is fuzzing the request: where the id value is 3-2, there should be an integer - the id of the item to update.
The problem is that this request is working. It should fail, but the code is actually updating the item where id = 3.
I'm doing things the way I'm supposed to: the model is retrieved using Eloquent's Model::find($id) method, passing in the id value from the request (which, after a bit of investigation, was determined to be the string "3-2"). AFAIK, the Eloquent library should be executing the query by binding the ID value to a parameter.
I tried executing the query using Laravel's DB class with the following code:
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3-2"));
and got the row for id = 3.
Then I tried executing the following query against my MySQL database:
SELECT * FROM table WHERE id='3-2';
and it did retrieve the row where id = 3. I also tried it with another value: "3abc". It looks like any value prefixed with a number will retrieve a row.
So ultimately, this appears to be a problem with MySQL. As far as I'm concerned, if I ask for a row where id = '3-2' and there is no row with that exact ID value, then I want it to return an empty set of results.
I have two questions:
Is there a way to change this behaviour? It appears to be at the level of the database server, so is there anything in the database server configuration to prevent this kind of thing?
This looks like a serious security issue to me. Zaproxy is able to inject some arbitrary value and make changes to my database. Admittedly, this is a fairly minor issue for my application, and the (probably) only values that would work will be values prefixed with a number, but still...
SELECT * FROM table WHERE id= ? AND ? REGEXP "^[0-9]$";
This will be faster than what I suggested in the comments above.
Edit: Ah, I see you can't change the query. Then it is confirmed, you must sanitize the inputs in code. Another very poor and dirty option, if you are in an odd situation where you can't change query but can change database, is to change the id field to [VAR]CHAR.
I believe this is due to MySQL automatically converting your strings into numbers when comparing against a numeric data type.
https://dev.mysql.com/doc/refman/5.1/en/type-conversion.html
mysql> SELECT 1 > '6x';
-> 0
mysql> SELECT 7 > '6x';
-> 1
mysql> SELECT 0 > 'x6';
-> 0
mysql> SELECT 0 = 'x6';
-> 1
You want to really just put armor around MySQL to prevent such a string from being compared. Maybe switch to a different SQL server.
Without re-writing a bunch of code then in all honesty the correct answer is
This is a non-issue
Zaproxy even states that it's possibly a SQL injection attack, meaning that it does not know! It never said "umm yeah we deleted tables by passing x-y-and-z to your query"
// if this is legal and returns results
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3"));
// then why is it an issue for this
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3-2"));
// to be interpreted as
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3"));
You are parameterizing your queries so Zaproxy is off it's rocker.
Here's what I wound up doing:
First, I suspect that my expectations were a little unreasonable. I was expecting that if I used parameterized queries, I wouldn't need to sanitize my inputs. This is clearly not the case. While parameterized queries eliminate some of the most pernicious SQL injection attacks, this example shows that there is still a need to examine your inputs and make sure you're getting the right stuff from the user.
So, with that said... I decided to write some code to make checking ID values easier. I added the following trait to my application:
trait IDValidationTrait
{
/**
* Check the ID value to see if it's valid
*
* This is an abstract function because it will be defined differently
* for different models. Some models have IDs which are strings,
* others have integer IDs
*/
abstract public static function isValidID($id);
/**
* Check the ID value & fail (throw an exception) if it is not valid
*/
public static function validIDOrFail($id)
{
...
}
/**
* Find a model only if the ID matches EXACTLY
*/
public static function findExactID($id)
{
...
}
/**
* Find a model only if the ID matches EXACTLY or throw an exception
*/
public static function findExactIDOrFail($id)
{
...
}
}
Thus, whenever I would normally use the find() method on my model class to retrieve a model, instead I use either findExactID() or findExactIDOrFail(), depending on how I want to handle the error.
Thank you to everyone who commented - you helped me to focus my thinking and to understand better what was going on.

Use MySQL Stored Procedure to check for malicious code

I'm attempting to write a stored procedure in MySql that will take a single parameter, and then check that parameter for any text that contains 'DROP','INSERT','UPDATE','TRUNCATE', etc., pretty much anything that isn't a SELECT statement. I know it's not ideal, but, unfortunately the SELECT statement is being built client-side, and to prevent some kind of man-in-the-middle change, it's just an added level of security from the server.
I've tried doing several means of accomplishing it, but, it's not working for me. I've come up with things similar to this:
CREATE PROCEDURE `myDatabase`.`execQuery` (in INC_query text)
BEGIN
#check to see if the incoming SQL query contains INSERT, DROP, TRUNCATE,
#or UPDATE as an added measure of security
IF (
SELECT LOCATE(LOWER(INC_query),'drop') OR
SELECT LOCATE(LOWER(INC_query),'truncate') OR
SELECT LOCATE(LOWER(INC_query),'insert') OR
SELECT LOCATE(LOWER(INC_query),'update') OR
SELECT LOCATE(LOWER(INC_query),'set')
>= 1)
THEN
SET #command = INC_query;
PREPARE statement FROM #command;
EXECUTE statement;
ELSE
SELECT * FROM database.otherTable; #just a generic output to know the procedure executed correctly, and will be removed later. Purely testing.
END IF;
END
Even if it contains any of my "filterable" words, it still executes the query. Any help would be appreciated, or if there's a better way of doing something, I'm all ears.
What if you have a column called updated_at or settings? You can't possibly expect this to work as you intend. This kind of technique is the reason there's so many references to clbuttic on the web.
You're really going to make a mess of things if you go down this road.
The only reasonable way to approach this is to send in the parameters for the kind of query you want to construct, then construct the query in your application using a vetted white list of allowed terms. An example expressed in JSON:
{
"select" : {
"table" : "users",
"columns" : [ "id", "name", "DROP TABLE users", "SUM(date)", "password_hash" ],
"joins" : {
"orders" : [ "users.id", "orders.user_id" ]
}
}
You just need to create a query constructor that emits this kind of thing, and another that converts it back into a valid query. You might want to list only particular columns for querying, as certain columns might be secret or internal only, not to be disclosed, like password_hash in this example.
You could also allow for patterns like (SUM|MIN|MAX|AVG)\((\w+)\) to capture specific grouping operations or JOIN conditions. It depends on how far you want to take this.

SQL select everything with arbitrary IN clause

This will sound silly, but trust me it is for a good (i.e. over-engineered) cause.
Is it possible to write a SQL query using an IN clause which selects everything in that table without knowing anything about the table? Keep in mind this would mean you can't use a subquery that references the table.
In other words I would like to find a statement to replace "SOMETHING" in the following query:
SELECT * FROM table_a WHERE table_a.id IN (SOMETHING)
so that the results are identical to:
SELECT * FROM table_a
by doing nothing beyond changing the value of "SOMETHING"
To satisfy the curious I'll share the reason for the question.
1) I have a FactoryObject abstract class which grants all models that extend it some glorious factory method magic using two template methods: getData() and load()
2) Models must implement the template methods. getData is a static method that accepts ID constraints, pulls rows from the database, and returns a set of associative arrays. load is not static, accepts an associative array, and populates the object based on that array.
3) The non-abstract part of FactoryObject implements a getObject() and a getObjects() method. These call getData, create objects, and loads() the array responses from getData to create and return populated objects.
getObjects() requires ID constraints as an input, either in the form of a list or in the form of a subquery, which are then passed to getData(). I wanted to make it possible to pass in no ID constraints to get all objects.
The problem is that only the models know about their tables. getObjects() is implemented at a higher level and so it doesn't know what to pass getData(), unless there was a universal "return everything" clause for IN.
There are other solutions. I can modify the API to require getData to accept a special parameter and return everything, or I can implement a static getAll[ModelName]s() method at the model level which calls:
static function getAllModelObjects() {
return getObjects("select [model].id from [model]");
}
This is reasonable and may fit the architecture anyway, but I was curious so I thought I would ask!
Works on SQL Server:
SELECT * FROM table_a WHERE table_a.id IN (table_a.id)
Okay, I hate saying no so I had to come up with another solution for you.
Since mysql is opensource you can get the source and incorporate a new feature that understands the infinity symbol. Then you just need to get the mysql community to buy into the usefulness of this feature (steer the conversation away from security as much as possible in your attempts to do so), and then get your company to upgrade their dbms to the new version once this feature has been implemented.
Problem solved.
The answer is simple. The workaround is to add some criteria like these:
# to query on a number column
AND (-1 in (-1) OR sample_table.sample_column in (-1))
# or to query on a string column
AND ('%' in ('%') OR sample_table.sample_column in ('%'))
Therefore, in your example, two following queries should return the same result as soon as you pass -1 as the parameter value.
SELECT * FROM table_a;
SELECT * FROM table_a WHERE (-1 in (-1) OR table_a.id in (-1));
And whenever you want to filter something out, you can pass it as a parameter. For example, in the following query, the records with id of 1, 2 and 6 are filtered.
SELECT * FROM table_a WHERE (-1 in (1, 2, 6) OR table_a.id in (1, 2, 6));
In this case, we have a default value like -1 or % and we have a parameter that can be anything. If the parameter is the default value, nothing is filtered.
I suggest % character as the default value if you are querying over a text column or -1 if you are querying over the PK of the table. But it totally depends to you to substitute % or -1 with any reserved character or number that you decide on.
similiar to #brandonmoore:
select * from table_a where table_a.id not in ('0')
How about:
select * from table_a where table_a.id not ine ('somevaluethatwouldneverpossiblyexistintable_a.id')
EDIT:
As much as I would like to continue thinking of a way to solve your problem, I know there isn't a way to so I figure I'll go ahead and be the first person to tell you so I can at least get credit for the answer. It's truly a bittersweet victory though :/
If you provide more info though maybe I or someone else can help you think of another workaround.