Laravel model scope having sum larger than x - mysql

I have the following two models:
ModelA
- id
- population
ModelB
- id
- model_a_id
- population
Now, I want to define a scope "not_full". The aim of this scope is to only return the ModelA instances where the population is larger than the population in the related ModelB's. This brings me to the following scope function in ModelA:
public function scopeTesting(Builder $query): Builder
{
return $query
->join('model_b', 'model_a.id', '=', 'model_b.model_a_id')
->groupBy('model_a.id')
->havingRaw('SUM(model_b.population) <= model_a.population');
}
As this would become a MySQL query, we have to deal with only_full_group_by of MySQL 8. This then errors our query as eloquent will now use SELECT * in the query, as MySQL doesn't know which value model_b.id it should use. However, we don't care about model_b.id. We only care about the values in ModelA. This would normally be solved with changing the select to SELECT model_a.* but this will then probably bring some bugs further down the read as every query with this scope will now only be able to return the values of ModelA. Is there any way to get around this issue in eloquent?

Related

Very slow query when using Eloquent whereNotIn

I have a Question model from very large table of questions (600,000 records), with relation to Customer,Answer and Product models. Relations are irrelevant to this question but I mentioned them to clarify I need to use Eloquent. When I call Question::with('customer')->get(); it runs smoothly and fast.
But there is another table in which I have question_ids of all questions which should not be shown (for specific reasons).
I tried this code:
// omitted product ids, about 95,000 records
$question_ids_in_product = DB::table('question_to_product')
->pluck('product_id')->all();
$questions = Question::with('customer')
->whereNotIn('product_id', $question_ids_in_product)
->paginate($perPage)->get();
It takes so much time and shows this error:
SQLSTATE[HY000]: General error: 1390 Prepared statement contains too many placeholders
and sometimes Fatal error: Maximum execution time of 30 seconds exceeded
When I run it with plain sql query:
SELECT * FROM questions LEFT JOIN customers USING (customer_id)
WHERE question_id NOT IN (SELECT question_id FROM question_to_product)
it takes only 80 milliseconds
How can I use Eloquent in this situation?
You can use whereRaw method:
$questions = Question::with('customer')
->whereRaw('question_id NOT IN (SELECT question_id FROM question_to_product)')
->paginate($perPage)->get();
But ideally as you found out this is a better sollution:
Question::with('customer')->whereNotIn('question_id',
function ($query) {
$query->from('question_to_product') ->select('question_id');
}
);
Difference?
When you will migrate your database to another database the whereRaw might not work as you put in raw statements.
That is why we have Eloquent ORM which handles these transitions and build the appropriate queries to run.
No performance impact because the SQL is the same (for MySQL)
P.S: For better debugging try installing this debug bar
refer from https://laravel.com/docs/5.4/queries#where-clauses
$users = DB::table('questions')
->leftJoin('customers', 'curtomer.id', '=', 'question.user_id')
->whereNotIn('question_id', [1, 2, 3])
->get();
It'll work 100%. When you query getting longer to response like more than 30 seconds when you are using whereNotIn. Use this Query Syntax.
$order = Order::on($databaseCredentials['database'])
->whereRaw('orders_id NOT IN (SELECT orders_id FROM orders)')
->skip($page)
->take(10)
->orderBy('orders.updated_at', 'ASC')
->paginate(10);

how to sort varchar column containing numeric values with linq lambdas to Entity

I am using linq lambdas to query the MySql (Note MySql not Sql) with Entity Framwork in MVC. Now i have one table product one of column this table is price with datatype "VARCHAR" (Accept i can't change type to INT as it can hold values like "N/A",etc).
I want to sort price column numerically with linq Lambdas.I have tried bellow.I am using Model values to filter query.
var query = ent.Product.Where(b => b.cp == Model.CodePostal);
if (Model.order_by_flg == 2)
{
query = query.OrderByDescending(a => a.price.PadLeft(10, '0'));
}
But it will not work and give me bellow error.
LINQ to Entities does not recognize the method 'System.String
PadLeft(Int32, Char)' method, and this method cannot be translated
into a store expression.
As it cant convert to Sql statement by Entity Framwork.
I also tried bellow.
var query = ent.Product.Where(b => b.cp == Model.CodePostal);
if (Model.order_by_flg == 2)
{
query = query.OrderByDescending(a => a.price.Length).ThenBy(a => a.price);
}
But i can't do this because it works for List but i cant first make list and then do this as i am using linq Skip() and Take() so first i have to sort it.
So how can i sort price column of type "VARCHAR" in Linq lambda?
EDIT
In table it is :
59,59,400,185,34
Wnen i use OrderBy.ThenBy it gives
34,59,59,106,185,400
It looks right as sorting ascending But when i use OrderByDescending.ThenBy it gives
106,185,400,34,59,59
So i can't use this.
NOTE: Please give reasons before Downvote so i can improve my question...
You can simulate fixed PadLeft in LINQ to Entities with the canonical function DbFunctions.Right like this
instead of this
a.price.PadLeft(10, '0')
use this
DbFunctions.Right("000000000" + a.price, 10)
I haven't tested it with MySql provider, but canonical functions defined in the DbFunctions are supposed to be supported by any provider.
It looks right as sorting ascending But when i use OrderByDescending.ThenBy it gives
106,185,400,34,59,59
That's because you're ordering by length descending, then value ascending.
What you need is simply to sort both by descending;
query = query.OrderByDescending(a => a.price.Length)
.ThenByDescending(a => a.price);
This should be faster than prepending numbers to sort, since you don't need to do multiple calculations per row but can instead sort by existing data.

Why does MySQL permit non-exact matches in SELECT queries?

Here's the story. I'm testing doing some security testing (using zaproxy) of a Laravel (PHP framework) application running with a MySQL database as the primary store for data.
Zaproxy is reporting a possible SQL injection for a POST request URL with the following payload:
id[]=3-2&enabled[]=on
Basically, it's an AJAX request to turn on/turn off a particular feature in a list. Zaproxy is fuzzing the request: where the id value is 3-2, there should be an integer - the id of the item to update.
The problem is that this request is working. It should fail, but the code is actually updating the item where id = 3.
I'm doing things the way I'm supposed to: the model is retrieved using Eloquent's Model::find($id) method, passing in the id value from the request (which, after a bit of investigation, was determined to be the string "3-2"). AFAIK, the Eloquent library should be executing the query by binding the ID value to a parameter.
I tried executing the query using Laravel's DB class with the following code:
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3-2"));
and got the row for id = 3.
Then I tried executing the following query against my MySQL database:
SELECT * FROM table WHERE id='3-2';
and it did retrieve the row where id = 3. I also tried it with another value: "3abc". It looks like any value prefixed with a number will retrieve a row.
So ultimately, this appears to be a problem with MySQL. As far as I'm concerned, if I ask for a row where id = '3-2' and there is no row with that exact ID value, then I want it to return an empty set of results.
I have two questions:
Is there a way to change this behaviour? It appears to be at the level of the database server, so is there anything in the database server configuration to prevent this kind of thing?
This looks like a serious security issue to me. Zaproxy is able to inject some arbitrary value and make changes to my database. Admittedly, this is a fairly minor issue for my application, and the (probably) only values that would work will be values prefixed with a number, but still...
SELECT * FROM table WHERE id= ? AND ? REGEXP "^[0-9]$";
This will be faster than what I suggested in the comments above.
Edit: Ah, I see you can't change the query. Then it is confirmed, you must sanitize the inputs in code. Another very poor and dirty option, if you are in an odd situation where you can't change query but can change database, is to change the id field to [VAR]CHAR.
I believe this is due to MySQL automatically converting your strings into numbers when comparing against a numeric data type.
https://dev.mysql.com/doc/refman/5.1/en/type-conversion.html
mysql> SELECT 1 > '6x';
-> 0
mysql> SELECT 7 > '6x';
-> 1
mysql> SELECT 0 > 'x6';
-> 0
mysql> SELECT 0 = 'x6';
-> 1
You want to really just put armor around MySQL to prevent such a string from being compared. Maybe switch to a different SQL server.
Without re-writing a bunch of code then in all honesty the correct answer is
This is a non-issue
Zaproxy even states that it's possibly a SQL injection attack, meaning that it does not know! It never said "umm yeah we deleted tables by passing x-y-and-z to your query"
// if this is legal and returns results
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3"));
// then why is it an issue for this
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3-2"));
// to be interpreted as
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3"));
You are parameterizing your queries so Zaproxy is off it's rocker.
Here's what I wound up doing:
First, I suspect that my expectations were a little unreasonable. I was expecting that if I used parameterized queries, I wouldn't need to sanitize my inputs. This is clearly not the case. While parameterized queries eliminate some of the most pernicious SQL injection attacks, this example shows that there is still a need to examine your inputs and make sure you're getting the right stuff from the user.
So, with that said... I decided to write some code to make checking ID values easier. I added the following trait to my application:
trait IDValidationTrait
{
/**
* Check the ID value to see if it's valid
*
* This is an abstract function because it will be defined differently
* for different models. Some models have IDs which are strings,
* others have integer IDs
*/
abstract public static function isValidID($id);
/**
* Check the ID value & fail (throw an exception) if it is not valid
*/
public static function validIDOrFail($id)
{
...
}
/**
* Find a model only if the ID matches EXACTLY
*/
public static function findExactID($id)
{
...
}
/**
* Find a model only if the ID matches EXACTLY or throw an exception
*/
public static function findExactIDOrFail($id)
{
...
}
}
Thus, whenever I would normally use the find() method on my model class to retrieve a model, instead I use either findExactID() or findExactIDOrFail(), depending on how I want to handle the error.
Thank you to everyone who commented - you helped me to focus my thinking and to understand better what was going on.

Eloquent count distinct returns wrong totals

i'm having an issue with how eloquent is formulation a query that i have no access to. When doing something like
$model->where('something')
->distinct()
->paginate();
eloquent runs a query to get the total count, and the query looks something like
select count(*) as aggregate from .....
The problem is that if you use distinct in the query, you want something like
select count(distinct id) as aggregate from .....
to get the correct total. Eloquent is not doing that though, thus returning wrong totals. The only way to get the distinct in count is to pass an argument through the query builder like so ->count('id') in which case it will add it. Problem is that this query is auto-generated and i have no control over it.
Is there a way to trick it into adding the distinct on the count query?
P.S Digging deep into the builders code we find an IF statement asking for a field on the count() method in order to add the distinct property to the count. Illuminate\Database\Query\Grammars\BaseGrammar#compileAggregate
if ($query->distinct && $column !== '*')
{
$column = 'distinct '.$column;
}
return 'select '.$aggregate['function'].'('.$column.') as aggregate';
P.S.1 I know that in SQL you could do a group by, but since i'm eager loading stuff it is not a good idea cause it will add a IN (number of id's found) to each of the other queries which slows things down significantly.
I faced the exact same problem and found two solutions:
The bad one:
$results = $model->groupBy('foo.id')->paginate();
It works but it will costs too much memory (and time) if you have a high number of rows (it was my case).
The better one:
$ids = $model->distinct()->pluck('foo.id');
$results = $query = $model->whereIn('foo.id', $ids)->paginate();
I tried this with 100k results, and had no problem at all.
This seems to be a wider problem, discussed here:
https://github.com/laravel/framework/issues/3191
https://github.com/laravel/framework/pull/4088
Untill the fixes are shipped with one of the next Laravel releases, you can always try using the raw expressions, like below (I didnt test it, but should work)
$stuff = $model->select(DB::raw('distinct id as did'))
->where('whatever','=','whateverelse')
->paginate();
Reference: http://laravel.com/docs/queries#raw-expressions
$model->where('something')->distinct()->count('id')->paginate();

Django queryset count with extra select

I have a model with PointField for location coordinates. I have a MySQL function that calculates the distance between two points called dist. I use extra() "select" to calculate distance for each returned object in the queryset. I also use extra() "where" to filter those objects that are within a specific range. Like this
query = queryset.extra(
select={
"distance":"dist(geomfromtext('%s'),geomfromtext('%s'))"%(loc1, loc2)
},
where=["1 having `distance` <= %s"%(km)]
) #simplified example
This works fine for getting and reading the results, except when I try counting the resultset I get the error that 'distance' is not a field. After exploring a bit further, it seems that count ignores the "select" from extra and just uses "where". While the full SQL query looks like this:
SELECT (dist(geomfromtext('POINT (-4.6858300000000003 36.5154300000000021)'),geomfromtext('POINT (-4.8858300000000003 36.5154300000000021)'))) AS `distance`, `testmodel`.`id`, `testmodel`.`name`, `testmodel`.`email`, (...) FROM `testmodel` WHERE 1 having `distance` <= 50.0
The count query is much shorter and doesn't have the dist selection part:
SELECT COUNT( `testmodel`.`id`) FROM `testmodel` WHERE 1 having `distance` <= 50.0
Logically, MySQL gives an error because "distance" is undefined. Is there a way to tell Django it has to include the extra select for the count?
Thanks for any ideas!
You could use a raw query if you are not plannig to use any other database system.
params = {'point1':wktpoint1, 'point2':wktpoint2}
query = """
SELECT
dist(%(point1)s, %(point2)s)
FROM
testmodel
;"""
query_set = self.raw(query, params)
Also, if you need more GIS support, you should evaluate PostgreSQL+PostGIS (If you don't like to reinvent the wheel, you should not make your own dist function)
Django offers GIS support through GeoDjango. There you got functions like distance. You should check support here
In order to use GeoDjango you need to add a field on yout model, to tell them to use the GeoManager, Then you can start doing geoqueries, and you should have no problems with count.
with mysql you cando something like this using geodjango
### models.py
from django.contrib.gis.db import models
class YourModel(models.Model):
your_geo_field=models.PolygonField()
#your_geo_field=models.PointField()
#your_geo_field=models.GeometryField()
objects = models.GeoManager()
### your code
from django.contrib.gis.geos import *
from django.contrib.gis.measure import D
a_geom=fromstr('POINT(-96.876369 29.905320)', srid=4326)
distance=5
YoourModel.objects.filter(your_geo_field__distance_lt=(a_geom, D(m=distance))).count()
you can see better examples here and the reference here