I am getting an IQueryable from my database and then I am getting another IQueryable from that first one -that is, I am filtering the first one.
My question is -does this affect performance? How many times will the code call the database? Thank you.
Code:
DataContext _dc = new DataContext();
IQueryable offers =
(from o in _dc.Offers
select o);
IQueryable filtered =
(from o in offers
select new { ... } );
return View(filtered);
The code you have given will never call the database since you're never using the results of the query in any code.
IQueryable collections aren't filled until you iterate through them...and you're not iterating through anything in that code sample (ah, the beauty of lazy initialization).
That also means that each of those statements will be executed as its own query against the database which results in no performance cost over doing two completely independent queries.
SO is not a replacement for developer tools. There are many good free tools able to tell you exactly what this code translates into and how it works. Use Reflector on this method and look at what code is generated and reason for yourself what is going on from there.
Related
I have a project that requires we allow users to create custom columns, enter custom values, and use these custom values to execute user defined functions.
Similar Functionality In Google Data Studio
We have exhausted all implementation strategies we can think of (executing formulas on the front end, in isolated execution environments, etc.).
Short of writing our own interpreter, the only implementation we could find that meets the performance, functionality, and scalability requirements is to execute these functions directly within MySQL. So basically taking the expressions that have been entered by the user, and dynamically rolling up a query that computes results server side in MySQL.
This obviously opens a can of worms security wise.
Quick aside: I expect to get the "you shouldn't do it that way" response. Trust me, I hate that this is the best solution we can find. The resources online describing similar problems is remarkably scarce, so if there are any suggestions for where to find information on analogous problems/solutions/implementations, I would greatly appreciate it.
With that said, assuming that we don't have alternatives, my question is: How do we go about doing this safely?
We have a few current safeguards set up:
Executing the user defined expressions against a tightly controlled subquery that limits the "inner context" that the dynamic portion of the query can pull from.
Blacklisting certain phrases the should never be used (SELECT, INSERT, UNION, etc.). This introduces issues, because a user should be able to enter something like: CASE WHEN {{var}} = "union pacific railroad" THEN... but that is a tradeoff we are willing to make.
Limiting the access of the MySQL connection making the query to only have access to the tables/functionality needed for the feature.
This gets us pretty far. But I'm still not comfortable with it. One additional option that I couldn't find any info online about was using the query execution plan as a means of detecting if the query is going outside of its bounds.
So prior to actually executing the query/getting the results, you would wrap it within an EXPLAIN statement to see what the dynamic query was doing. From the results of the EXPLAIN query, you should able to detect any operations (subqueries, key references, UNIONs, etc.) that fall outside of the bounds of what the query is allowed to do.
Is this a useful validation method? It seems to me that this would be a powerful tool for protecting against a suite of SQL injections, but I couldn't seem to find any information online.
Thanks in advance!
(from Comment)
Some Examples showing the actual autogenerated queries being used. There are both visual and list examples showing the query execution plan for both malicious and valid custom functions.
GRANT only SELECT on the table(s) that they are allowed to manipulate. This allows arbitrarily complex SELECT queries to be run. (The one flaw: Such queries may run for a long time and/or take a lot of resources. MariaDB has more facilities for preventing run-away selects.)
Provide limited "write" access via Stored Routines with expanded privileges, but do not pass arbitrary values into them. See SQL SECURITY: DEFINER has the privileges of the person creating the routine. (As opposed to INVOKER is limited to SELECT on the tables mentioned above.)
Another technique that may or may not be useful is creating VIEWs with select privileges. This, for example, can let the user see most information about employees while hiding the salaries.
Related to that is the ability to GRANT different permissions on different columns, even in the same table.
(I have implemented a similar web app, and released it to everyone in the company. And I could 'sleep at night'.)
I don't see subqueries and Unions as issues. I don't see the utility of EXPLAIN other than to provide more info in case the user is a programmer trying out queries.
EXPLAIN can help in discovering long-running queries, but it is imperfect. Ditto for LIMIT.
More
I think "UDF" is either "normalization" or "EAV"; it is hard to tell which. Please provide SHOW CREATE TABLE.
This is inefficient because it builds a temp table before removing the 'NULL' items:
FROM ( SELECT ...
FROM ...
LEFT JOIN ...
) AS context
WHERE ... IS NULL
This is better because it can do the filtering sooner:
FROM ( SELECT ...
FROM ...
LEFT JOIN ...
WHERE ... IS NULL
) AS context
I wanted to share a solution I found for anyone who comes across this in the future.
To prevent someone from entering some malicious SQL injection in a "custom expression" we decided to preprocess and analyze the SQL prior to sending it to the MySQL database.
Our server is running NodeJS, so we used a parsing library to construct an abstract syntax tree from their custom SQL. From here we can traverse the tree and identify any operations that shouldn't be taking place.
The mock code (it won't run in this example) would look something like:
const valid_types = [ "case", "when", "else", "column_ref", "binary_expr", "single_quote_string", "number"];
const valid_tables = [ "context" ];
// Create a mock sql expressions and parse the AST
var exp = YOUR_CUSTOM_EXPRESSION;
var ast = parser.astify(exp);
// Check for attempted multi-statement injections
if(Array.isArray(ast) && ast.length > 1){
this.error = throw Error("Multiple statements detected");
}
// Recursively check the AST for unallowed operations
this.recursive_ast_check([], "columns", ast.columns);
function recursive_ast_check(path, p_key, ast_node){
// If parent key is the "type" of operation, check it against allowed values
if(p_key === "type") {
if(validator.valid_types.indexOf(ast_node) == -1){
throw Error("Invalid type '" + ast_node + "' found at following path: " + JSON.stringify(path));
}
return;
}
// If parent type is table, then the value should always be "context"
if(p_key === "table") {
if(validator.valid_tables.indexOf(ast_node) == -1){
throw Error("Invalid table reference '" + ast_node + "' found at following path: " + JSON.stringify(path));
}
return;
}
// Ignore null or empty nodes
if(!ast_node || ast_node==null) { return; }
// Recursively search array values down the chain
if(Array.isArray(ast_node)){
for(var i = 0; i<ast_node.length; i++) {
this.recursive_ast_check([...path, p_key], i, ast_node[i]);
}
return;
}
// Recursively search object keys down the chain
if(typeof ast_node === 'object'){
for(let key of Object.keys(ast_node)){
this.recursive_ast_check([...path, p_key], key, ast_node[key]);
}
}
}
This is just a mockup adapted from our implementation, but hopefully it will provide some guidance. Should also note, it is best to also implement all of the strategies discussed above as well. Many safeguards are better than just one.
I have the following query
#initial_matches = Listing.find_by_sql(["SELECT * FROM listings WHERE industry = ?", current_user.industry])
Is there a way I can run another SQL query on the selection from the above query using a each do? I want to run geokit calculations to eliminate certain listings that are outside of a specified distance...
Your question is slightly confusing. Do you want to use each..do (ruby) to do the filtering. Or do you want to use a sql query. Here is how you can let the ruby process do the filtering
refined list = #initial_matches.map { |listing|
listing.out_of_bounds? ? nil : listing
}.comact
If you wanted to use sql you could simply add additional sql (maybe a sub-select) it into your Listing.find_by_sql call.
If you want to do as you say in your comment.
WHERE location1.distance_from(location2, :units=>:miles)
You are mixing ruby (location1.distance_from(location2, :units=>:miles)) and sql (WHERE X > 50). This is difficult, but not impossible.
However, if you have to do the distance calculation in ruby already, why not do the filtering there as well. So in the spirit of my first example.
listing2 = some_location_to_filter_by
#refined_list = #initial_matches.map { |listing|
listing.distance_from(listing2) > 50 ? nil : listing
}.compact
This will iterate over all listings, keeping only those that are further than 50 from some predetermined listing.
EDIT: If this logic is done in the controller you need to assign to #refined_list instead of refined_list since only controller instance variables (as opposed to local ones) are accessible to the view.
In short, no. This is because after the initial query, you are not left with a relational table or view, you are left with an array of activerecord objects. So any processing to be done after the initial query has to be in the format of ruby and activerecord, not sql.
When you query an EntitySet property on a model object in Linq-to-SQL, it returns all rows from the entityset and does any further querying client-side.
This is confirmed in a few places online and I've observed the behavior myself. The EntitySet does not implement IQueryable.
What I've had to do is convert code like:
var myChild = ... ;
// Where clause performed client-side.
var query = myChild.Parents().Where(...) ;
to:
var myChild = ... ;
// Where clause performed in DB and only minimal set of rows returned.
var query = MyDataContext.Parents().Where(p => p.Child() == myChild) ;
Does anyone know a better solution?
A secondary question: is this fixed in the Entity Framework?
An EntitySet is just a collection of entities. It implements IEnumerable, not IQueryable. The Active Record pattern specifies that entities be directly responsible for their own persistence. OR mapper entities don't have any direct knowledge of the persistence layer. OR Mappers place this responsibility, along with Unit Of Work, and Identity Map responsibilities into the Data Context. So if you need to query the data source, you gotta use the context (or a Table object). To change this would bend the patterns in use.
I had a similar problem: How can I make this SelectMany use a join. After messing with LINQPad for a good amount of time I found a decent workaround. The key is to push the EntitySet you are looking at inside a SelectMany, Select, Where, etc. Once it's inside that it becomes an Expression and then the provider can turn it into a proper query.
Using your example try this:
var query = from c in Children
where c == myChild
from p in c.Parents
where p.Age > 35
select p;
I'm not able to 100% verify this query as I don't know the rest of your model. But the first two lines of the query cause the rest of it to become an Expression that the provider turns into a join. This does work with my own example that is on the question linked to above.
I'm working my way through the MVC Storefront code and trying to follow the path of a repository, a service and a model that is a poco, outside of the dbml/data context. It's pretty easy to follow actually, until I started writing tests and things failed in a way I just don't understand.
In my case, the primary key is a uniqueidentifier instead of an int field. The repository returns an IQueryable:
public IQueryable<Restaurant> All()
{
return from r in _context.Restaurants select new Restaurant(r.Id)
{
Name = r.Name
};
}
In this case, Restaurant is a Models.Restaurant of course, and not _context.Restaurants.Restaurant.
Filtering in the service class (or in repository unit tests) against All(), this works just as expected:
var results = Repository.All().Where(r => r.Name == "BW<3").ToList();
This works just fine, has one Model.Restaurant. Now, if I try the same things with the pkid:
var results = Repository.All().Where(r => r.Id == new Guid("088ec7f4-63e8-4e3a-902f-fc6240df0a4b")).ToList();
If fails with:
The member 'BurningPlate.Models.Restaurant.Id' has no supported translation to SQL.
If seen some similiar posts where people say it's because r => r.Id is [Model.Restaurants] is a class level the linq2sql layer isn't aware of. To me, that means the the first version shouldn't work either. Of course, if my pk is an int, it works just fine.
What's really going on here? Lord knows, it's not very intuitive to have one work and one not work. What am I misunderstanding?
I think the problem here is due to using a constructor overload, and expecting the query to fill it in. When you do a projection like this, you have to put all the things you want to be in the projection query in the actual projection itself. Otherwise Linq won't include that in the SQL query.
So, rewrite your bits like so:
return from r in _context.Restaurants select new Restaurant()
{
Id=r.Id,
Name = r.Name
};
This should fix it up.
Not having actually typed this code out, have you tried
var results = Repository.All().Where(r => r.Id.Equals(new Guid("088ec7f4-63e8-4e3a-902f-fc6240df0a4b")).ToList()
Ninja
this probably has to do with fact that you trying to instantiate the guid in the query, and I think that LINQ to SQL is trying to convert that to actual SQL code before the object is created.
Try instantiating before the query and not on the query.
Now that LINQ to SQL is a little more mature, I'd like to know of any techniques people are using to create an n-tiered solution using the technology, because it does not seem that obvious to me.
LINQ to SQL doesn't really have a n-tier story that I've seen, since the objects that it creates are created in the class with the rest of it, you don't really have an assembly that you can nicely reference through something like Web Services, etc.
The only way I'd really consider it is using the datacontext to fetch data, then fill an intermediary data model, passing that through, and referencing it on both sides, and using that in your client side - then passing them back and pushing the data back into a new Datacontext or intellgently updating rows after you refetch them.
That's if I'm understanding what you're trying to get at :\
I asked ScottGu the same question on his blog when I first started looking at it - but I haven't seen a single scenario or app in the wild that uses LINQ to SQL in this way. Websites like Rob Connery's Storefront are closer to the provider.
Hm, Rockford Lhotka sad, that LINQ to SQL is wonderful technology for fetching data from database. He suggests that afterwards they'll must to be bind to "reach domain objects" (aka. CSLA objetcs).
Seriously speaking, LINQ to SQL had it's support for n-tier architecture see DataContext.Update method.
You might want to look into the ADO .Net Entity Framework as an alternative to LINQ to SQL, although it does support LINQ as well. I believe LINQ to SQL is designed to be fairly lightweight and simple, whereas the Entity Framework is more heavy duty and probably more suitable in large Enterprise applications.
OK, I am going to give myself one possible solution.
Inserts/Updates were never an issue; you can wrap the business logic in a Save/Update method; e.g.
public class EmployeesDAL
{
...
SaveEmployee(Employee employee)
{
//data formatting
employee.FirstName = employee.FirstName.Trim();
employee.LastName = employee.LastName.Trim();
//business rules
if(employee.FirstName.Length > 0 && employee.LastName.Length > 0)
{
MyCompanyContext context = new MyCompanyContext();
//insert
if(employee.empid == 0)
context.Employees.InsertOnSubmit(employee);
else
{
//update goes here
}
context.SubmitChanges();
}
else
throw new BusinessRuleException("Employees must have first and last names");
}
}
For fetching data, or at least the fetching of data that is coming from more than one table you can use stored procedures or views because the results will not be anonymous so you can return them from an outside method. For instance, using a stored proc:
public ISingleResult<GetEmployeesAndManagersResult> LoadEmployeesAndManagers()
{
MyCompanyContext context = new MyCompanyContext();
var emps = context.GetEmployeesAndManagers();
return emps;
}
Seriously speaking, LINQ to SQL had it's support for n-tier architecture see DataContext.Update method
Some of what I've read suggests that the business logic wraps the DataContext - in other words you wrap the update in the way that you suggest.
The way i traditionally write business objects i usually encapsulate the "Load methods" in the BO as well; so I might have a method named LoadEmployeesAndManagers that returns a list of employees and their immediate managers (this is a contrived example) . Maybe its just me, but in my front end I'd rather see e.LoadEmployeesAndManagers() than some long LINQ statement.
Anyway, using LINQ it would probably look something like this (not checked for syntax correctness):
var emps = from e in Employees
join m in Employees
on e.ManagerEmpID equals m.EmpID
select new
{ e,
m.FullName
};
Now if I understand things correctly, if I put this in say a class library and call it from my front end, the only way I can return this is as an IEnumerable, so I lose my strong typed goodness. The only way I'd be able to return a strongly typed object would be to create my own Employees class (plus a string field for manager name) and fill it from the results of my LINQ to SQL statement and then return that. But this seems counter intuitive... what exactly did LINQ to SQL buy me if I have to do all that?
I think that I might be looking at things the wrong way; any enlightenment would be appreciated.
"the only way I can return this is as an IEnumerable, so I lose my strong typed goodness"
that is incorrect. In fact your query is strongly typed, it is just an anonymous type. I think the query you want is more like:
var emps = from e in Employees
join m in Employees
on e.ManagerEmpID equals m.EmpID
select new Employee
{ e,
m.FullName
};
Which will return IEnumerable.
Here is an article I wrote on the topic.
Linq-to-sql is an ORM. It does not affect the way that you design an N-tiered application. You use it the same way you would use any other ORM.
#liammclennan
Which will return IEnumerable. ... Linq-to-sql is an ORM. It does not affect the way that you design an N-tiered application. You use it the same way you would use any other ORM.
Then I guess I am still confused. Yes, Linq-to-Sql is an ORM; but as far as I can tell I am still littering my front end code with inline sql type statements (linq, not sql.... but still I feel that this should be abstracted away from the front end).
Suppose I wrap the LINQ statement we've been using as an example in a method. As far as I can tell, the only way I can return it is this way:
public class EmployeesDAL
{
public IEnumerable LoadEmployeesAndManagers()
{
MyCompanyContext context = new MyCompanyContext();
var emps = from e in context.Employees
join m in context.Employees
on e.ManagerEmpID equals m.EmpID
select new
{ e,
m.FullName
};
return emps;
}
}
From my front end code I would do something like this:
EmployeesDAL dal = new EmployeesDAL;
var emps = dal.LoadEmployeesAndManagers();
This of course returns an IEnumerable; but I cannot use this like any other ORM like you say (unless of course I misunderstand), because I cannot do this (again, this is a contrived example):
txtEmployeeName.Text = emps[0].FullName
This is what I meant by "I lose strong typed goodness." I think that I am starting to agree with Crucible; that LINQ-to-SQL was not designed to be used in this way. Again, if I am not seeing things correctly, someone show me the way :)