Reindexing Magento though SQL - mysql

is there a way to call the reindexing that you can fire off in magento's backend though an SQL statement?
i have a bunch of scripts which add products to magento and we need to reindex after it, we have a scheduled job that runs these scripts and i want to do the reindex after their done so that way it will always reindex after the scripts are done regardless how long they take (sometimes they can take a couple of minutes, sometimes half and hour, depending on what data needs to be changed, inserted or deleted)
the task scheduler is on a Microsoft SQL Server and magento is on a MySQL server (we use a link server apparently)

No, there is not.
In Magento "re-indexing" means "run through a list of PHP classes and run their reindexAll methods". Indexing strategy varies between indexer types. Most require reading some sort of data, doing programatic calculations, and then inserting values into flat tables.
For example, the catalog/URL rewrite re-indexer is the class
app/code/core/Mage/Catalog/Model/Indexer/Url.php
(alias of catalog/indexer_url, PHP class of Mage_Catalog_Model_Indexer_Url)
Its reindxAll method contains
public function reindexAll()
{
/** #var $resourceModel Mage_Catalog_Model_Resource_Url */
$resourceModel = Mage::getResourceSingleton('catalog/url');
$resourceModel->beginTransaction();
try {
Mage::getSingleton('catalog/url')->refreshRewrites();
$resourceModel->commit();
} catch (Exception $e) {
$resourceModel->rollBack();
throw $e;
}
}
And the actual indexing is handled in the refreshRewrites method, which creates the needed Magento rewrites.

Related

Use MySQL Query Execution Plan for Detecting SQL Injections

I have a project that requires we allow users to create custom columns, enter custom values, and use these custom values to execute user defined functions.
Similar Functionality In Google Data Studio
We have exhausted all implementation strategies we can think of (executing formulas on the front end, in isolated execution environments, etc.).
Short of writing our own interpreter, the only implementation we could find that meets the performance, functionality, and scalability requirements is to execute these functions directly within MySQL. So basically taking the expressions that have been entered by the user, and dynamically rolling up a query that computes results server side in MySQL.
This obviously opens a can of worms security wise.
Quick aside: I expect to get the "you shouldn't do it that way" response. Trust me, I hate that this is the best solution we can find. The resources online describing similar problems is remarkably scarce, so if there are any suggestions for where to find information on analogous problems/solutions/implementations, I would greatly appreciate it.
With that said, assuming that we don't have alternatives, my question is: How do we go about doing this safely?
We have a few current safeguards set up:
Executing the user defined expressions against a tightly controlled subquery that limits the "inner context" that the dynamic portion of the query can pull from.
Blacklisting certain phrases the should never be used (SELECT, INSERT, UNION, etc.). This introduces issues, because a user should be able to enter something like: CASE WHEN {{var}} = "union pacific railroad" THEN... but that is a tradeoff we are willing to make.
Limiting the access of the MySQL connection making the query to only have access to the tables/functionality needed for the feature.
This gets us pretty far. But I'm still not comfortable with it. One additional option that I couldn't find any info online about was using the query execution plan as a means of detecting if the query is going outside of its bounds.
So prior to actually executing the query/getting the results, you would wrap it within an EXPLAIN statement to see what the dynamic query was doing. From the results of the EXPLAIN query, you should able to detect any operations (subqueries, key references, UNIONs, etc.) that fall outside of the bounds of what the query is allowed to do.
Is this a useful validation method? It seems to me that this would be a powerful tool for protecting against a suite of SQL injections, but I couldn't seem to find any information online.
Thanks in advance!
(from Comment)
Some Examples showing the actual autogenerated queries being used. There are both visual and list examples showing the query execution plan for both malicious and valid custom functions.
GRANT only SELECT on the table(s) that they are allowed to manipulate. This allows arbitrarily complex SELECT queries to be run. (The one flaw: Such queries may run for a long time and/or take a lot of resources. MariaDB has more facilities for preventing run-away selects.)
Provide limited "write" access via Stored Routines with expanded privileges, but do not pass arbitrary values into them. See SQL SECURITY: DEFINER has the privileges of the person creating the routine. (As opposed to INVOKER is limited to SELECT on the tables mentioned above.)
Another technique that may or may not be useful is creating VIEWs with select privileges. This, for example, can let the user see most information about employees while hiding the salaries.
Related to that is the ability to GRANT different permissions on different columns, even in the same table.
(I have implemented a similar web app, and released it to everyone in the company. And I could 'sleep at night'.)
I don't see subqueries and Unions as issues. I don't see the utility of EXPLAIN other than to provide more info in case the user is a programmer trying out queries.
EXPLAIN can help in discovering long-running queries, but it is imperfect. Ditto for LIMIT.
More
I think "UDF" is either "normalization" or "EAV"; it is hard to tell which. Please provide SHOW CREATE TABLE.
This is inefficient because it builds a temp table before removing the 'NULL' items:
FROM ( SELECT ...
FROM ...
LEFT JOIN ...
) AS context
WHERE ... IS NULL
This is better because it can do the filtering sooner:
FROM ( SELECT ...
FROM ...
LEFT JOIN ...
WHERE ... IS NULL
) AS context
I wanted to share a solution I found for anyone who comes across this in the future.
To prevent someone from entering some malicious SQL injection in a "custom expression" we decided to preprocess and analyze the SQL prior to sending it to the MySQL database.
Our server is running NodeJS, so we used a parsing library to construct an abstract syntax tree from their custom SQL. From here we can traverse the tree and identify any operations that shouldn't be taking place.
The mock code (it won't run in this example) would look something like:
const valid_types = [ "case", "when", "else", "column_ref", "binary_expr", "single_quote_string", "number"];
const valid_tables = [ "context" ];
// Create a mock sql expressions and parse the AST
var exp = YOUR_CUSTOM_EXPRESSION;
var ast = parser.astify(exp);
// Check for attempted multi-statement injections
if(Array.isArray(ast) && ast.length > 1){
this.error = throw Error("Multiple statements detected");
}
// Recursively check the AST for unallowed operations
this.recursive_ast_check([], "columns", ast.columns);
function recursive_ast_check(path, p_key, ast_node){
// If parent key is the "type" of operation, check it against allowed values
if(p_key === "type") {
if(validator.valid_types.indexOf(ast_node) == -1){
throw Error("Invalid type '" + ast_node + "' found at following path: " + JSON.stringify(path));
}
return;
}
// If parent type is table, then the value should always be "context"
if(p_key === "table") {
if(validator.valid_tables.indexOf(ast_node) == -1){
throw Error("Invalid table reference '" + ast_node + "' found at following path: " + JSON.stringify(path));
}
return;
}
// Ignore null or empty nodes
if(!ast_node || ast_node==null) { return; }
// Recursively search array values down the chain
if(Array.isArray(ast_node)){
for(var i = 0; i<ast_node.length; i++) {
this.recursive_ast_check([...path, p_key], i, ast_node[i]);
}
return;
}
// Recursively search object keys down the chain
if(typeof ast_node === 'object'){
for(let key of Object.keys(ast_node)){
this.recursive_ast_check([...path, p_key], key, ast_node[key]);
}
}
}
This is just a mockup adapted from our implementation, but hopefully it will provide some guidance. Should also note, it is best to also implement all of the strategies discussed above as well. Many safeguards are better than just one.

How IQueryables are dealt with in ASP.NET MVC Views?

I have some tables in a MySQL database to represent records from a sensor. One of the features of the system I'm developing is to display this records from the database to the web user, so I used ADO.NET Entity Data Model to create an ORM, used Linq to SQL to get the data from the database, and stored them in a ViewModel I designed, so I can display it using MVCContrib Grid Helper:
public IQueryable<TrendSignalRecord> GetTrends()
{
var dataContext = new SmgerEntities();
var trendSignalRecords = from e in dataContext.TrendSignalRecords
select e;
return trendSignalRecords;
}
public IQueryable<TrendRecordViewModel> GetTrendsProjected()
{
var projectedTrendRecords = from t in GetTrends()
select new TrendRecordViewModel
{
TrendID = t.ID,
TrendName = t.TrendSignalSetting.Name,
GeneratingUnitID = t.TrendSignalSetting.TrendSetting.GeneratingUnit_ID,
//{...}
Unit = t.TrendSignalSetting.Unit
};
return projectedTrendRecords;
}
I call the GetTrendsProjectedMethod and then I use Linq to SQL to select only the records I want. It is working fine in my developing scenario, but when I test it in a real scenario, where the number of records is way greater (something around a million records), it stops working.
I put some debug messages to test it, and everything works fine, but when it reaches the return View() statement, it simply stops, throwing me a MySQLException: Timeout expired. That let me wondering if the data I sent to the page is retrieved by the page itself (it only search for the displayed items in the database when the page itself needs it, or something like that).
All of my other pages use the same set of tools: MVCContrib Grid Helper, ADO.NET, Linq to SQL, MySQL, and everything else works alright.
You absolutely should paginate your data set before executing your query if you have millions of records. This could be done using the .Skip and .Take extension methods. And those should be called before running any query against your database.
Trying to fetch millions of records from a database without pagination would very likely cause a timeout at best.
Well, assuming information in this blog is correct, .AsPagination method requires you to sort your data by a particular column. It's possible that trying to do an OrderBy on a table with millions of records in it is just a time consuming operation and times out.

Linq to Sql: ChangeConflictException not being thrown (and no rows updated/deleted)

I'm trying to fit Linq to Sql into an N-Tier design. I am implementing concurrency by supplying original values when attaching objects to the data-context. When calling SubmitChanges, and observing the generated scripts on sql server profiler, I can see that they are being generated properly. They include where clauses that check all the object properties (they are all marked with UpdateCheck.Always).
The result is as expected, i.e., no rows are updated on updates or deleted on deletes. Yet I am not getting any exception. Isn't this supposed to throw a ChangeConflictException?
For clarity here is the design and flow for the tests I'm running: I have a client console and a service console talking to each other via WCF using WsHttpBinding.
Client requests data from service
Service instantiates a datacontext, retrieves data, disposes context, returns data to client.
Client makes modifications to returned data.
Client requests an update of changed data from the service.
5a. Service instantiates a datacontext, attaches objects, and...
5b. I pause execution and change values in the database in order to cause a change-conflict
5c. Service calls SubmitChanges.
Here's the code for step 5, cleaned up a bit for clarity:
public void UpdateEntities(ReadOnlyChangeSet<Entity> changeSet)
{
using (EntityDataContext context = new EntityDataContext())
{
if (changeSet.AddedEntities.Count > 0)
{
context.Entities.InsertAllOnSubmit(changeSet.AddedEntities);
}
if (changeSet.RemovedEntities.Count > 0)
{
context.Entities.AttachAll(changeSet.RemovedEntities, false);
context.Entities.DeleteAllOnSubmit(changeSet.RemovedEntities);
}
if (changeSet.ModifiedRecords.Count > 0)
{
foreach (var record in changeSet.ModifiedRecords)
{
context.Entities.Attach(record.Current, record.Original);
}
}
// This is where I pause execution and make changes to the database
context.SubmitChanges();
}
}
I'm using some classes to track changes and maintain originals, as you can see.
Any help appreciated.
EDIT: I'm having no problems with inserts. I've only included the code that calls InsertAllOnSubmit for completeness.
So I've found the answer. It appears to be a bug in Linq To Sql (correct me if I'm wrong). It turns out that the table being updated in the database has a trigger on it. This trigger calls a stored procedure that has a return value. This causes calls to insert, update or delete on this table to yield a return value (from the stored procedure run by the trigger) which is NOT a row-count but is a number. Apparently L2S sees this number and assumes all went well even though no insert/update/delete actually occurred.
This is quite bizarre, especially considering the returned number has a defined column name and its value is in the 6-digit area.

Forcing LINQ to use a Stored Procedure when accessing a Database

I've done some searches (over the web and SO) but so far have been unable to find something that directly answer this:
Is there anyway to force L2S to use a Stored Procedure when acessing a Database?
This is different from simply using SPROC's with L2S: The thing is, I'm relying on LINQ to lazy load elements by accessing then through the generated "Child Property". If I use a SPROC to retrieve the elements of one table, map then to an entity in LINQ, and then access a child property, I believe that LINQ will retrieve the register from the DB using dynamic sql, which goes against my purpose.
UPDATE:
Sorry if the text above isn't clear. What I really want is something that is like the "Default Methods" for Update, Insert and Delete, however, to Select. I want every access to be done through a SPROC, but I want to use Child Property.
Just so you don't think I'm crazy, the thing is that my DAL is build using child properties and I was accessing the database through L2S using dynamic SQL, but last week the client has told me that all database access must be done through SPROCS.
i don't believe that there is a switch or setting that out of the box and automagically would map to using t sprocs the way you are describing. But there is now reason why you couldn't alter the generated DBML file to do what you want. If I had two related tables, a Catalog table and CatalogItem tables, the Linq2SQL generator will naturally give me a property of CatalogItems on Catalog, code like:
private EntitySet<shelf_myndr_Previews_CatalogItem> _shelf_myndr_Previews_CatalogItems;
[global::System.Data.Linq.Mapping.AssociationAttribute(Name="CatalogItem", Storage="_CatalogItems", ThisKey="Id", OtherKey="CatalogId")]
public EntitySet<CatalogItem> CatalogItems
{
get
{
return this._CatalogItems;
//replace this line with a sproc call that ultimately
//returns the expected type
}
set
{
this._CatalogItems.Assign(value);
//replace this line with a sproc call that ultimately
//does a save operation
}
}
There is nothing stopping you from changing that code to be sproc calls there. It'd be some effort for larger applications and I'd be sure that you be getting the benefit from it that you think you would.
How about loading the child entities using the partial OnLoaded() method in the parent entity? That would allow you to avoid messing with generated code. Of course it would no longer be a lazy load, but it's a simple way to do it.
For example:
public partial class Supplier
{
public List<Product> Products { get; set; }
partial void OnLoaded()
{
// GetProductsBySupplierId is the SP dragged into your dbml designer
Products = dataContext.GetProductsBySupplierId(this.Id).ToList();
}
}
Call your stored procedure this way:
Where GetProductsByCategoryName is the name of your stored procedure.
http://weblogs.asp.net/scottgu/archive/2007/08/16/linq-to-sql-part-6-retrieving-data-using-stored-procedures.aspx

How can I control the creation of database indexes when using DataContext.CreateDatabase()

I am new to LINQ to SQL, but have done a lot of database development in the past.
The software I just started working on uses:
// MyDataContext is a sub class of DataContext, that is generated with SqlMetal
MyDataContext db = new MyDataContext (connectionString);
db.CreateDatabase();
to create the database when it is first run.
I need to add some indexes to the tables....
How can I tell the DataContext what indexes I want?
Otherwise how do I control this?
(I could use a sql script, but I like the ideal that db.CreateDatabase will always create a database that matches the data access code)
(For better, or worse the software has full access to the database server and our software often create databases on the fly to store result of model runs etc, so please don’t tell me we should not be creating databases from code)
I seem not to be the only person hitting limts on DataContext.CreateDatabase() see also http://csainty.blogspot.com/2008/02/linq-to-sql-be-careful-of.html
As far as I know the DataContext.CreateDatabase method can only create primary keys.
When you look at the DBML directly, you will see that there are no elements for defining an index. Therefore it is, IMHO, save to assume that CreateDatabase cannot do it.
So the only way I can think of for creating indexes "automatically" is by first calling DataContext.CreateDatabase and then calling DataContext.ExecuteCommand to add the indexes to the tables that were just created.
You can execute SQL Command on DatabaseCreated method.
public partial class DatabaseModelsDataContext : System.Data.Linq.DataContext
{
partial void OnCreated ()
{
var cmdText = #"
IF EXISTS (SELECT name FROM sys.indexes WHERE name = N'IX_LeafKey')
DROP INDEX IX_MyTableColumn
ON [mydb].[dbo].[Leaf];
CREATE INDEX IX_MyTableColumn
ON [mydb].[dbo].[MyTable] ([column]) ;";
ExecuteCommand(cmdText);
}
}