I am updating rows of a MySQL database with groovy and with the method I am using things are very slow. I was hoping you could improve on the performance of my example:
sql.resultSetConcurrency = ResultSet.CONCUR_UPDATABLE
sql.eachRow("SELECT * FROM email) { bt ->
bt.extendedDesc = update(bt.name, bt.direction)
}
sql.resultSetConcurrency = ResultSet.CONCUR_READ_ONLY
Then there is the update method:
def update(name, direction) {
if (direction == 'Outgoing') {
result = 'FROM: '+name
} else {
result = 'TO: '+name
}
if(result.size() > 75) {
result = result.substring(0, 72) + "..."
}
return result
}
So it updates one field of each entry in email (extendedDesc in this example) using a method that needs 2 other fields passed to it as parameters.
It is very slow, around 600 entries updated per minute, and email has 200000+ entries =/
Is there a better method to accomplish this? Should use Groovy if possible, as it needs to run with all my other Groovy scripts.
You are doing your update as a cursor based, updatable query, which has to read every record and conditionally write something back. You're doing all the heavy lifting in the code, rather than letting the database do it.
Try using an update query to only update the records matching your criteria. You won't need eachRow to do this.
Related
Let me preface by saying I'm very new to SQL (and back end design) in general. So for those annoyed with noob questions, please be gentle.
BACKGROUND:
I'm trying to build a product test database (storing test data for all our products) where I want a user to be able to refine a search to find test data they actually want. For example, they may start by searching for all products of a certain brand name, and then refine it with a product type, and/or refine it with a date range of when the test was done.
PROBLEM:
I'm having a hard time finding information on how to implement multi-parameter searches with mysql and node.js. I know you can do nested queries and joins and such within pure SQL syntax, but it's not abundantly clear to me how I would do this from node.js, especially when certain search criteria aren't guaranteed to be used.
Ex:
CREATE PROCEDURE `procedureName`(
IN brandname VARCHAR(20),
producttype VARCHAR(30))
BEGIN
SELECT * FROM products
WHERE brand = brandname
AND product_type = producttype;
END
I know how to pass data from node.js to this procedure, but what if the user didn't specify a product type? Is there a way to nullify this part of the query? Something like:
AND product_type = ALL;
WHAT I'VE TRIED:
I've also looked into nesting multiple SQL procedures, but passing in dynamic data to the "FROM" clause doesn't seem to be possible. Ex: if I had a brandname procedure, and a product type procedure, I don't know how/if I can pass the results from one procedure to the "FROM" clause of the other to actually refine the search.
One idea was to create tables with the results in each of these procedures, and pass those new table names to subsequent procedures, but that strikes me as an inefficient way to do this (Am I wrong? Is this a completely legit way to do this?).
I'm also looking into building a query string on the node side that would intelligently decide what search criteria have been specified by the front end, and figure out where to put SQL AND's and JOIN's and what-nots. The example below actually works, but this seems like it could get ugly quick as I add more search criteria, along with JOINS to other tables.
// Build a SQL query based on the parameters in a request URL
// Example request URL: http://localhost:3000/search?brand=brandName&type=productType
function qParams(req) {
let q = "SELECT * FROM products WHERE ";
let insert = [];
if(req.query.brand) {
brandname = req.query.brand; // get brandname from url request
q = q + `brand = ?`, // Build brandname part of WHERE clause
insert.push(brandname); // Add brandname to insert array to be used with query.
};
if(req.query.type) {
productType = req.query.type; // get product type from url request
insert.length > 0 ? q = q + ' AND ' : q = q; // Decide if this is the first search criteria, add AND if not.
q = q + 'product_type = ?'; // Add product_type to WHERE clause
insert.push(productType); // Add product_type variable to insert array.
}
// Return query string and variable insert array
return {
q: q,
insert: insert
};
};
// Send Query
async function qSend(req, res) {
const results = await qParams(req); // Call above function, wait for results
// Send query string and variables to MySQL, send response to browser.
con.query(results.q, results.insert, (err, rows) => {
if(err) throw err;
res.send(rows);
res.end;
})
};
// Handle GET request
router.use('/search', qSend);
CONCISE QUESTIONS:
Can I build 1 SQL procedure with all my search criteria as variables, and nullify those variables from node.js if certain criteria aren't used?
Is there way to nest multiple MySQL procedures so I can pick the procedures applicable to the search criteria?
Is creating tables of results in a procedure, and passing those new table names to other procedures a reasonable way to do that?
Building the query from scratch in node is working, but it seems bloated. Is there a better way to do this?
Googling "multi-parameter search mysql nodejs" is not producing useful results for my question, i.e. I'm not asking the right question. What is the right question? What do I need to be researching?
One option is to use coalesce():
SELECT p.*
FROM products p
WHERE
p.brand = COALESCE(:brandname, p.brand)
AND p.product_type = COALESCE(:producttype, p.producttype);
It may be more efficient do explicit null checks on the parameters:
SELECT p.*
FROM products p
WHERE
(:brandname IS NULL OR p.brand = :brandname)
AND (:producttype IS NULL OR p.product_type = :producttype);
Something just came to mind and I'd like to bounce it off:
Say you have a user profile, with 10 fields that the user can edit, and not all of them are required. When issuing update commands, is it more efficient to either:
A) Collect all of the fields, filled in or not, and issue one all encompassing update statement to the server's DB
or
B) Use client side validation to check to see which fields have been filled out or changed, and have a selection of SQL methods that only send and update these fields
or
C) Create groupings, like "updateRequiredFields(...) and updateExtraFields(...)", which would issue one smaller transfer if the changes only belong in one group, however two transfers if both are edited
General consensus? Clearly option B is the far more verbose approach, I'm just wondering if it's worth coding it all out or if it'll actually make a noticeable impact on the server (think "scaled for big data").
You could do something like this on your DB update function:
public function updateFields(array $fields) {
$updateQuery = array();
foreach($fields as $fieldKey => $fieldValue) {
//if $fieldValue is false, leave it unchanged
if ($fieldValue !== false) {
//NOTE: make sure you escape this or use PDO
$updateQuery[] = $fieldKey . '=' . $fieldValue;
}
}
$query = 'UPDATE UserInfo SET ' . implode(",", $updateQuery) . ' WHERE ...';
}
You just need to build $fields array based on what was modified on client side and then pass in with either new value or with false if no change.
I've stumbled upon a very strange LINQ to SQL behaviour / bug, that I just can't understand.
Let's take the following tables as an example: Customers -> Orders -> Details.
Each table is a subtable of the previous table, with a regular Primary-Foreign key relationship (1 to many).
If I execute the follow query:
var q = from c in context.Customers
select (c.Orders.FirstOrDefault() ?? new Order()).Details.Count();
Then I get an exception: Could not format node 'Value' for execution as SQL.
But the following queries do not throw an exception:
var q = from c in context.Customers
select (c.Orders.FirstOrDefault() ?? new Order()).OrderDateTime;
var q = from c in context.Customers
select (new Order()).Details.Count();
If I change my primary query as follows, I don't get an exception:
var q = from r in context.Customers.ToList()
select (c.Orders.FirstOrDefault() ?? new Order()).Details.Count();
Now I could understand that the last query works, because of the following logic:
Since there is no mapping of "new Order()" to SQL (I'm guessing here), I need to work on a local list instead.
But what I can't understand is why do the other two queries work?!?
I could potentially accept working with the "local" version of context.Customers.ToList(), but how to speed up the query?
For instance in the last query example, I'm pretty sure that each select will cause a new SQL query to be executed to retrieve the Orders. Now I could avoid lazy loading by using DataLoadOptions, but then I would be retrieving thousands of Order rows for no reason what so ever (I only need the first row)...
If I could execute the entire query in one SQL statement as I would like (my first query example), then the SQL engine itself would be smart enough to only retrieve one Order row for each Customer...
Is there perhaps a way to rewrite my original query in such a way that it will work as intended and be executed in one swoop by the SQL server?
EDIT:
(longer answer for Arturo)
The queries I provided are purely for example purposes. I know they are pointless in their own right, I just wanted to show a simplistic example.
The reason your example works is because you have avoided using "new Order()" all together. If I slightly modify your query to still use it, then I still get an exception:
var results = from e in (from c in db.Customers
select new { c.CustomerID, FirstOrder = c.Orders.FirstOrDefault() })
select new { e.CustomerID, Count = (e.FirstOrder != null ? e.FirstOrder : new Order()).Details().Count() }
Although this time the exception is slightly different - Could not format node 'ClientQuery' for execution as SQL.
If I use the ?? syntax instead of (x ? y : z) in that query, I get the same exception as I originaly got.
In my real-life query I don't need Count(), I need to select a couple of properties from the last table (which in my previous examples would be Details). Essentially I need to merge values of all the rows in each table. Inorder to give a more hefty example I'll first have to restate my tabels:
Models -> ModelCategoryVariations <- CategoryVariations -> CategoryVariationItems -> ModelModuleCategoryVariationItemAmounts -> ModelModuleCategoryVariationItemAmountValueChanges
The -> sign represents a 1 -> many relationship. Do notice that there is one sign that is the other way round...
My real query would go something like this:
var q = from m in context.Models
from mcv in m.ModelCategoryVariations
... // select some more tables
select new
{
ModelId = m.Id,
ModelName = m.Name,
CategoryVariationName = mcv.CategoryVariation.Name,
..., // values from other tables
Categories = (from cvi in mcv.CategoryVariation.CategoryVariationItems
let mmcvia = cvi.ModelModuleCategoryVariationItemAmounts.SingleOrDefault(mmcvia2 => mmcvia2.ModelModuleId == m.ModelModuleId) ?? new ModelModuleCategoryVariationItemAmount()
select new
{
cvi.Id,
Amount = (mmcvia.ModelModuleCategoryVariationItemAmountValueChanges.FirstOrDefault() ?? new ModelModuleCategoryVariationItemAmountValueChange()).Amount
... // select some more properties
}
}
This query blows up at the line let mmcvia =.
If I recall correctly, by using let mmcvia = new ModelModuleCategoryVariationItemAmount(), the query would blow up at the next ?? operand, which is at Amount =.
If I start the query with from m in context.Models.ToList() then everything works...
Why are you looking into only the individual count without selecting anything related to the customer.
You can do the following.
var results = from e in
(from c in db.Customers
select new { c.CustomerID, FirstOrder = c.Orders.FirstOrDefault() })
select new { e.CustomerID, DetailCount = e.FirstOrder != null ? e.FirstOrder.Details.Count() : 0 };
EDIT:
OK, I think you are over complicating your query.
The problem is that you are using the new WhateverObject() in your query, T-SQL doesnt know anyting about that; T-SQL knows about records in your hard drive, your are throwing something that doesn't exist. Only C# knows about that. DON'T USE new IN YOUR QUERIES OTHER THAN IN THE OUTER MOST SELECT STATEMENT because that is what C# will receive, and C# knows about creating new instances of objects.
Of course is going to work if you use ToList() method, but performance is affected because now you have your application host and sql server working together to give you the results and it might take many calls to your database instead of one.
Try this instead:
Categories = (from cvi in mcv.CategoryVariation.CategoryVariationItems
let mmcvia =
cvi.ModelModuleCategoryVariationItemAmounts.SingleOrDefault(
mmcvia2 => mmcvia2.ModelModuleId == m.ModelModuleId)
select new
{
cvi.Id,
Amount = mmcvia != null ?
(mmcvia.ModelModuleCategoryVariationItemAmountValueChanges.Select(
x => x.Amount).FirstOrDefault() : 0
... // select some more properties
}
Using the Select() method allows you to get the first Amount or its default value. I used "0" as an example only, I dont know what is your default value for Amount.
Follow up to this question. I have the following code:
string[] names = new[] { "Bob", "bob", "BoB" };
using (MyDataContext dataContext = new MyDataContext())
{
foreach (var name in names)
{
string s = name;
if (dataContext.Users.SingleOrDefault(u => u.Name.ToUpper() == s.ToUpper()) == null)
dataContext.Users.InsertOnSubmit(new User { Name = name });
}
dataContext.SubmitChanges();
}
...and it inserts all three names ("Bob", "bob" and "BoB"). If this was Linq-to-Objects, it wouldn't.
Can I make it look at the pending changes as well as what's already in the table?
I don't think that would be possible in general. Imagine you made a query like this:
dataContext.Users.InsertOnSubmit(new User { GroupId = 1 });
var groups = dataContext.Groups.Where(grp => grp.Users.Any());
The database knows nothing about the new user (yet) because the insert wasn't commited yet, so the generated SQL query might not return the Group with Id = 1. The only way the DataContext could take into account the not-yet-submitted insert in cases like this would be to get the whole Groups-Table (and possibly more tables, if they are affected by the query) and perform the query on the client, which is of course undesirable. I guess the L2S designers decided that it would be counterintuitive if some queries took not-yet-committed inserts into account while others wouldn't, so they chose to never take them into account.
Why don't you use something like
foreach (var name in names.Distinct(StringComparer.InvariantCultureIgnoreCase))
to filter out duplicate names before hitting the database?
Why dont you try something like this
foreach (var name in names)
{
string s = name;
if (dataContext.Users.SingleOrDefault(u => u.Name.ToUpper() == s.ToUpper()) == null)
{
dataContext.Users.InsertOnSubmit(new User { Name = name });
break;
}
}
I am sorry, I don't understand LINQ to SQL as much.
But, when I look at the code, it seems you are telling it to insert all the records at once (similar to a transaction) using SubmitChanges and you are trying to check the existence of it from the DB, when the records are not inserted at all.
EDIT: Try putting the SubmitChanges inside the loop and see that the code will run as per your expectation.
You can query the appropriate ChangeSet collection, such as
if(
dataContext.Users.
Union(dataContext.GetChangeSet().Inserts).
Except(dataContext.GetChangeSet().Deletes).
SingleOrDefault(u => u.Name.ToUpper() == s.ToUpper()) == null)
This will create a union of the values in the Users table and the pending Inserts, and will exclude pending deletes.
Of course, you might want to create a changeSet variable to prevent multiple calls to the GetChangeSet function, and you may need to appropriately cast the object in the collection to the appropriate type. In the Inserts and Deletes collections, you may want to filter it with something like
...GetChangeSet().Inserts.Where(o => o.GetType() == typeof(User)).OfType<User>()...
I'm a principiant of Linq so i need some help..
I don’t know if in Linq syntax by breaking a query in 2 or more parts,
just like the following example,
the records will be downloaded immediatly from sql server in each step,or they will be sent to server at the moment when I’ll start to see all the data?
for exemple when I bind some objects (a Datagrid for exemple)
System.Linq.IQueryable<Panorami> Result = db.Panorami;
byte FoundOneContion = 0;
//step 1
if (!string.IsNullOrEmpty(Title))
{
Result = Result.Where(p => SqlMethods.Like(p.Title, "%" + Title + "%"));
FoundOneContion = 1;
}
//step 2
if (!string.IsNullOrEmpty(Subject))
{
Result = Result.Where(p => SqlMethods.Like(p.Subject, "%" + Subject + "%"));
FoundOneContion = 1;
}
if (FoundOneContion == 0)
{
return null;
}
else
{
return Result.OrderBy(p => p.Title).Skip(PS * CP).Take(PS);
}
If unfortunatly Linq download immediately all the records
(Therefore such a doubt I have had was right!)
exist any syntax to round the problem?
For example: ternary operator ( condition ? true part : false part )
For any suggestions i would appreciate them a lot. thanks everyone!
The above method does not enumerate the query - therefore no database calls are made. The query is constructed and not executed.
You can enumerate the query by calling foreach, or calling some method that calls foreach (such as ToList, ToArray), or by calling GetEnumerator(). This will cause the query to execute.