I have a function (called "powersearch", the irony!) that searches for a set of strings across a bunch(~ 5) of fields.
The words come in as one string and are separated by spaces.
Some fields can have exact matches, others should have "contains".
(Snipped for brevety)
//Start with all colors
IQueryable<Color> q = db.Colors;
//Filter by powersearch
if (!string.IsNullOrEmpty(searchBag.PowerSearchKeys)){
foreach (string key in searchBag.SplitSearchKeys(searchBag.PowerSearchKeys)
.Where(k=> !string.IsNullOrEmpty(k))){
//Make a local copy of the var, otherwise it gets overwritten
string myKey = key;
int year;
if (int.TryParse(myKey, out year) && year > 999){
q = q.Where(c => c.Company.Name.Contains(myKey)
|| c.StockCode.Contains(myKey)
|| c.PaintCodes.Any(p => p.Code.Equals(myKey))
|| c.Names.Any(n => n.Label.Contains(myKey))
|| c.Company.CompanyModels.Any(m => m.Model.Name.Contains(myKey))
|| c.UseYears.Any(y => y.Year.Equals(year))
);
}
else{
q = q.Where(c => c.Company.Name.Contains(myKey)
|| c.StockCode.Contains(myKey)
|| c.PaintCodes.Any(p => p.Code.Contains(myKey))
|| c.Names.Any(n => n.Label.Contains(myKey))
|| c.Company.CompanyModels.Any(m => m.Model.Name.Equals(myKey))
);
}
}
}
Because the useYear count is rather large, I tried to check for it as little as possible by outruling all numbers that can never be a number that makes sence in this case. Similar checks are not possible on the other fields since they can pretty much contain any thinkable string.
Currently this query takes about 15 secs for a single, non-year string. That's too much.
Anything I can do to improve this?
--Edit--
Profiler shows me the following info for the part where the string is not a year:
exec sp_reset_connection
Audit login
exec sp_executesql N'
SELECT COUNT(*) AS [value]
FROM [dbo].[CLR] AS [t0]
INNER JOIN [dbo].[CO] AS [t1] ON [t1].[CO_ID] = [t0].[CO_ID]
WHERE
([t1].[LONG_NM] LIKE #p0)
OR ([t0].[EUR_STK_CD] LIKE #p1)
OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[PAINT_CD] AS [t2]
WHERE ([t2].[PAINT_CD] LIKE #p2)
AND ([t2].[CLR_ID] = [t0].[CLR_ID])
AND ([t2].[CUSTOM_ID] = [t0].[CUSTOM_ID])
)
)OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[CLR_NM] AS [t3]
WHERE ([t3].[CLR_NM] LIKE #p3)
AND ([t3].[CLR_ID] = [t0].[CLR_ID])
AND ([t3].[CUSTOM_ID] = [t0].[CUSTOM_ID])
)
) OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[CO_MODL] AS [t4]
INNER JOIN [dbo].[MODL] AS [t5] ON [t5].[MODL_ID] = [t4].[MODL_ID]
WHERE ([t5].[MODL_NM] = #p4)
AND ([t4].[CO_ID] = [t1].[CO_ID])
)
)
',N'#p0 varchar(10),#p1 varchar(10),#p2 varchar(10),#p3 varchar(10),#p4 varchar(8)',#p0='%mercedes%',#p1='%mercedes%',#p2='%mercedes%',#p3='%mercedes%',#p4='mercedes'
(took 3626 msecs)
Audit Logout (3673 msecs)
exec sp_reset_connection (0msecs)
Audit login
exec sp_executesql N'
SELECT TOP (30)
[t0].[CLR_ID] AS [Id],
[t0].[CUSTOM_ID] AS [CustomId],
[t0].[CO_ID] AS [CompanyId],
[t0].[EUR_STK_CD] AS [StockCode],
[t0].[SPCL_USE_CD] AS [UseCode],
[t0].[EFF_IND] AS [EffectIndicator]
FROM [dbo].[CLR] AS [t0]
INNER JOIN [dbo].[CO] AS [t1] ON [t1].[CO_ID] = [t0].[CO_ID]
WHERE
([t1].[LONG_NM] LIKE #p0)
OR ([t0].[EUR_STK_CD] LIKE #p1)
OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[PAINT_CD] AS [t2]
WHERE ([t2].[PAINT_CD] LIKE #p2)
AND ([t2].[CLR_ID] = [t0].[CLR_ID])
AND ([t2].[CUSTOM_ID] = [t0].[CUSTOM_ID])
)
)
OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[CLR_NM] AS [t3]
WHERE ([t3].[CLR_NM] LIKE #p3)
AND ([t3].[CLR_ID] = [t0].[CLR_ID])
AND ([t3].[CUSTOM_ID] = [t0].[CUSTOM_ID])
)
)
OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[CO_MODL] AS [t4]
INNER JOIN [dbo].[MODL] AS [t5] ON [t5].[MODL_ID] = [t4].[MODL_ID]
WHERE ([t5].[MODL_NM] = #p4)
AND ([t4].[CO_ID] = [t1].[CO_ID])
)
)'
,N'#p0 varchar(10),#p1 varchar(10),#p2 varchar(10),#p3 varchar(10),#p4 varchar(8)',#p0='%mercedes%',#p1='%mercedes%',#p2='%mercedes%',#p3='%mercedes%',#p4='mercedes'
(took 3368 msecs)
The database structure, sadly, is not under my control. It comes from the US and has to stay in the exact same format for compatibility reasons. Although most of the important fields are indeed indexed, they are indexed in (unnecessary) clustered primary keys. There's verry little I can do about that.
Okay, let's break this down - the test case you're interested in first is a single non-year, so all we've got is this:
q = q.Where(c => c.Company.Name.Contains(myKey)
|| c.StockCode.Contains(myKey)
|| c.PaintCodes.Any(p => p.Code.Contains(myKey))
|| c.Names.Any(n => n.Label.Contains(myKey))
|| c.Company.CompanyModels.Any(m => m.Model.Name.Equals(myKey))
Am I right? If so, what does the SQL look like? How long does it take just to execute the SQL statement in SQL Profiler? What does the profiler say the execution plan looks like? Have you got indexes on all of the appropriate columns?
Use compiled queries.
If you don't, you will lose up to 5-10x times performance, as LINQ-to-SQL will have to generate SQL from query every time you call it.
Things become worse when you use non-constants in LINQ-to-SQL as getting their values is really slow.
This assumes that you already have indexes and sane DB schema.
BTW, I am not kidding about 5-10x part.
Related
I'm trying to figure out how to write the following query to fetch some elements which have multiple categories.
$query->matching(
$query->logicalAnd(
[
// the following 4 lines are the problem lines
$query->logicalAnd(
$query->in('categories.uid', $categories),
$query->in('categories.uid', $countryCategories)
),
// $query->in('categories.uid', $categories),
// $query->in('categories.uid', $countryCategories),
$query->logicalOr(
[
$query->equals('is_pinned', 0),
$query->lessThan('pinned_until', time())
]
),
]
)
);
The idea is to fetch the elements where categories.uid match at least one uid in $categories and at least one in $countryCategories. Both $categories and $countryCategories are arrays filled with category uids.
The query worked fine until the second line $query->in('categories.uid' [...] was inserted. As soon as the second line is inserted the query result is empty. It's probably an error in the query, but neither me nor my colleague could find a working solution.
While searching I found the sql UNION, which I've never been working with before but I guessed it would be the way to go if I had to write the statement instead of building the query.
What I would like to know is if it is possible to fetch the elements with the "query builder" or if it is really necessairy to write a statement? If there is a solution with the query builder could you point it out for me? If not how would I build the query with UNION to fetch the elements as required?
If something is unclear, please do not hesitate to ask, I will try to specify further. Thanks.
EDIT
We've debugged the query too and I executed it in phpmyadmin directly. It was working without "AND (sys_category.uid IN ( 41, 2 ))" but with it the result is empty. The follwoing was the debugged query:
SELECT `tx_gijakobnews_domain_model_news`.*
FROM `tx_gijakobnews_domain_model_news` `tx_gijakobnews_domain_model_news`
LEFT JOIN `sys_category_record_mm` `sys_category_record_mm` ON ( `tx_gijakobnews_domain_model_news`.`uid` = `sys_category_record_mm`.`uid_foreign`) AND (( `sys_category_record_mm`.`tablenames` = 'tx_gijakobnews_domain_model_news') AND ( `sys_category_record_mm`.`fieldname` = 'categories'))
LEFT JOIN `sys_category` `sys_category` ON `sys_category_record_mm`.`uid_local` = `sys_category`.`uid`
WHERE ((
(`sys_category`.`uid` IN ( 15, 17, 10, 11, 12, 16, 13, 14 ))
////// this following line is where the problem begins
AND (`sys_category`.`uid` IN ( 41, 2 ))
)
/////////// the following lines are additional restrictions
/////////// which have no influence on the problem
AND ((`tx_gijakobnews_domain_model_news`.`is_pinned` = 0) OR ( `tx_gijakobnews_domain_model_news`.`pinned_until` < 1560867383))
)
AND ( `tx_gijakobnews_domain_model_news`.`sys_language_uid` IN ( 0, -1) )
AND ( `tx_gijakobnews_domain_model_news`.`pid` = 31)
AND ( ( `tx_gijakobnews_domain_model_news`.`deleted` = 0)
AND ( `tx_gijakobnews_domain_model_news`.`t3ver_state` <= 0)
AND ( `tx_gijakobnews_domain_model_news`.`pid` <> -1)
AND ( `tx_gijakobnews_domain_model_news`.`hidden` = 0)
AND ( `tx_gijakobnews_domain_model_news`.`starttime` <= 1560867360)
AND ( ( `tx_gijakobnews_domain_model_news`.`endtime` = 0)
OR ( `tx_gijakobnews_domain_model_news`.`endtime` > 1560867360) ) )
AND ( ( ( `sys_category`.`deleted` = 0)
AND ( `sys_category`.`t3ver_state` <= 0)
AND ( `sys_category`.`pid` <> -1)
AND ( `sys_category`.`hidden` = 0)
AND ( `sys_category`.`starttime` <= 1560867360)
AND ( ( `sys_category`.`endtime` = 0)
OR ( `sys_category`.`endtime` > 1560867360) ) )
OR ( `sys_category`.`uid`
IS NULL) )
ORDER BY `tx_gijakobnews_domain_model_news`.`publish_date` DESC
If there's a missing bracket I problably removed it accidentally while formatting...
I believe the problem is that the where clause is applied on a "per row" basis.
Meaning If you have a query like the following (based off of your query):
SELECT *
FROM news
LEFT JOIN sys_category_record_mm mm
ON (news.uid = mm.uid_foreign) /* AND (...) */
LEFT JOIN sys_category
ON mm.uid_local = sys_category.uid
WHERE
sys_category.uid IN (1,2,3)
AND sys_category.uid IN (4,5,6)
You might have one news entry, that is in category 1 and in category 4. But the result set would be two distinct rows:
news.uid | sys_category.uid
1 | 1
1 | 4
and the WHERE clause filters both of them out, because the sys_category.uid is not both in (1, 2, 3) and in (4, 5, 6) for each row individually.
The way to do that on an SQL level, would probably be, to do two joins to sys_category. But I do not believe it's possible with the (rather simple) extbase query builder.
Edit:
As a solution, you could use the $query->statement() method, that allows to use custom sql queries.
$result = $query->statement('SELECT news.* FROM news');
https://docs.typo3.org/m/typo3/book-extbasefluid/master/en-us/6-Persistence/3-implement-individual-database-queries.html
You could build your own custom Query with the QueryBuilder. Something like this:
use TYPO3\CMS\Core\Database\ConnectionPool;
use TYPO3\CMS\Core\Utility\GeneralUtility;
use TYPO3\CMS\Extbase\Utility\DebuggerUtility;
$queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
->getQueryBuilderForTable('table_to_select_from');
$result = $queryBuilder->select('*')
->from('table_to_select_from')
->where($queryBuilder->expr()->in('field', ['1','2','3'])
->execute()
->fetchAll();
DebuggerUtility::var_dump($result);
Here's the documentation:
https://docs.typo3.org/m/typo3/reference-coreapi/master/en-us/ApiOverview/Database/QueryBuilder/Index.html
I did it way simpler in the end.
Instead of adding both restrictions by the query, I looped through the results restricted by the first sys_category-condition and then removed those which didn't meet the second sys_category-restrictions.
Repository
$query->matching(
$query->logicalAnd([
$query->in('categories.uid', $categories),
$query->logicalOr(
[
$query->equals('is_pinned', 0),
$query->lessThan('pinned_until', time())
]
),
])
);
Controller
public function getRestrictedNews($news, $countryCategories) {
$newNews = array();
foreach ($news as $newsItem) {
$newsCategories = $newsItem->getCategories();
$shouldKeep = false;
foreach ($newsCategories as $categoryItem) {
if (in_array($categoryItem->getUid(), $countryCategories)) {
$shouldKeep = true;
}
}
if ($shouldKeep) {
array_push($newNews, $newsItem);
}
}
return $newNews;
}
It may not be the best solution, but it's one that works. :-)
I am trying to do some optimisation, currently post mysql work is done on the results to set a new paramter $class_subject... so i am trying get this already calculated in mysql...
SELECT
class_grade.results as results,
subjects.subject as subject,
subjects_pseudonyms.pseudonym as pseudonym,
IF( subjects_pseudonyms.pseudonym = null, subjects.subject, subjects_pseudonyms.pseudonym ) as class_subject
FROM
class_grade
INNER JOIN class ON class_grade.class_ID = class.class_ID
INNER JOIN subjects ON class.subject_ID = subjects.a_ID
LEFT JOIN subjects_pseudonyms ON class.subject_pseudonym_ID = subjects_pseudonyms.a_ID
WHERE
class_grade.teacher_ID = :teacher_id AND
class_grade.class_ID = :current_class_ID AND
class_grade.report_set_ID = :report_set_ID AND
class_grade.student_ID = :current_student_ID
In the above query the pseudonym might be null, if so I am attempting to set a new variable class_subject to be either subject or pseudonym...
The query runs fine, a results example is:
[results] => 71
[subject] => Law
[pseudonym] =>
[class_subject] =>
The problem is, the class_subject is not being populated..
Is there something wrong with my IF() cond?
Thanks,
John
You need to use IS NULL instead of = NULL or ISNULL()
http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_isnull
ISNULL() can be used instead of = to test whether a value is NULL.
(Comparing a value to NULL using = always yields false.)
I have this mysql query:
SELECT
freeAnswers.*,
(SELECT `districtCode`
FROM `geodatas`
WHERE `zipCode` = clients.zipCode
GROUP BY `zipCode`
LIMIT 0, 1) as districtCode,
clients.zipCode,
clients.gender,
clients.startAge,
clients.endAge,
clients.mail,
clients.facebook,
surveys.customerId,
surveys.activityId,
surveys.name as surveyName,
customers.companyName,
activities.name as activityName
FROM freeAnswers,
clients,
surveys,
customers,
activities
WHERE freeAnswers.surveyId = surveys.id
AND surveys.customerId = customers.id
AND activities.id = surveys.activityId
AND clients.id = freeAnswers.clientId
AND customers.id = 1
ORDER BY activityName asc
LIMIT 0, 10
the query is correct on my mysql server but when I try to use it in Zend Framework 1.11 model
I get this error: Mysqli prepare error: Operand should contain 1 column(s)
Please, could anyone help me to make it run well?
Best Regards,
Elaidon
Here is some code that should work. Zend_Db_Select doesn't really provide a way to select from multiple tables in the FROM clause without using a JOIN so this feels a bit hackish to me in regards to one small part of the query. Your best bet will probably be to rewrite the query using JOINs where appropriate.
$subselect = $db->select()
->from('geodatas', 'districtCode')
->where('zipCode = clients.zipCode')
->group('zipCode')
->limit(1, 0);
$from = $db->quoteIdentifier('freeAnswers') . ', ' .
$db->quoteIdentifier('clients') . ', ' .
$db->quoteIdentifier('surveys') . ', ' .
$db->quoteIdentifier('customers') . ', ' .
$db->quoteIdentifier('activities');
$select = $db->select()
->from(array('activities' => new Zend_Db_Expr($from)),
array('freeanswers.*',
'districtCode' =>
new Zend_Db_Expr('(' . $subselect . ')'),
'clients.zipCode', 'clients.gender', 'clients.startAge',
'clients.endAge', 'clients.mail', 'clients.facebook',
'clients.customerId', 'clients.activityId',
'surveyName' => 'surveys.name', 'customers.companyName',
'activityName' => 'activities.name'))
->where('freeAnswers.surveyId = surveys.id')
->where('surveys.customerId = customers.id')
->where('activities.id = surveys.activityId')
->where('clients.id = freeAnswers.clientId')
->where('customers.id = ?', 1)
->order('activityName ASC')
->limit(10, 0);
The only reason I say it is hackish is because of the line:
->from(array('activities' => new Zend_Db_Expr($from)),
Since from() really only works with one table, I create a Zend_Db_Expr and specify the correlation as the last table name in the expression. If you don't pass a Zend_Db_Expr, it will either quote your comma separated table name incorrectly, or if you pass an array of table names, it just uses the first. When you pass a Zend_Db_Expr with no name, it defaults to use AS t which also doesn't work in your case. That is why I put it as is.
That returns the exact SQL you provided except for the last thing mentioned. Here is actually what it returns:
SELECT
`freeanswers`.*,
(SELECT `geodatas`.`districtCode`
FROM `geodatas`
WHERE (zipCode = clients.zipCode)
GROUP BY `zipCode`
LIMIT 1) AS `districtCode`,
`clients`.`zipCode`,
`clients`.`gender`,
`clients`.`startAge`,
`clients`.`endAge`,
`clients`.`mail`,
`clients`.`facebook`,
`clients`.`customerId`,
`clients`.`activityId`,
`surveys`.`name` AS `surveyName`,
`customers`.`companyName`,
`activities`.`name` AS `activityName`
FROM `freeAnswers`,
`clients`,
`surveys`,
`customers`,
`activities` AS `activities`
WHERE (freeAnswers.surveyId = surveys.id)
AND (surveys.customerId = customers.id)
AND (activities.id = surveys.activityId)
AND (clients.id = freeAnswers.clientId)
AND (customers.id = 1)
ORDER BY `activityName` ASC
LIMIT 10
So that will work but eventually you will want to rewrite it using JOIN instead of specifying most of the WHERE clauses.
When dealing with subqueries and Zend_Db_Select, I find it easy to write each subquery as their own queries before writing the final query, and just insert the subqueries where they need to go and Zend_Db handles the rest.
Hope that helps.
Given the following tables:
Orders (OrderID, OrderStatus, OrderNumber)
OrderItems(OrderItemID, OrderID, ItemID, OrderItemStatus)
Orders: 2537 records
Order Items: 1319 records
I have created indexes on
Orders(OrderStatus)
OrderItems(OrderID)
OrderItems(OrderItemStatus)
I have the following SQL statement (generated by LinqToSql) which when executed, has:
- duration = 8789
- reads = 7809.
exec sp_executesql N'SELECT COUNT(*) AS [value]
FROM [dbo].[Orders] AS [t0]
WHERE ([t0].[OrderStatus] = #p0) OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[OrderItems] AS [t1]
WHERE ([t1].[OrderID] = [t0].[OrderID]) AND ([t1].[OrderItemStatus] = #p1)
))',N'#p0 nvarchar(2),#p1 nvarchar(2)',#p0=N'KE',#p1=N'KE'
Is there anything else which I can do to make it faster?
make all those nvarchars parameters varchars if the columns in the table are varchars
))',N'#p0 varchar(2),#p1 varchar(2)',#p0=N'KE',#p1=N'KE'
See also here: sp_executesql causing my query to be very slow
Count on a single index rather than *
This might generate some better sql.
IQueryable<int> query1 =
from oi in db.OrderItems
where oi.OrderItemStatus == theItemStatus
select oi.OrderID;
IQueryable<int> query2 =
from o in db.Orders
where o.OrderStatus == theOrderStatus
select o.OrderID;
IQueryable<int> query3 = query1.Concat(query2).Distinct();
int result = query3.Count();
I am checking login of a user by this repository method,
public bool getLoginStatus(string emailId, string password)
{
var query = from r in taxidb.Registrations
where (r.EmailId == emailId && r.Password==password)
select r;
if (query.Count() != 0)
{
return true;
}
return false;
}
I saw in one of the previous questions !query.Any() would be faster... Which should i use? Any suggestion....
The sql generated will be different between the two calls. You can check by setting your context.Log property to Console.Out or something.
Here's what it will be:
SELECT COUNT(*) AS [value]
FROM [dbo].[Registrations] AS [t0]
WHERE [t0].[EmailId] = #p0 and [t0].Password = #p1
SELECT
(CASE
WHEN EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[Registrations] AS [t0]
WHERE [t0].[EmailId] = #p0 and [t0].Password = #p1
) THEN 1
ELSE 0
END) AS [value]
In this case, I doubt it will make any difference because EmailID is probably a unique index so there can only be 1 result. In another case where count can be > 1, Any would be preferable because the second query allows sql server to short circuit the search since it only needs to find one to prove that any exist.
You could express it quite a bit shorter like this:
return taxidb.Registrations.Any(r => r.EmailId == emailId && r.Password==password);