Django ORM query field weight? - mysql

I'm doing the following query:
People.objects.filter(
Q(name__icontains='carolina'),
Q(state__icontains='carolina'),
Q(address__icontains='carolina'),
)[:9]
I want the first results of the query to be the people who is named "Carolina" (and also matches other fields, but name first). The problem is that I don't think is any way to determine a field "weight" or "priority".
Any idea?
Thanks!

You'll need to do 3 queries for this to work:
names_match = People.objects.filter(name__icontains='carolina')[:9]
states_match = People.objects.filter(state__icontains='carolina')[:9]
addresses_match = People.objects.filter(address__icontains='carolina')[:9]
all_objects = list(names_match) + list(states_match) + list(addresses_match)
all_objects = all_objects[:9]
There are two problems with this approach, which are fairly easily worked round:
It does unnecessary queries (what if names_match contained enough items already).
It allows for duplicates (what if someone in North Carolina is called Carolina?)

This should work:
qs = People.objects.filter(name__icontains='carolina') | People.objects.filter( Q(state__icontains = 'carolina'), Q(address__icontains='carolina')).distinct()
qs = list(qs)[:9]
Or if you want a pure duplicate free list:
qs = list(set(qs))[:9] #for a duplicate free list

Related

Django: ORM/SQL query speed significantly decreased after adding additional BooleanField or (SQL tinyint) to Django Filter

Using MySQL Latest Django:
I have a vaguely complex Django query that works quite quickly--until I add an additional "AND" with a Boolean Field--
See Below:
queriedForms = queryFormtype.form_set.filter(is_public=True)
newQuery = queriedForms.filter(formrecordattributevalue__record_value__icontains=term['TVAL'], formrecordattributevalue__record_attribute_type__pk=rtypePK)
newQuery = newQuery.filter(flagged_for_deletion=False)
logger.info(newQuery.query)
term['count'] = newQuery.count()
If I either remove the initial "is_public=True" or the final "flagged_for_deletion=False)--it works incredibly fast. If I use both as filters, it increases the time for the count() function by something like 2000%
The different QuerySet.query outputs are below:
SELECT `maqluengine_form`.`id`, `maqluengine_form`.`form_name`, `maqluengine_form`.`form_number`, `maqluengine_form`.`form_geojson_string`, `maqluengine_form`.`hierarchy_parent_id`, `maqluengine_form`.`is_public`, `maqluengine_form`.`project_id`, `maqluengine_form`.`date_created`, `maqluengine_form`.`created_by_id`, `maqluengine_form`.`date_last_modified`, `maqluengine_form`.`modified_by_id`, `maqluengine_form`.`sort_index`, `maqluengine_form`.`form_type_id`, `maqluengine_form`.`flagged_for_deletion` FROM `maqluengine_form` INNER JOIN `maqluengine_formrecordattributevalue` ON (`maqluengine_form`.`id` = `maqluengine_formrecordattributevalue`.`form_parent_id`) WHERE (`maqluengine_form`.`form_type_id` = 319 AND `maqluengine_form`.`is_public` = True AND `maqluengine_formrecordattributevalue`.`record_value` LIKE %seal% AND `maqluengine_formrecordattributevalue`.`record_attribute_type_id` = 18510 AND `maqluengine_form`.`flagged_for_deletion` = False)
SELECT `maqluengine_form`.`id`, `maqluengine_form`.`form_name`, `maqluengine_form`.`form_number`, `maqluengine_form`.`form_geojson_string`, `maqluengine_form`.`hierarchy_parent_id`, `maqluengine_form`.`is_public`, `maqluengine_form`.`project_id`, `maqluengine_form`.`date_created`, `maqluengine_form`.`created_by_id`, `maqluengine_form`.`date_last_modified`, `maqluengine_form`.`modified_by_id`, `maqluengine_form`.`sort_index`, `maqluengine_form`.`form_type_id`, `maqluengine_form`.`flagged_for_deletion` FROM `maqluengine_form` INNER JOIN `maqluengine_formrecordattributevalue` ON (`maqluengine_form`.`id` = `maqluengine_formrecordattributevalue`.`form_parent_id`) WHERE (`maqluengine_form`.`form_type_id` = 319 AND `maqluengine_form`.`is_public` = True AND `maqluengine_formrecordattributevalue`.`record_value` LIKE %seal% AND `maqluengine_formrecordattributevalue`.`record_attribute_type_id` = 18510)
The first takes about 20/30 seconds to perform the count(), while the second with only 1 of the two BooleanField's takes less than a second to perform the count()
=======================================
EDIT=======================
Apologies: since the question isn't obvious enough--why is adding an additional AND with a BooleanField increasing the query time by +2000%? Is anyone able to assist in figuring out WHY that's occurring. Thanks.
EDIT=========================
Also discovered that using a exclude(is_public=False) rather than filter(is_public=True) has the same effect as the solution below. Does anyone happen to know why an exclude() works fine--whereas the filter() does not?
==============================
Solution I came up with after a night's rest:
--I keep the query as is(I need it for later because it continues getting chain filtered)
--I need the count() from this stage--which is taking substantially longer than it should with the additional BooleanField AND
--I take a temporary values list to perform a len() on instead:
queriedForms = queryFormtype.form_set.all()
newQuery = queriedForms.filter(formrecordattributevalue__record_value__icontains=term['TVAL'], formrecordattributevalue__record_attribute_type__pk=rtypePK)
newQuery = newQuery.filter(flagged_for_deletion=False)
tempQuery = newQuery.values_list('is_public',flat=True)
finalQuery = [entry for entry in tempQuery if entry != 'False'] #Remove any indices that contain "False"
term['count'] = len(finalQuery)
The following counts that use chained filters after use the same technique--it's significantly faster--if not as fast as removing one of the Booleans from the filters.

VBA code to analyze a HTML table based off certain conditions

So I need to screen scrape data off a website and return it to a spreadsheet based off if a charge amount matched as well the date was the most recent in the table. If there was simply one line in the table, the macro pulls that accordingly. So most of the code is good, I am connected to the website, pulling everything effectively. Where I am struggling is getting the logic to work where the two amounts match as well as the date being the most recent in the HTML table.
I guess what my question is how do I loop through Item(5) the column of that table and specify it to choose the most recent date, also setting the value so that it only finds the one equal to the charge amount. I only want a one to one match. I am new to this so if anyone wants to help me I would greatly appreciate it.
Set IHEC = iHTMLDoc.getElementsByTagName("TR")
If IHEC.Length > 2 Then
For index = 0 to IHEC.Length - 1
Set IHEC_TD = IHEC.Item(index).getElementsByTagName("TD")
Do Until IHEC.Length <2 Or index = IHEC.Length - 1
If IHEC.TD.Item(3).innerText = myBilledAmount Then
myItem1 = IHEC_TDItem(0).innerText
myItem2 = IHEC_TDItem(1).innerText
myItem3 = IHEC_TDItem(2).innerText
myItem4 = IHEC_TDItem(3).innerText
myItem5 = IHEC_TDItem(4).innerText
myItem6 = IHEC_TDItem(5).innerText
myItem7 = IHEC_TDItem(6).innerText
myItem8 = IHEC_TDItem(7).innerText
myItem9 = IHEC_TDItem(8).innerText
End If
End If
Loop
Next Index

Doctrine2 query behavior with groupBy

I have the following custom method in the repo:
$query = $qb->select('Client', 'Organization')
->from(':Client', 'Client')
->leftJoin('Client.organizations', 'Organization');
--different searh conditions--
Then I use paginator to get the results:
$paginator = new \Doctrine\ORM\Tools\Pagination\Paginator($query->getQuery());
$clients = $paginator->getQuery()
->setMaxResults($length)
->setFirstResult($start);
Here where the magic comes.
I set length to 10 and first result to 0. So basicaly I should get 10 results on the page. However if there are 5 Organizations for the Client (Client mtm Organization), there will be 5 results, if there are 7 Organizations - 3 results.
But if I add
$query->groupBy('Client');
Then all is "ok" with the root level of results: i.e. there are 10 Clients on the page, but there is not all of the Organizations (max 1).
Did anyone experience the same issue? Any thoughts, suggestions?
It probably counts the same item several times.
Try once to add a distinct clause to your query.
$query = $qb->select('Client', 'Organization')
->from(':Client', 'Client')
->leftJoin('Client.organizations', 'Organization');
->distinct();

Sqlalchemy: Produce OR-clause with multiple filter()-Calls

I'm new to sqlalchemy and could use some help.
I'm trying to write a small application for which i have to dynamically change a select-statement. So I do s = select([files]), and then i add filters by s = s.where(files.c.createtime.between(val1, val2)).
This works great, but only with an AND-conjunction.
So, when I want to have all entries with createtime (between 1.1.2009 and 1.2.2009) OR createtime == 5.2.2009, I got the problem that i don't know how to achieve this with different filter-calls. Because of the programs logic it's not possible to use s= s.where(_or(files.c.createtime.between(val1, val2), files.c.createtime == DateTime('2009-02-01')))
Thanks in advance,
Christof
You can build or clauses dynamically from lists:
clauses = []
if cond1:
clauses.append(files.c.createtime.between(val1, val2))
if cond2:
clauses.append(files.c.createtime == DateTime('2009-02-01'))
if clauses:
s = s.where(or_(*clauses))
If you're willing to "cheat" by making use of the undocumented _whereclause attribute on Select objects, you can incrementally specify a series of OR terms by building a new query each time based on the previous query's where clause:
s = select([files]).where(literal(False)) # Start with an empty query.
s = select(s.froms).where(or_(s._whereclause,
files.c.createtime.between(val1, val2)))
s = select(s.froms).where(or_(s._whereclause,
files.c.createtime == datetime(2009, 2, 1)))
Building up a union is another option. This is a bit clunkier, but doesn't rely on undocumented attributes:
s = select([files]).where(literal(False)) # Start with an empty query.
s = s.select().union(
select([files]).where(files.c.createtime.between(val1, val2)))
s = s.select().union(
select([files]).where(files.c.createtime == datetime(2009, 2, 1)))

Correlate 2 columns in SQL

SELECT ica.CORP_ID, ica.CORP_IDB, ica.ITEM_ID, ica.ITEM_IDB,
ica.EXP_ACCT_NO, ica.SUB_ACCT_NO, ica.PAT_CHRG_NO, ica.PAT_CHRG_PRICE,
ica.TAX_JUR_ID, ica.TAX_JUR_IDB, ITEM_PROFILE.COMDTY_NAME
FROM ITEM_CORP_ACCT ica
,ITEM_PROFILE
WHERE (ica.CORP_ID = 1000)
AND (ica.CORP_IDB = 4051)
AND (ica.ITEM_ID = 1000)
AND (ica.ITEM_IDB = 4051)
AND ica.EXP_ACCT_NO = ITEM_PROFILE.EXP_ACCT_NO
I'm trying basically say since the exp account code is '801500' then the Name should return "Miscellaneous Medic...".
It seems as if what you are showing is not possible. Have you edited the data in the editor??? You are joining using ica.EXP_ACCT_NO = ITEM_PROFILE.EXP_ACCT_NO . Therefore, every entry with EXP_ACCT_NO = 801500, should also have the same COMDTY_NAME.
However, it could be the case that your IDs are not actually numbers and that they are strings with whitespace (801500__ vs 801500 ). But since you are not performing a left-outer join, it would also mean you have an entry in ITEM_PROFILE with the same whitespace.
You also need to properly normalize your table data (unless this is a view) but it still means you have erroneous data.
Try to perform the same query, but using the TRIM function to remove whitespace: https://stackoverflow.com/a/6858168/1688441 .
Example:
SELECT ica.CORP_ID, ica.CORP_IDB, ica.ITEM_ID, ica.ITEM_IDB,
ica.EXP_ACCT_NO, ica.SUB_ACCT_NO, ica.PAT_CHRG_NO, ica.PAT_CHRG_PRICE,
ica.TAX_JUR_ID, ica.TAX_JUR_IDB, ITEM_PROFILE.COMDTY_NAME
FROM ITEM_CORP_ACCT ica
,ITEM_PROFILE
WHERE (ica.CORP_ID = 1000)
AND (ica.CORP_IDB = 4051)
AND (ica.ITEM_ID = 1000)
AND (ica.ITEM_IDB = 4051)
AND trim(ica.EXP_ACCT_NO) = trim(ITEM_PROFILE.EXP_ACCT_NO);