SAP: join partner function data based on sales type - sap-erp

Working with SAP data, we are willing to enrich sales data with the last customer. Depending on the sales type, there are different partner function codes that correspond to the last company to which the sale is performed (e.g.: we may have indirect or direct sales). For now, we have been considering tables VBAP/VBAK/VBPA. We extract data from each table to separate files using sap4j, and then join VBAP and VBPA on VBELN, and consider partner codes WE (goods recipient) or custom consignation codes indicating the last buyer for consignations.
Is there some accurate way to know who is the last buyer in the chain for a given sale?

It can be done in the following way:
def sales_tabkey(row):
return "001{}{}".format(row['VBELN'], row['POSNR'])
def expected_partner_function_for_sales_type(row):
consignation_codes = set(['ORK', 'XKB', 'ZSOK', 'ZLZK', 'ZTSK', 'KE', 'ZED', 'ZZN'])
if row['AUART'] in consignation_codes:
return 'ZK'
return 'WE'
def get_kunnrf_frame(vbap, vbak, vbpa, kna):
consignation_codes = set(['ORK', 'XKB', 'ZSOK', 'ZLZK', 'ZTSK', 'KE', 'ZED', 'ZZN'])
df = pd.merge(vbap, vbak, on=['VBELN'], how='left')
df = pd.merge(df, vbpa, on='VBELN', how='left')
df["EXPPARVW"]=df.apply(expected_partner_function_for_sales_type, axis=1)
# KUNNR in kna is considered end_client_id
df = pd.merge(df, kna, on='ADRNR', how='left')[['VBELN','POSNR', 'KUNNR','end_client_id', 'ADRNR', 'PARVW', 'EXPPARVW', 'AUART']].drop_duplicates()
df['TABKEY']=df.apply(sales_tabkey,axis=1)
endclient_tabkeys = set(df.TABKEY.unique())
dfa = df[df.PARVW==df['EXPPARVW']]
dfb = df[df.TABKEY.isin(endclient_tabkeys.difference(set(dfa.TABKEY.unique())))]
return pd.concat([dfa, dfb])

Related

How to take two paramters (start_date and end date) to filter out events in django using query_params.get()?

I want to pass two parameters in GET method in django. The one is start_date and the second is end_date. But I just know about passing only one method. here is my code where I want to filter out account deposit history in range of dates .
class BusinessDonationHistoryController(BaseController):
view = ListView
#entity(Business, arg="business")
def read(self, request, response, business):
request.query_params.get() #Here I need help
deposits = Deposit.objects.filter(
business=business, deposit_type=deposit_types.BUSINESS_DONATION)
credit_accounts = CreditAccount.objects.filter(deposit__in=deposits)
deposit_map = defaultdict(list)
# create a mapping of deposit id and associated credit accounts
for credit_account in credit_accounts:
deposit_map[credit_account.deposit_id].append(credit_account)
history = []
for deposit in deposits:
history.append({
"date": deposit.created_date,
"total_amount": from_cents(deposit.amount),
"amount_disbursed": from_cents(sum([ca.total_amount - ca.current_amount for ca in deposit_map.get(deposit.id)]))
})
print(history)
Checked all the links on StackOverflow but I found nothing relevant.
checked this link even

Using scrapy and xpath to parse data

I have been trying to scrape some data but keep getting a blank value or None. I've tried doing next sibling and failed (I probably did it wrong). Any and all help is greatly appreciated. Thank you in advance.
Website to scrape (final): https://www.unegui.mn/azhild-avna/ulan-bator/
Website to test (current, has less listings): https://www.unegui.mn/azhild-avna/mt-hariltsaa-holboo/slzhee-tehnik-hangamzh/ulan-bator/
Code Snippet:
def parse(self, response, **kwargs):
cards = response.xpath("//li[contains(#class,'announcement-container')]")
# parse details
for card in cards:
company = card.xpath(".//*[#class='announcement-block__company-name']/text()").extract_first()
date_block = card.xpath("normalize-space(.//div[contains(#class,'announcement-block__date')]/text())").extract_first().split(',')
date = date_block[0]
city = date_block[1]
item = {'date': date,
'city': city,
'company': company
}
HTML Snippet:
<div class="announcement-block__date">
<span class="announcement-block__company-name">Электро экспресс ХХК</span>
, Өчигдөр 13:05, Улаанбаатар</div>
Expected Output:
date = Өчигдөр 13:05
city = Улаанбаатар
UPDATE: I figured out how to get my date and city data. I ended up using follow next sibling to get date, split by comma, and get the 2nd and 3rd values.
date_block = card.xpath("normalize-space(.//div[contains(#class,'announcement-block__date')]/span/following-sibling::text())").extract_first().split(',')
date = date_block[1]
city = date_block[2]
Extra:
If anyone can tell me or refer me to how I can setup my pipeline file would be greatly appreciated. Is it correct to use pipeline or should you use items.py? Currently I have 3 spiders in the same project folder: apartments, jobs, cars. I need to clean my data and transform it. For example, for the jobs spider I am currently working on as shown above I want to create the following manipulations:
if salary is < 1000, then replace with string 'Negotiable'
if date contains the text "Өчигдөр" then replace with 'Yesterday'
without deleting the time
if employer contains value 'Хувь хүн' then change company value to 'Хувь хүн'
my pipelines.py file:
from itemadapter import ItemAdapter
class ScrapebooksPipeline:
def process_item(self, item, spider):
return item
my items.py file:
import scrapy
class ScrapebooksItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
pass
I changed your xpath to a smaller scope.
extract_first() will get the first instance, so use getall() instead.
In order to get the date I had to use regex (most of the results have time but not date so if you get a blank for the date it's perfectly fine).
I can't read the language so I had to guess (kind of) for the city, but even if it's wrong you can get the point.
import scrapy
import re
class TempSpider(scrapy.Spider):
name = 'temp_spider'
allowed_domains = ['unegui.mn']
start_urls = ['https://www.unegui.mn/azhild-avna/ulan-bator/']
def parse(self, response, **kwargs):
cards = response.xpath('//div[#class="announcement-block__date"]')
# parse details
for card in cards:
company = card.xpath('.//span/text()').get()
date_block = card.xpath('./text()').getall()
date = date_block[1].strip()
date = re.findall(r'(\d+-\d+-\d+)', date)
if date:
date = date[0]
else:
date = ''
city = date_block[1].split(',')[2].strip()
item = {'date': date,
'city': city,
'company': company
}
yield item
Output:
[scrapy.core.scraper] DEBUG: Scraped from <200 https://www.unegui.mn/azhild-avna/ulan-bator/>
{'date': '2021-11-07', 'city': 'Улаанбаатар', 'company': 'Arirang'}
[scrapy.core.scraper] DEBUG: Scraped from <200 https://www.unegui.mn/azhild-avna/ulan-bator/>
{'date': '2021-11-11', 'city': 'Улаанбаатар', 'company': 'Altangadas'}
[scrapy.core.scraper] DEBUG: Scraped from <200 https://www.unegui.mn/azhild-avna/ulan-bator/>
...
...
...
Looks like you are missing indentation.
Instead
def parse(self, response, **kwargs):
cards = response.xpath("//li[contains(#class,'announcement-container')]")
# parse details
for card in cards: date_block = card.xpath("normalize-space(.//div[contains(#class,'announcement-block__date')]/text())").extract_first().split(',')
date = date_block[0]
city = date_block[1]
Try this:
def parse(self, response, **kwargs):
cards = response.xpath("//li[contains(#class,'announcement-container')]")
# parse details
for card in cards: date_block = card.xpath("normalize-space(.//div[contains(#class,'announcement-block__date')]/text())").extract_first().split(',')
date = date_block[0]
city = date_block[1]

Type Error: Result Set Is Not Callable - BeautifulSoup

I am having a problem with web-scraping. I am trying to learn how to do it, but I can't seem to get past some of the basics. I am getting an error, "TypeError: 'ResultSet' object is not callable" is the error I'm getting.
I've tried a number of different things. I was originally trying to use the "find" instead of "find_all" function, but I was having an issue with beautifulsoup pulling in a nonetype. I was unable to create an if loop that could overcome that exception, so I tried using the "find_all" instead.
page = requests.get('https://topworkplaces.com/publication/ocregister/')
soup = BeautifulSoup(page.text,'html.parser')all_company_list =
soup.find_all(class_='sortable-table')
#all_company_list = soup.find(class_='sortable-table')
company_name_list_items = all_company_list('td')
for company_name in company_name_list_items:
#print(company_name.prettify())
companies = company_name.content[0]
I'd like this to pull in all the companies in Orange County California that are on this list in a clean manner. As you can see, I've already accomplished pulling them in, but I want the list to be clean.
You've got the right idea. I think instead of immediately finding all the <td> tags (which is going to return one <td> for each row (140 rows) and each column in the row (4 columns)), if you want only the company names, it might be easier to find all the rows (<tr> tags) then append however many columns you want by iterating the <td>s in each row.
This will get the first column, the company names:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://topworkplaces.com/publication/ocregister/')
soup = BeautifulSoup(page.text,'html.parser')
all_company_list = soup.find_all('tr')
company_list = [c.find('td').text for c in all_company_list[1::]]
Now company_list contains all 140 company names:
>>> print(len(company_list))
['Advanced Behavioral Health', 'Advanced Management Company & R³ Construction Services, Inc.',
...
, 'Wes-Tec, Inc', 'Western Resources Title Company', 'Wunderman', 'Ytel, Inc.', 'Zillow Group']
Change c.find('td') to c.find_all('td') and iterate that list to get all the columns for each company.
Pandas:
Pandas is often useful here. The page uses multiple sorts including company size, rank. I show rank sort.
import pandas as pd
table = pd.read_html('https://topworkplaces.com/publication/ocregister/')[0]
table.columns = table.iloc[0]
table = table[1:]
table.Rank = pd.to_numeric(table.Rank)
rank_sort_table = table.sort_values(by='Rank', axis=0, ascending = True)
rank_sort_table.reset_index(inplace=True, drop=True)
rank_sort_table.columns.names = ['Index']
print(rank_sort_table)
Depending on your sort, companies in order:
print(rank_sort_table.Company)
Requests:
Incidentally, you can use nth-of-type to select just first column (company names) and use id, rather than class name, to identify the table as faster
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://topworkplaces.com/publication/ocregister/')
soup = bs(r.content, 'lxml')
names = [item.text for item in soup.select('#twpRegionalList td:nth-of-type(1)')]
print(names)
Note the default sorting is alphabetical on name column rather than rank.
Reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

Odoo - Search products with code instead of id

I am using odoo 10 and I have two models Order_Line and Products.
OrderLine
class OrderLine(models.Model):
_name = 'order_line'
_description = 'Order Lines'
name = fields.Char()
products = fields.Many2one('amgl.products', String='Products')
Products
class Products(models.Model):
_name = 'products'
_description = 'Products'
_sql_constraints = [
('uniq_poduct_code', 'unique(product_code)', 'Product Code already exists!')
]
name = fields.Char()
product_code = Char()
Now i am trying to create order_line from a csv file and in csv file the customer is providing me 'Product Code' instead of Id. How to handle this that, we use product code and system automatically fills the products associated with that product code.
Note :
Product Code in products table is also unique, so there is no chance of duplicating.
CSV template:
customer/account_number,customer/first_name,customer/last_name,customer/account_type,order/transaction_id,order/products/product_code,order/quantity,order/customer_id/id
Case 1: there are no products stored in the database with any of the product codes the customer is giving to you
If the product codes haven't been created yet in the database, you should have two CSV files (Products.csv and OrderLine.csv). The first one must have three columns (id, name and product_code). The second one must have three columns too (id, name and products/id). So you would only have to make up a XML ID under the id column in Products.csv and call this XML ID from the respective row of the column products/id of the file OrderLine.csv.
Case 2: the product codes the customer has given to you belong to existing products in the database
Now, the customer has given you product codes of products which already exist in the database. In this case you don't have to create a Products.csv file. You need to know which are the XML IDs of the products which have the product codes the customer gave to you. For that, you can go through the interface of Odoo to the tree view of the model products (if this view doesn't exist, you must create it). Then, you'll have to select all records (click on the number 80 of the top right corner to show more records per page if you need it). Once all of them are selected, click on More button and afterwars on Export. Select the column product_code and name and afterwards proceed. Save the generated CSV file as Products.csv, for example. Open it, you'll see all the XML ID of the exported products (if they hadn't XML ID, after the exportation they'll do -an exportation generates XML ID for each exported record if it doesn't have anyone-). Now, I guess the customer has given you something like a file with columns Name of the order line, Product code, so replace the Product code column values with the respective XML IDs of the products you have just exported. So in the end youu should have one file to import, OrderLine.csv, with id, name and products/id columns.
Case 3: there are some product codes belonging to existing products stored in the database and there are some ones which still don't exist
In this case you will have to combine both cases 1 and 2, first, export the products as described in case 2, and then, create a new one with the products whose code doesn't exist yet, as described in case 1. Then replace the product codes the customer gave to you with the respective ones as described in case 2.
Note: this process will give you a lot of time if you have thousands of records to import and you replace them manually. In this case it is mandatory to create a macro in your CSV editor which does the replacements (with search and replace). For example, with LibreOffice you can do macros with Python.
Example (Case 3)
The customer has given you a file of order lines, with two lines:
Name: OL A, Product Code: AAA
Name: OL B, Product Code: BBB
You export products from Odoo interface and you get a file with one
line:
id,name,product_code
__export__.products_a,"Product A","AAA"
You look for the coincidences of the product codes in both files, and
do the replacements in a copy of the customer file, so now you have
this:
Name: OL A, Product Code: __export__.products_a
Name: OL B, Product Code: BBB
Then you create a new CSV Products.csv and put in there the products
whose product code don't exist yet:
id,name,product_code
__import__.products_b,"Product B","BBB"
Now apply the replacements again comparing this new file with the one
we had, and you will get this:
Name: OL A, Product Code: __export__.products_a
Name: OL B, Product Code: __import__.products_b
Convert this file to a right CSV format for Odoo, and save it as
OrderLine.csv:
id,name,products/id
__import__.order_line_1,"OL A",__export__.products_a
__import__.order_line_2,"OL B",__import__.products_b
And finally, import the files, and take into account: import
Products.csv before OrderLine.csv.
EDIT
I think it should be better to waste a bit of time in programming a macro for your CSV editor (Excel, LibreOffice, Open Office or whatever), but if you're desperated and you need to do this only through Odoo, I came up with an awful workaround, but at least, it should work too.
1.Create a new Char field named product_code in order_line model (it would be there temporaly).
2.Modify the ORM create method of this model:
#api.model
def create(self, vals):
product_id = False
product_code = vals.get('product_code', False)
if product_code:
product = self.env['products'].search([
('product_code', '=', product_code)
])
if product:
product_id = product[0].id
vals.update({
'products': product_id,
})
return super(OrderLine, self).create(vals)
3.Copy the file which the customer's sent you, rename the headers properly, and rename the column order/products/product_code as product_code. Import the CSV file. Each importation of records will call the ORM create method of order_line model.
After the importation you'll have in the database the order lines rightly related to the products.
When you've finished you'll have to remember to remove the code you've added (and also remove the column product_code from order_line model in the database, in order to remove junk).
Solution 1
You can create a transient model with the fields that you are using in the CSV. And applying the idea of #forvas:
class ImportOrderLines(models.TransientModel):
_name = 'import.order.lines'
product_code = Char()
#api.model
def create(self, vals):
product_id = False
product_code = vals.get('product_code', False)
if product_code:
product = self.env['products'].search([
('product_code', '=', product_code)
])
if product:
product_id = product[0].id
self.env['order_line'].create({
'products': product_id,
})
return False # you don't need to create the record in the transient model
You can go to the list view of this transient model and import like in any other model, with the base_import view.
Solution 2
You could create a wizard in order to import the CSV to create the Order Lines.
Check the following source code. You must assing the method import_order_lines to a button in the wizard.
import base64
import magic
import csv
from cStringIO import StringIO
import codecs
from openerp import models, fields, api, _
from openerp.exceptions import Warning
class ImportDefaultCodeWizard(models.TransientModel):
_name = 'import.default_code.wizard'
name = fields.Char(
string='File name',
)
file = fields.Binary(
string='ZIP file to import to Odoo',
required=True,
)
#api.multi
def import_order_lines(self):
self.ensure_one()
content = base64.decodestring(self.file)
if codecs.BOM_UTF8 == content[:3]: # remove "byte order mark" (windows)
content = content[3:]
file_type = magic.from_buffer(content, mime=True)
if file_type == 'text/plain':
self._generate_order_line_from_csv(content)
return self._show_result_wizard()
raise Warning(
_('WRONG FILETYPE'),
_('You should send a CSV file')
)
def _show_result_wizard(self):
return {
'type': 'ir.actions.act_window',
'res_model': self._name,
'view_type': 'form',
'view_mode': 'form',
'target': 'new',
'context': self.env.context,
}
def _generate_order_line_from_csv(self, data):
try:
reader = csv.DictReader(StringIO(data))
except Exception:
raise Warning(
_('ERROR getting data from csv file'
'\nThere was some error trying to get the data from the csv file.'
'\nMake sure you are using the right format.'))
n = 1
for row in reader:
n += 1
self._validate_data(n, row)
default_code = row.get('default_code', False)
order_line = {
'default_code': self._get_product_id(default_code),
# here you should add all the order line fields
}
try:
self.env['order_line'].create(order_line)
except Exception:
raise Warning(
_('The order line could not be created.'
'\nROW: %s') % n
)
def _validate_data(self, n, row):
csv_fields = [
'default_code',
]
""" here is where you should add the CSV fields in order to validate them
customer/account_number, customer/first_name, customer/last_name,
customer/account_type, order/transaction_id, order/products/product_code ,order/quantity, order/customer_id/id
"""
for key in row:
if key not in csv_fields:
raise Warning(_('ERROR\nThe file format is not right.'
'\nCheck the column names and the CSV format'
'\nKEY: %s' % key))
if row.get('default_code', False) == '':
raise Warning(
_('ERROR Validating data'),
_('The product code should be filled.'
'\nROW: %s') % n
)
def _get_product_id(self, default_code):
if partner_id:
product_obj = self.env['product.product'].search([
('default_code', '=', default_code),
])
if len(product_code_obj) == 1:
return product_obj.default_code
else:
raise Warning(
_('ERROR Validating data'),
_('The product code should be filled.'
'\nROW: %s') % n
)
return False
You can search by product_code like so:
#api.model
def search_by_code(self, code):
result = self.env['products'].search([('product_code', '=', code)])

Sort JSON dictionaries using datetime format not consistent

I have JSON file (post responses from an API) - I need to sort the dictionaries by a certain key in order to parse the JSON file in chronological order. After studying the data, I can sort it by the date format in metadata or by the number sequences of the S5CV[0156]P0.xml
One text example that you can load in JSON here - http://pastebin.com/0NS5BiDk
I have written 2 codes to sort the list of objects by a certain key. The 1st one sorts by the 'text' of the xml. The 2nd one by [metadata][0][value].
The 1st one works, but a few of the XMLs, even if they are higher in number, actually have documents inside older than I expected.
For the 2nd code the format of date is not consistent and sometimes the value is not present at all. I am struggling to extract the datetime format in a consistent way. The second one also gives me an error, but I cannot figure out why - string indices must be integers.
# 1st code (it works but not ideal)
# load post response r1 in json (python 3.5)
j=r1.json()
# iterate through dictionaries and sort by the 4 num of xml (ex. 0156)
list = []
for row in j["tree"]["children"][0]["children"]:
list.append(row)
newlist = sorted(list, key=lambda k: k['text'][-9:])
print(newlist)
# 2nd code. I need something to make consistent datetime,
# except missing values and solve the list index error
list = []
for row in j["tree"]["children"][0]["children"]:
list.append(row)
# extract the last 3 blocks of characters from the [metadata][0][value]
# usually are like this "7th april, 1922." and trasform in datatime format
# using dparser.parse
def date(key):
return dparser.parse((' '.join(key.split(' ')[-3:])),fuzzy=True)
def order(slist):
try:
return sorted(slist, key=lambda k: k[date(["metadata"][0]["value"])])
except ValueError:
return 0
print(order(list))
#update
orig_list = j["tree"]["children"][0]["children"]
cleaned_list = sorted((x for x in orig_list if extract_date(x) != DEFAULT_DATE),
key=extract_date)
first_date = extract_date(cleaned_list[0])
if first_date != DEFAULT_DATE: # valid date found?
cleaned_list [0] ['date'] = first_date
print(first_date)
middle_date = extract_date(cleaned_list[len(cleaned_list)//2])
if middle_date != DEFAULT_DATE: # valid date found?
cleaned_list [0] ['date'] = middle_date
print(middle_date)
last_date = extract_date(cleaned_list [-1])
if last_date != DEFAULT_DATE: # valid date found?
cleaned_list [0] ['date'] = last_date
print(last_date)
Clearly you can't use the .xml filenames to sort the data if it's unreliable, so the most promising strategy seems to be what you're attempting to do in the 2nd code.
When I mentioned needing a datetime to sort the items in my comments to your other question, I literally meant something like datetime.date instances, not strings like "28th july, 1933", which wouldn't provide the proper ordering needed since they would be compared lexicographically with one another, not numerically like datetime.dates.
Here's something that seems to work. It uses the re module to search for the date pattern in the strings that usually contain them (those with a "name" associated with the value "Comprising period from"). If there's more than one date match in the string, it uses the last one. This is then converted into a date instance and returned as the value to key on.
Since some of the items don't have valid date strings, a default one is substituted for sorting purposes. In the code below, a earliest valid date is used as the default—which makes all items with date problems appear at the beginning of the sorted list. Any items following them should be in the proper order.
Not sure what you should do about items lacking date information—if it isn't there, your only options are to guess a value, ignore them, or consider it an error.
# v3.2.1
import datetime
import json
import re
# default date when one isn't found
DEFAULT_DATE = datetime.date(1, 1, datetime.MINYEAR) # 01/01/0001
MONTHS = ('january february march april may june july august september october'
' november december'.split())
# dictionary to map month names to numeric values 1-12
MONTH_TO_ORDINAL = dict( zip(MONTHS, range(1, 13)) )
DMY_DATE_REGEX = (r'(3[01]|[12][0-9]|[1-9])\s*(?:st|nd|rd|th)?\s*'
+ r'(' + '|'.join(MONTHS) + ')(?:[,.])*\s*'
+ r'([0-9]{4})')
MDY_DATE_REGEX = (r'(' + '|'.join(MONTHS) + ')\s+'
+ r'(3[01]|[12][0-9]|[1-9])\s*(?:st|nd|rd|th)?,\s*'
+ r'([0-9]{4})')
DMY_DATE = re.compile(DMY_DATE_REGEX, re.IGNORECASE)
MDY_DATE = re.compile(MDY_DATE_REGEX, re.IGNORECASE)
def extract_date(item):
metadata0 = item["metadata"][0] # check only first item in metadata list
if metadata0.get("name") != "Comprising period from":
return DEFAULT_DATE
else:
value = metadata0.get("value", "")
matches = DMY_DATE.findall(value) # try dmy pattern (most common)
if matches:
day, month, year = matches[-1] # use last match if more than one
else:
matches = MDY_DATE.findall(value) # try mdy pattern...
if matches:
month, day, year = matches[-1] # use last match if more than one
else:
print('warning: date patterns not found in "{}"'.format(value))
return DEFAULT_DATE
# convert strings found into numerical values
year, month, day = int(year), MONTH_TO_ORDINAL[month.lower()], int(day)
return datetime.date(year, month, day)
# test files: 'json_sample.txt', 'india_congress.txt', 'olympic_games.txt'
with open('json_sample.txt', 'r') as f:
j = json.load(f)
orig_list = j["tree"]["children"][0]["children"]
sorted_list = sorted(orig_list, key=extract_date)
for item in sorted_list:
print(json.dumps(item, indent=4))
To answer your latest follow-on questions, you could leave out all the items in the list that don't have recognizable dates by using extract_date() to filter them out beforehand in a generator expression with something like this:
# to obtain a list containing only entries with a parsable date
cleaned_list = sorted((x for x in orig_list if extract_date(x) != DEFAULT_DATE),
key=extract_date)
Once you have a sorted list of items that all have a valid date, you can do things like the following, again reusing the extract_date() function:
# extract and display dates of items in cleaned list
print('first date: {}'.format(extract_date(cleaned_list[0])))
print('middle date: {}'.format(extract_date(cleaned_list[len(cleaned_list)//2])))
print('last date: {}'.format(extract_date(cleaned_list[-1])))
Calling extract_date() on the same item multiple times is somewhat inefficient. To avoid that you could easily add the datetime.date value it returns to the object on-the-fly since it's a dictionary, and then just refer to it as often as needed with very little additional overhead:
# add extracted datetime.date entry to a list item[i] if a valid one was found
date = extract_date(some_list[i])
if date != DEFAULT_DATE: # valid date found?
some_list[i]['date'] = date # save by adding it to object
This effectively caches the extracted date by storing it in the item itself. Afterwards, the datetime.date value can simply be referenced with some_list[i]['date'].
As a concrete example, consider this revised example of displaying the datesof the first, middle, and last objects:
# display dates of items in cleaned list
print('first date: {}'.format(cleaned_list[0]['date']))
middle = len(cleaned_list)//2
print('middle date: {}'.format(cleaned_list[middle]['date']))
print('last date: {}'.format(cleaned_list[-1]['date']))