How to extract DATE from handwritten images in python - ocr

Extract the "Date" only from the handwritten text images without bounding the coordinates for the data in python.
The date format can be of any form e.g:
20-april-2019
12-02-2020
12-02-20
Feb-12-19
Feb-12-20
12Feb-2020
and so on
As explained in this link : https://www.researchgate.net/publication/261342693_Date_Field_Extraction_in_Handwritten_Documents

#Nathancy
Sample image for Date Extraction

Related

How to extract a description part from website with proper spacing?

I have accessed the website with beautiful Soup and retrieved the description part(div class) but since it was in bulleted points. I receive an output like this without any spacings between points(Not Readable):
DESCRIPTION:
COVID-19 ProjectionsGovernment-mandated social distancingHospital resource useAll bedsICU bedsInvasive ventilatorsDeaths per dayTotal deaths
Actually I have both normal paragraph and bullet points so I cannot use li or ul to retrieve bullet points alone.
This is my program for this description part:
def DESCRIPTION(self):
print('\n'+"DESCRIPTION: ")
for j in Data_Set_Info.soup.select('.iH9v7b'):
k = j.get_text()
print ('\n'+k)
The HTML code for this webpage is:
<div class="iH9v7b"><p>COVID-19 Projections</p><ul><li>Government-mandated social distancing</li><li>Hospital resource use</li><ul><li>All beds</li><li>ICU beds</li><li>Invasive ventilators</li></ul><li>Deaths per day</li><li>Total deaths</li></ul><p></p></div>
The webpage is:https://datasetsearch.research.google.com/search?query=health&docid=B2%2BtssYi2L2wvQwVAAAAAA%3D%3D
In this website there are different dataset and each dataset have different description. I need to get all description in a proper spacing with single program. Thanks in Advance
If you just want to get all the text with spaces in between, you can specify the character used to join text from different elements as an argument to get_text, like so:
k = j.get_text(' ')
If you want to be able to preserve (potentially nested) lists in the output then you'll need to recursively search through j.contents. A one-size-fits-all solution is unlikely to work for that purpose and will probably need a bit of experimentation.
Documentation links:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#contents-and-children

Obtain a text from a html code with BeautifulSoup

I've been trying to extract the text from the following code with BeautifulSoup in Python:
<a class="w-menu__link" href="https://www.universidadviu.es/grado-economia/">Grado en Economía</a>
I need to extract the text "Grado en Economía" from this and all other similar lines in the html code. For example:
<a class="w-menu__link" href="https://www.universidadviu.es/grado-derecho/">Grado en Derecho</a>
In this line I need to extract "Grado en Derecho".
I can extract the class and the href, but I don't know how to extract the rest of the text. I'm using the following code:
list_of_links_graus = []
html_graus = urlopen("https://www.universidadviu.es/grados-online-viu/") # Insert your URL to extract
bsObj_graus = BeautifulSoup(html_graus.read());
for link in bsObj_graus.find_all('a'):
list_of_links_graus.append(link.get('href'))
I would also ask if someone can please edit the title of this question in order to fit the real problem, since I'm not a html expert and I suppose I'm not extracting a simple text (as the title says).
Thanks to all in advance.
Use the text attribute
for link in bsObj_graus.find_all('a'):
list_of_links_graus.append((link.get('href'), link.text))

(Microsoft excel) Adding html tags to cells

I'm trying to figure out how do i add for example <p> and </p> at the beginning and the end of my cell data. So my data looks like this, example:
Before: Los Angeles
After: <p> Los Angeles </p>
I have a whole table with tons of content to be converted in this way. I would appreciate the help
Excel is not a good HTML construction tool.
If you want to concatenate html tags with contents of Excel cells you can construct the final html string using the & operator between bits of text, like this:
="<p>"&A1&"</p>"
Edit: if you need to include formatted dates in this construct, you may want to look at the Text() function, like below. Adust to the format you need.
=TEXT(A1,"dd mmm yyyy")
Format the cell's value with TEXT using a custom format mask. The backslash can be used as an escape character to avoid conflict with reserved formatting characters.
=TEXT(A1, "\<\p\>#\<\/\p\>")
This format mask could also be used as a cell's custom number format.
<p>Los Angeles</p>
<p>Melbourne</p>
<p>Vancouver</p>

input type date, custom format

How can I make it so that user has to enter YYYY-MM.
Where in the input box, the hyphen already exists and that you have to fill out as seen replacing YYYY and MM but you cannot enter anything else.
input date gives the mm/dd/yyyy format. I want YYYY-MM.
The date input doesn't allow any deviation from the standard styling, therefore this is not possible (to my knowledge) to do in standard HTML, however, you could utilise a standard HTML input, and then use the Masked Input Plugin by Digital Brush (using Javascript) to mask the input format.
For example:
$("#input").mask("9999-99");
You can also set your own parameters or custom placeholders using the plugin, more info on this can be viewed on their website.
You can't edit the date type format.
If you don't want to use the date picker that HTML5 gives you for free, then you could create your own input box with regex restrictions.
<form>
<input type="text" name="input1" placeholder="YYYY-MM" required pattern="[0-9]{4}-[0-9]{2}" />
<input type="submit">
</form>
If you want more strict validation (this one only validates that input is a 4 digit number and a 2 digit number separated by a hyphen), refer to this post
Regex date validation for yyyy-mm-dd
You can play around with the code here:
https://jsfiddle.net/bowenyang/oth9b8o7/

Csv hebrew text not in good order

I am trying to import csv file to use the data in my php project to insert them in mysql database. The problem is that my csv file contains one column woth hebrew character. This csv converted from xls file.
The problem is that when i open the file with excel i have correct display, like that
But when i am trying to use the csv file. I have a problem of order
פרקט תלת שכבתי אלון 189x15/4 גרי ישן מעושן גימור שמן UV
Somebody know how to resolve this problem thanks!
The problem is not in my php script. My php script is all right. But the problem is that the xcel cell format not correspond when i use it in csv.
The problem is when an English word or number is mixed in with the text:
Example:
English:
“Can we improve the health of patients by giving them Aspirin?”
Hebrew:
“[Hebrew translated text] Aspirin?”
This is displayed as:
Aspirin [Hebrew translated text]?
Hopefully I explained the issue enough. It is a little confusing so if I need clarify more, please let me know.
Any help or experience is appreciated?
As an RTL language speaker, I think I can be of help.
It all depends on the text direction the UI is using. Most of the application uses LTR (Left-to-Right) for text direction by default. If you are using MySQL Workbench to see the values stored in the column, MySQL Workbench uses LTR direction as well. That's why you will see the wrong order problem when you have bi-directional (text mixed with numbers) text.
Keep in mind, that CSV is merely a UTF-8 plain text, which means the text is style-less and direction-less. You need only to set your HTML direction to RTL. See example below:
<h3>Wrong LTR Direction</h3>
<p dir="ltr">פרקט תלת שכבתי אלון 189x15/4 גרי ישן מעושן גימור שמן UV</p>
<h3>Correct RTL Direction</h3>
<p dir="rtl">פרקט תלת שכבתי אלון 189x15/4 גרי ישן מעושן גימור שמן UV</p>
Salam :)