iMacros - extracting text - extract

Can you explain to me why my EXTRACT doesn't work? I am trying to count the number of users with private profiles in my group (because mostly these are bots). So I need to check whether the string "This profile is private" exists on the user's page.
After the code runs, a blue frame appears around the DIV which means the element is pinned correctly. However, the extract result is NaN.
I tried extracting both TXT and HTM.
iimPlay("CODE:TAG POS=1 TYPE=DIV ATTR=TXT:This<SP>profile<SP>is<SP>private EXTRACT=TXT");
var pageblock = parseInt (iimGetLastExtract());
alert (pageblock);
An example page with a private profile:
https://vk.com/id646170325
I tried extracting both TXT and HTM.

Yep normal..., no way you can convert the extracted This profile is private (is a String) to an Integer...!, ... which is what parseInt() is trying to do..., => Result = NaN...
=> The Behaviour/Result looks then "normal" to me...

Related

Read tif tag from a tif file using LibTif[Edited : Adding sample code]

I have a requirement where I need to read couple of Tiff tags from a input Tiff file. As user can provide any tag ID to read. For this, I need to know the type of the value of that tag so that i can read the tag and return the value to user.
const char* filename = "C:\\test\\Modified.tif";
TIFF* mtif = TIFFOpen(filename, "r");
uint16 flor, w, h;
uint16 gotcount = 0;
TIFFGetField(mtif, TIFFTAG_FILLORDER, &flor);
TIFFGetField(mtif, TIFFTAG_IMAGEWIDTH, &w);
TIFFGetField(mtif, TIFFTAG_IMAGELENGTH, &h);
I am using LibTif library. Here all i am able to read the width and height properly whereas fillorder tag value is not being received.
I opened the file in a Tiff editor and can see that FillOrder has valid value.
Can someone help me in this? Thanks.

.text is scrambled with numbers and special keys in BeautifuSoup

Hello I am currently using Python 3, BeautifulSoup 4 and, requests to scrape some information from supremenewyork.com UK. I have implemented a proxy script (that I know works) into the script. The only problem is that this website does not like programs to scrape this information automatically and so they have decided to scramble this script which I think makes it unusable as text.
My question: is there a way to get the text without using the .text thing and/or is there a way to get the script to read the text? and when it sees a special character like # to skip over it or to read the text when it sees & skip until it sees ;?
because basically how this website scrambles the text is by doing this. Here is an example, the text shown when you inspect element is:
supremetshirt
Which is supposed to say "supreme t-shirt" and so on (you get the idea, they don't use letters to scramble only numbers and special keys)
this  is kind of highlighted in a box automatically when you inspect the element using a VPN on the UK supreme website, and is different than the text (which isn't highlighted at all). And whenever I run my script without the proxy code onto my local supremenewyork.com, It works fine (but only because of the code, not being scrambled on my local website and I want to pull this info from the UK website) any ideas? here is my code:
import requests
from bs4 import BeautifulSoup
categorys = ['jackets', 'shirts', 'tops_sweaters', 'sweatshirts', 'pants', 'shorts', 't-shirts', 'hats', 'bags', 'accessories', 'shoes', 'skate']
catNumb = 0
#use new proxy every so often for testing (will add something that pulls proxys and usses them for you.
UK_Proxy1 = '51.143.153.167:80'
proxies = {
'http': 'http://' + UK_Proxy1 + '',
'https': 'https://' + UK_Proxy1 + '',
}
for cat in categorys:
catStr = str(categorys[catNumb])
cUrl = 'http://www.supremenewyork.com/shop/all/' + catStr
proxy_script = requests.get(cUrl, proxies=proxies).text
bSoup = BeautifulSoup(proxy_script, 'lxml')
print('\n*******************"'+ catStr.upper() + '"*******************\n')
catNumb += 1
for item in bSoup.find_all('div', class_='inner-article'):
url = item.a['href']
alt = item.find('img')['alt']
req = requests.get('http://www.supremenewyork.com' + url)
item_soup = BeautifulSoup(req.text, 'lxml')
name = item_soup.find('h1', itemprop='name').text
#name = item_soup.find('h1', itemprop='name')
style = item_soup.find('p', itemprop='model').text
#style = item_soup.find('p', itemprop='model')
print (alt +(' --- ')+ name +(' --- ')+ style)
#print(alt)
#print(str(name))
#print (str(style))
When I run this script I get this error:
name = item_soup.find('h1', itemprop='name').text
AttributeError: 'NoneType' object has no attribute 'text'
And so what I did was I un-hash-tagged the stuff that is hash-tagged above, and hash-tagged the other stuff that is similar but different, and I get some kind of str error and so I tried the print(str(name)). I am able to print the alt fine (with every script, the alt is not scrambled), but when it comes to printing the name and style all it prints is a None under every alt code is printed.
I have been working on fixing this for days and have come up with no solutions. can anyone help me solve this?
I have solved my own answer using this solution:
thetable = soup5.find('div', class_='turbolink_scroller')
items = thetable.find_all('div', class_='inner-article')
for item in items:
alt = item.find('img')['alt']
name = item.h1.a.text
color = item.p.a.text
print(alt,' --- ', name, ' --- ',color)

How can I display a TemporaryUploadedFile from Django in HTML as an image?

In Django, I have programmed a form in which you can upload one image. After uploading the image, the image is passed to another method with the type TemporaryUploadedFile, after executing the method it is given to the HTML page.
What I would like to do is display that TemporaryUploadedFile as an image in HTML. It sounds quite simple to me but I could not find the answer on StackOverflow or on Google to the question: How to display a TemporaryUploadedFile in HTML without having to save it first, hence my question.
All help is appreciated.
Edit 1:
To give some more information about the code and the variables while debugging.
input_image = next(iter(request.FILES.values()))
output_b64 = (input_image.content_type, str(base64.b64encode(input_image.read()), 'utf8'))
Well, you can encode the image to base64 and use a data url as the value for src.
A base64 data url looks like this:
<img src="">
\_______/ \__________________/
| |
File type base64 encoded data
Read the Mozilla docs for more on data urls.
Here's some relevant code:
import base64
def my_view(request):
# assuming `image` is a <TemporaryUploadedFile object>
image_b64 = base64.b64encode(image.read())
image_b64 = image_b64.decode('utf8') # convert bytes to string
image_type = image.content_type # png or jpeg or something else
return render('template', {'image_b64': image_b64, 'image_type': image_type})
Then in your template:
<img src="data:{{ image_type }};base64,{{ image_b64 }}">
I want to thank xyres for pushing me in the right direction. As you can see, I used some parts of his solution in the code below:
# As input I take one image from the form.
temp_uploaded_file = next(iter(request.FILES.values()))
# The TemporaryUploadedFile is converted to a Pillow Image
input_image = pil_image.open(temp_uploaded_file)
# The input image does not have a name so I set it afterwards. (This step, of course, is not mandatory)
input_image.filename = temp_uploaded_file.name
# The image is saved to an InMemoryFile
output = BytesIO()
input_image.save(output, format=img.format)
# Then the InMemoryFile is encoded
img_data = str(base64.b64encode(output.getvalue()), 'utf8')
output_b64 = ('image/' + img.format, img_data)
# Pass it to the template
return render(request, 'visualsearch/similarity_output.html', {
"output_image": output_b64
})
In the template:
<img id="output_image" src="data:{{ image.0 }};base64,{{ image.1 }}">
The current solution works but I don't think it is perfect because I expect that it can be done with less code and faster, so if you know how this can be done better you are welcome to post your answer here.

Second Scraper - If Statement

I am working on my second Python scraper and keep running into the same problem. I would like to scrape the website shown in the code below. I would like to be ability to input parcel numbers and see if their Property Use Code matches. However, I am not sure if my scraper if finding the correct row in the table. Also, not sure how to use the if statement if the use code is not the 3730.
Any help would be appreciated.
from bs4 import BeautifulSoup
import requests
parcel = input("Parcel Number: ")
web = "https://mcassessor.maricopa.gov/mcs.php?q="
web_page = web+parcel
web_header={'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_13_2)AppleWebKit/537.36(KHTML,likeGecko)Chrome/63.0.3239.132Safari/537.36'}
response=requests.get(web_page,headers=web_header,timeout=100)
soup=BeautifulSoup(response.content,'html.parser')
table=soup.find("td", class_="Property Use Code" )
first_row=table.find_all("td")[1]
if first_row is '3730':
print (parcel)
else:
print ('N/A')
There's no td with class "Property Use Code" in the html you're looking at - that is the text of a td. If you want to find that row, you can use
td = soup.find('td', text="Property Use Code")
and then, to get the next td in that row, you can use:
otherTd = td.find_next_sibling()
or, of you want them all:
otherTds = td.find_next_siblings()
It's not clear to me what you want to do with the values of these tds, but you'll want to use the text attribute to access them: your first_row is '3730' will always be False, because first_row is a bs4.element.Tag object here and '3730' is a str. You can, however, get useful information from otherTd.text == '3730'.

Control name from other window

I need to read a text value from other window and query that value to another application (my question will be around the 1st task)…so,
I’m “spying” other window (some 3rd party application we use in connection with our product) and waiting for “accept” button to be clicked to read a value from a text box. This other application, the dialog box, has multiple text boxes and command buttons.
I made a mouse hook and I’m activating it upon this application appearance. I‘m reading all mouse moves inside this window rectangle; texts, captions, child windows IDs, rectangles, grab left/right/middle/wheel clicks. I can grab the “accept” button click; I CAN SEE the button caption and I can read that window, get the text and determine what button is clicked, etc. Now…
I can read ALL EDIT class values, get their window handles, rectangles, etc., BUT I CANNOT IDENTIFY THEM AS UNIQUE items within the class collection: I need specifically read my desired text box value. Fortunately the text box I’m interested in is ALWAYS COMES FIRST IN MY LOOP when I’m reading texts from EDIT class loop. However I would like to be more specific; making sure that I’m reading the text box with the NAME. I know. During the development I could read that NAME and hard code it in the program. My a suspicion is that control name is not saved in the binary code. My understanding is that control ID, windows handle are created upon windows creation and have absolutely no reference to control name (say: txtOrderNumber). If for buttons I can be specific because of button captions (so, I can determine what button is clicked) I’m locked with EDIT class items and thrown to lucky 1st guess when reading the value.
My question is:
How I can get a control name from another window, for this task I’m interested to know about EDIT class instance names.
Here are some codes (shortened) from the project:
Dim hWnd As IntPtr = FindWindow(Nothing, _windowText)
'API: FindWindowEx
'API: SendMessage
'API: GetClassName
'API: GetWindowTextLength
'API: GetWindowText
'API: WM_GETTEXT
Public Shared Function GetClassValues(_controlClass As String, _hWindow As IntPtr) As List(Of String)
Dim cl As New List(Of String)
'First control handle in that class
Dim hc As IntPtr = FindWindowEx(_hWindow, IntPtr.Zero, _controlClass, vbNullString)
Do
Dim sv As String = GetWindowValue(hc)
cl.Add(sv)
'Next control (after hc) handle
hc = FindWindowEx(_hWindow, hc, _controlClass, vbNullString)
Loop Until hc = 0
Return cl
End Function
Public Shared Function GetWindowValue(_hWindow As IntPtr) As String
If _hWindow = IntPtr.Zero Then Return String.Empty
Dim sz As Integer = 256
Dim bf As IntPtr = Marshal.AllocHGlobal(sz)
Dim pt As IntPtr = SendMessage(_hWindow, WM_GETTEXT, sz, bf)
Dim rs As String = Marshal.PtrToStringUni(bf)
Marshal.Release(bf)
Return rs.Trim
End Function
Public Shared Function GetWindowClassName(_hWindow As IntPtr) As String
Dim ln As Integer = 256
Dim sb As New System.Text.StringBuilder("", ln)
GetClassName(_hWindow, sb, ln)
Return sb.ToString()
End Function
Public Shared Function GetWindowText(_hWindow As IntPtr) As String
Dim ln As Integer
If _hWindow.ToInt32 <= 0 Then Return String.Empty
ln = GetWindowTextLength(_hWindow)
If ln = 0 Then Return String.Empty
Dim sb As New System.Text.StringBuilder("", ln + 1)
GetWindowText(_hWindow, sb, sb.Capacity)
Return sb.ToString()
End Function
I've looked at GetWindowLong and GetDlgCtrlID API and have tried most of the flags with no success so far...
Any tip, clue, direction is appreciated.
Thank you
I made a global mouse hook, this is not a problem and, GetWindowText and WM_GETTEXT works fine. As a matter of fact the program works fine and functional at this point.
Upon detecting a target window I save child window handles in a list collection using EnumChildWindows API and filtering EDIT class windows only (used in connection with modified version of GetClassValues function posted above. The argument for this function is the first EDIT class window handle). Anyway, the way how I arbitrary access my desired text box is to use the saved list for this class windows and access by list index. As I mentioned earlier, fortunately, windows CREATES THIS CHILD windows in consistent order. So, in my case this EDIT class window, the text box “object”, is always the 1st in the list though there are many in the main window.
I would like to get that the text box “object” name, say “txtAccountNumber” as I mentioned earlier…