I want to extract these telephone numbers from the website, either as an image or if possible as a string.
Here is an example from the website: Link
As you can see the telephone number is an image.
However I cant seem to view the image when I open the image source:
<img src="http://www.callmyname.sg/search/display_phone_number/VUhkVE1WOW5BV1lFWWxSbVhUdFRObGMzQlRBRU9nPT0=">
But when put into html and viewed in a browser, you can see the image fine.
It's a solution to prevent people like you from scraping their website :)
The url http://www.callmyname.sg/search/display_phone_number/VUhkVE1WOW5BV1lFWWxSbVhUdFRObGMzQlRBRU9nPT0= leads to a script that generates the image - probably based on the argument.
VUhkVE1WOW5BV1lFWWxSbVhUdFRObGMzQlRBRU9nPT0=
Since it ends with an equals sign, I tried to decode it as base64:
UHdTMV9nAWYEYlRmXTtTNlc3BTAEOg==
Now it looks even more like base64, so I tried another round:
PwS1_gfbTf];S6W70:
So it's clearly not plaintext (or not encoded with base64), which would be ridiculous and would let you extract the number this way. They either use some special cipher, or store the numbers in database with this as identifier.
I don't think you can steal the phone number easily, only using OCR perhaps.
When you visit the URL, you will get garbage, since they do not send proper MIME header
�PNG IHDR�,���tRNS���7X}4IDATx���_HZo�g�� E��p��l��EHTx!]�DtQ�M�.x3��.dx�*b]Dl"]�D���bQq.B����Z2$��:ȡ�wq��9�s���Cx>W�}���ٳ��ڶ����]���Ǐ�/_���ݿ���ahh���\q����������555�=���*�"�*�*�f�����}uu�e�d2���o����?00p����J%ȴds���BB�˲�`�`0RJy����n�{cc�e�H$b�ۻ����(�~�_����A4�Z��_�V|��J�w�����t:��333.��ƕ������+^����L`���֑��W��3�X�" y���$p'U"��F���y���z&�ioo��萟�*� ����\�L&Sx����p�e���ׯ_R��y�J%�~����|qq��|e�Z%:�J�{��q��nW�ՉD"�J��~�n4��������̔Ty���qF���>BwGa�z����������8��ߡc�f��B�>!�Ub�N�s���|�F�^/B���Lj��i��NfJ��͛D"����� o!t��`����fvv�eم��V���D)�����x���d2966&�n� ^,0O4��(!D��l�h46�-�~��Tً>B�"�Q�>,�P��ok#U \�BU,�P���=G SA+GIEND�B`�
but it's really just ordinary PNG image:
img http://www.callmyname.sg/search/display_phone_number/VUhkVU5scGlBV1lDWWdFelVEUUhZQWRvQlRZR013PT0=
It's a PNG image, but the server doesn't specify the right content header. It tells your browser that is't an html page in UTF-8 encoding, so you just see some garbage (including the letters PNG at the start).
The <img> tag though doesn't know how to display text so it just tries to load it as an image (and with success).
I don't see a way to extract the numbers in any other way than just reading the image. Because it contains only numbers and will have a similar format all the time, maybe you can find a simple way to parse it instead of using a full fledged OCR library.
It's actually a png-file, generated by a computer before being displayed. You can reference it fine from any other page though, and you should also be able to download it easily (right click, save as ...) Note: I tested this, make sure you save the image with the extension .png and not .html which it will default to.
<img src="http://www.callmyname.sg/search/display_phone_number/QkNOVE1RODNBV1lDWWdVM1V6ZFZNZ1JyRFQ0Rk1BPT0=">
Related
In my website, when I hover over 547864852 number it shows me tel:%E2%80%AD547864852%E2%80%AC. But on click it goes to the correct number. Can anyone suggest me what to do? This happens in my one if WordPress site.
Screenshot:
This is called URL Encoding, difference set of characters in such urls are represented in URL encoding in the back.URL encoding converts characters into a format that can be transmitted over the Internet. For more info see - https://www.w3schools.com/tags/ref_urlencode.ASP
If the link does go where you wanted it to, don't worry.
I'm attempting to customize the webUI of a Tasmota binary for an espressif based device. My goal is to embed a base64 encoded PNG into the main page so that the image is available even if the internet connection is unavailable.
I have converted a sample image to base64 via CLI as well as several online converters, and have validated the base64. if I plug the data URI into my browser, the image renders correctly, but if I compile the binary and flash the device to view the page, I receive an error and a broken thumbnail.
my current image is
iVBORw0KGgoAAAANSUhEUgAAAEgAAAAPCAMAAABwbnmhAAADAFBMVEUAAAD////+/v79/f38/Pz7+/v6+vr5+fn4+Pj39/f29vb19fX09PTz8/Py8vLx8fHw8PDv7+/u7u7t7e3s7Ozr6+vq6urp6eno6Ojn5+fm5ubl5eXk5OTj4+Pi4uLh4eHg4ODf39/e3t7c3Nzb29va2trY2NjX19fW1tbV1dXU1NTT09PS0tLR0dHQ0NDPz8/Ozs7Nzc3MzMzLy8vKysrJycnIyMjHx8fGxsbFxcXExMTDw8PCwsLBwcHAwMC9vb27u7u3t7e1tbWysrKxsbGwsLCtra2rq6upqamoqKimpqalpaWkpKSjo6OioqKhoaGgoKCenp6ampqZmZmYmJiXl5eVlZWTk5OSkpKRkZGQkJCPj4+Ojo6NjY2MjIyLi4uJiYmIiIiHh4eGhoaFhYWEhISBgYGAgIB/f39+fn59fX18fHx6enp5eXl3d3d1dXVycnJwcHBvb29ra2tqamppaWloaGhnZ2dmZmZlZWVkZGRiYmJhYWFfX19eXl5dXV1cXFxbW1taWlpZWVlYWFhXV1dWVlZVVVVUVFRTU1NSUlJRUVFQUFBPT09NTU1MTExLS0tKSkpISEhHR0dGRkZFRUVDQ0NBQUFAQEA/Pz8+Pj49PT08PDw7Ozs6Ojo5OTk4ODg3Nzc2NjY0NDQyMjIxMTEvLy8uLi4tLS0rKysqKiopKSkoKCgnJycmJiYlJSUkJCQjIyMiIiIhISEgICAfHx8dHR0cHBwbGxsZGRkYGBgWFhYVFRUUFBQTExMSEhIREREQEBAPDw8ODg4NDQ0MDAwLCwsKCgoJCQkICAgHBwcGBgYFBQUEBAQDAwMCAgIBAQH///8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAfIeyiAAABAHRSTlP//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////wD/////////////////////////////////////////////////////////////c2WY+gAAAAlwSFlzAAALEwAACxMBAJqcGAAABxhpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDYuMC1jMDAyIDc5LjE2NDQ2MCwgMjAyMC8wNS8xMi0xNjowNDoxNyAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczpkYz0iaHR0cDovL3B1cmwub3JnL2RjL2VsZW1lbnRzLzEuMS8iIHhtbG5zOnBob3Rvc2hvcD0iaHR0cDovL25zLmFkb2JlLmNvbS9waG90b3Nob3AvMS4wLyIgeG1sbnM6eG1wTU09Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9tbS8iIHhtbG5zOnN0RXZ0PSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvc1R5cGUvUmVzb3VyY2VFdmVudCMiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIDIxLjIgKE1hY2ludG9zaCkiIHhtcDpDcmVhdGVEYXRlPSIyMDIxLTAxLTIxVDE3OjQxOjAxLTA4OjAwIiB4bXA6TW9kaWZ5RGF0ZT0iMjAyMS0wMS0yOFQxODo1NTozNi0wODowMCIgeG1wOk1ldGFkYXRhRGF0ZT0iMjAyMS0wMS0yOFQxODo1NTozNi0wODowMCIgZGM6Zm9ybWF0PSJpbWFnZS9wbmciIHBob3Rvc2hvcDpDb2xvck1vZGU9IjIiIHBob3Rvc2hvcDpJQ0NQcm9maWxlPSJzUkdCIElFQzYxOTY2LTIuMSIgeG1wTU06SW5zdGFuY2VJRD0ieG1wLmlpZDo2ZjJlM2NiYi05NjI1LTQwNjMtOTlmMi05MDZkNDc3ZTgzNmUiIHhtcE1NOkRvY3VtZW50SUQ9ImFkb2JlOmRvY2lkOnBob3Rvc2hvcDpmYTE1NDdjZi03MzhmLWYwNDMtOWNjNC1mMGNhMTM1ZWI3YTEiIHhtcE1NOk9yaWdpbmFsRG9jdW1lbnRJRD0ieG1wLmRpZDo3OWRlMGZkYy0zZDAyLTQ4NzYtYWVmZC03Zjg0ZWUxN2QyNDciPiA8eG1wTU06SGlzdG9yeT4gPHJkZjpTZXE+IDxyZGY6bGkgc3RFdnQ6YWN0aW9uPSJjcmVhdGVkIiBzdEV2dDppbnN0YW5jZUlEPSJ4bXAuaWlkOjc5ZGUwZmRjLTNkMDItNDg3Ni1hZWZkLTdmODRlZTE3ZDI0NyIgc3RFdnQ6d2hlbj0iMjAyMS0wMS0yMVQxNzo0MTowMS0wODowMCIgc3RFdnQ6c29mdHdhcmVBZ2VudD0iQWRvYmUgUGhvdG9zaG9wIDIxLjIgKE1hY2ludG9zaCkiLz4gPHJkZjpsaSBzdEV2dDphY3Rpb249ImNvbnZlcnRlZCIgc3RFdnQ6cGFyYW1ldGVycz0iZnJvbSBpbWFnZS9naWYgdG8gaW1hZ2UvcG5nIi8+IDxyZGY6bGkgc3RFdnQ6YWN0aW9uPSJzYXZlZCIgc3RFdnQ6aW5zdGFuY2VJRD0ieG1wLmlpZDo3ODExOTllZS1kNzdjLTRmM2ItODRmZi05ZmRlMGJlY2I5M2QiIHN0RXZ0OndoZW49IjIwMjEtMDEtMjhUMTc6MjM6NDQtMDg6MDAiIHN0RXZ0OnNvZnR3YXJlQWdlbnQ9IkFkb2JlIFBob3Rvc2hvcCAyMS4yIChNYWNpbnRvc2gpIiBzdEV2dDpjaGFuZ2VkPSIvIi8+IDxyZGY6bGkgc3RFdnQ6YWN0aW9uPSJzYXZlZCIgc3RFdnQ6aW5zdGFuY2VJRD0ieG1wLmlpZDo2ZjJlM2NiYi05NjI1LTQwNjMtOTlmMi05MDZkNDc3ZTgzNmUiIHN0RXZ0OndoZW49IjIwMjEtMDEtMjhUMTg6NTU6MzYtMDg6MDAiIHN0RXZ0OnNvZnR3YXJlQWdlbnQ9IkFkb2JlIFBob3Rvc2hvcCAyMS4yIChNYWNpbnRvc2gpIiBzdEV2dDpjaGFuZ2VkPSIvIi8+IDwvcmRmOlNlcT4gPC94bXBNTTpIaXN0b3J5PiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0IGVuZD0iciI/PrCJQdUAAADOSURBVDgRY2CkEmAYjAYxMDBoI/gMDBuROQwMoXAWXNwGmQOSycQwCKxhBQGDrFB4UDbEICFknQwMC/AaZAFVBeWmwA1C8ifDQoiaqSjmMqDoNGWYj2oSsotgIoyMWyBqOnEaZMwwDS6Bw6ADDOwIzcy4vAamdFHdhIg1DyBrPwMLsjNwGATV7IiqCm7QBUbGvQxMqCGO1SC4I0LhqrRhBrmB+XuQQgKVgWwQkqICGAtukAOQcQSh5ChQ8ChUDZjkQRiEEu6+COvODsq8BgBH7haTEIBidQAAAABJRU5ErkJggg==
and my insertion is
"<img src='data:image/png;charset=utf-8;base64,**base64string**'>"
which I have tried both with and without specifying charset, I've tried inserting my code within or around div and p tags...
this webserver is contained within a .ino which I am compiling with platformio, hence the outer " " and the inner ' '
safari is reporting: Failed to load resource: Data URL decoding failed
when I view the source, I see
<img src='data:image/png;charset=utf-8;base64,iVBORw0KGgoAAAANSUhEUgAAAR8AAAA6CAMAAAC+n61OAAADAFBMVEUAAAD////+/v79/f38/Pz7+/v6+vr5+fn4+Pj39/f29vb19fX09PTz8/Py8vLx8fHw8PDv7+/u7u7t7e3s7Ozr6+vq6urp6eno6Ojn5+fm5ubl5eXk5OTj4+Pi4uLh4eHg4ODf39/e3t7c3Nzb29va2trY2NjX19fW1tbV1dXU1NTT09PS0tLR0dHQ0NDPz8/Ozs7Nzc3MzMzLy8vKysrJycnIyMjHx8fGxsbFxcXExMTDw8PCwsLBwcHAwMC9vb27u7u3t7e1tbWysrKxsbGwsLCtra2rq6upqamoqKimpqalpaWkpKSjo6OioqKhoaGgoKCenp6ampqZmZmYmJiXl5eVlZWTk5OSkpKRkZGQkJCPj4+Ojo6NjY2MjIyLi4uJiYmIiIiHh4eGhoaFhYWEhISBgYGAgIB/f39+fn59fX18fHx6enp5eXl3d3d1dXVycnJwcHBvb29ra2tqamppaWloaGhnZ2dmZmZlZWVkZGRiYmJhYWFfX19eXl5dXV1cXFxbW1taWlpZWVlYWFhXV1dWVlZVVVVUVFRTU1NSUlJRUVFQUFBPT09NTU1MTExLS0tKSkpISEhHR0dGRkZFRUVDQ0NBQUFAQEA/Pz8+Pj49PT08PDw7Ozs6Ojo5OTk4ODg3Nzc2NjY0NDQyMjIxMTEvLy8uLi4tLS0rKysqKiopKSkoKCgnJycmJiYlJSUkJCQjIyMiIiIhISEgICAfHx8dHR0cHBwbGxsZGR</div><div id=' l1' name='l1'>
it's clearly not pulling all of the base64 or truncating it. I'm counting 799 characters in the browser debug of the base64 string, despite the full length of the string is 5168 characters. additionally, safari is complaining about eb('l1').innerHTML = s; citing "null is not an object" and I've no idea what that means or where's coming from.
I'll be the first to admit that I have a tenuous grasp of HTML, even less of CSS, and even less of web language contained within a .ino
happy to receive any pointing in any direction, if this is too big of a can of worms to digest.
oh, also I suppose it's possible to enable SPIFFS and just host the .png on the device's filesystem, but I'm probably even further away from understanding how to implement that. if that's easier, I'm more than willing to go down that road.
Flash size is 1M and binary is currently 602K, so there is some room left.
source code available at https://github.com/arendst/Tasmota/releases/
Tasmota v9.2.0
EDIT:
I have created a 5px x 5px (and consequently 1px x 1px) test image, and same issue. if I load the page and debug, I see the error. If I edit the source html and paste the correct code and close the ", the image then loads.
the browser shows:
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAIAAAACDbGyAAAK3GlDQ1BEaXNwbGF5AABIiZWXB1BTWReAz3uphIQWCEVK6E2QXqWEHkBBOohKSAIJJYQUVOzK4gquBRURLCu6IqLg6grIWhALtkWw9wVZVNR1sWBDZR/wE/x35p9/9sycd7935txT3tw7cx4ALZUjFuegagC5IpkkJjSAmZScwiT1AQLaoAY4sOBwpWJWdHQkYDK+fiMIwLubI0+Aa3YjseDfiQaPL+ViYVIxTudJubkYt2L6lSuWyABwGIPpXJl4hP/AWFOCFYjxhxHOHGU8dYTTx5g56hMXE4ixCwCZyuFIMgGo/pidWcDNxOJQ0zB2EPGEIozXYezLFXB4GHdiPDk3N2+EP2NshfmLAWhmGHukfxMz87/ipyvicziZCh7ra1TIQUKpOIcz/19+mv8vuTny8RwWmFIFkrCYMUZuZ+dFKFiUPj1qnIW8cX/ktkAeFj/OXGlgyjjzOEERir050yPHOUMYwlbEkbHjxpkvDY4dZ0lejCJXhiSQNc4cyUReeXa8wi7gsxXxCwVxieNcIEyYPs7S7NiICZ9AhV0ij1HUzxeFBkzkDVH0niv9pl8hW7FXJogLU/TOmaifL2JNxJQmKWrj8YOCJ3ziFf5iWYAilzgnWuHPzwlV2KUFsYq9MuxwTuyNVnzDLE549DhDHAhADiLgAR8kkA55kAMyYEIQCEEKYuyNA9hxkvHnyUaaC8wTz5cIMwUyJgu7gXwmW8S1n8x0cnByBBi5z2NH5A1j9J4ijIsTtvxWAM8</div><div id=" l1'="" name="l1">
I can see that the base64 is truncated, and the SRC URL is no longer closed by a second set of ". if I manually insert the closing " paste the remainder of the code into the source, the image then renders.
it looks like something in the string (or perhaps a length restriction in the webserver.ino is breaking this?
Well here´s the thing, I am using the Terminal to do this. With a text editor such as nano I create a plain text file with the content: "GIF89a2017" and I save it as rare.gif
Here´s the thing, when I do file rare.gif it gives me this output: rare.gif: GIF image data, version 89a, 12338 x 14129 and that is indicating that it is a GIF image with a resolution of 12338 x 14129 and that's what I don't understand. Where´s that resolution coming from?
Another thing is, I thought extension didn't really decide what type of file it is, for example when I take a .gif and convert it into and .exe it still recognises it as a GIF image with the file command. I'm gonna guess that in the problem that I have it is recognised as a GIF image because it was created with the GIF extension but I'd like to know why.
Thanks to everyone!
Where´s that resolution coming from?
It's coming from the (bogus) GIF89 header you put in the file. The four bytes following "GIF89a" define the width and height. Each one is stored as a 16-bit unsigned integer. The characters you put there -- 2017 -- are interpreted as:
32 30 ("20") -- 0x3032 = 12338
31 37 ("17") -- 0x3731 = 14129
I'm gonna guess that in the problem that I have it is recognised as a GIF image because it was created with the GIF extension but I'd like to know why.
No, file doesn't look at extensions. It's because the file had a semi-valid GIF header. If you changed the header to something that didn't start with "GIF89a", it will no longer be recognized as a GIF.
In Safari, when I drag an image from the browser window to the desktop the image take its filename from the last part of the URL. For example:
http://www.mysite.com/images/05
the image name is
05.jpeg
Is this a behaviour consistent across all (recent IE8+) browsers?
Can I decide an arbitrary filename the image will get when dragged out of the browser?
I tried (in Safari) to set the name and alt tag of the image but this doesn't have any effect.
Maybe can I decide the filename setting it in the header of the server response when the image is served?
One method is to specify the desired filename in the header of the response when the file is served.
I'm on php so...
header('Content-disposition: inline; filename=the-image.jpg');
When the image is dragged from the browser window to the desktop the file name is the-image.jpg
Unfortunately this is not consistent across all browsers, in particular Firefox doesn't follow the rule and sticks to the last part of the URL for giving the name.
The solution that works across all browsers is to avoid specifying the name in the header of the response and set the name as the last part of the URL.
As I can manage the routes for my website the solution I adopted is to let the route to images end with a string that is ignored by the server and has the sole purpose of defining a filename for the image in case it's dragged out of the browser.
For example:
http://www.my-site.com/images/05/my%20custom%20filename.jpg
What tells the server what image the client wants is the parameter following images, so 05 in the example.
It's important to note that the filename must be URI-component encoded, escaping spaces, slashes, percents, and so on...
The filename, to be OS friendly, should then be scrubbed from slashes, back-slashes and other characters that may eventually create mess.
I'm only asking because I tried sending an email (with PEAR) which containd an image whose name was something like "header image_3021" and noticed that it didn't show up in the email.
When I checked the SRC in the recieved email the space was replaced with + and that somehow made the link point to the wrong file. Now, IIRC, + is a correct encoding for spaces in URLs yet the browser could not locate "header+image_2031".
I checked the original content of the email both with Gmail's show original and in the server logs and the space was still there, so the replacement was done either by the browser or by Gmail's rendering process.
I have since modified my upload algorithms to not allow spaces in filenames but I have to ask: What's the best way to make sure the browser will display images with spaces in their file names? Replace them with %20 myself? Let the browser do it? Just disallow them?
Definitely encode them. By doing so, you remove the nuance of how different clients will interpret the string. As #mr alien said, there are out of box php functions that will handle that for you.