excelvba selenium cannot get html address - html

I have a problem with my html page below. because the page is encrypted; I have uploaded a screenshot.
enter image description here
I want to get the text in the 1st field.unfortunately, it doesn't come apart from the text in the 2nd field.
<div class="col-sm-4" id="anneAdi">
<i class="feather icon-eye-off f-20 bak2" data-id="0" data-bak="anneAdi"></i>
<img src="/Common/bitMapResimGoster.aspx?BitMapResim= : SEVİM"></div>
I want to write SEVİM in excel cell
<'elementler = baglan.FindElementByCss("#anneAdi > img").Attribute("src")>
<'elementler = baglan.FindElementByCss("img").Attribute("src")>
Formula result: %C2%A0:%C2%A0SEV%C4%B0M
like. I tried constantly but it didn't work.
sorry for my broken english.I'm waiting for your help. thanks.

%C2%A0:%C2%A0SEV%C4%B0M If this is your problem, then you know that this is how it is encoded perhaps, because of the empty spaces that are there in the img src: /Common/bitMapResimGoster.aspx?BitMapResim= : SEVİM. Note that there are spaces after BitMapResim, and because of that the output shows like that.
See here
A snippet from W3Schools (https://www.w3schools.com/tags/ref_urlencode.asp)
URL Encoding (Percent Encoding)
URL encoding converts characters into a format that can be transmitted
over the Internet.
URLs can only be sent over the Internet using the ASCII character-set.
Since URLs often contain characters outside the ASCII set, the URL has
to be converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by
two hexadecimal digits.
URLs cannot contain spaces. URL encoding normally replaces a space
with a plus (+) sign or with %20.

Related

Why do some strings contain " " and some " ", when my input is the same(" ")?

My problem occurs when I try to use some data/strings in a p-element.
I start of with data like this:
data: function() {
return {
reportText: {
text1: "This is some subject text",
text2: "This is the conclusion",
}
}
}
I use this data as follows in my (vue-)html:
<p> {{ reportText.text1 }} </p>
<p> {{ reportText.text2 }} </p>
In my browser, when I inspect my elements I get to see the following results:
<p>This is some subject text</p>
<p>This is the conclusion</p>
As you can see, there is suddenly a difference, one p element uses and the other , even though I started of with both strings only using . I know and technically represent the same thingm, but the problem with the string is that it gets treated as a string with 1 large word instead of multiple separate words. This screws up my layout and I can't solve this by using certain css properties (word-wrap etc.)
Other things I have tried:
Tried sanitizing the strings by using .replace( , ), but that doesn't do anything. I assume this is because it basically is the same, so there is nothing to really replace. Same reason why I have to use blockcode on stackoverflow to make the destinction between and .
Logged the data from vue to see if there is any noticeable difference, but I can't see any. If I log the data/reportText I again only see string with 's
So I have the following questions:
Why does this happen? I can't seem to find any logical explanation why it sometimes uses 's and sometimes uses 's, it seems random, but I am sure I am missing something.
Any other things I could try to follow the path my string takes, so I can see where the transformation from to happens?
Per the comments, the solution devised ended up being a simple unicode character replacement targeting the \u00A0 unicode code point (i.e. replacing unicode non-breaking spaces with ordinary spaces):
str.replace(/[\\u00A0]/g, ' ')
Explanation:
JavaScript typically allows the use of unicode characters in two ways: you can input the rendered character directly, or you can use a unicode code point (i.e. in the case of JavaScript, a hexadecimal code prefixed with \u like \u00A0). It has no concept of an HTML entity (i.e. a character sequence between a & and ; like ).
The inspector tool for some browsers, however, utilizes the HTML concept of the HTML entity and will often display unicode characters using their corresponding HTML entities where applicable. If you check the same source code in Chrome's inspector vs. Firefox's inspector (as of writing this answer, anyway), you will see that Chrome uses HTML entities while Firefox uses the rendered character result. While it's a handy feature to be able to see non-printable unicode characters in the inspector, Chrome's use of HTML entities is only a convenience feature, not a reflection of the actual contents of your source code.
With that in mind, we can infer that your source code contains unicode characters in their fully rendered form. Regardless of the form of your unicode character, the fix is identical: you need to target these unicode space characters explicitly and replace them with ordinary spaces.

Is it possible to remove extra space only for Chinese character but keep necessary code symbol in html?

I want to update the page so all the sentences in Chinese all contain extra spaces and every Chinese character gets a space before it.
The page will be a mess if I press \S to find all the all extra space, then delete all.
It will take lot of time pressing \S to find all the all extra space in the code, then cut out the specific Chinese character one by one.
(I just saw that you are doing it in an editor. The following is in JavaScript. You can install Node.js and write a simple program to read in each line and replace it with the correct content, write it back out to a file. For example, Google for Node fs.)
You could use:
const s = "Oscar list 奧 斯 卡 提 名 名 單 出 爐 - 最 佳 導演全男班 today";
const result = s.replace(/([^\u0000-\u00FF])[ \t]*(?![\u0000-\u00FF])/gu, "$1");
console.log(result);
// stringify to show string:
console.log(JSON.stringify(result));
Basically it is saying, if it is not the usual 8-bit extended ASCII but is unicode character $1, followed by some space, and it is not 8-bit extended ASCII afterwards (just lookahead), then replace it with just $1.
You can change it to 7-bit ASCII if you want, which is [^\u0000-\u007F]

Replaced hypen by – in html text

why hypen is replaced by – in html text
<div class="left">Additional website – URL</div>
But when loading webpage shows as ' Additional website – URL ' .
I know use of Html code instead of this hypen ,but I want to know how it happens because just above div tag is working correctly
<div class="left">Additional website - Name</div>
Webpage as 'Additional website - Name'
Look into encoding issues. Using a correct header for your site may have an effect on how it is rendered. Could you post your headers?
What you see is an en-dash (not a hyphen!) which is correctly encoded in UTF-8 in the HTML file, but decoded incorrectly by the browser. You must set your browser’s character encoding to UTF-8.
it happened to me with the '–' character (long minus sign).
I used this simple replace so resolve it:
htmlText = htmlText.Replace('–', '-');

special chars generated when using HTML::TreeBuilder & HTML::Element

I've two questions:
If I take out any text using text() or as_trimmed_text() function and want to push in some element then do I need to use HTML::Entities::encode_entities? :
my $text=$node->as_trimmed_text();
$a->push_content($text); # Do I need to use encode_entities here?
Secondly after processing and generating whole html document using as_HTML() it's sometimes generating some special characters for example: Â(Â) as an extra char when all I see is single space in Dreamweaver.
I have two answers:
Assuming that you want the content of $a to be the same as the content of $node, you do not need to encode_entities as push_content inserts the passed string as a text node rather than parsing it as markup. OTOH, if the content of $node is <span> (represented in HTML source as <span>) and you actually want $a to display <span> (represented in HTML source as &lt;span&gt;), you would call encode_entities on it.
Chances are that your input text contains raw UTF-8 characters which the code is interpreting as Latin-1 or a similar encoding. The "single space" characters are actually U+00A0, non-breaking space, which is represented in UTF-8 by the two bytes 0xc2 0xa0, which when interpreted in Latin-1 are "Â" and non-breaking space.

When should I HTML-escape data and when should I URL-escape data?

When should I HTML-escape data in my code and when should I URL-escape? I am confused about which one when to use...
For example, given a element which asks for an URL:
<input type="text" value="DATA" name="URL">
Should I HTML-Escape DATA here or URL-escape it here?
And what about an element:
NAME
Should URL be URL-escaped or HTML-escaped? What about NAME?
Thanks, Boda Cydo.
URL encoding ensures that special characters such as ? and & don't cause the URL to be misinterpreted on the receiving end. In practice, this means you'll need to URL encode any dynamic query string values that have a chance of containing such characters.
HTML encoding ensures that special characters such as > and " don't cause the browser the misinterpret the markup. Therefore you need to HTML encode any values outputted into the markup that might contain such characters.
So in your example:
DATA needs to be HTML encoded.
Any dynamic segments of URL will need to be URL encoded, then the whole string will need to be HTML encoded.
Name needs to be HTML encoded.
HTML Escape when you're writing anything to a HTML document.
URL Escape when you're constructing a URL to call in-code, or for a browser to call (i.e. in the href tag).
In your examples you'll want to 'Attribute' escape the attributes. (I can't remember the exact function name, but it's in HttpUtility).
In the examples you show, it should be first URL-escaped, then HTML-escaped:
<a href="http://www.example.com?arg1=this%2C+that&arg2=blah">