On following website, i am trying to fetch the value(970.15 today) using MSXML2.ServerXMLHTTP and .responsetext.
I have done it on another site and found the info in the text.
On that site though, .responsetext does not this value anywhere, but when i do inspect element in my browser, i do see it !
I discovered that function to read information on sites only last week, so still learning ! If anyone can help me, that would be great !
https://www.spglobal.com/spdji/en/indices/equity/sp-euro-50-equal-weight-50-point-decrement-index/#overview
Since it did not work, i put .responsetext value in a cell and copy/pasted it in notepad but did not find 970 is the file.
Note that to put the text in a cell, i had to get rid of the first 50 characters ... i did not understand this either !
Thanks !
G
Related
I am facing a behavior that I really don't understand.
If you go on the webpage: https://www.edel-optics.fr/Lunettes-de-soleil.html#ful_iPageNumber=1 and inspect the code you will realize that it's the same html content as on https://www.edel-optics.fr/Lunettes-de-soleil.html#ful_iPageNumber=7
=> to test it, try to search "ERIKA - 710/T5" on both source codes and you will find it (but you should only find on the ful_iPageNumber=1).
Why is it behaving like this ?
Secondary question: how to I get the real content of https://www.edel-optics.fr/Lunettes-de-soleil.html#ful_iPageNumber=7 ?
Thank you for your help
John
Problem
You have explained that when you perform a search, you get the same results as with your pagination (page 1)
Issue
You are not getting the value your searching for placed into the URL
https://www.edel-optics.fr/Recherche.html?time=1519871844737#query=
the #query is = to nothing
You would be needing something like:
https://www.edel-optics.fr/Recherche.html?time=1519871844737#query=ERIKA%20-%20710/T5
Without seing your code its hard to say where the issue lays. it could well be that the search box is not inside the Form or it could be that the submit button is on another form to the search box, or maybe an issue with backend scripts not grabbing the get values as a result of case differences in the value name.
Without seing your script its hard to diagnose
Ok I found a solution to solve this strange problem, replace the # in the URL with a ? and you will have the actual html content (corresponding to the display)...
I'm developing an ASP code that read a external websites and parse it via HTMLDocument interface Object ( "HTMLFILE" Object) to navigate contents via DOM structure. But there are some pages that throw an error :
'htmlfile error 80070057 Invalid Argument.'
After doing a lot of research, I've discovered that there are some HTML tags that, i don't know why, are not rendered or managed correctly by HTMLFILE object giving me that error.
Because ASP is too old and there isn't much content available today to be probing, I'm convinced that I have to parse it before send to HTMLFILE Object, and the best way that I have figured is to do via RegEx.
But I'm facing some problems (and because i don't have much practice).
I have to successfully locate HTML Tag Blocks that 'HTMLFILE' do not accept to be able to remove them.
For Example:
<head>
<script> ....... </script>
<style> ....... </style>
</head>
<body>
<iframe> ........ </iframe>
<div> ..... </div>
<table>.....</table>
I have to match full script block, style and iframe, leaving the rest of document intact.
From last days i've doing some research and have almost done it:
<(?:script|embed|object|frameset|frame|iframe|meta|style).+(.|\s)*?>$
I've tried to match single line tag (for example '<BR>') but I'm totally confused now and there are some inconsistencies on it, for example, some of lines that close some tags are improperly selected.
I Know that the best way is discover why HTMLFILE is throwing me on error, but there is no more information on error to debug it.
Thank for all the time and patience.
Here is the regex candidate:
<(script|meta|style|embed|object|frameset|frame|iframe)[\s\S]*?<\/(script|meta|style|embed|object|frameset|frame|iframe)>
DEMO with explanation
EDIT
Update with lazy match for [\s\S]*?
Regex is not best tool for that, take a look here, but if you really want, I think in simple cases you can also use one regex for all tags, also nested:
(?=(<([^>]+)>([\s\S]*?)<\/\2>))
DEMO
the 1st groups shows whole captured part, 2nd groups capture just tag, and 3rd group capture content of tag. It doesn't actually match text, only capture some fragments. However you probably can get start/end index of match, and use in as you want.
Still I think you should reconsider using regex, however suntex used above is quite useful, so it is worth to know how to use it.
Im trying to send a pre-populated email using mailto and href however I soon discovered that IE9 has a problem with recognising hrefs longer than 509 character (give or take). Basically, clicking on the link brings up a blank page. I looked for an answer and came across this javascript solution, however it still doesnt work.
Here is the anchor tag:
Sign up
And here is the script:
var sMailto = "mailto:blah#email.com?body=Dear eyecare professional,%0A%0aTo help us schedule your upcoming webinar, please fill out and return the following information:%0A%0A• Name:%0A%0A• Preferred date of webinar* (any Wednesday at 6 pm EST):%0A%0A• City/State (Optional):%0A%0A• Comments/Questions/Feedback:%0A%0AUpon receipt, we will send you a link to an upcoming GoTo Meeting webinar on Macula Risk implementation in your clinic. These webinars are regularly held on Wednesdays at 6 pm EST.%0A%0A* If you would like to request training on any other date or time - please note this in the Comments section and we will do our best to accommodate your request.%0A%0AKind Regards,%0A%0AGerry Bruckheimer";
function doMailto() {
document.location.href= sMailto;
}
The weird thing is that this works in every other browser except stupid IE 9.
UPDATE: If you are experiencing a similar problem to mine, try using window.open(url). I realise its not a perfect solution but it works.
The URL limit for IE9 is actually quite high at between 5120 and 5150 when following a link. Unfortunately a Javascript hack won't help here - the limit will still be in effect. I doubt that's the issue though.
The message you're sending contains some characters that I wouldn't put in a URL, particularly "•". You should URL encode your message before putting it in a link (that last symbol encodes to %e2%80%a2 apparently). You can URL encode it in Javascript or manually encode it with an online tool before pasting it into the <a> tag.
Some browsers are more relaxed than others in handling strange characters in URL (or in code in general).
Hope that helps
Got some html with javascript in, the javascript creates an MSXml2 object and loads some XML from a file, and populates a span. However the HTML that's within the XML is being stripped. Is there a way to stop it from doing this?
(pseudocode)
I've tried various combinations of mySpan = blah.GetNode("mynode").text , .value, .innerxml etc. but nothing is working yet.
Typically, as soon as I post it on here my 2 hours of googling pays off, and I discover its simply (psudocode) getNode("mynode").xml !
When I view the source of the page in my browser (FireFox) (View->Page Source), copy it and paste it into my HTML editor, I view almost the same page (In this example it is www.google.com) as it appears in my browser. But when I get the HTML source through this code (through Googles App Engines)
from google.appengine.api import urlfetch
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
print result.content
copy it and paste it into my HTML editor, the page then looks quite different. Why is it so? Is there something wrong with the code?
++++++++++++++++++++++++++++++
Follow-up:
By this moment (Sunday, December 13th, 2009, 1:01 PM, GMT, to be precise) I have received two comments-questions (from Aaron and Christian P.) and one answer from Alex Martelli.
Both Aaron and Christian P. are asking about what actually is different between the Fire-Fox-obtained source and Google-App-Engine-obtained source when they are both displayed through the same HTML editor.
Here I have uploaded too screen shots:
One shows the Fire-Fox-obtained source
And the other one shows Google-App-Engine-obtained source
when they are both displayed through “MS Front Page” editor.
One difference, which is quite obvious, is different encoding: In Fire-Fox code everything is displayed in English, while in the Google-App-Engine code I get a lot of various symbols, instead.
Another difference is some additional lines at the top of the page in the Google App Engine code. I think, this is what Alex Martelli was talking about in his answer (“…the fetch-and-print approach is going to have metadata around it as well…”).
One more minor difference is that the box for the Google image is split into several boxes in one code, while it remains whole in the other one.
Alex Martelli suggested that I use this code (if I understood him correctly):
from google.appengine.api import urlfetch
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
print "content-type: text/plain"
print
I’ve tried it, but in this case nothing is displayed at all.
Thank you all for your responses and, please, continue responding – I really want to see this issue finally resolved.
++++++++++++++++++++++++++++++
Follow-up:
Okay, the issue has been resolved.
I failed to pay my full attention to Alex Martelli's instructions and, therefore, came up with a wrong code. Here is he right one:
from google.appengine.api import urlfetch
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
print "content-type: text/plain"
print
print result.content
This code displays exactly what is needed - no additional lines at the top of the page.
Well, I still get the strange symbols, but I discovered that it's probably Google's problem. The thing is I am currently in Taiwan, and Google seems to be aware of that and automatically switches from www.google.com (which is in English) to www.google.com.tw (which is in Chinese), but this one, I guess, is already another topic.
Thanks to everyone who has responded here.
You have not explicitly emitted a "content type" header, and an end-of-headers empty line, so the first few lines are probably going to be lost; try adding before the final print something like
print "content-type: text/plain"
print
Beyond this, what you're getting in either case is essentially a big <script> with a little extra HTML around it -- that's all that Firefox is going to give you in the "view source" page, while the fetch-and-print approach is going to have metadata around it as well, e.g., the "doctype" (depending on what HTML editor you're targeting, this may or may not be an issue).