I am attempting to parse a page to find out if it has a string on it that I need so I can take the appropriate action. I would usually use
New System.Net.WebClient().DownloadString(URL).ToString
This cannot be used in the instance as I need to be logged into to view the page, because of this I have attempted to get the document Text from the web browser element once the page is loaded
RichTextBox2.Text = WebBrowser1.DocumentText.ToString
Unfortunately, this does not work as the string I am looking for is not within the source of the page, it also does not appear to be referenced in the source which really confuses me but appears within the elements tab within the google chrome Developer tools.
I have been looking around on how to get a list of all the elements on the web page so that I can just see if it contains the element I am looking for but I cannot seem to find what I am looking for.
TLDR: I am looking on how to get all the elements of a loaded webpage that do not appear in the web page source.
Side Note: I cannot seem to find the element being referenced using "src" and it does not appear to be within an iframe.
Any help would be greatly appreciated as I am completely stumped.
I have looked around but if you feel I have missed something please let me know.
For those interested, I have found a workaround for my instance but this will not work and is only a bodge as there is a different value that only appears for the condition I am looking for not the actual condition.
For Each element As HtmlElement In Me.WebBrowser1.Document.All
RichTextBox3.Text = RichTextBox3.Text + (element.TagName() & "-->" & element.Id)
Next
This will show some elements I assume on the page but not all of them in my instance and not the one I was looking for but could be a good place to start for others with my issue.
Related
I'm trying to take the full html source of the tab in
this page
I want to take the source of
this tab
But unfortunately, the html I'm getting is not completed.
I registered a gif to explain it better
That select list is showing just when I inspect the element, while If I just insect the element with the list closed, it doesn't return any list html.. is it created dinamically when the user click on it?
I've tried to expand all the codes, but unfortunately it seems the html of every list is not appearing.. It might be created just when I open the list?
Is there a way to get the lists html?
Hope I've been clear.
Not sure what you want to save, but by inspecting it sources, it seems that the website use the way of removing and appending the html source which means only you pressed the expanded button, Javascript will append it (different options) to the body, otherwise it will not shown in the element tab.
I don't think you could get all html tags in just 1 try because the website use Javascript to append the html and you can't see it in the element section in console when the element is being removed.
Example:
You can save the page if you want. Just save it with Ctrl+S and you will find the basic source in there along with the stylesheet and other scripts.
I am creating a personal learning website. It will contain many "lessons" and I would like to be able to use a menu for every active "lesson", so that the user could go to the next "lesson" without going back to the menu to select it.
To do that, I would need an extern file to modify once and for it to be "called" on every "active" page on the website.
I tried using the -object tag- in html, but it does not work with the template i am using and I would still have to modify the size of the window on every page it appears in.
For some unknown reason, I could not make the third-party method shown on w3s work, and neither the jQuery method I found on similar questions.
I am at an impasse with this page I am trying to modify for my institution. The page is somewhat proprietary and not all of the files that comprise the page are accessible to me. The only things I can get access to are 5 different files called a stylesheet(naturally), a head, a top, a bottom, and a print
So this is the page:
http://s5-sandbox.parature.com/ics/support/default.asp?deptID=15028
The issue is that below the page where you have the 2 sections for "Viewed Most Popular Topics" and "Most Recent Topics", is that the words ViewedMost run on together. Additionally the number of views and the first word of the actual topic respective of the views run on together. For example, you'll see 3869What is my Blackboard Username and COM ID?
Given the pages that I have access to, none of them have the ID for that section declared in them. It is in some .ASP page on Parature's backend. I've contacted their support and put in a ticket and did not receive any resolution. I tried emailing someone directly and a man told me something ridiculous:
If you right click on the area and click inspect element you are able to add a space either behind “view” or in
front of “most”. Either works, I’ve tried them both. The same goes for the number and the article.
I already know full well how to use the Chrome inspector but what I told this gentleman was that I don't have access to the page to add a space nor do I even know if there is an element I can use to fix the formatting. I was hoping you guys would be able to see something I do not on the page that would allow me to create the space that I am looking to add.
Thank you all.
.item:after {
content: "\00a0";
}
I need to add microdata snippets to a list that is being populated by a script during the page load.
My code is written in a way that I have the basic list element in my html code, and it gets duplicated as the list is populated (this happens once when the page is loaded).
I try to add microdata to every element in the list, but when I use google's rich snippet tool it seems that it only reads the basic html snippet and not the whole list after it was populated. I do the exact same trick on a different page and there it seems to work fine (meaning i get a list of videoObjects each containing the data inserted to it) [edit: the second page was created on server side, this is why it worked on it].
Any idea how to get around problem?
As a general rule, search engines do not read content dynamically created by JavaScript. So anything your script dynamically creates will be invisible to Google. If you want them to index this content you need to create this content server side.
I'm trying to extract the overall comments number from a web page using Jsoup.
For example, here is a page (CNN): http://edition.cnn.com/2011/POLITICS/07/31/debt.talks/index.html?hpt=T1
I see that the class ID is cnn_strycmtsndff, but can't get to find the right command to extract it.
Can someone help?
Thanks
Unfortunately, I don't think Jsoup is going to cut it. If you use the Chrome developer tools you can clearly pick out the HTML used for presenting the "(##### Comments)" section, but if you just view the source, none of that information is there. It seems like they are using some Javascript to dynamically embed the information in the page.
This is what you see in "View Source":
<div id="disqus_thread"></div><script type="text/javascript" src="http://cnn.disqus.com/embed.js"></script>
So Jsoup will never be able to see the elements with the comment information.