Red Font (In Firefox) <img> tags in HTML not seen by JSoup - html

EDIT: Self-Answered. JSoup does indeed find all image tags.
I'm trying to scrape something off https://www.flickr.com/explore and I ran into a problem.
In the source code, the main images on that website are written in red font, and they don't get found by my JSoup select method (or with the getElementsByTag method). It would be much easier if you guys went to the website and checked the source code yourself because of formatting issues but I'll try to include the bare minimum here.
EDIT: I just tried viewing the source code through chrome and IE, and the image tags are not red, so I'm assuming it's firefox formatting. But the question remains, JSoup doesn't see those image tags. (Second edit at the end of the post)
EDIT 3: Removed my pasted code to put this print screen in: http://i.imgur.com/o8fNPnZ.png
Notice how the red blocks are the main user uploaded images (that I want), and you can see other img tags that are not red (but those are only things like tiny logos). When I run the code
Elements imageElements = doc.select("img");
and then print it, I get all the tags that are not red.
I'm not very experienced with HTML or CSS, is there something specific that I don't know? Or is it something in my code? Is there a way to retrieve the "red" font images as well?
EDIT 2: OK so I narrowed it down to red HTML font in firefox being an error of some kind. If I hover over it, it says: No space between attributes.
Now I'm a little more confused since flickr is a huge website and it obviously still works since I see the images. Can this be some sort of "anti-scraping" thing they have going on? Is there still a way for me to download the images?

Answering my own question.
I was mistaken, JSoup does indeed find ALL the img tags. I'm not 100% sure where my mistake was since I saw it yesterday and have changed my code since then, but I'm assuming it was my misuse of .select which would exclude those images (my code in this question was simplified for argument's sake).
I'll leave this question up because it might help someone else running into errored HTML in source code since there are a few helpful tips in the comments

Related

CSS elements in inspector displaying wrong code line?

New web dev student here. I'm in inspector looking for certain elements of my web page. However once I find elements, it shows the CSS file and line that it belongs on. But, when I go to the CSS file and look for the element, it's not on that line. Am I missing something here?
Look at both the .carousel-inner in the image. Both telling me different lines of code they're on. But when I go to my CSS file in a text editor, those elements are not on the corresponding line that inspector is telling me.
Keep in mind that inside a generated CSS file, there can be several lines for the same definition. What could be happening is that you are checking the definition for the same class, but for a different media query.
I'd suggest you look for all the instances of the definition and see which one reflects the changes.
ctrl+f in your css file should be work

Website weirdly broken

Hello,
I have the website DaltonEmpire (http://daltonempire.nl, check out for yourself), and when I got home today, it showed error 500. I had made really tiny HTML changes at school via my new CodeAnywhere app, but this was not supposed to happen. After some cleaning up of my PHP, just removing whitespaces, the page loaded.
But now, the background is completely gone and there all all kinds of weird &nspb; tags between my HTML according to Chrome Developer Tools [1], which weren't there before. In my actual code, of course there's whitespace to order my HTML, but that's just spaces, no &nspb;'s, and that never happened before.
Also, the body background is not loaded [2], and the Developer Tools indicate that CSS responsible for the background is not included at all [3] (rather than overwritten or not loaded), even though it is clearly in a <style> block with the body selector [4]. Manually adding that [5][6] bit through the Developer Tools seem to fix this.
Has anyone any idea how this could happen/how this could be solved?
The strangest thing is, I did not change anything specific at all that I can recall. What has caused this?
I need my website to be fixed as fast as possible, as my visitors are students to get their educative documents and in two days is their test week.
Thanks in regard,
Isaiah van Hunen
Attachments:
Weird &nspb;'s
Background not loaded
Background CSS not included?
Background CSS is included
Adding manual Background CSS
Background loads
I can help with 1).
is a formatting entity:
it is the entity used to represent a non-breaking space. It is essentially a standard space, the primary difference being that a browser should not break
(or wrap) a line of text at the point that this occupies.
http://www.sightspecific.com/~mosh/www_faq/nbsp.html
Microsoft Word puts it into HTML files, and so do other WYSIWYG editors.
Unfortunately, CodeAnywhere seems to have the same issue.
Do you have an earlier version of the code that you can open in Notepad/Notepad++/Atom in order to add the whitespace manually there (with ` tags or the like)? That might help.

Numerals as first characters in a line of html text are not displaying in Chrome

I'm observing this super weird bug on a news site maybe someone has seen before.
In the html text, if the first characters in a line of text are numerals, they are not displayed by the browser.
The html is coming through via a CMS, which forces the line breaks in the editor, but no tags are inserted. CMS data is XSLT processed into html templates.
When this text is sent to the browser, you can see the new lines are formed (without br tags), and you see that the numerals are still within the content. But these new lines are only honored by the browser if a white-space property is set using one of the "pre" values.
Seems to be related to the white space property as i can use the inspector to add white-space:pre-line/pre-wrap and boom, they appear.
Really keen to hear some thoughts on this, or could this be a possible Chrome bug?
Link to an example article here:
tvnz.co.nz/national-news/flights-cancelled-130km-h-winds-hit-wellington-5508294
In the last paragraph of that article you can read/inspect to see the missing numeral values.
So I really don't understand why this happens, but it has something to with the zoom setting... There are all kinds of articles about chrome bugs w/ the zoom setting, but none seem to address exactly what you were seeing...
If you inspect the page and change the zoom from 1 to .99999 it works... Again, I got the suggestion from this link but I'm at a loss to explain exactly what is broken w/ chrome, but it does seem like a chrome bug...

How is this element manipulation implemented?

In Google Chrome, you can use shortcuts for elements with contenteditable='true':
CTRL + B : Set the highlighted text to bold, for example
What happens under the hood is, the <b> tag is attached or removed to the marked phrase, word whatever.
How is this done? Where do "they" know from, whether the element is already set to bold, and, primary question, where it is located?
I am asking this because i can't get rid of this problem, mentioned earlier today:
Get the highlighted text position in .html() and .text()
Edit:
I tried the following
Rich-Text-Editing
But first, it won't load correctly, but this should be caused by my own failure.
Second, for learning purposes, i would like to implent my own minified version.
As i am really at JavaScript, i could not figure out how this is be done.
document.getSelection() / window.getSelection() should work for whatever you'd like to do with the selected stuff.
Element styles get inherited. How this is kept track of depends on the CSS implementation.
Taking a look at the source code of Chrome might pretty much help.

html markup: multiple/repeating html,head, body tags, etc. - consequences

I'm working on a website project w/ a team. And they started the code, it seems messy for me.
we're having php includes for some sections of a page.
e.g. in the part of index.php we have:
<?php include("pages/header.tpl");?>, and inside this, we also have:
`<?php include("pages/submenus/commercial.sbm");?>`
inside, header.tpl are the menu bar,
and inside commercial.sbm are the pop-up hover submenu items.
The thing is, in these 3 files, we have <html>, <head>, <body>, <script>,<style> tags
so, these tags now are being repeated in one page -eg. when i view source index.php
I know this is not a valid HTML mark-up right?
My question is, what would be the outcomes having this kind of code/ html markup.
Thanks!
This really depends on which browser you're using and how it parses the file. If you use the developer tools in chrome, safari, or firefox (via firebug), you can see the end result of the parse. Browsers that implement the HTML 5 parser algorithm should all give the same result for malformed markup such as duplicate head and html tags, but there are still many browsers in use that don't.
The best option is really just to fix the bad markup.