How is this element manipulation implemented? - html

In Google Chrome, you can use shortcuts for elements with contenteditable='true':
CTRL + B : Set the highlighted text to bold, for example
What happens under the hood is, the <b> tag is attached or removed to the marked phrase, word whatever.
How is this done? Where do "they" know from, whether the element is already set to bold, and, primary question, where it is located?
I am asking this because i can't get rid of this problem, mentioned earlier today:
Get the highlighted text position in .html() and .text()
Edit:
I tried the following
Rich-Text-Editing
But first, it won't load correctly, but this should be caused by my own failure.
Second, for learning purposes, i would like to implent my own minified version.
As i am really at JavaScript, i could not figure out how this is be done.

document.getSelection() / window.getSelection() should work for whatever you'd like to do with the selected stuff.
Element styles get inherited. How this is kept track of depends on the CSS implementation.
Taking a look at the source code of Chrome might pretty much help.

Related

HTML select tag without border and providing magic copy

Background
I have a text I would like to present on a web page (angular/bootstrap) in which some of the words I would like the user to be able to change by clicking on them, getting a drop-down of fixed alternatives and selecting another one of these alternatives.
Req 1:
Ideally it should be a minimal visualisation of whether a word has alternatives or not. Maybe a dotted line under the word.
Req 2:
It would be great if the entire text was possible for the user to mark and copy as usual and that the copied result doesn't contain any HTML such as select boxes etc. I realise this req may be impossible to fullfil, but you never know - maybe there are some clever work-around that the clever community has.
I tried using the select tag, but I didn't succeed to fullfil any of the reqs above.
I have made a simpler form for your problem and requirements :
https://jsfiddle.net/aakashshah/2y7kj5j3/1/
onclick element can achieve the required feature

Red Font (In Firefox) <img> tags in HTML not seen by JSoup

EDIT: Self-Answered. JSoup does indeed find all image tags.
I'm trying to scrape something off https://www.flickr.com/explore and I ran into a problem.
In the source code, the main images on that website are written in red font, and they don't get found by my JSoup select method (or with the getElementsByTag method). It would be much easier if you guys went to the website and checked the source code yourself because of formatting issues but I'll try to include the bare minimum here.
EDIT: I just tried viewing the source code through chrome and IE, and the image tags are not red, so I'm assuming it's firefox formatting. But the question remains, JSoup doesn't see those image tags. (Second edit at the end of the post)
EDIT 3: Removed my pasted code to put this print screen in: http://i.imgur.com/o8fNPnZ.png
Notice how the red blocks are the main user uploaded images (that I want), and you can see other img tags that are not red (but those are only things like tiny logos). When I run the code
Elements imageElements = doc.select("img");
and then print it, I get all the tags that are not red.
I'm not very experienced with HTML or CSS, is there something specific that I don't know? Or is it something in my code? Is there a way to retrieve the "red" font images as well?
EDIT 2: OK so I narrowed it down to red HTML font in firefox being an error of some kind. If I hover over it, it says: No space between attributes.
Now I'm a little more confused since flickr is a huge website and it obviously still works since I see the images. Can this be some sort of "anti-scraping" thing they have going on? Is there still a way for me to download the images?
Answering my own question.
I was mistaken, JSoup does indeed find ALL the img tags. I'm not 100% sure where my mistake was since I saw it yesterday and have changed my code since then, but I'm assuming it was my misuse of .select which would exclude those images (my code in this question was simplified for argument's sake).
I'll leave this question up because it might help someone else running into errored HTML in source code since there are a few helpful tips in the comments

In MXML, how can you insert a newline into the label attribute of an mx:Button?

In other MXML components, you can do stuff like use curly brackets to embed scripting, use "&#13", and other stuff like that in their text and/or label attributes. Apparently mx:Buttons' label attributes are so locked down that the normal suggestions for other components aren't working. I could try just setting the labels in the main script of an MXML file or something, but that's sloppy programming if it can be avoided (the labels' values are going to be constant in this case). Is there not some way to put a line break in the attribute in MXML?
As for using "&#13", that gave me a line break at least, but any text to the right of that sequence disappeared. This makes me think there may be a way to make that work, but so far, I haven't found such a way.
Thanks!
EDIT: One thing though: I don't want to do anything that depends on the particular canvas or panel or whatever that the button's on to be actually created or anything like that. Latency, in that case, could cause the user to see the label change.
Alright, here's basically the answer to this question:
Adobe Flex: Word Wrap in Button Label (thread)
https://stackoverflow.com/a/1654948/279112 (answer I'm referring to)
The only problem here is that, even though newlines and word-wrapping both work in his example, they don't do that great of a job of working in conjunction with one another. It's good enough for what I'm currently needing to do though, so I'll improve on that later. Credit goes to danii and Alex for answering this initially, as well as to Christian Nunciato for providing this particular form of the answer.

Using direct HTML tags instead of div.class

I have some special tags on my blogsite which need to be as simple as possible so that my colleges who don't know anything about HTML can use it. For example
<question>...</question>
and
<answer>...</answer>
and then these are styled in CSS. It's far easier for HTML-idiots than to use the <div class="answer">...</div> format.
I've just found out IE8 is displaying it all wrong while Firefox and Chrome do it right. Is that expected or am I doing something wrong? Do you know of any hack to fix this since there are tons of blogsposts I'll have to manually change otherwise!!
You want to create <question>...</question> etc.
These are not HTML (not even HTML5), and you will struggle to get browsers to understand them reliably.
A quick tip that might help you:
You say you've got it working in all browsers except IE. If so, you might be able to hack IE to get it working as well, using a technique similar to the hacks like HTML5Shiv that are being used to get IE to work with the new HTML5 tags. These use Javascript to create a DOM element with the new tag name, after which IE suddenly starts to recognise that tag as being valid HTML.
It might just work. But be aware that it is a hack, and it only targets IE. And since you're using non-standard tags, you also have no way of knowing what will happen in the future in terms of it breaking browsers, even if they work now. (in fact, I would say the worse case scenario would be if one of the tags you've invented is added to the HTML standard at a later date, because then you'll start getting weird layout glitches as it gets added to the default stylesheet)
If you can get it working that way, then well done. But consider yourself warned that it's not good practice.
What you have actually asked for is not HTML, but XML markup. This is perfectly fine, but shouldn't be put directly into a web page in the way you're hoping.
There are a number of well-documented ways to get raw XML code into a browser.
One option is to use XSL to transform it into valid HTML. Another way would be to load it into a DOM object in Javascript and process it using a script. (this is where the 'X' comes in 'Ajax').
My guess is that a simple XSL transformation would do the trick for you. (In fact, it sounds like your use case might be simple enough that even just basic string replacement might suffice for the same end result). You can get your colleauges to create the code using <whatever> <tags> <they> <want>, and you write a script that parses it and converts it to regular HTML prior to merging it with the rest of the page.
In the long term, this would probably be a far better solution than the hack I've described above.
Hope that helps.
I don't know if this answer fits your needs but imho using custom html tags is basically NOT using HTML. Therefore the absence of compatibilty.
If you need to render data in HTML wouldn't be better using XML + XSLT?
You can find guides on w3schools
You can't add new elements like that. HTML has some fixed elements that browsers understand, but if you add your own, browser don't know what to do with them.
HTML5 has some new elements you can maybe find useful : http://www.w3schools.com/html5/html5_new_elements.asp but this won't work with older browser without some kind of javascript to fix things. For example http://remysharp.com/2009/01/07/html5-enabling-script/
However, if you really want to add new tags, it is possible to do so and then "modify" them via javascript to known tags (actually it's what the html5 enabling script of IE do), but it won't be possible to apply CSS easily to the new tags.
In short, I strongly advise against adding new tags. It's not that hard to understand something like <div class="answer">.
sounds like you want to write XML and convert to HTML using XSLT. This is an old tutorial (includes defining an DTD), but a further web search will garner more results that might suit
here you go fella:
http://net.tutsplus.com/tutorials/html-css-techniques/how-to-make-all-browsers-render-html5-mark-up-correctly-even-ie6/
You need to use createElement :)

Having the HTML of a webpage, how to obtain the visible words of that webpage?

Having the HTML of a webpage, what would be the easiest strategy to get the text that's visible on the correspondent page? I have thought of getting everything that's between the <a>..</a> and <p>...</p> but that is not working that well.
Keep in mind as that this is for a school project, I am not allowed to use any kind of external library (the idea is to have to do the parsing myself). Also, this will be implemented as the HTML of the page is downloaded, that is, I can't assume I already have the whole HTML page downloaded. It has to be showing up the extracted visible words as the HTML is being downloaded.
Also, it doesn't have to work for ALL the cases, just to be satisfatory most of the times.
I am not allowed to use any kind of external library
This is a poor requirement for a ‘software architecture’ course. Parsing HTML is extremely difficult to do correctly—certainly way outside the bounds of a course exercise. Any naïve approach you come up involving regex hacks is going to fall over badly on common web pages.
The software-architecturally correct thing to do here is use an external library that has already solved the problem of parsing HTML (such as, for .NET, the HTML Agility Pack), and then iterate over the document objects it generates looking for text nodes that aren't in ‘invisible’ elements like <script>.
If the task of grabbing data from web pages is of your own choosing, to demonstrate some other principle, then I would advise picking a different challenge, one you can usefully solve. For example, just changing the input from HTML to XML might allow you to use the built-in XML parser.
Literally all the text that is visible sounds like a big ask for a school project, as it would depend not only on the HTML itself, but also any in-page or external styling. One solution would be to simply strip the HTML tags from the input, though that wouldn't strictly meet your requirements as you have stated them.
Assuming that near enough is good enough, you could make a first pass to strip out the content of entire elements which you know won't be visible (such as script, style), and a second pass to remove the remaining tags themselves.
i'd consider writing regex to remove all html tags and you should be left with your desired text. This can be done in Javascript and doesn't require anything special.
I know this is not exactly what you asked for, but it can be done using Regular Expressions:
//javascript code
//should (could) work in C# (needs escaping for quotes) :
h = h.replace(/<(?:"[^"]*"|'[^']*'|[^'">])*>/g,'');
This RegExp will remove HTML tags, notice however that you first need to remove script,link,style,... tags.
If you decide to go this way, I can help you with the regular expressions needed.
HTML 5 includes a detailed description of how to build a parser. It is probably more complicated then you are looking for, but it is the recommended way.
You'll need to parse every DOM element for text, and then detect whether that DOM element is visible (el.style.display == 'block' or 'inline'), and then you'll need to detect whether that element is positioned in such a manner that it isn't outside of the viewable area of the page. Then you'll need to detect the z-index of each element and the background of each element in order to detect if any overlapping is hiding some text.
Basically, this is impossible to do within a month's time.