I have this extension which injects HTML as a notification, the problem is that every site renders this HTML different since my HTML code inherits all the css rules.
so I wondered if there's a way to inject this HTML and keep it from rendering different in every website.
I would personally tag up all parts of my html and include inline css rules that specify exactly how I want the html to appear.
Not including any styling information puts you at the mercy of the designers od each site.
Related
I have a task that requires parsing HTML pages and get all the tags with their attribute. Parsing an HTML page is not a big issue and I know many open source projects that can do the job. However, if an HTML tag use a CSS class or inherit attributes from a div tag that uses a CSS class then I will also need to get all the attributes from the CSS that the tag uses.
I am working on a kind of funky 3rd party website where I have access to the CSS files but cannot modify the actual html or JavaScript files. I changed the background on the site but I found later the css also applied to web parts inside of iframes. The same css file is linked inside of these iframes so the background tags I am applying to the outer html body {} are also applied to the inner set.
Is there any way to add another selector that only applies if it is inside of an iframe or is there no way for it to figure that out? I have tried iframe html body {} and #name-of-iframe body {} but these don't even show as being associated with the inner body in the browser.
Is this something that can be done? Any insight is appreciated.
Unfortunately you would need to run Javascript to determine if you're running inside an iframe or not. CSS by itself is not sufficient.
You might look for differences in the markup between the two pages-- for example, maybe the <body> element has a different ID or class on pages inside the iframe.
We have an single-page web app that displays emails. Some of the emails we're viewing contain style elements that, when loaded into the DOM, affect our entire app. What's the best way to prevent this from happening? I'm currently removing style elements using the HtmlAgilityPack as shown in the post below, but I'm wondering if there's an easier way.
Regex to remove body tag attributes (C#)
Use iframes. That will put the message into a separate document, and there will be no styling interference.
Since you said html emails, the only way to make the Css work is to give inline css style's. External CSS will never work on html mails.
I have a website that allows a user to create blog posts. There are some backlisted tags but most standard HTML tags are acceptable.
However, I'm having issues with how the pages get displayed.
I keep the HTML wrapped in its own div.
I would ultimately like to keep the HTML from the user separate from the main sites stylesheets so it can avoid inheriting styles and screwing up the layout of the originating site where the HTML is being displayed.
So in the end, is there anything I can apply to a div so its contents are quarantined from the rest of the site?
Thanks!
You could use a reset stylesheet to reset the properties for that specific DIV and it’s children. And on the other side, you’ll probably need a CSS parser to adjust the user’s stylesheet for that specific DIV.
You can do it in a frame or an iframe. That will keep it separate in every way.
Could you format each user-generated content with a div of class 'username' in addition to any other classnames you may add automatically?
Then they -I assume 'they'- can format and style as they please, can have all their styles prefaced like so: div.username selector.
Otherwise, you may be able to use iframes.
You could use an iframe to keep them completely separate if you really wanted to be extreme about it.
Or, you could restrict them to only writing inline styles so that they can't affect your page's stylesheet:
1) strip out any style tags html the user creates. This way they can't override your styles.
2) validate their code and either fix or reject things like unclosed tags or elements that shouldn't be inside a div (like head or body tags) and make sure it all gets closed properly so it can't mess up any html from your page after the div it's contained in.
I am working on an app for doing screen scraping of small portions of external web pages (not an entire page, just a small subset of it).
So I have the code working perfectly for scraping the html, but my problem is that I want to scrape not just the raw html, but also the CSS styles used to format the section of the page I am extracting, so I can display on a new page with it's original formatting intact.
If you are familiar with firebug, it is able to display which CSS styles are applicable to the specific subset of the page you have highlighted, so if I could figure out a way to do that, then I could just use those styles when displaying the content on my new page. But I have no idea how to do this........
Today I needed to scrape Facebook share dialogs to be used as dynamic preview samples in our app builder for facebook apps. I've taken Firebug 1.5 codebase and added a new context menu option "Copy HTML with inlined styles". I've copied their getElementHTML function from lib.js and modified it to do this:
remove class, id and style attributes
remove onclick and similar javascript handlers
remove all data-something attributes
remove explicit hrefs and replace them with "#"
replace all block level elements with div and inline element with span (to prevent inheriting styles on target page)
absolutize relative urls
inline all applied non-default css atributes into brand new style attribute
reduce inline style bloat by considering styling parent/child inheritance by traversion DOM tree up
indent output
It works well for simpler pages, but the solution is not 100% robust because of bugs in Firebug (or Firefox?). But it is definitely usable when operated by a web developer who can debug and fix all quirks.
Problems I've found so far:
sometimes clear css property is not emitted (it breaks layout pretty badly)
:hover and other pseudo-classes cannot be captured this way
firefox keeps only mozilla specific css properties/values in it's model, so for example you lose -webkit-border-radius, because this was skipped by CSS parser
Anyway, this solution saved lot of my time. Originally I was manually selecting pieces of their stylesheets and doing manual selection and postprocessing. It was slow, boring and polluted our class namespace. Now I'm able to scrape facebook markup in minutes instead of hours and exported markup does not interfere with the rest of the page.
A good start would be the following: make a pass through the patch of HTML you plan to extract, collecting each element (and its ID/classes/inline styles) to an array. Grab the styles for those element IDs & classes from the page's stylesheets immediately.
Then, from the outermost element(s) in the target patch, work your way up through the rest of the elements in the DOM in a similar fashion, eventually all the way up to the body and HTML elements, comparing against your initial array and collecting any styles that weren't declared within the target patch or its applied styles.
You'll also want to check for any * declarations and grab those as well. Then, make sure when you're reapplying the styles to your eventual output you do so in the right order, as you collected them from low-to-high in the DOM hierarchy and they'll need to be reapplied high-to-low.
A quick hack would be to pull down their CSS file and apply it to the page you are using to display the data. To avoid any interference you could load the page into an IFrame wherever you need to display it. Of course, I have to question the intention of this code. Are you allowed to republish the information you are scraping?
If you have any way to determine the "computed style" then you could effectively throw away the style sheet and, ****gasp****, apply inline styles using all of the computed styles' properties.
But I don't recommend this. It will be very bloated.