Related
I'm trying to extend the VisualEditor by adding custom functionality.
It was pretty easy to add stuff which is realized with a single HTML-tag without parameters. But now I try to add textcolor. I tried to replicate the LanguageAnnotation as that's pretty similar (using span lang=.. while I want span style=color:..).
But it looks like there are a lot more things which I have to change here & I don't understand.
I'd be very grateful for any kind of help here.
Edit: To provide more information: Currently I am trying to replicate the .toDomElements function which I struggle with because I can not find the place where to modify the exact tag syntax (style instead of lang).
Well, I'm not sure if it's the best possible solution, but I've managed this problem by replicating the LanguageAnnotatation, including the widget-system (I chose it because it was the closest one to what I wanted to accomplish as it uses CSS and the span-tag).
I've replicated the following classes, adjusting the attributes (language has lang and dir attributes, my color-annotation only the color attribute):
ve.ce.TextColorAnnotation.js
ve.dm.TextColorAnnotation.js
ve.ui.TextColorContextItem.js
ve.ui.TextColorSearchDialog.js
ve.ui.TextColorInspector.js
ve.ui.TextColorInspectorTool.js
ve.ui.TextColorSearchWidget.js
ve.ui.TextColorInputWidget.js
ve.ui.TextColorResultWidget.js
It works pretty good imo. The only problem I see right now (and which I am working on next) is that with this implementation, text (or background)-color can only be applied to text, but not to tables (only when marking the text inside a cell).
I want to parse a HTML text and find special parts. For example a text in 3rd div of 1st row and 2nd column of a table. I have 2 options to parse: Regular Expressions and XPath. What is advantages and disadvantages of each one?
thanks
It somewhat depends on whether you have a complete HTML file of unknown but well-formed content versus having merely a snippet or an expanse of HTML of completely known content which may or may not be well-formed.
There is a difference between editing and parsing, you see.
It is one thing to be editing your own HTML file that you wrote yourself or are otherwise staring right in the face, and you issue the editor command
:100,200s!<br */>!!g
To remove the breaks from lines 200–300.
It is quite another to suck down whatever HTML happens to be at the other end of a URL and then try to make some sense out it, sight unseen.
The first calls for a regex solution — the very one shown above, in fact. To go off writing some massively overengineered behemoth to do a fall parse to set up the entire parse tree just to do the simple edit shown above is quite simply wrong. It’s also its own punishment.
On the other hand, using patterns to parse out (as opposed to lex out) an entire HTML document that can contain all kinds of whacky things you aren’t planning for just cries out for leveraging someone else’s hard work intead of recreating the wheel for yourself, and badly at that.
However, there’s something else nobody likes to mention, and that’s that most people just aren’t competent at regexes. They don’t really understand them. They don’t know how to test them or to craft them. They don’t know how to make them readable and maintainable.
The truth of the matter is that the overwhelming majority of regex users cannot even manage as simple and basic a thing as matching an arbitrary HTML tag using a regex, even when things gotchas like alternate encodings and CDATA sections and redefined entitities and <script> contents and archaic never-seen forms are all safely dispensed with.
It’s not because it’s hard to do; it isn’t, actually. It’s just that the people trying to do it understand neither regexes nor HTML particularly well, and they don’t know they don’t know, and so they get themselves in way over their heads more quickly than they realize. And then they have a complete disaster on their hands.
Plus it’s been done before, and correctly. Might as well learn from someone else’s mistakes for a change, eh? It would probably help to have a few canned regexes at your disposal to go at frequently manipulated things. This is especially useful for editing.
But for a full parse, you really shouldn’t try to embed a full HTML grammar inside your pattern. Honest, you really shouldn’t. Speaking as someone has actually can and has done this, I unlike 99.9999% of the responders here the credibility of actual experience in this area when I advise against it. Sure, I can do it, but I almost never want to, and I certainly don’t want you to try it at home unsupervised. I can’t be held responsible for any damage that might ensue. :)
Sure, this may sound like “Do as I say, not as I do,” but if your level of regex mastery were at a level that allowed you to contemplate such a thing, you would not be asking this question. As I mentioned, almost no one who uses regexes can actually match an arbitrary HTML tag, simple as that is. Given that you need that sort of building block before writing your recursive descent grammar, and given that next to nobody can even manage that simple building block, well...
Given that sad state of affairs, it’s probably best to use regexes for simple edit jobs only, and leave their use for more complete solutions to real regex wizards, for they are subtle and quick to anger. Meaning of course the regexes, not (just) the wizards.
But sure, keep some canned regexes handy for doing simple editing rather than full parsing. That way you won’t be forced to redevise them each time from first principles. I do keep a few of these around, but then I also keep simple frameworks that allow me to edit a particular structural element of the HTML, like the plain text or the tag contents or the link references, etc, and those all use a full parser, letting me then surgically target just the parts I want in complete confidence I haven’t forgotten something.
More as a testament to what is possible than what is advisable, you can see some answers with more, um, “heroic” pattern matching, including recursion,
here,
here,
here,
here,
here, and
here.
Understand that some of those were actually written for the express purpose of showing people why they should not use regexes, because some of them are really quite sophisticated, much moreso than you can expect in nonwizards. That difficulty may chase you away, which is ok, because it was sort of meant to.
But don’t let that stop you from using vi on your HTML files, nor should it scare you away from using its search or substitute commands. Don’t let the perfect be the enemy of the good. Sometimes good enough is exactly what you need, because the perfect would take more investment than it could ever be worth.
Understanding which out of several possible approaches will give you the most bang for your buck is something that takes time to learn, and no one can tell you the answer that works for you. They don’t know your dataset, your requirements, your skillset, your priorities. Therefore any categorical answer is automatically wrong. You have to evaluate these things for yourself.
I think XPath is the primary option for traversing XML-like documents. With RegExp, it will be up to you to handle the different forms of writing a tag (with multiple spaces, double quotes, single quotes, no quotes, in one line, in multi-lines, with inner data, without inner data, etc). With XPath, this is all transparent to you, and it has many features (like accessing a node by index, selecting by attribute values, selecting simblings, and MANY others).
See how powerfull it can be at http://www.w3schools.com/xpath/.
EDIT: See also How do HTML parses work if they're not using regexp?
XPath is less likely to break if the web developer does any minor changes. That would be my choice.
Here is the canonical Stackoverflow explanation for why you should not parse HTML with regex:
RegEx match open tags except XHTML self-contained tags
In general, you cannot parse HTML with regex because regex is not made to parse HTML. Just use XPath.
Having the HTML of a webpage, what would be the easiest strategy to get the text that's visible on the correspondent page? I have thought of getting everything that's between the <a>..</a> and <p>...</p> but that is not working that well.
Keep in mind as that this is for a school project, I am not allowed to use any kind of external library (the idea is to have to do the parsing myself). Also, this will be implemented as the HTML of the page is downloaded, that is, I can't assume I already have the whole HTML page downloaded. It has to be showing up the extracted visible words as the HTML is being downloaded.
Also, it doesn't have to work for ALL the cases, just to be satisfatory most of the times.
I am not allowed to use any kind of external library
This is a poor requirement for a ‘software architecture’ course. Parsing HTML is extremely difficult to do correctly—certainly way outside the bounds of a course exercise. Any naïve approach you come up involving regex hacks is going to fall over badly on common web pages.
The software-architecturally correct thing to do here is use an external library that has already solved the problem of parsing HTML (such as, for .NET, the HTML Agility Pack), and then iterate over the document objects it generates looking for text nodes that aren't in ‘invisible’ elements like <script>.
If the task of grabbing data from web pages is of your own choosing, to demonstrate some other principle, then I would advise picking a different challenge, one you can usefully solve. For example, just changing the input from HTML to XML might allow you to use the built-in XML parser.
Literally all the text that is visible sounds like a big ask for a school project, as it would depend not only on the HTML itself, but also any in-page or external styling. One solution would be to simply strip the HTML tags from the input, though that wouldn't strictly meet your requirements as you have stated them.
Assuming that near enough is good enough, you could make a first pass to strip out the content of entire elements which you know won't be visible (such as script, style), and a second pass to remove the remaining tags themselves.
i'd consider writing regex to remove all html tags and you should be left with your desired text. This can be done in Javascript and doesn't require anything special.
I know this is not exactly what you asked for, but it can be done using Regular Expressions:
//javascript code
//should (could) work in C# (needs escaping for quotes) :
h = h.replace(/<(?:"[^"]*"|'[^']*'|[^'">])*>/g,'');
This RegExp will remove HTML tags, notice however that you first need to remove script,link,style,... tags.
If you decide to go this way, I can help you with the regular expressions needed.
HTML 5 includes a detailed description of how to build a parser. It is probably more complicated then you are looking for, but it is the recommended way.
You'll need to parse every DOM element for text, and then detect whether that DOM element is visible (el.style.display == 'block' or 'inline'), and then you'll need to detect whether that element is positioned in such a manner that it isn't outside of the viewable area of the page. Then you'll need to detect the z-index of each element and the background of each element in order to detect if any overlapping is hiding some text.
Basically, this is impossible to do within a month's time.
This question already has answers here:
Closed 13 years ago.
I am developing a "modern" website, and I'm having a lot of trouble getting the CSS to make everything line up properly. I feel like they layout would be a lot easier if I just used a table, but I've been avoiding <table> tags, because I've been told that they are "old-fashioned" and not the right way to do things.
Is it okay to use tables? How do I decide when a table is appropriate, and when I should use CSS instead? Do I just do whatever is easier?
The answer is yes, it's fine to use tables. The general rule of thumb is that if you are displaying tabular data, a table is probably a good way to go. You should generally try to style your table with css as much as you can though.
Also, this pie graph might help you:
alt text http://www.ratemyeverything.net/image/7292/0/Time_Breakdown_of_Modern_Web_Design.ashx
EDIT: Tables are fine. For displaying data. Just like my second sentence stated. The question was "is it ok to use tables". The answer is - yes, it is ok to use tables. It is not illegal.
Since even though it's implied to use tables for data in my general rule of thumb, apparently I must also state that the corollary is that it's not ok to use tables for anything else, even though the poster already seemed to grasp this concept. So, for the record, the general rule of thumb is to not use tables for laying out your site.
Tables should be used to represent tabular data. CSS should be used for presentation and layout.
This question has also been exhaustively answered here:
Why not use tables for layout in HTML?
Essentially - if you have tabular data, then use a table. There's really no need now to use tables for layout - sure, they were often considered 'easier' but semantically the page is horrid, they were often considered inaccessible.
See some discussion:
css-discuss
and a particularly comical URL - shouldiusetablesforlayout.com
In the 'modern' approach of tables it is not about using table tags or div tags, but about using the right tag for the right purpose.
The table tag is used for tabular data. There is nothing wrong with using it for that!
For using CSS, there are a lot of tutorials and guides (good and bad) around. Indicators of a bad tutorial are: lot of use of blocks (divs) that only make sense for the layout and not for the content. Good signs are the ones that advise to use the right tags for the right content and teach you how to make up that tags.
Tables are only appropriate for tabular data. Imagine you have to add some spreadsheet like data, where you have clear row/column headers, and some data inside those rows.
A product comparison, for example, is also a valid table item.
I believe that tables are OK for display of rectilinear data of arbitrary rows and/or columns. That's about it. Tables should not be used for layout purposes anymore.
In general, HTML markup should describe the structure and content of a web page—it should not be used to control presentational aspects such as layout and styling (that's what CSS is for). A <table> tag, like most have already said, should represent tabular data—something that would appear as a table of information.
The reason why people rag on tables so much is that in the old days, there was no such thing as CSS—all page layout was done directly in HTML. Tags were not thought of as describing content—all anyone really cared about was how a tag would make things look in a web browser. As a result of this, people figured that, since they could organize things into rows and columns, tables must be good for laying out elements of a web page. This became a really popular technique—in fact, I'd wager that using tables was considered the preferred method of laying out web pages for quite some time.
So when people tell you that tables are "old-fashioned," they are specifically referring to this abuse of the <table> tag that was so popular back in the old days. Like I said, there's nothing wrong with HTML tables themselves, but using them for web page layout just doesn't make sense nowadays.
(Plus, from a purely pragmatic standpoint, layouts done with HTML tables are very inflexible and hard to maintain.)
its ok to use tables when you are showing data in a grid / tabular format. however, for general structure of the site, its highly recommended that you use css driven div, ul, li elements to give you more lucid website.
If you anyways decide to work with tables, you must consider the following cons :
they are not SEO friendly
they are quite rigid in terms of their structure and at times difficult to maintain as well
you may be spending little extra time on div based website, but its worth every minute spent.
The whole "anti-Table" movement is a reaction to a time when deeply nested tables were the only method to layout pages, leading to HTML that was very hard to understand.
Tables are a valid method for tabular (data) layout, and if a table is the easiest way to implement a layout, then by any means use a table.
Table is always the right choice when you have the need to present data in a grid.
Quoting Sitepoints's book HTML Utopia: Designing Without Tables using CSS
If you have tabular data and the appearance of that data is less important than its appropriate display in connection with other portions of the same data set, then a table is in order. If you have information that would best be displayed in a spreadsheet such as Excel, you have tabular data.
I would say no for using tables to construct your layout. Tables make sense only for actual tabular data you need to represent. If you spend enough time figuring the CSS out you will find its easier then using tables for a layout. Just remember: Tables for displaying data. CSS for page layouts.
Tables are just that: Tables.
They are frowned upon because they should not be used for layout, as has been the fashionable thing to do before browsers could position stuff properly.
If what you want to markup is, in fact, a table, then use a table. Other than that, try to stay away.
One small thing: Aligning two parts of text to the exact same line that won't move apart (think, username and post date). There using a table is IMHO an option.
First get it working. Then get it perfect.
Get the layout done in some way before making it perfect or better.
How many people per day will go to the page you are working on? A million? or 20 ?
How much time are you going to spend on CSS issues instead of other issues? Does your boss want you to spend this much time on the issue? Does he/she know what you are doing?
Absolutely. I don't know where CSS zealots invented the idea that tables are not naturally used for "layout". Tables have been used for laying things out since their invention, whether those things be numbers, words, or pretty pictures. That's what they do. Moreover, table is part of all versions of (X)HTML so there are no deprecation concerns.
Absolutely.
All that HTML offers was originally intended for you to define the markup of your page. In my book, absolute and relative positions of elements on a page belong to markup. So both divs and tables are very much suited for this task. Pick up what works best for your particular need.
CSS adds many styling possibilities and also layout tricks but it complements HTML options not replaces them.
There is actually a very fine line between seeing something as a markup or styling issue. CSS proponents would say that with CSS you can relocate and reshuffle completely all big and little pieces of a page. I cannot however imagine putting header below, footer above and making things appear in reverse order.
Take an example. You design a notebook. You know where to place major components, mainboard, cooling system, keyboard, display and ports. You may certainly wish to rearrange a little bit port connectors, on whic side and in which sequence they appear, but you don't really expect to put display where the keyboard is, put keyboard on the lid, make fans blow to your face and have all connectors on the botom to be reached through holes on your desk.
Using tables can make it slightly difficult to rearrange elements on a page. This might be true. However, in most cases you know in advance how approximately your page should look like and you would not want to change everything drastically. if you can't say it before your begin your work you probably have no clear idea what you are doing and what for.
Moreover, only tables possess elastic properties, which allows the to stretch to the width/height of their content. Nothing else of HTML/CSS can be used to do that.
CSS design on one side allows you to create quite adjustable designs. On the other hand, it locks you out from designing a page adjustable to its content. Both wins and losses.
Table is also the only tool to make very complex and precise interfaces. For example, the page SO is very simple. It probably can be done with pure CSS. In the meantime, have you seen any enterprise-class software like CRMs, SRMs etc? That multitude of buttons, text field, check boxes, dropdownlists all precisely located on a screen? Good luck achieving that kind of complexity with just CSS. And these layouts migrate from desktop applications into web each day (keyword: software-as-a-service).
So choose what suits best your current need and don't trust those CSS lovers. Actually don't trust any fanatics at all.
I am trying to create a bracket system using HTML. I've found other solutions, however, most require lots of absolute/relative positioning or tables.
I'm looking for a way to make it flexible, so I can just change the HTML to change it from a 16-man bracket to a 64-man bracket.
[404 - link removed]
Now, I don't see much wrong with my current example, however, I'm just curious if there is anyone out there has some suggestions on improving or completely changing the way I am doing it.
I'd rather stay away from tables, and definitely stay away from any sort of positioning (this is meant to be flexible).
If you have any ideas, that would be great. :)
Thanks,
Andrew
That actually looks fairly good. What I would do to improve it is encapsulate the logic in a bit of Javascript, supply the bracket information in some sort of text format, and have the Javascript parse the text format to generate the bracket as deeply as you need it.