Which programming language is CSS / HTML defined by? - html

How did the developers of HTML and CSS define them? I mean which programming language did they use?
Imagine i want to define a new HTML tag or new CSS property, or even a new language.
for example instead of using tags <>, I want to define a language to use brackets [] and new CSS-resembling syntax:
[foo style(bar: .....)]
How they (developers) do this? and which programming language they use, which approach they follow?
p.s.1: I'm not going to develop a new language, it is just a question.
p.s.2: I couldn't find appropriate tags, so please be patient if this question doesn't fit css,html & xml contexts.

Consider this:
You are reading English. You are able to understand punctuation, meaning of the words and then are able to extract what that means.
The same is for HTML and browsers. HTML is itself a "markup language" (not a programming language) like English in our example. Web Browser is like our brain, it understands the syntax(grammar and punctuation) and then extract what that means and show it to us.
So your question was more like "In what language is English written in?".
As for your question to how you want to create something like HTML/CSS, you need to first understand the basics of "Theory of Computation" and "Compiler Construction".
But to answer your question in brief, you need to create a dictionary(which defines meaning of each and every word in your language) and then create a "parser"(like Web Browser) which understands its meaning.
This being a very wide topic, I would like you to search the web for the two topics I mentioned above.
Hope I helped!

Related

If HTML is not a programming language, what am I doing if I am doing HTML codes?

I am creating an article about programming. If I am using C#, for example, I am a C# programmer and I am programming using C#. How about HTML? If HTML is not a programming language, and it is a markup language, what is the correct verb applicable to a person coding in HTML? Is it just coding?
Edit 2:
Wow, apparently you can call HTML/CSS a programming language because HTML5/CCS3 is Turing-Complete by by accident (for first link, check comments).
Main Answer:
"How about HTML?" I take the stance that to be programming, the language has to be Turing Complete. So in my definition you can't be a Regex programmer. The more lean definition is that it needs variables & control statements, as simple as having an 'if' and a 'branch' instruction. So as you point out, pure HTML is not a programming language. But HTML in the real world isn't just html text files!
I would call an HTML user a HTML Techonologist or HTML author but if someone said they were a HTML coder or even a programmer, I wouldn't bat an eye or try to correct them. I don't think many people write plain HTML and the moment one adds Javascript or allows pages to be generated by PHP, python, or anything else it crosses the programming language definition. (edit 2: The moment you add CSS3 it becomes Turing Complete and thus a 'real' programming language)
Edit 1:
I like an answer I found about why 'real programmers' are so defensive over reminding people HTML/CSS is not 'real programming'. The OP's question dealt with what to call HTML authors but this question comes up because 'real programmers' are so firm in making a distinction between their work. I like this quote from Kramli (linked before)
There are times when the difference between programming languages and other languages really does matter. Quite often, however, we can all communicate perfectly effectively when just lump them all in together.
You have three questions...
Q1: I am a C# programmer and I am programming using C#. How about HTML?
A1: I am coding in HTML
Q2: If HTML is not a programming language, and it is a markup
language, what is the correct verb applicable to a person coding in
HTML?
A2: Verb = Coding, But I think you are looking for the term Coder
Q3: Is it just coding?
A3: Yes
HTML is a markup language, hence the name HyperText Markup Language.
You are effectively the modern day equivalent of a typesetter in the print industry.
If you have minimal input in the page creation process then you're probably a Coder, however if you have significant input into page layout, then the job role is normally referred to as being a Web Designer. If you're writing lots of scripts (in say PHP, Python, Ruby, Perl or whatever your least worst option is) to produce the pages in a reasonably professional manner, then you can award yourself the wonderful title of Web Developer :-)
If you devote some thought as to how all these scripts are going to hang together, and how users are going to interact with your site, then you can claim to be an Analyst. :-)
In the Internet, job roles are quite fuzzy; personally I consider myself a mix of all of the above, concentrated more on the Developer/Analyst side as whilst I understand the technical aspects of HTML and CSS, I don't have the appreciation of good design and presentation to fully claim being a Designer in a professional context.
I also suggest you read the answers to the related questions on the right of this page...
As with any language - be it musical, programmatic, mathematical,hyper text or anything in between - as a content creator you are a writer.
Specifically for a mark up language (such as HTML) you are annotating a document with tags that are separate entities from the text between them, and so could be considered an Editor, Author, or Designer because you are generally directing the content of a page.
Differences arise with HTML compared to writing technical documents using, for example, DITA. Where as a DITA document has its architecture and tags, it does not necessarily require a style sheet to be displayed. HTML on the other hand is normally consumed through a web browser so requires CSS transformation to be shown in a readable fashion. For this reason, formatting becomes as important as content and people writing HTML and CSS as a combination are referred to as Web Designers.
If you begin throwing in programming languages such as PHP or JScript you will be referred to as a Web Developer, but developer and designer are often interchangeable between the two options.
what is the correct verb applicable to a person coding in HTML?
coding is a process that involves using programming language. since HTML is not a programming language you can use writing instead of coding. as simple as that.
No, HTML is not a programming language. The "M" stands for "Markup". Generally, a programming language allows you to describe some sort of process of doing something, whereas HTML is a way of adding context and structure to text.
If you're looking to add more alphabet soup to your CV, don't classify them at all. Just put them in a big pile called "Technologies" or whatever you like. Remember, however, that anything you list is fair game for a question.
HTML is so common that I'd expect almost any technology person to already know it (although not stuff like CSS and so on), so you might consider not listing every initialism you've ever come across. I tend to regard CVs listing too many things as suspicious, so I ask more questions to weed out the stuff that shouldn't be listed. :)
However, if your HTML experience includes serious web design stuff including Ajax, JavaScript, and so on, you might talk about those in your "Experience" section.

Parsing Random Web Pages

I need to parse a bunch of random pages and add them to a DB. I am thinking of using regular expressions but I was wondering if there are any 'special' techniques (other than looking for content between known text/tags). The content is more(not always) like:
Some Title
Text related to Title
I guess I don't need to extract complete Text but some way to know where the Title/Paragraph and extract the content from there. The content itself may have images/links that I would like to retain.
Thanks!
Please see this answer: RegEx match open tags except XHTML self-contained tags
Use Python. http://www.python.org/
Use Beautiful Soup. http://www.crummy.com/software/BeautifulSoup/
You need to use a proper HTML parser, and extract the elements you’re interested in via the parser’s API (or via the DOM).
Since I don’t know what language you’re programming in, it’s rather difficult to recommend a parser, but some well known ones are Jericho for Java, and Beautiful Soup for Python.

HTML parser...My recent project needs a web spider

HTML parser...My recent project needs a web spider..it automatically get web content which it gets the links recursively....
But, it needs to know its content exactly. like tag.
it runs in linux and windows..do you know some opensource about this needs..
thanx
or about some suggestion.
Here is a StackOverflow question showing how to use a number of XML/HTML parsers in different languages. If you tell us what language you're using, I can be more specific, but your answer may already be in there.
Depends what language you are developing for, trying googling:
html parser languagename
hpricot is a good one for Ruby, for example.
I think the subject you need to know is Regular Expression.
Regular Expression is available on all platform and all languages (Java, PHP, Python, C#, Ruby, Javascript).
Using Regular Expression, you can easily exact its content as preferred form you want.
Pattern p = Pattern.compile("<a\\s[^>]*href=\"([^\"]+?)\"[^>]*>");
Matcher m = p.matcher(pageContent);
while( m.find() ) {
System.out.println( m.group(1) );
}
Above code block written in Java will extract all anchor tags in a page and extract URL into your hand.
If you don't have enough time to learn Regular Expression, the following references will help you.
http://htmlparser.sourceforge.net/

Writing XSS Filter for (X)HTML Based on White List

I need to implement a simple and efficient XSS Filter in C++ for CppCMS. I can't use existing high quality filters
written in PHP because because it is high performance framework that uses C++.
The basic idea is provide a filter that have a while list of HTML tags and a white
list of options for these tags. For example. typical HTML input can consist of
<b>, <i>, tags and <a> tag with href. But straightforward implementation is not
good enough, because, even allowed simple links may include XSS:
Click On Me
There are many other examples can be found there. So I though also about a possibility to create a white list of prefixes for tags like href/src -- so I always need to check if it starts with (https?|ftp)://
Questions:
Are these assumptions are good enough for most of purposes? Meaning that If I do not
give an options for style tags and check src/href using white list of prefixes it solves XSS problems? Are there problems that can't be fixes this way?
Is there a good reference for formal grammar of HTML/XHTML in order to write simple
parser that would cleanup all incorrect of forbidden tags like <script>
You can take a look at the Anti Samy project, trying to accomplish the same thing. It's Java and .NET though.
http://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project#.NET_version
http://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project_.NET
Edit 1, A bit extra :
You can potentially come up with a very strict white listing. It should be structured well and should be pretty tight and not much flexible. When you combine flexibility, so many tags, attributes and different browsers generally you end up with a XSS vulnerability.
I don't know what is your requirements but I'd go with a strict and simple tag support (only b li h1 etc.) and then strict attribute support based on the tag (for example src is only valid under href tag), then you need to do whitelisting in the attribute values as you stated http|https|ftp or style="color|background-color" etc.
Consider this one:
<x style="express/**/ion:(alert(/bah!/))">
Also you need to think about some character whitelisting or some UTF-8 normalization, because different encodings can cause awkward issues. Such as new lines in attributes, non valid UTF-8 sequences.
All details of HTML parsing are specified in HTML 5. However implementation of it is quite a lot of work, and it doesn't matter whether you'll parse HTML exactly with all corner cases. At worst you'll end up with different DOM, but you have to sanitize DOM anyway.
As you mentioned, there are various PHP implementations of this, but I don't know of any in C++, since that's not a language typically applied to web development. Overall, it's going to depend on how complex of an implementation you want to come up with.
A very restrictive whitelist is probably the "simplest" way, but if you want to be really comprehensive I would look into doing a conversion of one of the established versions to C++, as opposed to trying to write your own from scratch. There are so many tricks to worry about, that I think you'd be better off standing on the shoulders of others that have already gone through all that.
I don't know anything about using C++ for web development, but converting PHP to it doesn't seem like it would be a particularly difficult task, PHP doesn't really have any magical capabilities that C++ won't be able to duplicate. I'm sure there will be some small hitches, but overall if you want to go the more-complex route it'd definitely still be faster to do a conversion than a full design from scratch.
HTML Purifier seems like a strong PHP implementation that is still actively maintained, there's a comparison document where the author discuss some differences between his approach and others', probably worth reading.
Whatever you come up with, definitely test it with all the examples you link, and make sure it passes all those. Good luck!

Is HTML considered a programming language? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I guess the question is self-explanatory, but I'm wondering whether HTML qualifies as a programming language (obviously the "L" stands for language).
The reason for asking is more pragmatic—I'm putting together a resume and don't want to look like a fool for listing things like HTML and XML under languages, but can't figure out how to classify them.
No, HTML is not a programming language. The "M" stands for "Markup". Generally, a programming language allows you to describe some sort of process of doing something, whereas HTML is a way of adding context and structure to text.
If you're looking to add more alphabet soup to your CV, don't classify them at all. Just put them in a big pile called "Technologies" or whatever you like. Remember, however, that anything you list is fair game for a question.
HTML is so common that I'd expect almost any technology person to already know it (although not stuff like CSS and so on), so you might consider not listing every initialism you've ever come across. I tend to regard CVs listing too many things as suspicious, so I ask more questions to weed out the stuff that shouldn't be listed. :)
However, if your HTML experience includes serious web design stuff including Ajax, JavaScript, and so on, you might talk about those in your "Experience" section.
YES, a declarative programming language.
You really want to list the most important things you know that are relative to the job you're applying for on your resume. If you list ASP.NET but don't list HTML, even though it's somewhat obvious, there are a lot of managers and/or HR types that will assume you don't know HTML since it's not listed. I've had it happen to me before.
Update - Some say no it isn't a programming language, and you may not agree with me on this, but regardless on a resume it IS a programming language. You get HR types looking at your resume before the hiring manager even sees it. If the manager says you need to know HTML, and it's not listed in the 'programming languages' section then the HR person may disregard you resume thinking you don't know it because it's not listed.
Update 6-8-2012: Any instruction that tells the computer to do something is a programming language. So even after all these years, I still stand by my answer. HTML is a programming language. Something that isn't a programming language would be XML.
No, the clue is in the M - it's a Markup Language.
On some level Chris Pietschmann is correct. SQL isn't Turing complete (at least without stored procedures) yet people will list that as a language, TeX is Turing complete but most people regard it as a markup language.
Having said that: if you are just applying for jobs, not arguing formal logic, I would just list them all as technologies. Things like .NET aren't languages but would probably be listed as well.
The 'M' stands for a 'Markup'. It's a 'Markup Language' not a programming language. Some people will disagree with this, but my opinion is that if it lacks logical constructs (conditional branching, iteration, etc) its not really a programming language.
As for the resume, I would suggest putting HTML and XML under a section like 'Technologies'. I usually have a section like this where I list things like version control software, OS's I've developed for, build systems, etc.
No, HTML is a not a programming language. It is called "markup" for that reason.
If you're going to say that HTML is a programming language, then you might as well include things such as word documents, as they too are based on ML, or 'Markup Language'.
Simply put--HTML defines content!
I think not exactly a programming language, but exactly what its name says: a markup language.
We cannot program using just pure, HTML. But just annotate how to present content.
But if you consider programming the act of tell the computer how to present contents, it is a programming language.
In the advanced programming languages class I took in college, we had what I think is a pretty good definition of "programming language": a programming language is any (formal) language capable of expressing all computable functions, which the Church-Turing thesis implies is the set of all Turing-computable functions.
By that definition, no, HTML is not a programming language, even a declarative one. It is, as others have explained, a markup language.
But the people reviewing your resume may very well not care about such a formal distinction. I'd follow the good advice given by others and list it under a "Technologies" type of section.
I think that it definitely has its place on a resume. Knowledge of HTML is valuable, and there really is a lot to know, what with cross-browser compatibility issues and standards which should be followed.
I wouldn't list HTML under "programming languages" alongside C# or something, but it's worth noting your experience.
No - there's a big prejudice in IT against web design; but in this case the "real" programmers are on pretty firm ground.
If you've done a lot of web design work you've probably done some JavaScript, so you can put that down under 'programming languages'; if you want to list HTML as well, then I agree with the answer that suggests "Technologies".
But unless you're targeting agents who're trying to tick boxes rather than find you a good job, a bare list of things you've used doesn't really look all that good. You're better off listing the projects you've worked on and detailing the technologies you used on each; that demonstrates that you've got real experience of using them rather than just that you know some buzzwords.
I get around this problem by not having a "programming languages" section on my resume. Instead I label it simply as "languages", and I stick HTML and CSS at the end. I'd rather make life easier for the reviewer so that they can see whether mine checks-off all their requirements.
Only fools would disregard an applicant because he or she listed HTML under "languages" instead of some other label, especially since there is no industry standard. And who wants to work for fools?
Well, L is for language, but it doesn't imply programming language. After all, English or French are (natural) languages too! ;-)
As said above, put them under a subsidiary section, Technology seems to be a good term.
(Looking at my own resume, not updated in a while) I have made a section just called "Languages", so I can't get wrong... :-D
I have put "(X)HTML and CSS, XML/DTD/Schema and SVG" at the end of the section, clearly separated.
In French, I have a section "Langages" (programming and markup) and another "Langues" (French/English). In the English version, I titled both at "Languages", which is clumsy now that I think of it, although context clarify this. I should find a better formulation.
HTML is in no way a programming language.
Programming languages deals with ''proccessing functions'', etc. HTML just deals with the visual interface of a web page, where the actual programming handles the proccessing. PHP for example.
If anyone really knows programming, I really can't see how people can mistake HTML for an actual programming language.
In recruitment terms, having been on both sides of the fence, definitely put HTML under 'programming languages', or perhaps more safely under 'technologies'
Yes, we all know that it is a Markup Language and not a Programming Language. but a) Recruitment Agencies don't know and don't care, and b) employers don't know and don't care. Really.
And pointing out their ignorance will only serve you ill. And the techies who eventually see your CV will be grateful for a candidate who has heard of HTML, and won't worry about the taxonomy.
Honestly, it isn't an issue.
List it under technologies or something. I'd just leave it off if I were you as it's pretty much expected that you know HTML and XML at this point.