Is HTML considered a programming language? [closed] - html

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I guess the question is self-explanatory, but I'm wondering whether HTML qualifies as a programming language (obviously the "L" stands for language).
The reason for asking is more pragmatic—I'm putting together a resume and don't want to look like a fool for listing things like HTML and XML under languages, but can't figure out how to classify them.

No, HTML is not a programming language. The "M" stands for "Markup". Generally, a programming language allows you to describe some sort of process of doing something, whereas HTML is a way of adding context and structure to text.
If you're looking to add more alphabet soup to your CV, don't classify them at all. Just put them in a big pile called "Technologies" or whatever you like. Remember, however, that anything you list is fair game for a question.
HTML is so common that I'd expect almost any technology person to already know it (although not stuff like CSS and so on), so you might consider not listing every initialism you've ever come across. I tend to regard CVs listing too many things as suspicious, so I ask more questions to weed out the stuff that shouldn't be listed. :)
However, if your HTML experience includes serious web design stuff including Ajax, JavaScript, and so on, you might talk about those in your "Experience" section.

YES, a declarative programming language.
You really want to list the most important things you know that are relative to the job you're applying for on your resume. If you list ASP.NET but don't list HTML, even though it's somewhat obvious, there are a lot of managers and/or HR types that will assume you don't know HTML since it's not listed. I've had it happen to me before.
Update - Some say no it isn't a programming language, and you may not agree with me on this, but regardless on a resume it IS a programming language. You get HR types looking at your resume before the hiring manager even sees it. If the manager says you need to know HTML, and it's not listed in the 'programming languages' section then the HR person may disregard you resume thinking you don't know it because it's not listed.
Update 6-8-2012: Any instruction that tells the computer to do something is a programming language. So even after all these years, I still stand by my answer. HTML is a programming language. Something that isn't a programming language would be XML.

No, the clue is in the M - it's a Markup Language.

On some level Chris Pietschmann is correct. SQL isn't Turing complete (at least without stored procedures) yet people will list that as a language, TeX is Turing complete but most people regard it as a markup language.
Having said that: if you are just applying for jobs, not arguing formal logic, I would just list them all as technologies. Things like .NET aren't languages but would probably be listed as well.

The 'M' stands for a 'Markup'. It's a 'Markup Language' not a programming language. Some people will disagree with this, but my opinion is that if it lacks logical constructs (conditional branching, iteration, etc) its not really a programming language.
As for the resume, I would suggest putting HTML and XML under a section like 'Technologies'. I usually have a section like this where I list things like version control software, OS's I've developed for, build systems, etc.

No, HTML is a not a programming language. It is called "markup" for that reason.
If you're going to say that HTML is a programming language, then you might as well include things such as word documents, as they too are based on ML, or 'Markup Language'.
Simply put--HTML defines content!

I think not exactly a programming language, but exactly what its name says: a markup language.
We cannot program using just pure, HTML. But just annotate how to present content.
But if you consider programming the act of tell the computer how to present contents, it is a programming language.

In the advanced programming languages class I took in college, we had what I think is a pretty good definition of "programming language": a programming language is any (formal) language capable of expressing all computable functions, which the Church-Turing thesis implies is the set of all Turing-computable functions.
By that definition, no, HTML is not a programming language, even a declarative one. It is, as others have explained, a markup language.
But the people reviewing your resume may very well not care about such a formal distinction. I'd follow the good advice given by others and list it under a "Technologies" type of section.

I think that it definitely has its place on a resume. Knowledge of HTML is valuable, and there really is a lot to know, what with cross-browser compatibility issues and standards which should be followed.
I wouldn't list HTML under "programming languages" alongside C# or something, but it's worth noting your experience.

No - there's a big prejudice in IT against web design; but in this case the "real" programmers are on pretty firm ground.
If you've done a lot of web design work you've probably done some JavaScript, so you can put that down under 'programming languages'; if you want to list HTML as well, then I agree with the answer that suggests "Technologies".
But unless you're targeting agents who're trying to tick boxes rather than find you a good job, a bare list of things you've used doesn't really look all that good. You're better off listing the projects you've worked on and detailing the technologies you used on each; that demonstrates that you've got real experience of using them rather than just that you know some buzzwords.

I get around this problem by not having a "programming languages" section on my resume. Instead I label it simply as "languages", and I stick HTML and CSS at the end. I'd rather make life easier for the reviewer so that they can see whether mine checks-off all their requirements.
Only fools would disregard an applicant because he or she listed HTML under "languages" instead of some other label, especially since there is no industry standard. And who wants to work for fools?

Well, L is for language, but it doesn't imply programming language. After all, English or French are (natural) languages too! ;-)
As said above, put them under a subsidiary section, Technology seems to be a good term.
(Looking at my own resume, not updated in a while) I have made a section just called "Languages", so I can't get wrong... :-D
I have put "(X)HTML and CSS, XML/DTD/Schema and SVG" at the end of the section, clearly separated.
In French, I have a section "Langages" (programming and markup) and another "Langues" (French/English). In the English version, I titled both at "Languages", which is clumsy now that I think of it, although context clarify this. I should find a better formulation.

HTML is in no way a programming language.
Programming languages deals with ''proccessing functions'', etc. HTML just deals with the visual interface of a web page, where the actual programming handles the proccessing. PHP for example.
If anyone really knows programming, I really can't see how people can mistake HTML for an actual programming language.

In recruitment terms, having been on both sides of the fence, definitely put HTML under 'programming languages', or perhaps more safely under 'technologies'
Yes, we all know that it is a Markup Language and not a Programming Language. but a) Recruitment Agencies don't know and don't care, and b) employers don't know and don't care. Really.
And pointing out their ignorance will only serve you ill. And the techies who eventually see your CV will be grateful for a candidate who has heard of HTML, and won't worry about the taxonomy.
Honestly, it isn't an issue.

List it under technologies or something. I'd just leave it off if I were you as it's pretty much expected that you know HTML and XML at this point.

Related

Practically speaking, why semantic markup?

Does Google really care if I use an <h5> as a <b> tag?
What are some real-world, practical reasons I should care about semantic markup?
A few examples
Many visually impaired people rely on speech browsers to read pages back to them. These programs cannot interpret pages very well unless they are clearly explained. In other words semantic code aids accessibility
Search engines need to understand what your content is about in order to rank you properly on search engines.
Semantic code tends to improve your placement on search engines, as it is easier for the "search engine spiders" to understand.
However, semantic code has other benefits too:
As you can see from the example above, semantic code is shorter and so downloads faster.
Semantic code makes site updates easier because you can apply design style to headings across an entire site instead of on a per page basis.
Semantic code is easier for people to understand too so if a new web designer picks up the code they can learn it much faster.
Because semantic code does not contain design elements it is possible to change the look and feel of your site without recoding all of the HTML.
Once again, because design is held separately from your content, semantic code allows anybody to add or edit pages without having to have a good eye for design.
You simply describe the content and the cascading style sheet defines what that content looks like.
Source: boagworld
Semantics and the Web
Semantics are the implied meaning of a subject, like a word or sentence. It aids how humans (and these days, machines) interpret subject matter. On the web, HTML serves both humans and machines, suggesting the purpose of the content enclosed within an HTML tag. Since the dawn of HTML, elements have been revised and adapted based on actual usage on the web, ideally so that authors can navigate markup with ease and create carefully structured documents, and so that machines can infer the context of the wonderful collection of data we humans can read.
Until — and perhaps even after — machines can understand language and all its nuances at the same level as a human, we need HTML to help machines understand what we mean. A computer doesn’t care if you had pizza for dinner. It likely just wants to know what on earth it should do with that information.
HTML semantics are a nuanced subject, widely debated and easily open to interpretation. Not everyone agrees on the same thing right away, and this is where problems arise.
Allow me to paint a picture:
You are busy creating a website.
You have a thought, “Oh, now I have to add an element.”
Then another thought, “I feel so guilty adding a div. Div-itis is terrible, I hear.”
Then, “I should use something else. The aside element might be appropriate.”
Three searches and five articles later, you’re fairly confident that aside is not semantically correct.
You decide on article, because at least it’s not a div.
You’ve wasted 40 minutes, with no tangible benefit to show for it.
— Divya Manian
This generated a storm of responses, both positive and negative. In Pursuing Semantic Value By Jeremy Keith argued that being semantically correct is not fruitless, and he even gave an example of how <section> can be used to adjust a document’s outline. He concludes:
But if you can get past the blustery tone and get to the kernel of the article, it’s a fairly straightforward message: don’t get too hung up on semantics to the detriment of other important facets of web development.
— Jeremy Keith
Naming Things
Of all the possible new element names in HTML5, the spec is pretty set on things like <nav> and <footer>. If you’ve used either of those as a class or id in your own markup, it’s no coincidence. Studies of the web from the likes of Google and Opera (amongst others) looked at which names people were using to hint at the purpose of a part of their HTML documents. The authors of the HTML5 spec recognised that developers needed more semantic elements and looked at what classes and IDs were already being used to convey such meaning.
Of course, it isn’t possible to use all of the names researched, and of the millions of words in the English language that could have been used, it’s better to focus on a small subset that meets the demands of the web. Yet some people feel that the spec isn’t yet doing so.
Source: html5doctor (This goes on for quite a while so I've only put a few examples here.)
Hope this helps!

If HTML is not a programming language, what am I doing if I am doing HTML codes?

I am creating an article about programming. If I am using C#, for example, I am a C# programmer and I am programming using C#. How about HTML? If HTML is not a programming language, and it is a markup language, what is the correct verb applicable to a person coding in HTML? Is it just coding?
Edit 2:
Wow, apparently you can call HTML/CSS a programming language because HTML5/CCS3 is Turing-Complete by by accident (for first link, check comments).
Main Answer:
"How about HTML?" I take the stance that to be programming, the language has to be Turing Complete. So in my definition you can't be a Regex programmer. The more lean definition is that it needs variables & control statements, as simple as having an 'if' and a 'branch' instruction. So as you point out, pure HTML is not a programming language. But HTML in the real world isn't just html text files!
I would call an HTML user a HTML Techonologist or HTML author but if someone said they were a HTML coder or even a programmer, I wouldn't bat an eye or try to correct them. I don't think many people write plain HTML and the moment one adds Javascript or allows pages to be generated by PHP, python, or anything else it crosses the programming language definition. (edit 2: The moment you add CSS3 it becomes Turing Complete and thus a 'real' programming language)
Edit 1:
I like an answer I found about why 'real programmers' are so defensive over reminding people HTML/CSS is not 'real programming'. The OP's question dealt with what to call HTML authors but this question comes up because 'real programmers' are so firm in making a distinction between their work. I like this quote from Kramli (linked before)
There are times when the difference between programming languages and other languages really does matter. Quite often, however, we can all communicate perfectly effectively when just lump them all in together.
You have three questions...
Q1: I am a C# programmer and I am programming using C#. How about HTML?
A1: I am coding in HTML
Q2: If HTML is not a programming language, and it is a markup
language, what is the correct verb applicable to a person coding in
HTML?
A2: Verb = Coding, But I think you are looking for the term Coder
Q3: Is it just coding?
A3: Yes
HTML is a markup language, hence the name HyperText Markup Language.
You are effectively the modern day equivalent of a typesetter in the print industry.
If you have minimal input in the page creation process then you're probably a Coder, however if you have significant input into page layout, then the job role is normally referred to as being a Web Designer. If you're writing lots of scripts (in say PHP, Python, Ruby, Perl or whatever your least worst option is) to produce the pages in a reasonably professional manner, then you can award yourself the wonderful title of Web Developer :-)
If you devote some thought as to how all these scripts are going to hang together, and how users are going to interact with your site, then you can claim to be an Analyst. :-)
In the Internet, job roles are quite fuzzy; personally I consider myself a mix of all of the above, concentrated more on the Developer/Analyst side as whilst I understand the technical aspects of HTML and CSS, I don't have the appreciation of good design and presentation to fully claim being a Designer in a professional context.
I also suggest you read the answers to the related questions on the right of this page...
As with any language - be it musical, programmatic, mathematical,hyper text or anything in between - as a content creator you are a writer.
Specifically for a mark up language (such as HTML) you are annotating a document with tags that are separate entities from the text between them, and so could be considered an Editor, Author, or Designer because you are generally directing the content of a page.
Differences arise with HTML compared to writing technical documents using, for example, DITA. Where as a DITA document has its architecture and tags, it does not necessarily require a style sheet to be displayed. HTML on the other hand is normally consumed through a web browser so requires CSS transformation to be shown in a readable fashion. For this reason, formatting becomes as important as content and people writing HTML and CSS as a combination are referred to as Web Designers.
If you begin throwing in programming languages such as PHP or JScript you will be referred to as a Web Developer, but developer and designer are often interchangeable between the two options.
what is the correct verb applicable to a person coding in HTML?
coding is a process that involves using programming language. since HTML is not a programming language you can use writing instead of coding. as simple as that.
No, HTML is not a programming language. The "M" stands for "Markup". Generally, a programming language allows you to describe some sort of process of doing something, whereas HTML is a way of adding context and structure to text.
If you're looking to add more alphabet soup to your CV, don't classify them at all. Just put them in a big pile called "Technologies" or whatever you like. Remember, however, that anything you list is fair game for a question.
HTML is so common that I'd expect almost any technology person to already know it (although not stuff like CSS and so on), so you might consider not listing every initialism you've ever come across. I tend to regard CVs listing too many things as suspicious, so I ask more questions to weed out the stuff that shouldn't be listed. :)
However, if your HTML experience includes serious web design stuff including Ajax, JavaScript, and so on, you might talk about those in your "Experience" section.

Regular Expressions vs XPath when parsing HTML text

I want to parse a HTML text and find special parts. For example a text in 3rd div of 1st row and 2nd column of a table. I have 2 options to parse: Regular Expressions and XPath. What is advantages and disadvantages of each one?
thanks
It somewhat depends on whether you have a complete HTML file of unknown but well-formed content versus having merely a snippet or an expanse of HTML of completely known content which may or may not be well-formed.
There is a difference between editing and parsing, you see.
It is one thing to be editing your own HTML file that you wrote yourself or are otherwise staring right in the face, and you issue the editor command
:100,200s!<br */>!!g
To remove the breaks from lines 200–300.
It is quite another to suck down whatever HTML happens to be at the other end of a URL and then try to make some sense out it, sight unseen.
The first calls for a regex solution — the very one shown above, in fact. To go off writing some massively overengineered behemoth to do a fall parse to set up the entire parse tree just to do the simple edit shown above is quite simply wrong. It’s also its own punishment.
On the other hand, using patterns to parse out (as opposed to lex out) an entire HTML document that can contain all kinds of whacky things you aren’t planning for just cries out for leveraging someone else’s hard work intead of recreating the wheel for yourself, and badly at that.
However, there’s something else nobody likes to mention, and that’s that most people just aren’t competent at regexes. They don’t really understand them. They don’t know how to test them or to craft them. They don’t know how to make them readable and maintainable.
The truth of the matter is that the overwhelming majority of regex users cannot even manage as simple and basic a thing as matching an arbitrary HTML tag using a regex, even when things gotchas like alternate encodings and CDATA sections and redefined entitities and <script> contents and archaic never-seen forms are all safely dispensed with.
It’s not because it’s hard to do; it isn’t, actually. It’s just that the people trying to do it understand neither regexes nor HTML particularly well, and they don’t know they don’t know, and so they get themselves in way over their heads more quickly than they realize. And then they have a complete disaster on their hands.
Plus it’s been done before, and correctly. Might as well learn from someone else’s mistakes for a change, eh? It would probably help to have a few canned regexes at your disposal to go at frequently manipulated things. This is especially useful for editing.
But for a full parse, you really shouldn’t try to embed a full HTML grammar inside your pattern. Honest, you really shouldn’t. Speaking as someone has actually can and has done this, I unlike 99.9999% of the responders here the credibility of actual experience in this area when I advise against it. Sure, I can do it, but I almost never want to, and I certainly don’t want you to try it at home unsupervised. I can’t be held responsible for any damage that might ensue. :)
Sure, this may sound like “Do as I say, not as I do,” but if your level of regex mastery were at a level that allowed you to contemplate such a thing, you would not be asking this question. As I mentioned, almost no one who uses regexes can actually match an arbitrary HTML tag, simple as that is. Given that you need that sort of building block before writing your recursive descent grammar, and given that next to nobody can even manage that simple building block, well...
Given that sad state of affairs, it’s probably best to use regexes for simple edit jobs only, and leave their use for more complete solutions to real regex wizards, for they are subtle and quick to anger. Meaning of course the regexes, not (just) the wizards.
But sure, keep some canned regexes handy for doing simple editing rather than full parsing. That way you won’t be forced to redevise them each time from first principles. I do keep a few of these around, but then I also keep simple frameworks that allow me to edit a particular structural element of the HTML, like the plain text or the tag contents or the link references, etc, and those all use a full parser, letting me then surgically target just the parts I want in complete confidence I haven’t forgotten something.
More as a testament to what is possible than what is advisable, you can see some answers with more, um, “heroic” pattern matching, including recursion,
here,
here,
here,
here,
here, and
here.
Understand that some of those were actually written for the express purpose of showing people why they should not use regexes, because some of them are really quite sophisticated, much moreso than you can expect in nonwizards. That difficulty may chase you away, which is ok, because it was sort of meant to.
But don’t let that stop you from using vi on your HTML files, nor should it scare you away from using its search or substitute commands. Don’t let the perfect be the enemy of the good. Sometimes good enough is exactly what you need, because the perfect would take more investment than it could ever be worth.
Understanding which out of several possible approaches will give you the most bang for your buck is something that takes time to learn, and no one can tell you the answer that works for you. They don’t know your dataset, your requirements, your skillset, your priorities. Therefore any categorical answer is automatically wrong. You have to evaluate these things for yourself.
I think XPath is the primary option for traversing XML-like documents. With RegExp, it will be up to you to handle the different forms of writing a tag (with multiple spaces, double quotes, single quotes, no quotes, in one line, in multi-lines, with inner data, without inner data, etc). With XPath, this is all transparent to you, and it has many features (like accessing a node by index, selecting by attribute values, selecting simblings, and MANY others).
See how powerfull it can be at http://www.w3schools.com/xpath/.
EDIT: See also How do HTML parses work if they're not using regexp?
XPath is less likely to break if the web developer does any minor changes. That would be my choice.
Here is the canonical Stackoverflow explanation for why you should not parse HTML with regex:
RegEx match open tags except XHTML self-contained tags
In general, you cannot parse HTML with regex because regex is not made to parse HTML. Just use XPath.

Why are comments in HTML/CSS so infrequently used? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I apologise in advance if this question is asked too often, but I've been doing light web development lately and noticed this across many different (and somewhat prominent) webpages.
I see comments (of varying quality) being used in "more traditional" programming languages, but it's very, very uncommon for me to see them utilised in HTML or CSS. (I've seen it more in JavaScript, though.) I can usually figure out what's going on since HTML isn't very complicated, but why is this so?
Thanks!
Perhaps the commented version is kept locally, and minified/gzipped versions are the ones shown to the public. This makes loading times faster than those with the additional commentary.
I think because HTML (and CSS for the most part) is simple markup and rarely contains any complex logic in it (JavaScript maybe). So the markup itself is self-explanatory and requires no additional comments to explain/clarify what it is or what it does.
The pages you look at the internet are just the end products. Bandwidth costs money and noone wants to pay for sending noticies of the inner workings of their site to visitors most ofwhom never takes a look at the page source.
The backend which generates these pages (as most of the web pages are generated) can have comments of course. There are security concerns too, you don't want to give out unnecessary information about the inner workings of your site.
There are sepcialized tools too just to remove unnecessary content from pages (and css) to create smaller files.
I would speculate because HTML is a markup language, and all the content is quite transparent - so not necessary to comment compared with a procedural language where the logic can be complex, and a hint helps you to understand it.
I would also speculate it is because a large portion of the HTML is repetitive between pages and sites, so needs no explanation as it has been seen many times before.
You would not want to end up with this would you :)
<h1>My great site</h1>
<!-- heading level one - 'My great site' -->
Comments are best used to explain tricky bits of a file, and there's not a lot that's tricky about HTML or CSS.
WIth that said, if i'm doing something that works in some specific browser, i'll add a comment about why it's done the way it is. I'll often also use a comment to add notes about bug fixes (especially if there's a ticket for the bug and double-especially if there's a hundred other people working on the CSS as well). But often for HTML, it's almost no effort to turn the HTML comment into a comment in the server-side language, hiding it from the browser completely. So the code could be commented, without you ever knowing it.
I personally don't use them because during web development, comments tend to be personalized, especially ones you would put into HTML. Instead, I put them in PHP in HTML to make them invisible to the source reader.
As for CSS comments, the only time I have those is when I have over 5000 lines of CSS code and need to separate it into sections. Even then, I prefer to make several smaller files which are self-explanatory through file location and file name, rather than use extensive commenting.
I believe HTML and CSS to be simple and descriptive enough for anyone to derive their meaning very fast without help from comments.
CSS, especially, is incredibly self-describing. E.g.
margin: 0;
There’s not a lot to add to that. In CSS, I always try to add comments when the purpose of some code isn’t immediately obvious, e.g.
overflow: hidden; /* Contains descendant floats */
And in HTML, I try to make class names and id values self-describing, like good class, method and variable names in programming languages.
The most important reason for that is as you stated, there is no such a huge need for comments in HTML and CSS.
Secondly, commenting in HTML is very difficult compared to programming languages, putting a <!-- --> mark needs time, whereas // comes from inside.
Quite simply, HTML and CSS don't need as many comments. For the most part, the CSS and HTML you write does exactly what it says. While it is certainly possible to obfuscate HTML and CSS (especially CSS), it does not happen nearly as often as it does with programming languages. In programming, you need comments because the code is not always self-explanatory and you often do strange things for non-obvious reasons.
Commenting in HTML is only needed when you are doing strange things you neeed to explain. Otherwise, it is just as ridiculous as making comments for code that has an obvious behavior, like this:
var i = 3; //Declares a variable called i and assigns its value to 3.
Example of why you might need a comment in HTML:
<td> </td> <!--IE does not display borders on cells without content-->
I would assume that it is
because HTML is not a programming language and you usually do not have to explain why/how something works..
and also because people try to minimize the size of their pages..

Did HTML's loose standards hurt or help the internet

I was reading O'Reilly's Learning XML Book and read the following
HTML was in some ways a step backward.
To achieve the simplicity necessary to
be truly useful, some principles of
generic coding had to be sacrificed.
... To return to the ideals of
generic coding, some people tried to
adapt SGML for the web ... This proved
too difficult.
This reminded me of a StackOverflow Podcast where they discussed the poorly formed HTML that works on browsers.
My question is, would the Internet still be as successful if the standards were as strict as developers would want them to be now?
Lack of standard enforcement didn't hurt the adoption of the web in the slightest. If anything, it helped it. The web was originally designed for scientists (who generally have little patience for programming) to post research results. So liberal parsers allowed them to not care about the markup - good enough was good enough.
If it hadn't been successful with scientists, it never would have migrated to the rest of academia, nor from there to the wider world, and it would still today be an academic exercise.
But now that it's out in the wider world, should we clamp down? I see no incentive for anyone to do so. Browser makers want market share, and they don't get it by being pissy about which pages they display properly. Content sites want to reach people, and they don't do that by only appearing correctly in Opera. The developer lobby, such as it is, is not enough.
Besides, one of the reasons front-end developers can charge a lot of money (vs. visual designers) is because they know the ins and outs of the various browsers. If there's only one right way, then it can be done automatically, and there's no longer a need for those folks - well, not at programmer salaries, anyway.
Most of the ambiguity and inconsistency on the web today isn't from things like unclosed tags - it's from CSS semantics being inconsistent from one browser to the next. Even if all web pages were miraculously well-formed XML, it wouldn't help much.
The fact that html simply "marks up" text and is not a language with operators, loops, functions and other common programming language elements is what allows it to be loosely interpreted.
One could correlate this loose interpretation as making the markup language more accessible and easily used thus allowing more "uneducated" people access to the language.
My personal opinion is that this has little to do with the success of the Internet. Instead, it's the ability to communicate and share information that make the internet "successful."
It hurt the Internet big time.
I recall listening to a podcast interview with someone who worked on the HTML 2.0 spec and IIRC there was a big debate at the time surrounding the strictness of parsers adhering to the standard.
The winners of the argument used the "a well implemented system should be liberal in what it accepts and strict in what it outputs" approach which was popular at the time.
AFAICT many people now regard this approach as overly simplistic - it sounds good in principle, but actually rarely works in practice.
IMO, even if HTML was super strict from the outset, it would still have been simple enough for most people to grasp. Uptake might have been marginally slower at the outset, but a huge amount of time/money (billions of dollars) would have been saved in the medium-long term.
There is a principle that describes how HTML and web browsers are able to work and interoperate with any success at all:
Be liberal in what you accept, and conservative in what you output.
There needs to be some latitude between what is "correct" and "acceptable" HTML. Because HTML was designed to be "human +rw", we shouldn't be surprised that there are so many flavours of tag soup. Flexibility is HTML's strength wherever humans need to be involved.
However, that flexibility adds processing overhead which can be hard to justify when you need to create something for machine consumption. This is the reason for XHTML and XML: it takes away some of that flexibility in exchange for predictable input.
If HTML had been more strict, something easier would have generated the needed network effect for the internet to become mainstream.