Remove first line from HTML Markup Field using RegEx - html

I have a single text field that contains HTML markup. The system that generates this field content always seems to generate a first line with a non-visible carriage return value in it and I can't seem to prevent if from doing so.
Does anyone know of a way (perhaps using a Regular Expression), to remove that first line from this text field?
I'd prefer to leave all other instances of the carriage return values in the field as is, so if it's a RegEx statement that will just remove the first line of a text field, that would work for me.
Any suggestions most welcomed.
Cheers,
Wayne

Usually the trim (often removes whitespaces, CR ) method is used for this in many programming languages. You did not state in what language you will be doing this...

Related

How to match text and skip HTML tags using a regular expression?

I have a bunch of records in a QuickBase table that contain a rich text field. In other words, they each contain some paragraphs of text intermingled with HTML tags like <p>, <strong>, etc.
I need to migrate the records to a new table where the corresponding field is a plain text field. For this, I would like to strip out all HTML tags and leave only the text in the field values.
For example, from the below input, I would expect to extract just a small example link to a webpage:
<p>just a small <a href="#">
example</a> link</p><p>to a webpage</p>
As I am trying to get this done quickly and without coding or using an external tool, I am constrained to using Quickbase Pipelines' Text channel tool. The way it works is that I define a regex pattern and it outputs only the bits that match the pattern.
So far I've been able to come up with this regular expression (Python-flavored as QB's backend is written in Python) that correctly does the exact opposite of what I need. I.e. it matches only the HTML tags:
/(<[^>]*>)/
In a sense, I need the negative image of this expression but have not be able to build it myself.
Your help in "negating" the above expression is most appreciated.
Assuming there are no < or > elsewhere or entity-encoded, an idea using a lookbehind.
(?:(?<=>)|^)[^<]+
See this demo at regex101
(?:(?<=>)|^) is an alternation between either ^ start of the string or looking behind for any >. From there [^<]+ matches one or more characters that are not < (negated character class).

How to limit simple form input to 50 characters

Is it possible to limit a simple form input to only 50 characters without javascript?
I have used the max_length attribute, however this includes blank spaces which is not what i want.
I've attempted to use pattern (as suggested on another post), but i can't seem to get that to work either.
Thanks
I don't know why you don't want it to include blanks.
Usually I use max_length including blanks and leave it to the user to trim their excess whitespace. I'm not disagreeing, I honestly don't know what your requirement is.
If you want to allow leading and trailing whitespace, but are willing to leave it to the user to replace excess whitespace within the text to one whitespace character then this is the pattern you want:
<input pattern="^\s*.{0,50}\s*$">
Sometimes for multiline regular expressions, \A is used instead of ^ and \z is used instead of $, but I'm not sure HTML supports that in their regular expressions.

Regular Expression for HTML attributes

I need to write a regular expression to catch the following things in bold
class="something_A211"
style="width:380px;margin-top: 20px;"
I have no idea how to write it, can someone help me?
I need this because, in html file i have to replace (whit notepad++) with empty, so i want to have a clear < tr > or < td > or anything else.
Thank you
You can use a regex like this to capture the content:
((?:class|style)=".*?")
Working demo
However, if you just want to match and delete that you can get rid of capturing groups:
(?:class|style)=".*?"
For all constructions like something="data", you can use this.
[^\s]*?\=\".*?\"
https://regex101.com/r/oQ5dR0/1
The link shows you what everything does.
To explain it briefly, a non space character can come before the "=" any mumber of times, then comes the quotes and info inside of them.
The question mark in .*? (and character any number of times) is needed so only the minimum amount of characters will be used (instead of looking for the next possible quotes somewhere further along)

HTML attribute value question

what does it mean a value that contains no U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters. Can some one explain in layman's terms and give an example?
I guess it means a string that doesn't contain a line feed or carriage return character, like this_one.
here_is
one_that_
does
Update
I got this info from w3.org
Please link to this. I thought it may have been don't use them in your HTML attributes, but I just validated a page with a multiline title attribute with the W3C validator.
When you press Enter in a text editor to go to the next line, an invisible LINE FEED and/or CARRIAGE RETURN character is inserted.
Some HTML attributes cannot have any line breaks in their values, according to the specification,
That has nothing to do with HTML attributes or values. LF and CR are end of line characters. Wikipedia has an excellent article about them. What are you trying to accomplish and where are you getting this error?
In HTML, common commands will include an element, an attribute and a value. For example, in <A HREF ="somevalue"> A is the element, HREF is the attribute and somevalue is the value.
When you say values cannot have a carriage return or a line feed, then the value statement should not look like this:
<A HREF ="somevalue ENTER
somevalue continuing after a carriage return and line feed"></A>
Avoid that. Instead, that same information should be typed, letting the code wrap around on its own.

Actionscript3 E4X XML and CSS: Do I really have to use CDATA?

When working with CSS inside of XML such as
<span class="IwuvAS3"></span>
when parsed in flash, if I don't use CDATA like the following:
<![CDATA[<span class="IwuvAS3"></span>]]>
then the parsed data drops down a line for every "<" character it sees.
When parsing the data into a single-line text field, nothing was shown because it was actually down a line. Soon as I wrap it inside of CDATA it works great. I have played with prettyIndent, and as I understand ignoreWhite is true by default.
Is there a way to parse the data without the use of CDATA and keep the implied line breaks out?
EDIT 1 (10/10/08): Thank you, but I am actually looking for a Function or Method. Escaping each is much more cumbersome than using CDATA. The only reason I don't want to use CDATA is that I was taught to stay clear of it. If ActionScript has a method associated to E4X XML handling that will remove the requirement to wrap my XML in CDATA, I would love to know about it.
EDIT 1 (10/15/08): Thanks Philippe! I never would have thought that HTML formatting in Flash is treated as whitespace. The answer was
textField.condenseWhite = true;
<3AS3
Set the TextField's condenseWhite property to true - so only < br/> tags will generate linebreaks.
You could escape the "<" characters (and &, ", >, ', among others) as entities instead.