What type of collation to use in a table - mysql

I have what is a simple problem that hopefully has a simple solution:
I have a site written in PHP and HTML, using a Linux server with MySQL.
It has a form where users fill in some personal info, including a textarea in which
they are meant to copy and paste a test CV.
I have also set up a back end for my client where she can query the database to see who
registered and retrieve their info.
My problem is that when I query and echo the content of the table row that contains the
CV (alot of text), the line breaks are all gone - everything is printed in one line.
Does someone know if I can solve this by using the right kind of collation/character encoding
for that specific row that contains the users's cvs? I am hoping that such collation exists that saves and maintains line breaks.

Collation has nothing to do with it - collations and charsets won't touch your newlines at all. If you want to see it, look at the page source of the echo'd text.
HTML, however, treats line breaks like all other whitespace under normal circumstances, so they won't be visible when you echo them to a browser. You shouldn't be outputting plain text as HTML anyway, because they're not the same. You must convert the plain text to HTML first; a simple method is to call htmlspecialchars() and nl2br() on the text (in that order, otherwise htmlspecialchars will eat your newly-created br tags and turn them into <br/>. Failing to do so will not only create undesired output, it can also be a major security risk (XSS).

Use nl2br($text) to add HTML line breaks.

I don't think collation is related to this. Break lines from the textarea come in the form of the \n or \r characters. If you are not doing anything "weird" those break lines should be stored into the DB.
I think your problem is when you echo the content of the table, since the browser doesn't display the \n and \r as new lines, you have to either substitute them for <br/> element or wrap each paragraph in a <p></p>
You can use nl2br() for that.

or how about wrapping the text in a <pre> </pre>
see: http://www.w3schools.com/tags/tag_pre.asp

Related

Line ending charactor LFs are automatically changed to CRLFs in HTML textarea

I noticed that all LFs are automatically changed to CRLFs if I put them into a HTML textarea.
■ Questions:
where and what causes this behavior?
is this because of Windows Operation system, i.e. it will not happen if using a different Operating system such as MacOS? (I just experienced this on a windows machine, not yet tested on a Mac though...)
or is this something which depends on Browser? (I have seen this behavior on Chrome, IE, and Firefox. Not yet tested on Safari...)
or is this something only happens on my editor? (i.e I am using sakura editor)
If possible, how to preserve the LF so that it does not get changed into CRLF?
■ Steps to reproduce this:
find a textarea where you can input, for example the following w3school website.
https://www.w3schools.com/tags/tryit.asp?filename=tryhtml_textarea
prepare a text that at least 2 lines with some LFs using an editor which can detect the line ending charactors (so that you can make sure you have some LFs).
※ I am using Sakura editor as an example.
copy and paste the text prepared in step 2 to the textarea.
once text is copied into the textarea, this time, copy the entire content of the textarea.
paste the content of the textarea back to your editor.
the line ending characters all become CRLFs.
■ P.S.
Please see the screenshots for details
left side is original text with 3 LFs
right side is the content copied back from the textarea and all LFs becomes CRLFs)
「↓」indicated LF
「⏎」indicated CRLF
Thanks
I think I find myself the answer at least some helpful information, i will just leave a record in case there are people seeking for the answer for similar questions.
where and what causes this behavior?
For historical reasons, the element’s value is normalized in three different ways for three different purposes. The raw value is the value as it was originally set. It is not normalized. The API value is the value used in the value IDL attribute. It is normalized so that line breaks use U+000A LINE FEED (LF) characters. Finally, there is the value, as used in form submission and other processing models in this specification. It is normalized so that line breaks use U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pairs, and in addition, if necessary given the element’s wrap attribute, additional line breaks are inserted to wrap the text at the given width.
for more information please read:
https://www.w3.org/TR/html5/forms.html#the-textarea-element
If possible, how to preserve the LF so that it does not get changed into CRLF?
I guess there are a lot of ways. Using javascript to replace all /r/n to /n before submit a form will likely be a client side solution. or if it doesn't have the necessity to be handled on client side which is exactly my case, I do the replacement process on the server side to force convert all line ending characters to LF.

Linebreaks in middle of URL in HTML

I have this strange issue, where I get random linebreaks in my HTML when I copy & paste links from mails I get.
The problem is, linebreaks look exactly like any other whitespace and on long lines I have problems seeing if there is any linebreaks.
Normally this wouldn't be a problem, but we are also using emailing system that doesn't like breaklines in middle of an element.
Is there a way to see these without manually scanning all the lines, which is impossible due to amount of mails we are sending.
Regex maybe?
I'm using Notepad++ as an editor.
In Notepad++, you can use "Extended" mode in the FIND Option. Use "\r\n" to scan all the new lines in the file. Use "\r" to find all carriage returns in the file.

Strange symbol shows up on website (L SEP)?

I noticed on my website, http://www.cscc.org.sg/, there's this odd symbol that shows up.
It says L SEP. In the HTML Code, it display the same thing.
Can someone shows me how to remove them?
That character is U+2028 or HTML entity code 
 which is a kind of newline character. It's not actually supposed to be displayed. I'm guessing that either your server side scripts failed to translate it into a new line or you are using a font that displays it.
But, since we know the HTML and UNICODE vales for the character, we can add a few lines of jQuery that should get rid of the character. Right now, I'm just replacing it with an empty space in the code below. Just add this:
$(document).ready(function() {
$("body").children().each(function() {
$(this).html($(this).html().replace(/
/g," "));
});
});
This should work, though please note that I have not tested this and may not work as none of my browsers will display the character.
But if it doesn't, you can always try pasting your text block onto http://www.nousphere.net/cleanspecial.php which will remove any special characters.
Some fonts render LS as L SEP. Such a glyph is designed for unformatted presentations of the character, such as when viewing the raw characters of a file in a binary editor. In a formatted presentation, actual line spacing should be displayed instead of the glyph.
The problem is that neither the web server nor web browser are interpreting the LS as a newline. The web server could detect the LS and replace it with <br>. Such a feature would fit well with a web server that dynamically generates HTML anyway, but would add overhead and complexity to a web server that serves file contents without modification.
If a LS makes its way to the web browser, the web browser doesn't interpret it as formatting. Page formatting is based only on HTML tags. For example, LF and CR just affect formatting of the HTML source code, not the web page's formatting (except in <pre> sections). The browser could in principle interpret LS and PS (paragraph separator) as <br> and <p>, but the HTML standard doesn't tell browsers to do that. (It seems to me like it would be a good addition.)
To replace the raw LS character with the line separation that the content creator likely intended, you'll need to replace the LS characters with HTML markup such as <br>.
This is the solution for the 'strange symbol' issue.
$(document).ready(function () {
$("body").children().each(function() {
document.body.innerHTML = document.body.innerHTML.replace(/\u2028/g, ' ');
});
})
The jquery/js solutions here work to remove the character, but it broke my Revolution Slider. I ended up doing a search replace for the character on the wp_posts tabel with Better Search Replace plugin: https://wordpress.org/plugins/better-search-replace/
When you copy paste the character from a page to the plugin box, it is invisible, but it does work. Before doing DB replaces, always have a database (or full) backup ready! And be sure to uncheck the bottom checkbox to not do a dry run with the plugin.

Display the string with new line with out using \n in jsp

I have a form with a text area as one of the input fields. I want to store the user entered data. In the process I need to store the \n (enter key). But here the problem is in the DataBase (MySql) data is stored, but \n is not stored; i.e. data store as:
.
When I display the same data in the browser it shows in single line.
I need to split the data with \n and it should look like:
When I display the same data in the browser it shows in single line.
Yes, it would - because a newline in HTML doesn't get rendered as a newline. If you look at the raw HTML, I suspect you'll still see the line breaks... it's just that's not how it's rendered.
In order to represent line breaks in HTML, you need to use <br /> between lines, or something similar - or display it with a <pre> tag.
Basically, you need to format your raw text as HTML, however you decide to do that. If you're just dumping the HTML straight into your page, you may well find you already have issues if the text contains HTML tags - they should be escaped.
If you are seeing \n being stored in database, then you can write logic to include
<br/> tag while printing database values on your jsp.

How to display plain text in webpage?

I have inserted my code in mysql database using text area.
What I have save appears is like this in mysql
This is Line 1
test
This lis Line 3
Now, my problem is to display the saved "file" to my browser which I am expecting to appear like this.
This is Line 1
test
This lis Line 3
Has anyone have some situation like this?
Use htmlentities on your output display. You can save html or any code as is in mysql with no special attention. You will need to escape it though so user based input isn't malicious.
http://www.php.net/manual/en/function.htmlentities.php
htmlentities("URL", ENT_QUOTES, 'UTF-8');
If you run this in php you will display the whole html tag. Likewise, you can spew out results from a mysql query, wrapping the relevant content in htmlentities to achieve what you're looking to do.