acceptable url character replacement for apostrophes and hyphens - html

I have a string as such: My First - Recipe's I want to translate this into a readable URL.
I'd like to change the spaces to - however there already exists a hyphen in the string. Also, there is an apostrophe.
I read online that using underscores is bad in clean URLS which is why I want to use hyphens but i cannot figure out what to change the hyphen to when there already exists one as well as the apostrophe

While you can use Unicode in web-page URLs, in practice and for usability you're restricted in the characters they can use - for example, if you want to allow people to manually type-in the full address then it's best to avoid all non-alphanumeric characters, especially punctuation, and if possible also avoid visual-homographs such as L, l, 1, i, I, and so on.
So if you have a web-page for an article of writing whose internal title is "Kaajh'Káalbh!" then you'll want the URL-name to be kaajh-kaalbh.
Note that the conversion from "Kaajh'Káalbh!" to "kaajh-kaalbh" involves the loss of information - this is a "one-way function" - which is to say that given a particular output ("kaajh-kaalbh") it is not easy to determine the original input ("Kaajh'Káalbh!") - in this case it's because there can be multiple inputs that lead to the same output, so you cannot know what the original input is - because it could have been "Kaajh'Káalbh!" or "Kaajh Kaalbh" or "kaajh Kaalbh?" - and so on.
You could argue that you could still query the database and find which rows correspond to the input, and I imagine your query would look like this:
SELECT * FROM Articles WHERE GetUrlVersionOfTitle( Title ) = 'kaajh-kaalbh'
Where GetUrlVersionOfTitle is a function in your SQL that would perform the conversion like so:
GetUrlVersionOfTitle( x ) = x.ToLower().Replace( ' ', '-' ).Replace( '\'', '-' ).Replace( etc )...
...which means your query becomes non-Sargable (see also) and would have terrible runtime query performance (because the database system would need to run the function on every row in your table, every time - obviously that's not good). It also doesn't solve the problem of ensuring that at most 1 row has the same URL-name (to guarantee that only 1 row matches a given URL name input).
The solution then is to precompute the URL-name, store it in a separate column, and also have a UNIQUE constraint against it
CREATE TABLE Articles (
ArticleId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Title nvarchar(255) NOT NULL,
UrlTitle varchar(255) NOT NULL UNIQUE,
...
)
INSERT INTO Articles( Title, UrlTitle ) VALUES ( #title, #urlTitle )
(where #urlTitle is a parameter whose value is the precomputed URL-friendly version of Title).
And then it's simple to match the article corresponding to a given URL:
In ASP.NET MVC:
[Route("~/articles/{urlTitle}")]
public ActionResult GetArticle(String urlTitle) {
Article article
using( DbContext db = ... ) {
article = db.Articles.SingleOrDefault( a => a.UrlTitle == urlTitle );
}
return this.View( new ArticleViewModel( article ) );
}
In my own code, I generate URL-friendly titles by first converting text to a normalized Unicode representation, then stripping-out diacritics, and also dropping non-digit/letter characters, like so:
Note that this only really works for Latin script - I've never had to target a non-Latin system (e.g. Greek, Cyrillic, Arabic, Hebrew, Farsi etc) so YMMV, but the same principles apply:
public static String ConvertToUrlName(String title) {
if( title == null ) throw new ArgumentNullException(nameof(title));
// Convert to normalized Unicode
// see here: https://stackoverflow.com/a/249126/159145
title = title.Normalize( NormalizationForm.FormD );
StringBuilder sb = new StringBuilder( title.Length );
foreach(Char c in title) {
// If the character is a diacritic or other non-base character, then ignore it
if( CharUnicodeInfo.GetUnicodeCategory( c ) != UnicodeCategory.NonSpacingMark ) continue;
c = Char.ToLowerInvariant( c ); // convert to lower-case
if( Char.IsLetterOrDigit( c ) ) {
sb.Append( c );
}
else if( Char.IsWhiteSpace( c ) ) {
sb.Append( '-' );
}
// and ignore all other character classes, such as punctuation
}
String urlTitle = sb.ToString();
return urlTitle;
}
Ta-da.

Related

MYSQL: Validate Input with Regular Expressions REGEX

i have one question with regard to MYSQL. I want to create a function that is able to check whether an Input is given in a specific format.
The output should be in the following shape:
***x x (a) n (n) (n) (n)
with :
x = letters and numbers
n = numbers
a = letters
brackets = optional Values***
So my Solution until now is this (User 'Nick' helped me):
CREATE FUNCTION validate_number(testnumber VARCHAR(7))
RETURNS INT
DETERMINISTIC
RETURN testnumber REGEXP '^[[:alnum:]]{2}[[:alpha:]]?[[:digit:]]{1,4}$';
And this approach works for most cases.
But when i enter a value that exceeds the possible amount of elements (max elements = 7) i get no result.
example:
validate_number('00A00002345')
=> no result.
Do you guys have an idea what the problem is?
Thank you very much in advance.
you are actually pointing out the solution of the problem :)
just change VARCHAR(7) to something bigger VARCHAR(2000)
When I run your function, I get the error:
select validate_number('00A00002345')
Data too long for column 'testnumber' at row 1
You can add a length to the varchar.
CREATE FUNCTION validate_number (
in_testnumber VARCHAR(32000)
)
Or, use text:
CREATE FUNCTION validate_number (
in_testnumber text
)
RETURNS INT
DETERMINISTIC
BEGIN
RETURN (in_testnumber REGEXP '^[[:alnum:]]{2}[[:alpha:]]?[[:digit:]]{1,4}$');
END;

store euro currency format in mysql [duplicate]

On a new project I work on I have data in CSV format to import into a mysql table. One of the columns is a price field which stores currency in the european format ie. 345,83.
The isssue I have is storing this decimal seperator. In most European currencies the decimal seperator is "," but when I try to insert a decimal number into a field (ex. 345,83), I get the following error: "Data truncated for column 'column_name' at row 'row #'". If I use '.' instead of ',' it works fine. Could you please help me with, how to store this format in mysql?
you can store it as a regular decimal field in the database, and format the number european style when you display it
edit: just added an example of how it might be achieved
$european_numbers = array('123.345,78', '123 456,78', ',78');
foreach($european_numbers as $number) {
echo "$number was converted to ".convert_european_to_decimal($number)."\n";
// save in database now
}
function convert_european_to_decimal($number) {
// i am sure there are better was of doing this, but this is nice and simple example
$number = str_replace('.', '', $number); // remove fullstop
$number = str_replace(' ', '', $number); // remove spaces
$number = str_replace(',', '.', $number); // change comma to fullstop
return $number;
}
Use number_format or money_format, it's pretty much what you preffer.
It's worse than you think. The number 1234.56 may be written in Europe as:
1234,56
1 234,56 (space as a group separator)
1.234,56 (dot as a group separator)
In .net the number parser can works according to a given culture, so if you know the format it does the hard work for you. I'm sure you can find a PHP equivalent, it'd save you a lot of trouble.
You could import the currency field into a VARCHAR column and then copy this column into a DECIMAL column while replacing the , by a . in all rows using MySQL string-manipulation-functions.
UPDATE <<table>>
SET <<decimal-currency-col>> = REPLACE(<<varchar-currency-col>>, ',', '.');
Some data types do not have a direct
correlation between SQL Server or
Access and MySQL. One example would be
the CURRENCY data type: MySQL does not
(yet) have a CURRENCY data type, but
creating a column with the definition
DECIMAL(19,4) serves the same purpose.
While MSSQL defaults to Unicode
character types such as nCHAR and
nVARCHAR, MySQL does not so tightly
bind character sets to field types,
instead allowing for one set of
character types which can be bound to
any number of character sets,
including Unicode.
from http://dev.mysql.com/tech-resources/articles/migrating-from-microsoft.html
You could also consider multiplying it by 100 and storing it as INT.
Before inserting the price to the DB:
$price = (int)$price*100;
After receiving price from the DB:
$price = number_format($price, 2, ',', ' ');
Try replacing the "," with "."?
$price = str_replace(",", ".", $price);

Golang ORDER BY issue with MySql

I can't seem to dynamically ORDER BY with db.Select(). I've Googled without any luck...
WORKS
rows, err := db.Query("SELECT * FROM Apps ORDER BY title DESC")
DOES NOT WORK
rows, err := db.Query("SELECT * FROM Apps ORDER BY ? DESC", "title")
I'm not getting any errors, the query simply fails to order.
Placeholders ('?') can only be used to insert dynamic, escaped values for filter parameters (e.g. in the WHERE part), where data values should appear, not for SQL keywords, identifiers etc. You cannot use it to dynamically specify the ORDER BY OR GROUP BY values.
You can still do it though, for example you can use fmt.Sprintf() to assemble the dynamic query text like this:
ordCol := "title"
qtext := fmt.Sprintf("SELECT * FROM Apps ORDER BY %s DESC", ordCol)
rows, err := db.Query(qtext)
Things to keep in mind:
Doing so you will have to manually defend vs SQL injection, e.g. if the value of the column name comes from the user, you cannot accept any value and just insert it directly into the query else the user will be able to do all kinds of bad things. Trivially you should only accept letters of the English alphabet + digits + underscore ('_').
Without attempting to provide a complete, all-extensive checker or escaping function, you can use this simple regexp which only accepts English letters, digits and '_':
valid := regexp.MustCompile("^[A-Za-z0-9_]+$")
if !valid.MatchString(ordCol) {
// invalid column name, do not proceed in order to prevent SQL injection
}
Examples (try it on the Go Playground):
fmt.Println(valid.MatchString("title")) // true
fmt.Println(valid.MatchString("another_col_2")) // true
fmt.Println(valid.MatchString("it's a trap!")) // false
fmt.Println(valid.MatchString("(trap)")) // false
fmt.Println(valid.MatchString("also*trap")) // false

Using strcmp() in my cgi code, for an html webpage, is causing a server error

I am making an html webpage that uses cgi to access a table/database in mysql. I input a .csv file containing info on my class schedule and the html displays it in the usual schedule table.
My problem is that I can't seem to use strcmp in my parsing cgi as it causes a server error. here is an excerpt of my code where I uses strcmp.
void parse2(char *queu)
{
//---------------------------------------------------------------
char *saveptr[1024];
char *subtoken;
char *Subject;
char *Day;
char *Start;
char *End;
char *Room;
char *Teacher;
int check = 1;
//---------------------------------------------------------------
subtoken = strtok_r(queu, ",", saveptr);
check = strcmp(subtoken, "\0");
printf("%d<br>", check);
if(check == 0){
printf("Error!");
} else {
Subject = subtoken;
Day = strtok_r(NULL, ",", saveptr);
Start = strtok_r(NULL, ",", saveptr);
End = strtok_r(NULL, ",", saveptr);
Room = strtok_r(NULL, ",", saveptr);
Teacher = strtok_r(NULL, ",", saveptr);
printf("%s\n<br/>%s\n<br/>%s\n<br/>%s\n<br/>%s\n<br/>%s\n", Subject, Day, Start, End, Room, Teacher);
//inputsql(Subject, Day, Start, End, Room, Teacher);
}
//---------------------------------------------------------------
}
Note that, I have tested this code and it works fine without me calling strcmp().
I am using strcmp() to prevent a line of unwanted characters, generated after the info when retrieved using POST method, from being entered into my database.
As you can see from the above code, I used strtok() to parse the line of info. Since the line of unwanted characters do not contain a comma (which is my delimiter) it should return a NULL value. correct?
Can anyone help me out? I welcome suggestions to use a different way on solving the problem I chose to solve using strcmp().
I think you should be checking subtoken == NULL, not strcmp(subtoken, "\0") == 0.
"\0" is a string containing a NUL byte, then another NUL (the terminator), so the standard library's string functions will just see an empty string. That's different to a NULL pointer (i.e. a pointer with value zero).
From STRTOK(3):
Each call to strtok() returns a pointer to a null-terminated string
containing the next token. This string does not include the
delimiting byte. If no more tokens are found, strtok() returns NULL.

JPA2 Criteria API .as(String.class) casting to char(1) - How do I work around this?

Using the criteria api, I have a query which does something like:
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<SourcePath> pathQuery = cb.createQuery(SourcePath.class);
Root<SourcePath> pathRoot = pathQuery.from(SourcePath.class);
pathQuery.where(cb.equal(cb.literal(0x00010002).as(String.class), (pathRoot.get(SourcePath_.path))));
TypedQuery<SourcePath> query = entityManager.createQuery(pathQuery);
query.getResultList();
The resulting sql query results something like:
select ...snip aliases... from SourcePath where cast(x'00010002', char(1)) = path;
(path would be some nasty old alias, but that's irrelevant).
This query is incorrect. Particularly, the cast: cast(x'00010002', char(1)) is not a cast to as string, as specified by .as(String.class), instead it should be either cast(x'00010002', char), or cast(x'00010002', char(N) where N is a suitably big enough number.
I've isolated the cause of this cast failure to the MySqlDialect provided by org.hibernate. Particularly:
public String getCastTypeName(int code) {
if ( code==Types.INTEGER ) {
return "signed";
}
else if ( code==Types.VARCHAR ) {
return "char";
}
...snip...
}
Which farther down the chain is interpreted as a char, which is registered by the dialog: registerColumnType( Types.CHAR, "char(1)" );.
Finally, to my question. How can I work around this behaviour? Do I report it as a bug to Hibernate? Do I extend the Dialog and correct the returned type from getCastTypeName? Is there an alternative to .as which will appropriately cast? Or, should I use strings everywhere I'm currently using hex-values to avoid touching edge cases of hibernate's implementation?
Thanks
idb
IMHO, you should use a String literal like cb.literal("\u0001\u0002"). Reasons:
It's the same thing as cb.literal(0x00010002).as(String.class), but less verbose.
Does not hit any "edge case"
It's clearer: does "0x00010002" be treated in Big Endian or LE? Which encoding should be used?
You can improve legibility using constants (e.g.: cb.literal(ASCII_SOH + ASCII_STX) where SOH="\u0001" and STX="\u0002").
EDIT: Adding a better description, since I didn't see the "Or, should I use strings everywhere I'm currently using hex-values"