Which one is the best approach for validating emails

Which one is the best approach for validating emails - ecmascript-6

I just have a quick question; which one is the best approach between joi and regular expressions for validation?

Very simple validation and then send an email with a validate email link. Trying to validate email address beyond # followed by . you will get it wrong unless you read the whole email spec and have a ridiculously complicated regex.

The answers here (including this one) will be very bias as your question is very subjective.
I will try to give you to be as objective as possible.
First, let us review what JOI and Regular Expressions are:
JOI is a javascript only library that does schema-validation of javascript objects. It uses (according to its github page example) regular expressions.
Regular expression is common across language (with some variations may be) which can be used (among its usecases) to validate strings.
If you are writing a web UI and wants to validate the data filled in by the user, JOI is purpose built (because it can take care of the whole objects).
If you just want to check e-mails are in correct form, JOI will be an overkill. In this specific instance, you can live with just a regular expression.
Subjective part: There is a certain school (group) of software engineering community who does not like regular expression due to readability issues. I am one of them. If you want just e-mail validation, I'd recommend you write a validator yourselve (to check if it includes '#' sign ... and a valid domain name follows '#' etc.).

Related

Store TinyMCE form data in a MySQL database and return as is

I am currently in a CMS project and we wish to use TinyMCE as our WYSIWYG editor. In here, we have allowed users to customize their content (such as some sections are bold, different indentations etc. ) All I want to know is, how do we store those form data in a MySQL database, and how do we return data from those styles? Are there problems in database, as this contains html tags? I think you may get what im asking. This is our first time using any of WYSIWYG editor and Thank you for any help..

To MySQL, the HTML generated by TinyMCE is just a String, you can store it as it is (after your security filter has validated it)
But, your application must handle the Strong very carefully because careless handling will result in Cross Site Scripting (XSS).
It's more difficult than preventing XSS on non-html fields because output sanitizing won't work.
The most robust way to prevent XSS on HTML field is that you will need filter the HTML text in the server , whitelist or blacklist.
Blacklist is easier to implement, but may miss some patterns we didn't know. Whitelist is more robust but troublesome. you will define the tags and attributes you allowed, HTML contains invalid tags or attributes can either be filtered or blocked. You can achieve whitelist by Jsoup ( for better performance, I will suggest you to customize Jsoup)
You can not trust tinymce to help you, because a hacker can easily bypass the client side validation.

How to turn off phone carrier HTML optimizations?

I made an app which provides a schedule for the pupils at my school. It gets its data from the school's online schedule service. Due to the lack for a real API, I reverse-engineerd the website: Now, the app parses it with string operations basically.
And here's the problem: The string searches do not match on certain mobile carriers' networks because they're stripping away the spaces and other foo. Is there an universal way to turn that off?

No, this is up to the carrier and even if there was a way to disable it, it would be non-standard and not worth addressing.
Additionally, you should not use string operations but a real HTML parser, like JSoup is for Java (there is a .NET port too, NSoup). If you look at the examples, it is relatively easy to use and will protect your application from space normalizations and any other change in the markup irrelevant to your application.
For data stored in inline JavaScript, you could first extract the right node from the document and then use a regex to trim the relevant parts. Or you could also use a regex on the HTML document as a whole, but remember that you can't really parse HTML using regexes.
Adopting another strategy, request pages over HTTPs rather than HTTP (if the server supports TLS/SSL) so that they can't be manipulated by the carrier.

DRY user-input validation (clientside, serverside) using JSON-schema

As part of a extensive test-case, I'm building an ajax-based CMS-like application which provides CRUD-functionality on various documenttypes, e.g: articles, tags, etc.
on the server and client-side I'm considering to use JSON-schema ( http://json-schema.org/ ) as a way to do user input validation in a DRY way (i.e: 1 validation-schema, to be used both on the server and client-side, no duplicate-code and all that) . This seems great, because:
JSON-schema is both implemented in JS and Java, so one schema could in theory handle client-side and server-side validation
all CUD-operations requests and responses are JSON (through ajax)
However besides the usual validations on user-input I would like to have some additional checks on the server (e.g: like checking if the name of a Tag a user wants to create already exists)
Ideally, I want these type of checks to be included in my general server-side validation code (which as said would be based on JSON-schema). However, I'm not entirely convinced that this is the right approach, because these additional checks are not based on the provided JSON-data alone, but need additional data to validate (e.g: the names of existing tags in the system to check if a tag-name already exists).
So, would it be a good idea (design / architectual wise) to incorporate additional checks like the one described above in the json-schema based validation-framework on the server-side? Would this be an elegant solution? Or would you keep them seperate altogether? If not, why not and what other alternative approached would you suggest to stay DRY concerning client and server-side validation?
What do you think?
Some additional context/ goals of the text-case below for some background info.
Thanks,
Geert-Jan
Some context / goals:
ajax-based CMS using REST approach
CUD-requests are performed through ajax using a rest approach (i.e: mapping on POST, PUT, DELETE respectively). Requests and responses are all done through JSON.
CMS without forms. Instead use in-place editing (e.g using Aloha-editor: http://www.aloha-editor.org/
staying DRY.
templating: done through Mustache templating on client and server-side. Intial rendering and incremental rendering through ajax are done based on 1 and the same template. I wanted to go for something different than Mustache (bc. of it's lack of expressive power), but it works for this prototype at least. (See previous question for alternatives, on which I'm still looking for an answer: Client-side templating language with java compiler as well (DRY templating) )
DRY input-validation: as described above
Validation flow ( in case of failure):
user creates/updates/deletes item.
a validation-failure on the client would instantly give feedback to the user as what to repair. (The Javascript JSON-schema-validator would ideally return JSON which is formatted to the user)
when client-side validation succeeds, the CUD-operation is performed using ajax.
if server-side validation fails, a status-code 400 (Bad request) is returned, with a Json-object containing the validation-failure(s) which is picked up by jquery's error-callback
$.ajax({
....
error: function(xhr, status, error) {
var validationJSON = JSON.parse(xhr.responseText);
//handle server-side validation failure
},
....
});
JSON-object containing server-side validation failures are presented to the user (analogously to client-side)

It is very possible and one of the most gratifying things to have a single definition of validations in one place (per model) on the server that can then generate appropriate JS for client-side and AJAX-based validations.
Yii framework for PHP has a fantastic architecture for accomplishing this in an elegant way that stores all the validation rules together in the model (divvied up into appropriate "scenarios" as needed). From there, it's a matter of flipping a few switches to make a particular form client-side or AJAX-validateable. I believe Yii's interfaces for this were based on Rails.
Anyway I would highly recommend checking out the following key points from Yii's design; even if you don't know PHP, you can use this for inspiration:
CModel::rules() => The DRY source for model validation rules
CActiveForm => Used to generate form elements based on model attributes
See for example CActiveForm::textField()
CValidator => Base class for validators which provisions for the ability to client-validate
I think it's wise to pursue DRY validation rule declaration and in my experience it is not at all unrealistic to achieve that 100% and still have rich forms—and rich validation rules. (And boy will you love life when you don't have to manage all that client-validate JS...)
Hope this helps.

How do I learn to verify that user input is sane?

I'm not sure of the terminology here, so let me specify that when I say "verify" user input, I mean watch out for users claiming 30 Feb 2021 as their birthdays, rather than guarding against injection attacks.
Are there any guides to doing this correctly, or lists of common ways people do it wrong? Strategies for ensuring correct input even before it's entered (e.g., picking out of a calendar instead of typing into a text field)?
Note that I am not interested in language-specific answers (e.g., ASP.NET Validation Controls) but rather general strategies and principles.

The freer you make the input field, the more you have to check. Some languages may make it easy for you to verify that a text field is a valid date; others may not.
Then again, some users will resent clicking on a calendar control or three drop-downs to enter their birthdate. They may prefer to just type it in. That's a trade-off.

The term you are looking for is input validation.
As you point out if you use a control where it is impossible to enter invalid data you can help the client, but you still need to implement proper validation on the server.
I mean watch out for users claiming 30 Feb 2021 as their birthdays, rather than guarding against injection attacks
Why not do both? Is there a specific reason why you want to leave yourself open to injection attacks?
Assume that the user sends a string to the server, either one they entered themselves or else one that was sent by a control you placed on the page. The first part is to find a library function for parsing the string into typed data. In your example you could use DateTime.TryParse to parse a string to a date. This will fail for your given example as the given date is invalid. If you cannot find a library function for what you are trying to parse you can try to write a parser yourself. For simple validations you may be able to express it as a regular expression. For more complicated inputs you may need to write some code that performs the validation, perhaps even using a parser library to help you if the input language is particularly complicated.
The second part is to implement business validation rules specific for your needs. For example you know that a birth date must be in the past, but not too far in the past. This will require some judgement as it's not impossible that someone using your site could be 100 years old, but it's highly unlikely that they are 200 years old since no-one is believed to be this old.

i would recommend using a design pattern called "strategy". this is one of the patterns created by "the gang of four", or "gof" for short. there are some copies and variants of this pattern that you may have heard of, e.g. "inversion of control" and "dependency injection".
anyways, for an object oriented language, what you do is that you create a class called "validator", which validates data in a method called "validate". you'll have to make validate accept some relevant form of input, or overload it to have different methods for different sorts of data. or if you have access to some form of generics, you can use that.
next up, the constructor of this class should take a "validatorstrategy" object as argument. and then the actual validation will be passed through the strategy object.
to take this even further, you could then create some sort of input form generator system, where you specify input fields with your own type names. these will then generate different input fields depending on your front end language (html/android xml/java swing), and they will also affect the way in which the input is validated.
hmm.. i wonder how to solve the issue with two password input fields that need to have the exact same content to validate. how would this look in the form generating system? maybe there would be one input type named "password" which would generate one input field which doesn't show the input and has no validation, and another type named "passwordsetter" which would generate two input fields which doesn't show the input, and has the validation strategy of comparing the data from th two fields. creating that validation strategy could be pretty tricky though D:

How can I extract addresses and phone number from HTML?

Is there a library that specializes in parsing such data?

You could use something like Google Maps. Geocode the address and, if successful, Google's API will return an XML representation of the address with all of the elements separated (and corrected or completed).
EDIT:
I'm being voted down and not sure why. Parsing addresses can be a little difficult. Here's an example of using Google to do this:
http://blog.nerdburn.com/entries/code/how-to-parse-google-maps-returned-address-data-a-simple-jquery-plugin
I'm not saying this is the only way or necessarily the best way. Just a way to parse addresses on a web site.

There are 2 parts to this: extract the complete address from the page, and parse that address into something you can use (store the various parts in a DB for example).
For the first part you will need a heuristic, most likely country-dependant: for US addresses [A-Z][A-Z],?\s*\d\d\d\d\d should give you the end of an address, provided the 2 letters turn out to be a state. Finding the beginning of the string is left as an exercise.
The second part can be done either through a call to Google maps, or as usual in Perl, using a CPAN module: Lingua::EN::AddressParse (test it on your data to see if it works well enough for you).
In any case this is a difficult task, and you will most likely never get it 100% right, so plan for manually checking the addresses before using them.

You don't need regular expressions (yet) or a general parser like pyparsing (at all). Look at something like Beautiful Soup, which will parse even bad HTML into something like a tree of tags. From there, you can look at the source of the page, and find out what tags to drill down through to get to the data. Then, from Beautiful Soup's tree, you can search for these nodes using XPath (in recent versions), and directly loop over the tags you're interested in, getting to the actual data easily. From there, you can parse the data using a quick regex or something. This will be more flexible and more future proof, and also possibly less head-exploding, than just trying to do it in pure regular expressions.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008