Why are MySQL queries almost always written in Capital - mysql

I've seen most coders use Capital letters while writing MySQL queries, something like
"SELECT * FROM `table` WHERE `id` = 1 ORDER BY `id` DESC"
I've tried writing the queries in small caps and it still works.
So is there any particular reason for not using small caps or is it just a matter of choice?

It's just a matter of readability. Keeping keywords in upper case and table/column names lower case means it's easier to separate the two when scan reading the statement .: better readability.
Most SQL implementations are case-insensitive, so you could write your statement in late 90s LeEt CoDeR StYLe, if you felt so inclined, and it would still work.

The case make no difference of the SQL engine. It is just a convention followed, just like coding conventions use in any of the programming languages

You have to have a system - there are already a few questions on the site that deal with conventions and approaches. Try:
SQL formatting standards

This is for readability sake.
Sometimes we have to write queries like that because our project (official) recommends.
But everything for readability and uniformity in a project.

Related

Why do people on SO prefer CASE WHEN to other alternatives?

I have noticed that on SO a lot of people seem to prefer CASE ... WHEN to other alternatives.
For example all of the answers in this question use CASE ... WHEN whereas I would have used a simple IF. IF is quite a bit less to type and is prevalent in all programming languages so it seems kind of weird to me that not a single answer uses it. (I would also expect that IF is a bit faster though I did not measure it).
Even more interesting are the answers to this question. 2 out of 3 answers (among them the accepted answer) suggest using CASE ... WHEN when from my point of view COALESCE is the better solution (after all COALESCE was created for exactly the problem the OP has). (Also, in this case I am almost certain that COALESCE would be faster.)
So, my question is, is there any benefit to CASE ... WHEN (that offsets the additional typing) that I am missing or is it a case of "To a man with a hammer, everything looks like a nail"?
One reason, a good one actually, is that a CASE WHEN expression is ANSI compliant while IF is not. Were someone to face porting a MySQL query to another database the IF calls in MySQL would probably all have to be rewritten.
MySQL, like most databases, extended ANSI by introducing the IF() function. Perhaps IF, or something similar to it, will become part of the standard some day.
CASE WHEN is in the SQL standard. IF is not. As SQL databases do have vastly different dialects, it is not the worst idea to stick to code that will work on most databases for the following reasons:
If you build the habit of using code that is specific to one database, you will have troubles when working on another.
If you use code that is specific to one database, you cannot test your query with other databases by simply copy pasting them. You can also not migrate your application to other databases without changing your SQL queries.
CASE WHEN is the ANSI standard expression for conditional expressions. IF() is a function specific to MySQL.
In general, I prefer ANSI standard functionality when available -- although there are occasional exceptions.
Specifically about IF() as a function. It is easily confused with IF as a statement in MySQL. Using it as a function seems like unnecessary confusion (admittedly, there are other databases where CASE can be confused with a CASE statement in the scripting language, but that is not an issue in MySQL).
In addition, IF() is pretty close to control flow, which makes it different from most other functions anyway.

Should SQL functions be uppercase or lowercase?

I am new in SQL and I was wondering what is the right way to write the functions. I know the norm for statements like SELECT is uppercase, but what is the norm for functions? I've seen people write them with lowercase and others in uppercase.
Thanks for the help.
There's no norm on that, there are standards, but those can change from company to company.
SQL code is not case sensitive.
Usually you should write SQL code (and SQL reserved code) in UPPERCASE and fields and other things in lowercase. But it is not necessary.
It depends on the function a bit, but there is no requirement. There is really no norm but some vendors have specific standards / grammar, which in their eyes aids with readability (/useability) of the code.
Usually 'built-in' functions (non vendor specific) are shown in uppercase.
/* On MySQL */
SELECT CURRENT_TIMESTAMP;
-> '2001-12-15 23:50:26'
Vendor specific functions are usually shown in lowercase.
A good read, on functions in general, can be found here.
SQL keywords are routinely uppercase, but there is no, one "correct" way. Lowercase works just as well. (N.B. Many think that using uppercase keywords in SQL improves readability, although not to say that using lowercase is difficult to read.) For the most part, however, no one will be averted to something like:
SELECT * FROM `users` WHERE id="3"
Although, if you prefer, this will work as well:
select * from `users` where id='3'
Here's a list of them. Note that they are written in uppercase, but it is not required:
http://developer.mimer.com/validator/sql-reserved-words.tml
Here's another good resource that I keep in my "interesting articles to read over periodically." It elaborates a bit on some somewhat interesting instances when case should be taken into consideration:
http://dev.mysql.com/doc/refman/5.0/en/identifier-case-sensitivity.html

What is better for performances: IN or OR

I need to do a request that check if a column value is either 2117 or 0.
Currently, I do this with a OR
select [...] AND (account_id = 2117 OR account_id = 0) AND [...]
Since I'm facing performance issues, I was wandering whether it wouldn't be better to do
select [...] AND account_id IN (0, 2117) AND [...]
Explain command gives similar results in both cases. So, maybe it's more about optimizing the parsing phase than anything else. Or maybe those two ways are totally equivalent and optimized away by mySQL and I should just not care.
On the mySQL website, they talk about the OR optimization like that:
Use x = ANY (table containing (1,2)) rather than x=1 OR x=2.
But I didn't get the syntax right or even understand why.
What do you think?
There is no contest here... IN is always much, much better.
The reason is that databases won't use an index with an OR, but will use an index with IN.
Changing OR to IN is usually the first optimization I make to queries.
Why not try and run a heavy benchmark? If theres a noticable difference then opt for the better option, otherwise just use "OR" for readability. Maybe the source code would yield some useful answers, but that might be outside the scope of efficiency.
In it typically easier to read and handle by the engine... However, that said based on some limited number. You don't want to an IN or OR with 20+ IDs (typically). When you get into a situation where there are a bunch of numbers, create a table (temp table even) an insert the values you want to join based on, then use that as the basis of a SQL-join for your results. Offers better flexibility when dealing with larger selections of data.
I'd say there might be a difference with lots of elements, but not with two. I'd be more inclined to tune indexes or to look at the table architecture, to find worthwhile performance improvements.

Database performance benchmark

Any good articles out there comparing Oracle vs SQL Server vs MySql in terms of performance?
I'd like to know things like:
INSERT performance
SELECT performance
Scalability under heavy load
Based on some real examples in order to gain a better understanding about the different RDBMS.
The question is really too broad to be answered because it all depends on what you want to do as there is no general "X is better than Y" benchmark without qualifying "at doing Z" or otherwise giving it some kind of context.
The short answer is: it really doesn't matter. Any of those will be fast enough for your needs. I can say that with 99% certainty. Even MySQL can scale to billions of rows.
That being said, they do vary. As just one example, I wrote a post about a very narrow piece of functionality: join and aggregation performance. See Oracle vs MySQL vs SQL Server: Aggregation vs Joins.
Yes, such benchmarks do exist, but they cannot be published, as Oracle's licensing prohibits publishing such things.
At least, that is the case to the best of my knowledge. I've seen a few published which do not name Oracle specifically, but instead say something like "a leading RDBMS" when they are clearly talking about Oracle, but I don't know whether that gets around it.
On the other hand, Oracle now own MySQL, so perhaps they won't care so much, or perhaps they will. Who knows.

Best practices for seaching for alternate forms of a word with Lucene

I have a site which is searchable using Lucene. I've noticed from logs that users sometimes don't find what they're looking for because they enter a singular term, but only the plural version of that term is used on the site. I would like the search to find uses of other forms of a word as well. This is a problem that I'm sure has been solved many times over, so what are the best practices for this?
Please note: this site only has English content.
Some approaches I've thought of:
Look up the word in some kind of thesaurus file to determine alternate forms of a given word.
Some examples:
Searches for "car", also add "cars" to the query.
Searches for "carry", also add "carries" and "carried" to the query.
Searches for "small", also add "smaller" and "smallest" to the query.
Searches for "can", also add "can't", "cannot", "cans", and "canned" to the query.
And it should work in reverse (i.e. search for "carries" should add "carry" and "carried").
Drawbacks:
Doesn't work for many new technical words unless the dictionary/thesaurus is updated frequently.
I'm not sure about the performance of searching the thesaurus file.
Generate the alternate forms algorithmically, based on some heuristics.
Some examples:
If the word ends in "s" or "es" or "ed" or "er" or "est", drop the suffix
If the word ends in "ies" or "ied" or "ier" or "iest", convert to "y"
If the word ends in "y", convert to "ies", "ied", "ier", and "iest"
Try adding "s", "es", "er" and "est" to the word.
Drawbacks:
Generates lots of non-words for most inputs.
Feels like a hack.
Looks like something you'd find on TheDailyWTF.com. :)
Something much more sophisticated?
I'm thinking of doing some kind of combination of the first two approaches, but I'm not sure where to find a thesaurus file (or what it's called, as "thesaurus" isn't quite right, but neither is "dictionary").
Consider including the PorterStemFilter in your analysis pipeline. Be sure to perform the same analysis on queries that is used when building the index.
I've also used the Lancaster stemming algorithm with good results. Using the PorterStemFilter as a guide, it is easy to integrate with Lucene.
Word stemming works OK for English, however for languages where word stemming is nearly impossible (like mine) option #1 is viable. I know of at least one such implementation for my language (Icelandic) for Lucene that seems to work very well.
Some of those look like pretty neat ideas. Personally, I would just add some tags to the query (query transformation) to make it fuzzy, or you can use the builtin FuzzyQuery, which uses Levenshtein edit distances, which would help for mispellings.
Using fuzzy search 'query tags', Levenshtein is also used. Consider a search for 'car'. If you change the query to 'car~', it will find 'car' and 'cars' and so on. There are other transformations to the query that should handle almost everything you need.
If you're working in a specialised field (I did this with horticulture) or with a language that does't play nicely with normal stemming methods you could use the query logging to create a manual stemming table.
Just create a word -> stem mapping for all the mismatches you can think of / people are searching for, then when indexing or searching replace any word that occurs in the table with the appropriate stem. Thanks to query caching this is a pretty cheap solution.
Stemming is a pretty standard way to address this issue. I've found that the Porter stemmer is way to aggressive for standard keyword search. It ends up conflating words together that have different meanings. Try the KStemmer algorithm.