Hi I would like to ask if which of this 2 is better? The use of colName AS Name or Name = colName. How are they differ and which of the 2 is more convenient and faster way or any of the 2 has difference in terms of memory usage or etc.?
Thank you and god speed!
I always use "AS" when I rename columns. I used "=" for query conditions. The time of execution and result produces the same. Maybe use the function on what you are comfortable with.
Related
Lets say I have the following model
class Person(models.Model):
name = models.CharField(max_length=20, primary_key=True)
So I would have objects in the database like
Person.objects.create(name='alex white')
Person.objects.create(name='alex chen')
Person.objects.create(name='tony white')
I could then subsequently query for all users whose first name is alex or last name is white by doing the following
all_alex = Person.objects.filter(name__startswith='alex')
all_white = Person.objects.filter(name__endswith='white')
I do not know how Django implements this under the hood, but I am going to guess it is with a SQL LIKE 'alex%' or LIKE '%white'
However, since according to MySQL index documentation, since the primary key index can only be used (e.g. as opposed to a full table scan) if % appears on the end of the LIKE query.
Does that mean that, as the database grows, startswith will be viable - whereas endswith will not be since it will resort to full table scans?
Am I correct or did I go wrong somewhere? Keep in mind these are not facts but just my deductions that I made from general assumptions - hence why I am asking for confirmation.
Assuming you want AND -- that is only Alex White and not Alex Chen or Tony White, ...
Even better (assuming there is an index starting with name) is
SELECT ...
WHERE name LIKE 'Alex%White'
If Django can't generate that, then it is getting in the way of efficient use of MySQL.
This construct will scan all the names starting with alex, further filtering on the rest of the expression.
If you do want OR (and 3 names), then you are stuck with
SELECT ...
WHERE ( name LIKE 'Alex%'
OR name LIKE '%White' )
And there is no choice but to scan all the names.
In some situations, perhaps this one, FULLTEXT would be better:
FULLTEXT(name) -- This index is needed for the following:
SELECT ...
WHERE MATCH(name) AGAINST('Alex White' IN BOOLEAN MODE) -- for OR
SELECT ...
WHERE MATCH(name) AGAINST('+Alex +White' IN BOOLEAN MODE) -- for AND
(Again, I don't know the Django capabilities.)
Yes, your understanding is correct.
select *
from foo
where bar like 'text1%' and bar like '%text2'
is not necessarily optimal. This could be an improvement:
select *
from (select *
from foo
where foo.bar like 'text1%') t
where t.bar like '%text2'
You need to make measurements to check whether this is better. If it is, the cause is thatin the inner query you use an index, while in the outer query you not use an index, but the set is prefiltered by the first query, therefore you have a much smaller set to query.
I am not at all a Django expert, so my answer might be wrong, but I believe chaining your filter would be helpful if filter actually executes the query. If that is the case, then you can use the optimization described above. If filter just prepares a query and chaining filters will result in a single query different from the one above, then I recommend using hand-written MySQL. However, if you do not have performance issues yet, then it is premature to optimize it, since you cannot really test the amount of performance you gained.
I need to set values to a "Yes or No" column name STATUS. And I'm thinking about 2 methods.
method 1 (use letter): set value Y/N then find all rows that have value Y in field STATUS by a query like:
SELECT * FROM post WHERE status="Y"
method 2 (use number): set value 1/0 then find all rows that have value 1 in field STATUS by a query like:
SELECT * FROM post WHERE status=1
Should I use method 1 or method 2? Which one is faster? Which one is better?
The two are essentially equivalent, so this becomes a question of which is better for your application.
If you are concerned about space, then the smallest space for one character is char(1), using 8 bits. With a number, you can use bit or set types for pack multiple flags. But, this only makes a difference if you have lots of flags.
The store-it-as-a-number approach has a slight advantage, where you can count the "Yes" values by doing:
select sum(status)
(Of course, in MySQL, this is only a marginal improvement on sum(status = 'Y').
The store-it-as-a-letter approach has a slight advantage if you decide to include "Maybe" or other values at some point in the future.
Finally, any difference in performance in different ways of representing these values is going to be very, very minimal. You would need a table with millions and millions of rows to start to notice a problem. So, use the mechanism that works best for your application and way of representing the value.
Second one is definitely faster primarily because whenever you involve something within quotes , it is meaningless to SQL. It would be better to use types that are non string in order to get better performance. I would suggest using METHOD 2.
Fastest way would be ;
SELECT * FROM post WHERE `status` = FIND_IN_SET(`status`,'y');
I think you should create column with ENUM('n','y'). Mysql stores this type in optimal way. It also will help you to store only allowed values in the field.
You can also make it more human friendly ENUM('no','yes') without affect to performance. Because strings 'no' and 'yes' are stored only once per ENUM definition. Mysql stores only index of the value per row.
I think the method 1 is better if you are concerned with the storage prospective .
As storing an integer i.e 1/2 takes 4 bytes of memory where as a character takes only 1 byte of memory. So its better to use method 1.
This may increase some performance .
I have done the SQL beginners and advanced courses at W3Schools successfully, and cannot find any other free advanced onine SQL course. I am having a problem with the SQL syntax in the accepted answer in this SO thread, covering the calculation of the median value of a column. My questions are:
After 'from' come two variables. Does that mean that data are selected from two tables, and if so, how would the formula be if I just require the median value of one column of one table?
The OP/TS named this columns 'id' and 'val'. Why then is 'x.val' selected?
The SELECT x.val from data x, data y means to cross-join table data with itself. I'm pretty sure it eventually just finds the median for one column, and this is a trick to help calculate that median.
To understand this better (and note that I don't totally understand it), try this:
Set up a table with some sample data - say 5-8 rows to begin with
Promote the HAVING values to the SELECT list
Get rid of the HAVING clause
So your query will look something like this:
SELECT x.val, SUM(SIGN(1-SIGN(y.val-x.val))), (COUNT(*)+1)/2
FROM data x, data y
GROUP BY x.val
Then take a look at the results and you'll be able to get more insight into the logic. Also see if you can follow the calculation when you track it row by row.
Finally, note that the query isn't so much advanced as it is specialized. I mean, it is advanced and all, but it's the math gymnastics rather than the query semantics that are probably giving you trouble. Don't sweat it if you don't understand this right off the bat :)
As for why val is selected - that's the column the OP is trying to calculate the median for. The id is probably there because it's generally a good idea to have a PK on every row. It's not needed for the calculation so it's not included in the query.
I have a scenario where I need to check for 10,000 different specific names against a table with about 60,000 records of names. Assuming caching is not relevant, generally speaking, for performance purposes, is it better to:
(1) Break up into mini-queries so that there are maybe 200 different names per query?
or
(2) Write one mongocious sql statement with 10,000 "OR" clauses?
You missed out number 3: Do it another way entirely:
I would write the list to a separate table/temp table or something, then filter using a join/exists or whatever.
One first observation is that usually RDBMSs have a limit of the size of the query string which you might exceed with so many ORs.
So a solution would be to write a stored procedure and do it in a loop.
Ignoring this, given that in case (1) the data would be accessed more times than in case (2), the latter one is preferable.
Or #4 - Use an IN() query in batches. About 1000 usually works pretty well:
SELECT * FROM table WHERE name IN ('str1', 'str2', 'str3', ...)
It's not perfect, but there's no temporary table involved, and MySQL is pretty good about optimizing IN().
I am curious about the disadvantage of quoting integers in MYSQL queries
For example
SELECT col1,col2,col3 FROM table WHERE col1='3';
VS
SELECT col1,col2,col3 FROM table WHERE col1= 3;
If there is a performance cost, what is the size of it and why does it occur? Are there any other disavantages other that performance?
Thanks
Andrew
Edit: The reason for this question
1. Because I want to learn the difference because I am curious
2. I am experimenting with a way of passing composite keys from my database around in my php code as psudo-Id-keys(PIK). These PIK's are the used to target the record.
For example, given a primary key (AreaCode,Category,RecordDtm)
My PIK in the url would look like this:
index.php?action=hello&Id=20001,trvl,2010:10:10 17:10:45
And I would select this record like this:
$Id = $_POST['Id'];//equals 20001,trvl,2010:10:10 17:10:45
$sql = "SELECT AreaCode,Category,RecordDtm,OtherColumns.... FROM table WHERE (AreaCode,Category,RecordDtm) = ({$Id});
$mysqli->query($sql):
......and so on.
At this point the query won't work because of the datetime(which must be quoted) and it is open to sql injection because I haven't escaped those values. Given the fact that I won't always know how my PIK's are constructed I would write a function splits the Id PIK at the commas, cleans each part with real_escape_string and puts It back together with the values quoted. For Example:
$Id = "'20001','trvl','2010:10:10 17:10:45'"
Of course, in this function that is breaking apart and cleaning the Id I could check if the value is a number or not. If it is a number, don't quote it. If it is anything but a string then quote it.
The performance cost is that whenever mysql needs to do a type conversion from whatever you give it to datatype of the column. So with your query
SELECT col1,col2,col3 FROM table WHERE col1='3';
If col1 is not a string type, MySQL needs to convert '3' to that type. This type of query isn't really a big deal, as the performance overhead of that conversion is negligible.
However, when you try to do the same thing when, say, joining 2 table that have several million rows each. If the columns in the ON clause are not the same datatype, then MySQL will have to convert several million rows every single time you run your query, and that is where the performance overhead comes in.
Strings also have a different sort order from numbers.
Compare:
SELECT 312 < 41
(yields 0, because 312 numerically comes after 41)
to:
SELECT '312' < '41'
(yields 1, because '312' lexicographically comes before '41')
Depending on the way your query is built using quotes might give wrong results or none at all.
Numbers should be used as such, so never use quotes unless you have a special reason to do so.
According to me, I think there is no performance/size cost in the case you have mentioned. Even if there is, then it is very much negligible and wont affect your application as such.
It gives the wrong impression about the data type for the column. As an outsider, I assume the column in question is CHAR/VARCHAR & choose operations accordingly.
Otherwise MySQL, like most other databases, will implicitly convert the value to whatever the column data type is. There's no performance issue with this that I'm aware of but there's a risk that supplying a value that requires explicit conversion (using CAST or CONVERT) will trigger an error.