I am running this MySQL select query:
select * from ABC where column_value=1;
I expect to get output like this:
ID Name
1 AAA
2 BBB
3 CCC
But instead I am getting this:
ID Name
2 BBB
1 AAA
3 CCC
Can anyone give me an idea why MySQL is behaving like this?
Databases tend to use the fastest way to read data from tables. This means it may return data in any order if it finds it faster, unless you use an ORDER BY clause.
select * from ABC where column_value=1;
The query doesn't specify any sorting of the returned rows.
SQL is a language that handle sets of tuples and a set is, by definition, an unordered collection of items.
The fact that, under some circumstances, one database engine or another returns the rows in a certain order (sorted by the value of the PK f.e.) is an implementation detail. It is not required by the language and it can change any time.
Even more, when the query doesn't specify an order for the returned rows, the database engine uses whatever method it finds more appropriate to get them fast. The order may depend on external factors and it may change over time. For example, if you remove from the table the rows returned by the query then insert them again but in a different order, a subsequent run of the same query may (and it most probably does) return the rows in a different order than before.
As an insight (that is neither exact, nor reliable), for a query that doesn't contain an ORDER BY clause over a small table, the database returns the rows in the order it finds them in the table data because it doesn't read the index.
For small tables the engine skips reading the index when it is not needed and goes directly to the table data. This way it spares a disk access that doesn't provide any additional value to the processing.
By default id ie. default primary key will be used for "order by", might be possible that you have deleted some rows?
select * from ABC order by ID where column_value=1;
You can use order by feature to obtain the desired result.
Related
I have a mysql innodb table where I'm performing a lot of selects using different columns. I thought that adding an index on each of those fields could help performance, but after reading a bit on indexes I'm not sure if adding an index on a column you select on always helps.
I have far more selects than inserts/updates happening in my case.
My table 'students' looks like:
id | student_name | nickname | team | time_joined_school | honor_roll
and I have the following queries:
# The team column is varchar(32), and only has about 20 different values.
# The honor_roll field is a smallint and is only either 0 or 1.
1. select from students where team = '?' and honor_roll = ?;
# The student_name field is varchar(32).
2. select from students where student_name = '?';
# The nickname field is varchar(64).
3. select from students where nickname like '%?%';
all the results are ordered by time_joined_school, which is a bigint(20).
So I was just going to add an index on each of the columns, does that make sense in this scenario?
Thanks
Indexes help the database more efficiently find the data you're looking for. Which is to say you don't need an index simply because you're selecting a given column, but instead you (generally) need an index for columns you're selecting based on - i.e. using a WHERE clause (even if you don't end up including the searched column in your result).
Broadly, this means you should have indexes on columns that segregate your data in logical ways, and not on extraneous, simply informative columns. Before looking at your specific queries, all of these columns seem like reasonable candidates for indexing, since you could reasonably construct queries around these columns. Examples of columns that would make less sense would be things phone_number, address, or student_notes - you could index such columns, but generally you don't need or want to.
Specifically based on your queries, you'll want student_name, team, and honor_roll to be indexed, since you're defining WHERE conditions based on the values of these columns. You'll also benefit from indexing time_joined_school if, as you suggest, you're ORDER BYing your queries based on that column. Your LIKE query is not actually easy for most RDBs to handle, and indexing nickname won't help. Check out How to speed up SELECT .. LIKE queries in MySQL on multiple columns? for more.
Note also that the ratio of SELECT to INSERT is not terribly relevant for deciding whether to use an index or not. Even if you only populate the table once, and it's read-only from that point on, SELECTs will run faster if you index the correct columns.
Yes indexes help on accerate your querys.
In your case you should have index on:
1) Team and honor_roll from query 1 (only 1 index with 2 fields)
2) student_name
3) time_joined_school from order
For the query 3 you can't use indexes because of the like statement. Hope this helps.
I am trying to include in a MYSQL SELECT query a limitation.
My database is structured in a way, that if a record is found in column one then only 5000 max records with the same name can be found after that one.
Example:
mark
..mark repeated 5000 times
john
anna
..other millions of names
So in this table it would be more efficent to find the first Mark, and continue to search maximum 5000 rows down from that one.
Is it possible to do something like this?
Just make a btree index on the name column:
CREATE INDEX name ON your_table(name) USING BTREE
and mysql will silently do exactly what you want each time it looks for a name.
Try with:
SELECT name
FROM table
ORDER BY (name = 'mark') DESC
LIMIT 5000
Basicly you sort mark 1st then the rest follow up and gets limited.
Its actually quite difficult to understand your desired output .... but i think this might be heading in the right direction ?
(SELECT name
FROM table
WHERE name = 'mark'
LIMIT 5000)
UNION
(SELECT name
FROM table
WHERE name != 'mark'
ORDER BY name)
This will first get upto 5000 records with the first name as mark then get the remainder - you can add a limit to the second query if required ... using UNION
For performance you should ensure that the columns used by ORDER BY and WHERE are indexed accordingly
If you make sure that the column is properly indexed, MySQL will take care off optimisation for you.
Edit:
Thinking about it, I figured that this answer is only useful if I specify how to do that. user nobody beat me to the punch: CREATE INDEX name ON your_table(name) USING BTREE
This is exactly what database indexes are designed to do; this is what they are for. MySQL will use the index itself to optimise the search.
In my database I have some records where I am sorting by a column that contains identical values:
| col1 | timestamp |
| row1 | 2011-07-01 00:00:00 |
| row2 | 2011-07-01 00:00:00 |
| row3 | 2011-07-01 00:00:00 |
SELECT ... ORDER BY timestamp
It looks like the result is in random order.
Is the random order consistent? I have these data in two mysql servers can I expect the same result?
I'd advise against making that assumption. In standard SQL, anything not required by an explicit ORDER BY clause is implementation dependent.
I can't speak for MySQL, but on e.g. SQL Server, the output order for rows that are "equal" so far as the ORDER BY is concerned may vary every time the query is run - and could be influenced by practically anything (e.g. patch/service pack level of the server, workload, which pages are currently in the buffer pool, etc).
So if you need a specific order, the best thing you can do (both to guarantee it, and to document your query for future maintainers) is explicitly request the ordering you want.
Lot's of answers already, but the bottom line answer is NO.
If you want rows returned in a particular sequence, consistently, then specify that in an ORDER BY. Without that, there absolutely NO GUARANTEE what order rows will be returned in.
I think what you may be missing is that there can be multiple expressions listed in the ORDER BY clause. And you can include expressions that are not in the SELECT list.
In your case, for example, you could use ORDER BY timestamp, id.
(Or some other columns or expressions.)
That will order the rows first on timestamp, and then any rows that have the same value for timestamp will be ordered by id, or whatever the next expression in this list is.
The answer is: No, the order won't be consistent. I faced the same issue and solved it by adding another column to the order section. Be sure that this column is unique for each record like 'ID' or whatever it is.
In this case, you must add the 'ID' field to your table which is unique for each record. You can assign it 'AI' (auto increment) so that you are not going to deal with the maintenance.
After adding the 'ID' column, update the last part of your query like:
SELECT mt.*
FROM my_table mt
ORDER BY mt.timestamp ASC, mt.id DESC
In ORDER BY condition if the rows are same values or if you want to arrange the data by selecting ORDER BY statement. CASE : You want to ORDER BY the values of column are frequency of words. And two words in the table may have the same frequency value in the frequency occurrence column.. So in the frequency column you will have two same frequencies of two different words. So, in "select * from database_name ORDER BY frequency" you may find any of one the two words having the same frequency showing up just before its latter. And in second run the other word which was showing after the first word showing up earlier now. It depends on buffer memory,pages being in and out at the moment etc..
That depends on storage engine used. In MyISAM they'll be ordered in natural order (i.e. in order they're stored on the disk - which can be changed using ALTER TABLE ... ORDER BY command). In InnoDB they'll be ordered by PK. Other engines can have their own rules.
I am trying to find a way to get a random selection from a large dataset.
We expect the set to grow to ~500K records, so it is important to find a way that keeps performing well while the set grows.
I tried a technique from: http://forums.mysql.com/read.php?24,163940,262235#msg-262235 But it's not exactly random and it doesn't play well with a LIMIT clause, you don't always get the number of records that you want.
So I thought, since the PK is auto_increment, I just generate a list of random id's and use an IN clause to select the rows I want. The problem with that approach is that sometimes I need a random set of data with records having a spefic status, a status that is found in at most 5% of the total set. To make that work I would first need to find out what ID's I can use that have that specific status, so that's not going to work either.
I am using mysql 5.1.46, MyISAM storage engine.
It might be important to know that the query to select the random rows is going to be run very often and the table it is selecting from is appended to frequently.
Any help would be greatly appreciated!
You could solve this with some denormalization:
Build a secondary table that contains the same pkeys and statuses as your data table
Add and populate a status group column which will be a kind of sub-pkey that you auto number yourself (1-based autoincrement relative to a single status)
Pkey Status StatusPkey
1 A 1
2 A 2
3 B 1
4 B 2
5 C 1
... C ...
n C m (where m = # of C statuses)
When you don't need to filter you can generate rand #s on the pkey as you mentioned above. When you do need to filter then generate rands against the StatusPkeys of the particular status you're interested in.
There are several ways to build this table. You could have a procedure that you run on an interval or you could do it live. The latter would be a performance hit though since the calculating the StatusPkey could get expensive.
Check out this article by Jan Kneschke... It does a great job at explaining the pros and cons of different approaches to this problem...
You can do this efficiently, but you have to do it in two queries.
First get a random offset scaled by the number of rows that match your 5% conditions:
SELECT ROUND(RAND() * (SELECT COUNT(*) FROM MyTable WHERE ...conditions...))
This returns an integer. Next, use the integer as an offset in a LIMIT expression:
SELECT * FROM MyTable WHERE ...conditions... LIMIT 1 OFFSET ?
Not every problem must be solved in a single SQL query.
This is going to be one of those questions but I need to ask it.
I have a large table which may or may not have one unique row. I therefore need a MySQL query that will just tell me TRUE or FALSE.
With my current knowledge, I see two options (pseudo code):
[id = primary key]
OPTION 1:
SELECT id FROM table WHERE x=1 LIMIT 1
... and then determine in PHP whether a result was returned.
OPTION 2:
SELECT COUNT(id) FROM table WHERE x=1
... and then just use the count.
Is either of these preferable for any reason, or is there perhaps an even better solution?
Thanks.
If the selection criterion is truly unique (i.e. yields at most one result), you are going to see massive performance improvement by having an index on the column (or columns) involved in that criterion.
create index my_unique_index on table(x)
If you want to enforce the uniqueness, that is not even an option, you must have
create unique index my_unique_index on table(x)
Having this index, querying on the unique criterion will perform very well, regardless of minor SQL tweaks like count(*), count(id), count(x), limit 1 and so on.
For clarity, I would write
select count(*) from table where x = ?
I would avoid LIMIT 1 for two other reasons:
It is non-standard SQL. I am not religious about that, use the MySQL-specific stuff where necessary (i.e. for paging data), but it is not necessary here.
If for some reason, you have more than one row of data, that is probably a serious bug in your application. With LIMIT 1, you are never going to see the problem. This is like counting dinosaurs in Jurassic Park with the assumption that the number can only possibly go down.
AFAIK, if you have an index on your ID column both queries will be more or less equal performance. The second query will need 1 less line of code in your program but that's not going to make any performance impact either.
Personally I typically do the first one of selecting the id from the row and limiting to 1 row. I like this better from a coding perspective. Instead of having to actually retrieve the data, I just check the number of rows returned.
If I were to compare speeds, I would say not doing a count in MySQL would be faster. I don't have any proof, but my guess would be that MySQL has to get all of the rows and then count how many there are. Altough...on second thought, it would have to do that in the first option as well so the code will know how many rows there are as well. But since you have COUNT(id) vs COUNT(*), I would say it might be slightly slower.
Intuitively, the first one could be faster since it can abort the table(or index) scan when finds the first value. But you should retrieve x not id, since if the engine it's using an index on x, it doesn't need to go to the block where the row actually is.
Another option could be:
select exists(select 1 from mytable where x = ?) from dual
Which already returns a boolean.
Typically, you use group by having clause do determine if there are duplicate rows in a table. If you have a table with id and a name. (Assuming id is the primary key, and you want to know if name is unique or repeated). You would use
select name, count(*) as total from mytable group by name having total > 1;
The above will return the number of names which are repeated and the number of times.
If you just want one query to get your answer as true or false, you can use a nested query, e.g.
select if(count(*) >= 1, True, False) from (select name, count(*) as total from mytable group by name having total > 1) a;
The above should return true, if your table has duplicate rows, otherwise false.