Questionable SQL practice - Order By id rather than creation time - mysql

So I have an interesting question that I am not sure is considered a 'hack' or not. I looked through some questions but did not find a duplicate so here it is. Basically, I need to know if this is unreliable or considered bad practice.
I have a very simple table with a unique auto incrementing id and a created_at timestamp.
(simplified version of my problem to clarify the concept in question)
+-----------+--------------------+
| id |created_at |
+-----------+--------------------+
| 1 |2012-12-11 20:35:19 |
| 2 |2012-12-12 20:35:19 |
| 3 |2012-12-13 20:35:19 |
| 4 |2012-12-14 20:35:19 |
+-----------+--------------------+
Both of these columns are added dynamically so it can be said that a new 'insert' will ALWAYS have a greater id and ALWAYS have a greater date.
OBJECTIVE -
very simply grab the results ordered by created_at in descending order
SOLUTION ONE - A query that orders by date in descending order
SELECT * FROM tablename
ORDER BY created_at DESC
SOLUTION TWO - A query that orders by ID in descending order
SELECT * FROM tablename
ORDER BY id DESC
Is solution two considered bad practice? Or is solution two the proper way of doing things. Any explanation of your reasonings would be very helpful as I am trying to understand the concept, not just simply get an answer. Thanks in advance.

In typical practice you can almost always assume that an autoincrement id can be sorted to give you the records in creation order (either direction). However, you should note that this is not considered portable in terms of your data. You might move your data to another system where the keys are recreated, but the created_at data is the same.
There is actually a pretty good StackOverflow discussion of this issue.
The basic summary is the first solution, ordering by created_at, is considered best practice. Be sure, however, to properly index the created_at field to give the best performance.

You shouldn't rely on ID for anything other than that it uniquely identifies a row. It's an arbitrary number that only happens to correspond to the order in which the records were created.
Say you have this table
ID creation_date
1 2010-10-25
2 2010-10-26
3 2012-03-05
In this case, sorting on ID instead of creation_date works.
Now in the future you realize, oh, whoops, you have to change the creation date of of record ID #2 to 2010-09-17. Your sorts using ID now report the records in the same order:
1 2010-10-25
2 2010-09-17
3 2012-03-05
even though with the new date they should be:
2 2010-09-17
1 2010-10-25
3 2012-03-05
Short version: Use data columns for the purpose that they were created. Don't rely on side effects of the data.

There are a couple of differences between the two options.
The first is that they can give different results.
The value of created_at might be affected by the time being adjusted on the server but the id column will be unaffected. If the time is adjusted backwards (either manually or automatically by time synchronization software) you could get records that were inserted later but with timestamps that are before records that were inserted earlier. In this case you will get a different order depending on which column you order by. Which order you consider to be "correct" is up to you.
The second is performance. It is likely to be faster to ORDER BY your clustered index.
How the Clustered Index Speeds Up Queries
Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads.
By default the clustered key is the primary key, which in your case is presumably the id column. You will probably find that ORDER BY id is slightly faster than ORDER BY created_at.

Primary keys, especially of surrogate type, do not usually represent any kind of meaningful data aside from the fact that their mere function is to allow for uniquely identifiable records. Since dates in this case do represent meaningful data that has meaning outside of its primary function I'd say sorting according to dates is a more logical approach here.

Ordering by id orders by insertion order.
If you have use cases where insertion can be delayed, for example a batch process, then you must order by created_at to order by time.
Both are acceptable if they meet you needs.

Related

mysql optimization - copy or serialized old rows

Suppose i have a simple table with this columns:
| id | user_id | order_id |
About 1,000,000 rows is inserted to this table per month and as it is clear relation between user_id and order_id is 1 to M.
The records in the last month needed for accounting issues and the others is just for showing order histories to the users.To archive records before last past month,i have two options in my mind:
first,create a similar table and each month copy old records to it.so it will get bigger and bigger each month according to growth of orders.
second,create a table like below:
| id | user_id | order_idssss |
and each month, for each row to be inserted to this table,if there exist user_id,just update order_ids and add new order_id to the end of order_ids.
in this solution number of rows in the table will be get bigger according to user growth ratio.
suppose for each solution we have an index on user_id.
.
Now question is which one is more optimized for SELECT all order_ids per user in case of load on server.
the first one has much more records than the second one,but in the second one some programming language is needed to split order_ids.
The first choice is the better choice from among the two you have shown. With respect, I should say your second choice is a terrible idea.
MySQL (with all SQL dbms systems) is excellent at handling very large numbers of rows of uniformly laid out (that is, normalized) data.
But, your best choice is to do nothing except create appropriate indexes to make it easy to look up order history by date or by user. Leave all your data in this table and optimize lookup instead.
Until this table contains at least fifty million rows (at least four years' worth of data), the time you spend reprogramming your system to allow it to be split into a current and an archive version will be far more costly than just keeping it together.
If you want help figuring out which indexes you need, you should ask another question showing your queries. It's not clear from this question how you look up orders by date.
In a 1:many relationship, don't make an extra table. Instead have the user_id be a column in the Orders table. Furthermore, this is likely to help performance:
PRIMARY KEY(user_id, order_id),
INDEX(order_id)
Is a "month" a calendar month? Or "30 days ago until now"?
If it is a calendar month, consider PARTITION BY RANGE(TO_DAYS(datetime)) and have an ever-increasing list of monthly partitions. However, do not create future months in advance; create them just before they are needed. More details: http://mysql.rjweb.org/doc.php/partitionmaint
Note: This would require adding datetime to the end of the PK.
At 4 years' worth of data (48 partitions), it will be time to rethink things. (I recommend not going much beyond that number of partitions.)
Read about "transportable tablespaces". This may become part of your "archiving" process.
Use InnoDB.
With that partitioning, either of these becomes reasonably efficient:
WHERE user_id = 123
AND datetime > CURDATE() - INTERVAL 30 DAY
WHERE user_id = 123
AND datetime >= '2017-11-01' -- or whichever start-of-month you need
Each of the above will hit at most one non-empty partition more than the number of months desired.
If you want to discuss this more, please provide SHOW CREATE TABLE (in any variation), plus some of the important SELECTs.

Using index with IN clause and ordering by primary key

I am having a problem with the following task using MySQL. I have a table Records(id,enterprise, department, status). Where id is the primary key, and enterprise and department are foreign keys, and status is an integer value (0-CREATED, 1 - APPROVED, 2 - REJECTED).
Now, usually the application need to filter something for a concrete enterprise and department and status:
SELECT * FROM Records WHERE status = 0 AND enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
The order by is required, since I have to provide the user with the most recent records. For this query I have created an index (enterprise, department, status), and everything works fine. However, for some privileged users the status should be omitted:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
This obviously breaks the index - it's still good for filtering, but not for sorting. So, what should I do? I don't want create a separate index (enterprise, department), so what if I modify the query like this:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
AND status IN (0,1,2)
ORDER BY id desc LIMIT 0,10;
MySQL definitely does use the index now, since it's provided with values of status, but how quick will the sorting by primary key be? Will it take the recent 10 values for each status available, and then merge them, or will it first merge the ids for each status together, and only after that take the first ten (this way it's gonna be much slower I guess).
All of the queries will benefit from one composite query:
INDEX(enterprise, department, status, id)
enterprise and department can swapped, but keep the rest of the columns in that order.
The first query will use that index for both the WHERE and the ORDER BY, thereby be able to find the 10 rows without scanning the table or doing a sort.
The second query is missing status, so my index is less than perfect. This would be better:
INDEX(enterprise, department, id)
At that point, it works like above. (Note: If the table is InnoDB, then this 3-column index is identical to your 2-column INDEX(enterprise, department) -- the PK is silently included.)
The third query gets dicier because of the IN. Still, my 4 column index will be nearly the best. It will use the first 3 columns, but not be able to do the ORDER BY id, so it won't use id. And it won't be able to comsume the LIMIT. Hence the EXPLAIN will say Using temporary and/or Using filesort. Don't worry, performance should still be nice.
My second index is not as good for the third query.
See my Index Cookbook.
"How quick will sorting by id be"? That depends on two things.
Whether the sort can be avoided (see above);
How many rows in the query without the LIMIT;
Whether you are selecting TEXT columns.
I was careful to say whether the INDEX is used all the way through the ORDER BY, in which case there is no sort, and the LIMIT is folded in. Otherwise, all the rows (after filtering) are written to a temp table, sorted, then 10 rows are peeled off.
The "temp table" I just mentioned is necessary for various complex queries, such as those with subqueries, GROUP BY, ORDER BY. (As I have already hinted, sometimes the temp table can be avoided.) Anyway, the temp table comes in 2 flavors: MEMORY and MyISAM. MEMORY is favorable because it is faster. However, TEXT (and several other things) prevent its use.
If MEMORY is used then Using filesort is a misnomer -- the sort is really an in-memory sort, hence quite fast. For 10 rows (or even 100) the time taken is insignificant.

If your table has more selects than inserts, are indexes always beneficial?

I have a mysql innodb table where I'm performing a lot of selects using different columns. I thought that adding an index on each of those fields could help performance, but after reading a bit on indexes I'm not sure if adding an index on a column you select on always helps.
I have far more selects than inserts/updates happening in my case.
My table 'students' looks like:
id | student_name | nickname | team | time_joined_school | honor_roll
and I have the following queries:
# The team column is varchar(32), and only has about 20 different values.
# The honor_roll field is a smallint and is only either 0 or 1.
1. select from students where team = '?' and honor_roll = ?;
# The student_name field is varchar(32).
2. select from students where student_name = '?';
# The nickname field is varchar(64).
3. select from students where nickname like '%?%';
all the results are ordered by time_joined_school, which is a bigint(20).
So I was just going to add an index on each of the columns, does that make sense in this scenario?
Thanks
Indexes help the database more efficiently find the data you're looking for. Which is to say you don't need an index simply because you're selecting a given column, but instead you (generally) need an index for columns you're selecting based on - i.e. using a WHERE clause (even if you don't end up including the searched column in your result).
Broadly, this means you should have indexes on columns that segregate your data in logical ways, and not on extraneous, simply informative columns. Before looking at your specific queries, all of these columns seem like reasonable candidates for indexing, since you could reasonably construct queries around these columns. Examples of columns that would make less sense would be things phone_number, address, or student_notes - you could index such columns, but generally you don't need or want to.
Specifically based on your queries, you'll want student_name, team, and honor_roll to be indexed, since you're defining WHERE conditions based on the values of these columns. You'll also benefit from indexing time_joined_school if, as you suggest, you're ORDER BYing your queries based on that column. Your LIKE query is not actually easy for most RDBs to handle, and indexing nickname won't help. Check out How to speed up SELECT .. LIKE queries in MySQL on multiple columns? for more.
Note also that the ratio of SELECT to INSERT is not terribly relevant for deciding whether to use an index or not. Even if you only populate the table once, and it's read-only from that point on, SELECTs will run faster if you index the correct columns.
Yes indexes help on accerate your querys.
In your case you should have index on:
1) Team and honor_roll from query 1 (only 1 index with 2 fields)
2) student_name
3) time_joined_school from order
For the query 3 you can't use indexes because of the like statement. Hope this helps.

How to prevent MySQL selecting one index when a better one is available?

I have a table with 30,000 rows (and growing), which I join with another table. One some pages, I need to run a some 100+ of those queries, and things get slow. If I EXPLAIN the query, I notice that one table uses a primary key and is fast, but another table using one of its indexes, which is not the best one. Here's an overview:
SIMPLE | acc_entries | ref | ledger,date,type,status,status_ledger_date_type | type | 1 | const | 15359 | Using where
This is a sample query:
SELECT SUM(usd) AS total FROM acc_entries
LEFT JOIN acc_ledgers ON acc_entries.ledger = acc_ledgers.id
WHERE acc_entries.status = 1 AND
acc_ledgers.account = 3004 AND
date >= '2011-01-01' AND
date <= '2011-08-30' AND
type = 'credit'
As you can see, I am using in my WHERE the fields status, ledger (which is the field that joins with acc_ledgers.account), date and type. All of these fields have indexes. However, there is also a specific index that is used for all of them, in that same order. It is called status_ledger_data_type, and as you can see it is one of the indexes that MySQL considers using. However, at the end MySQL opts to use type as an index. This has some 15,000 possible rows (half of the table), whereas the other combined index only features a fraction of this. So my questions is: why does MySQL selects this index when a better one is available, and how can I prevent this?
You can try using index hints to force the use of your desired index.
MySql docs on Index Hints
The Battle Between Force Index and the Query Optimizer
7 ways to convince MySQL to use the right index
Actually, you want your index based on your smaller granularity. The Ledger from your Acc_Entries table will join to your ACC_Ledgers table on ITS primary index of ID, so the Acc_Ledgers is not really utilizing the Ledger portion for the WHERE clause. Your index should match as closely to the WHERE clause of your common queries. In this case, I would have an index on
(Account, Status, Type, Date)
The reason for Account being first, smaller result set. You could have 5,000 entries. Of those, 300 entries for the one account accounts, so you've already eliminated a huge amount of data to go through. Then, the Status... of the 300, you could have 100 # status 1, 100 # status 2, 100 # status 3, so you've now reduced the set even more, etc by other criteria of type and date.
Your query otherwise is completely fine... just a personal style in writing, I try to write my queries with the WHERE conditions as closely matching the index in same sequence too, so I would just have the Account clause first, then Status, Type and Date... but again, thats a personal style in writing queries.

How does mysql order rows with the same value?

In my database I have some records where I am sorting by a column that contains identical values:
| col1 | timestamp |
| row1 | 2011-07-01 00:00:00 |
| row2 | 2011-07-01 00:00:00 |
| row3 | 2011-07-01 00:00:00 |
SELECT ... ORDER BY timestamp
It looks like the result is in random order.
Is the random order consistent? I have these data in two mysql servers can I expect the same result?
I'd advise against making that assumption. In standard SQL, anything not required by an explicit ORDER BY clause is implementation dependent.
I can't speak for MySQL, but on e.g. SQL Server, the output order for rows that are "equal" so far as the ORDER BY is concerned may vary every time the query is run - and could be influenced by practically anything (e.g. patch/service pack level of the server, workload, which pages are currently in the buffer pool, etc).
So if you need a specific order, the best thing you can do (both to guarantee it, and to document your query for future maintainers) is explicitly request the ordering you want.
Lot's of answers already, but the bottom line answer is NO.
If you want rows returned in a particular sequence, consistently, then specify that in an ORDER BY. Without that, there absolutely NO GUARANTEE what order rows will be returned in.
I think what you may be missing is that there can be multiple expressions listed in the ORDER BY clause. And you can include expressions that are not in the SELECT list.
In your case, for example, you could use ORDER BY timestamp, id.
(Or some other columns or expressions.)
That will order the rows first on timestamp, and then any rows that have the same value for timestamp will be ordered by id, or whatever the next expression in this list is.
The answer is: No, the order won't be consistent. I faced the same issue and solved it by adding another column to the order section. Be sure that this column is unique for each record like 'ID' or whatever it is.
In this case, you must add the 'ID' field to your table which is unique for each record. You can assign it 'AI' (auto increment) so that you are not going to deal with the maintenance.
After adding the 'ID' column, update the last part of your query like:
SELECT mt.*
FROM my_table mt
ORDER BY mt.timestamp ASC, mt.id DESC
In ORDER BY condition if the rows are same values or if you want to arrange the data by selecting ORDER BY statement. CASE : You want to ORDER BY the values of column are frequency of words. And two words in the table may have the same frequency value in the frequency occurrence column.. So in the frequency column you will have two same frequencies of two different words. So, in "select * from database_name ORDER BY frequency" you may find any of one the two words having the same frequency showing up just before its latter. And in second run the other word which was showing after the first word showing up earlier now. It depends on buffer memory,pages being in and out at the moment etc..
That depends on storage engine used. In MyISAM they'll be ordered in natural order (i.e. in order they're stored on the disk - which can be changed using ALTER TABLE ... ORDER BY command). In InnoDB they'll be ordered by PK. Other engines can have their own rules.