Which mysql method is fast? - mysql

I am saving some text in database say 1000 characters.
But I display only first 200 characters.
Method 1
I could save first 200 characters in one column
and the remaining in second column of sql table
Method 2
I can save everything in one column and while displaying I can
query for 200 characters

It would be "cleaner" to store everything in 1 column. and you can select only the first 200 characters like this
select substring(your_column, 1, 200) as your_column from your_table

It really is irrelevant, but if you try to optimize, then method 1 is better, as long as you limit your query to that column (or you only query these columns you really need), because doing any substring on server side takes time and resources (times number of requests...). Method 2 is cleaner, but you are optimize for time so method 1.

This will come down to one of two things:
If you are pulling the entire row back into PHP and then only showing the first 200 chars, then your network speed will potentially be a bottleneck on the pulling data back:
If on the other hand you have two columns, you will potentially have a bottleneck at your drive access which fetches the data back to your PHP - longer rows can cause a slower access to multiple rows.
This will come down to a server-specific weigh-up. It will really depend on how your server performs. I would suggest running some scenarios where your code tries to pull back a few hundred thousand of each to see how long it takes.

Method 2.
First, duplicate storage of data is usually bad (demoralization). This is certainly true in this case.
Second, it would take longer to write to two tables than one.
Third, you have now made updates and deleted vulnerable to annoying inconsistencies (see #1).
Fourth, unless you are searching the first 200 characters for text, getting data out will be the same for both methods (just select a sub string of the first 200 characters).
Fifth, even if you are searching the first 200 characters, you can index on those, and retrieval speed should be identical.
Sixth, you don't want a database design that limits your UX--what if you need to change to 500 characters? You'll have a lot of work to do.
This is a very obvious case of what not to do in database design.
reference: as answered by Joe Emison http://www.quora.com/MySQL/Which-mysql-method-is-fast

Related

Does SELECT * really take more time than selecting only the needed columns?

Will it make a discernible difference in the time a website page loads? On average, my tables have 10 columns, if I just need 3 of those columns, should I just call those in the query to make it faster?
Will it make a discernable difference. Probably not under most circumstances. Here are some cases where it would possibly make a big difference:
The 7 unneeded columns are really, really big.
You are returning lots and lots of rows.
You have a big table, are getting many rows, and an index is available on the 3 columns but not the 10.
But, there are other reasons not to use *:
It will replace the columns based on the order of the columns in the database at the time the query is compiled. This can cause problems if the structure of the table changes.
If a column name changes or is removed, your query would work and subsequent code might break. If you explicitly list the columns, then the query will break, making the problem easier to spot.
Typing three column names shouldn't be a big deal. Explicitly listing the columns makes the code more informative.
Let's say you had a table with 1000 columns, and you only needed 3.
What do you think would run faster and why?
This: SELECT * FROM table_name;
or this:SELECT col1, col2, col3, FROM table_name;
When you are using * you are now holding that entire selection (big or small) in memory. The bigger the selection...the more memory its going to use/need.
So even though your table isn't necessarily big, I would still only select the data that you actually need. You might not even notice a difference in speed, but it will definately be faster.
Yes if you only need a handful of columns, only select those. Below are some reasons:
THE MOST OBVIOUS: Extra data needs to be sent back making for larger packets to transmit (or pipe via local socket). This will increase overall latency. This might not seem like much for 1 or 2 rows, but wait until you've got 100 or 1000 rows... 7 extra columns of data will significanly affect overall transit latency expecially if you end up having the result set having to be broken into more TCP packets for transmit. This might not be such an issue if you're hitting a localhost socket, but move your DB to a server across a network, to another datacenter, etc... and the impact will be plain as day!
With the MySQL query cache enabled, storing unneeded data in result sets will increase your over cache space needs--larger query caches can suffer performance hits.
A HUGE HIT CAN BE: If you need only columns that are part of a covering index, doing a select * will require follow up point lookups for the remaining data fields in the main table rather than just use the data from the index table.
Yes you should.
Using named columns in select is a best practice working with database for multiple reasons.
Only the needed data travel from the database to the application server reducing cpu, memory and disk usage.
It helps detecting coding errors and structure changes.
There are only a few cases when using select * is a good idea, in all the other queries do yourself a favour and use the column names.
Yes definitely. * will get replaced with all the column names. After that only the execution starts. For example if there are 3 columns a, b, c in a table.. select a, b, c directly starts execution where as select * starts transforming the query into select a, b, c after that only the execution stats.
The short of it is yes, if you are returning more data it will take longer. This may be a very very very tiny amount of time but yes it will take longer. As stated above select * can be dangerous in a production situation where you may not be the one designing/implementing the database. If you assume that columns are returned in a particular order or the database structure is of a particular type and then the DBA goes in and makes some kind of a change and does not inform you, you may have an issue with your code.
The difference is very minimal, but there is a slight difference, I think it really depends on several factors for which is faster.
1) How many columns are in the table?
2) How many columns do you actually need to grab?
3) How many records are you grabbing?
In your case, based on what you said of having 10 columns and only needing 3 of those columns, I doubt it'll make a difference if you use 'Select *' or not, unless perhaps you're grabbing tens of thousands of records. But in more extreme cases with a lot more columns involved I have found 'Select *' to be slightly faster, but that might not be true in all cases.
I once did some speed tests in a SQLite table with over 150 columns, where I needed to grab only about 40 of the columns, and I needed all 20,000+ records. The speed differences were very minimal (we're talking 20 to 40 milliseconds difference), but it was actually faster to grab the data from All the columns with a 'SELECT ALL *', rather than going 'Select All Field1, Field2, etc'.
I assume the more records and columns in your table, the greater the speed difference this example will net you. But if you only needed 3 columns in a gigantic table I'd guess that just grabbing those 3 columns would be faster.
Bottom line though, if you really care about the minimal speed differences between 'Select *' and 'Select field1, field2, etc', then do some speed tests.

Fast mysql query to randomly select N usernames

In my jsp application I have a search box that lets user to search for user names in the database. I send an ajax call on each keystroke and fetch 5 random names starting with the entered string.
I am using the below query:
select userid,name,pic from tbl_mst_users where name like 'queryStr%' order by rand() limit 5
But this is very slow as I have more than 2000 records in my table.
Is there any better approach which takes less time and let me achieve the same..? I need random values.
How slow is "very slow", in seconds?
The reason why your query could be slow is most likely that you didn't place an index on name. 2000 rows should be a piece of cake for MySQL to handle.
The other possible reason is that you have many columns in the SELECT clause. I assume in this case the MySQL engine first copies all this data to a temp table before sorting this large result set.
I advise the following, so that you work only with indexes, for as long as possible:
SELECT userid, name, pic
FROM tbl_mst_users
JOIN (
-- here, MySQL works on indexes only
SELECT userid
FROM tbl_mst_users
WHERE name LIKE 'queryStr%'
ORDER BY RAND() LIMIT 5
) AS sub USING(userid); -- join other columns only after picking the rows in the sub-query.
This method is a bit better, but still does not scale well. However, it should be sufficient for small tables (2000 rows is, indeed, small).
The link provided by #user1461434 is quite interesting. It describes a solution with almost constant performance. Only drawback is that it returns only one random row at a time.
does table has indexing on name?
if not apply it
2.MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).
3.http://jan.kneschke.de/projects/mysql/order-by-rand/

MySQL Improving speed of order by statements

I've got a table in a MySQL db with about 25000 records. Each record has about 200 fields, many of which are TEXT. There's nothing I can do about the structure - this is a migration from an old flat-file db which has 16 years of records, and many fields are "note" type free-text entries.
Users can be viewing any number of fields, and order by any single field, and any number of qualifiers. There's a big slowdown in the sort, which is generally taking several seconds, sometimes as much as 7-10 seconds.
an example statement might look like this:
select a, b, c from table where b=1 and c=2 or a=0 order by a desc limit 25
There's never a star-select, and there's always a limit, so I don't think the statement itself can really be optimized much.
I'm aware that indexes can help speed this up, but since there's no way of knowing what fields are going to be sorted on, i'd have to index all 200 columns - what I've read about this doesn't seem to be consistent. I understand there'd be a slowdown when inserting or updating records, but assuming that's acceptable, is it advisable to add an index to each column?
I've read about sort_buffer_size but it seems like everything I read conflicts with the last thing I read - is it advisable to increase this value, or any of the other similar values (read_buffer_size, etc)?
Also, the primary identifier is a crazy pattern they came up with in the nineties. This is the PK and so should be indexed by virtue of being the PK (right?). The records are (and have been) submitted to the state, and to their clients, and I can't change the format. This column needs to sort based on the logic that's in place, which involves a stored procedure with string concatenation and substring matching. This particular sort is especially slow, and doesn't seem to cache, even though this one field is indexed, so I wonder if there's anything I can do to speed up the sorting on this particular field (which is the default order by).
TYIA.
I'd have to index all 200 columns
That's not really a good idea. Because of the way MySQL uses indexes most of them would probably never be used while still generating quite a large overhead. (see chapter 7.3 in link below for details). What you could do however, is to try to identify which columns appear most often in WHERE clause, and index those.
In the long run however, you will probably need to find a way, to rework your data structure into something more manageable, because as it is now, it has the smell of 'spreadsheet turned into database' which is not a nice smell.
I've read about sort_buffer_size but it seems like everything I read
conflicts with the last thing I read - is it advisable to increase
this value, or any of the other similar values (read_buffer_size,
etc)?
In general he answer is yes. However the actual details depend on your hardware, OS and what storage engine you use. See chapter 7.11 (especially 7.11.4 in link below)
Also, the primary identifier is a crazy pattern they came up with in
the nineties.[...] I wonder if there's anything I can do to speed up
the sorting on this particular field (which is the default order by).
Perhaps you could add a primarySortOrder column to your table, into which you could store numeric values that would map the PK order (precaluclated from the store procedure you're using).
Ant the link you've been waiting for: Chapter 7 from MySQL manual: Optimization
Add an index to all the columns that have a large number of distinct values, say 100 or even 1000 or more. Tune this number as you go.

Is using LIMIT and OFFSET in MySQL less expensive than returning the full set of records?

It might be a silly question but I am just curious about what goes on behind the curtains.
If I want to paginate database records I can either use LIMIT and OFFSET or simply get all the records and extrapolate the ones I want with more code.
I know the second option is absolutely silly, I just want to know if it is more expensive
If I use LIMIT and OFFSET will the database grab just what I ask for, or will internally get all the records matching my query (even hundreds of thousands) and then use internally a starting index (OFFSET) and an ending index (OFFSET + LIMIT) to get the requested subset of records?
I don't even know if I used the right words to describe the doubt I have, I hope someone can shed some light.
Thanks!
Yes, it would be more expensive, for two reasons.
1) Mysql will optimize internally to only calculate the rows that it needs, rather than retrieving them all internally. Note that this optimization is a lot less if you have an order by in your query, because then mysql has to match and sort all of the rows in the dataset, rather than stopping when it finds the first X in your limit.
2) When all the records are returned, they all need to be transmitted over the wire from the database to your application server. That can take time, especially for medium to large data sets.
The difference can be enormous. Not only is the network difference big sometimes (a few rows vs hundreds to thousands), but the number of rows the database needs to find can be large also. For example, if you ask for 10 rows, the database can stop after finding 10 rows, rather than having to check every row.
Whenever possible, use LIMIT and OFFSET.

What is running optimize on a table doing to make such a huge difference?

Simple situation, two column table [ID, TEXT]. The Text column has 1-10 word phrases. 300,000 rows.
Running the query:
SELECT * FROM row
WHERE text LIKE '%word%'
...took 0.1 seconds. Ok.
So I created a 2nd column, the table now has: [ID, TEXT2, TEXT2]
I made TEXT2 = TEXT (using an UPDATE table SET TEXT2 = TEXT]
Then I run the query for '%word%' again, and it takes 2.4 seconds.
This leaves me very very stumped but after quite a lot of blind alleys, I run OPTIMIZE on the table, and it goes to about 0.2 seconds.
Two questions:
Does anyone know how the data structure get's itself in such a mess whereby doubling the data increases the search time for this query by a factor of 24?
Is it standard for an un-indexed search like this to increase at the rate of the underlying table data structure as opposed to the data in the actual column being searched?
Thanks!
Sounds to me like you are the victim of Query caching. The second time your run the query (after the optimize), it already has the answer cached, and therefore the result is returned instantly. Have you tried searching for different search terms. Try running the query with caching turned off as so:
SELECT SQL_NO_CACHE * FROM row WHERE text LIKE '%word%'
To see if this changes the results, or try searching for different words, but with similar number of results to ensure that your server isn't just returned a cached value.
The first time it does a table scan which sounds about right for the timing - no index involved.
Then you added the index and the mysql optimizer doesn't notice you've got a wildcard on the front, so it scans the entire index to find the records, then needs two more reads (one to the PK, then one into the table from there) to get the data record on top of that.
OPTIMIZE probably just updates the optimizer statistics so it knows it should scan the table again.
I would think that the difference is caused by the increased row length causing the table to be fragmented on the disk. Optimize will sort that problem out, leading to the search time returning to normal (give or take a bit).