execution time of insert and retrieval operation in mysql - mysql

which will take more execution time insert operation or select operation if both are single query affecting only one row.
for eg:
insert into example values('id','name','email')
or
select *from example where id='id';

For all benchmarking, you need to keep in mind:
It's not usually a simple matter, there are a large number of factors that can affect the figures.
Some of these factors are the number of indexes, historical patterns of read/write, database tuning, disk layout, fragmentation and so forth.
It rarely matter for small thing like a one-row operation, provided you have an intelligent setup (correct indexes and so on).
The best way to tell, for a given setup, is to test.
In any case, this sort of question usually arises when you want to choose the fastest of two functionally identical choices.
In this case, there is zero crossover in functionality so I'm not sure what you will gain with this answer. If you want to insert information, use insert. If you want to extract it, use select.
It's not like you can use select (no matter how fast it may be) to insert data into your database (well, other than as part of insert into ... select ...).

Related

Is select faster than insert

I have an big load file that I downloaded. This contains records that I will have to load into the database. Based on the size of the data, it will likely take 2 weeks or more to finish (since there is preprocessing etc). A coworker asked me to make what she called a delta file, which checks the current database to see if the data already exists based on a certain field in the database and IFF it exists then we will keep that in the load file, otherwise we will discard it.
I'm confused because to implement this I would need to do a select query for every file in the load file to check if it exists. a select would take O(n) I'm assuming. Then the insert (for a smaller data set) an additional O(1).
whereas an insert would just take O(1).
I'd like to 1) understand why this implementation is faster (If I don't understand things properly) and 2) a possible solution to implementation of this delta file if you can think of something smarter than what I suggested
Thanks
Databases make indexes for columns specified in the schema. The way your data is indexed can make a massive difference in performance. Without an index, a select operation may be O(n) but with an index it may be O(1).
Insert operations must maintain the index. For large data loading operations you may be well off to disable indexing until the end so you are doing a single index update on all the data instead of many index updates on each record you insert.
Some measurements I did the other day indicate that selects are faster than inserts in my situation. I came across this question because I am trying to learn if this is generally true or reflects something specific about the way I have it setup.

What's wrong with using a wildcard in your mySQL query? [duplicate]

This question already has answers here:
Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc
(49 answers)
Closed 9 years ago.
Basically what's the difference in terms of security and speed in these 2 queries?
SELECT * FROM `myTable`
and
SELECT `id`, `name`, `location`, `place` etc... FROM `myTable`
Would using * increase the benchmark on my query and perform slower than static rows?
There won't be much appreciable difference in performance if you also select all columns individually.
The idea is to select only the data you require and no more, which can improve performance if there is alot of unneeded columns in your query, for example, when you join several tables.
Ofc, on the other side of the coin, using * makes life easier when you make changes to the table.
Security-wise, the less you select, the less potentially sensitive data can be inadvertently dumped to the user's browser. Imagine if * included the column social_security_number and somewhere in your debug code it gets printed out as an HTML comment.
Performance-wise, in many cases your database is on another server, so requesting the entire row when you only need a small part of it means a lot more data going over the network.
There is not a single, simple answer, and your literal question cannot fully be answered without more detail of the specific table structure, but I'm going with the assumption that you aren't actually talking about a specific query against a specific table, but rather about selecting columns explicitly or using the *.
SELECT * is always wasteful of something unless you are actually going to use every column that exists in the rows you're reading... maybe network bandwidth, or CPU resources, or disk I/O, or a combination, but something is being unnecessarily used, though that "something" may be in very small and imperceptible quantities ... or it may not ... but it can add up over time.
The two big examples that come to mind where SELECT * can be a performance killer are cases like...
...tables with long VARCHAR and BLOB columns:
Columns such as BLOB and VARCHAR that are too long to fit on a B-tree page are stored on separately allocated disk pages called overflow pages. We call such columns off-page columns. The values of these columns are stored in singly-linked lists of overflow pages, and each such column has its own list of one or more overflow pages
— http://dev.mysql.com/doc/refman/5.6/en/innodb-row-format-overview.html
So if * includes columns that weren't stored on-page with the rest of the row data, you just took an I/O hit and/or wasted space in your buffer pool with accesses that could have been avoided had you selected only what you needed.
...also cases where SELECT * prevents the query from using a covering index:
If the index is a covering index for the queries and can be used to satisfy all data required from the table, only the index tree is scanned. In this case, the Extra column says Using index. An index-only scan usually is faster than ALL because the size of the index usually is smaller than the table data.
— http://dev.mysql.com/doc/refman/5.6/en/explain-output.html
When one or more columns are indexed, copies of the column data are stored, sorted, in the index tree, which also includes the primary key, for finding the rest of the row data. When selecting from a table, if all of the columns you are selecting can be found within a single index, the optimizer will automatically choose to return the data to you by reading it directly from the index, instead of going to the time and effort to read in all of the row data... and this, some cases, is a very significant difference in the performance of a query, because it can mean substantially smaller resource usage.
If EXPLAIN SELECT does not reveal the exact same query plan when selecting the individual columns you need compared with the plan used when selecting *, then you are looking at some fairly hard evidence that you are putting the server through unnecessary work by selecting things you aren't going to use.
In additional cases, such as with the information_schema tables, the columns you select can make a particularly dramatic and obvious difference in performance. The information_schema tables are not actually tables -- they're server internal structures exposed via the SQL interface... and the columns you select can significantly change the performance of the query because the server has to do more work to calculate the values of some columns, compared to others. A similar situation is true of FEDERATED tables, which actually fetch data from a remote MySQL server to make a foreign table appear logically to be local. The columns you select are actually transferred across the network between servers.
Explicitly selecting the columns you need can also lead to fewer sneaky bugs. If a column you were using in code is later dropped from a table, the place in your code's data structure -- in some languages -- is going to contain an undefined value, which in many languages is the same think you would see if the column still existed but was null... so the code thinks "okay, that's null, so..." a logical error follows. Had you explicitly selected the columns you wanted, subsequent executions of the query would throw a hard error instead of quietly misbehaving.
MySQL's C-client API, which some other client libraries are built on, supports two modes of fetching data, one of which is mysql_store_result, which buffers the data from the server on the client side before the application actually reads it into its internal structures... so as you are "reading from the server" you may have already implicitly allocated a lot of memory on the client side to store that incoming result-set even when you think you're fetching a row at a time. Selecting unnecessary columns means even more memory needed.
SELECT COUNT(*) is an exception. The COUNT() function counts the number of non-null values seen, and * merely means "count the rows"... it doesn't examine column data, so if you want a star there, go for it.
As a favor to your future self, unless you want to go back later and rewrite all of those queries when you're trying to get more performance out of your server, you should bite the bullet and do the extra typing, now.
As a bonus, when other people see your code, they won't accuse you of laziness or inexperience.

SELECT TableName.Col1 VS SELECT Col1

This might be a weird question but didnt know how to research on it. When doing the following query:
SELECT Foo.col1, Foo.col2, Foo.col3
FROM Foo
INNER JOIN Bar ON Foo.ID = Bar.BID
I tend to use TableName.Column instead of just col1, col2, col3
Is there any performance difference? Is it faster to specify Table name for each column?
My guess would be that yes it is faster since it would take some time to lookup the column name and and differentiate it.
If anyone knows a link where I could read up on this I would be grateful. I did not even know how to title this question better since not sure how to search on it.
First of all: This should not matter. The time to look up the columns is such a miniscule fraction of the total processing time of a typical query, that this might be the wrong spot to look for additional performance.
Second: Tablename.Colname is faster than Colname only, as it eliminates the need to search the referenced tables (and table-like structures like views and subqueries) for a fitting column. Again: The difference is inside the statistical noise.
Third: Using Tablename.Colname is a good idea, but for other reasons: If you use Colname only, and one of the tables in your query gets a new column with the same name, you end up with the oh-so-well-known "ambiguous column name" error. Typical candidates for such a columns often are "comment", "lastchanged", and friends. If you qualify your col references, this maintainability problem simply disappears - your query will work as allways, ignoring the new fields.
If it's faster, the difference is surely negligible, like a few microseconds per query. All the data about the tables mentioned in the query has to be loaded into memory, so it doesn't save any disk access. It's done during query parsing, not during data processing. Even if you run the query thousands of times, it might not make up for the time spent typing those extra characters, and certainly not the time we've spent discussing it. :)
But it makes the queries longer, so there's slightly more time spent in communications. If you're sending the query over a network, that will probably negate any time saved during parsing. You can reduce this by using short table aliases, though:
SELECT t.col1, t.col2
FROM ReallyLongTableName t
As a general rule, when worrying about database performance you only need to concern yourself with aspects whose time is dependent on the size number of rows in the tables. Anything that's the same regardless of the amount of data will fall into the noise, unless you're dealing with extremely tiny tables (in which case, why are you bothering with a database -- use a flat file).

Big tables and analysis in MySql

For my startup, I track everything myself rather than rely on google analytics. This is nice because I can actually have ips and user ids and everything.
This worked well until my tracking table rose about 2 million rows. The table is called acts, and records:
ip
url
note
account_id
...where available.
Now, trying to do something like this:
SELECT COUNT(distinct ip)
FROM acts
JOIN users ON(users.ip = acts.ip)
WHERE acts.url LIKE '%some_marketing_page%';
Basically never finishes. I switched to this:
SELECT COUNT(distinct ip)
FROM acts
JOIN users ON(users.ip = acts.ip)
WHERE acts.note = 'some_marketing_page';
But it is still very slow, despite having an index on note.
I am obviously not pro at mysql. My question is:
How do companies with lots of data track things like funnel conversion rates? Is it possible to do in mysql and I am just missing some knowledge? If not, what books / blogs can I read about how sites do this?
While getting towards 'respectable', 2 Millions rows is still a relatively small size for a table. (And therefore a faster performance is typically possible)
As you found out, the front-ended wildcard are particularly inefficient and we'll have to find a solution for this if that use case is common for your application.
It could just be that you do not have the right set of indexes. Before I proceed, however, I wish to stress that while indexes will typically improve the DBMS performance with SELECT statements of all kinds, it systematically has a negative effect on the performance of "CUD" operations (i.e. with the SQL CREATE/INSERT, UPDATE, DELETE verbs, i.e. the queries which write to the database rather than just read to it). In some cases the negative impact of indexes on "write" queries can be very significant.
My reason for particularly stressing the ambivalent nature of indexes is that it appears that your application does a fair amount of data collection as a normal part of its operation, and you will need to watch for possible degradation as the INSERTs queries get to be slowed down. A possible alternative is to perform the data collection into a relatively small table/database, with no or very few indexes, and to regularly import the data from this input database to a database where the actual data mining takes place. (After they are imported, the rows may be deleted from the "input database", keeping it small and fast for its INSERT function.)
Another concern/question is about the width of a row in the cast table (the number of columns and the sum of the widths of these columns). Bad performance could be tied to the fact that rows are too wide, resulting in too few rows in the leaf nodes of the table, and hence a deeper-than-needed tree structure.
Back to the indexes...
in view of the few queries in the question, it appears that you could benefit from an ip + note index (an index made at least with these two keys in this order). A full analysis of the index situation, and frankly a possible review of the database schema cannot be done here (not enough info for one...) but the general process for doing so is to make the list of the most common use case and to see which database indexes could help with these cases. One can gather insight into how particular queries are handled, initially or after index(es) are added, with mySQL command EXPLAIN.
Normalization OR demormalization (or indeed a combination of both!), is often a viable idea for improving performance during mining operations as well.
Why the JOIN? If we can assume that no IP makes it into acts without an associated record in users then you don't need the join:
SELECT COUNT(distinct ip) FROM acts
WHERE acts.url LIKE '%some_marketing_page%';
If you really do need the JOIN it might pay to first select the distinct IPs from acts, then JOIN those results to users (you'll have to look at the execution plan and experiment to see if this is faster).
Secondly, that LIKE with a leading wild card is going to cause a full table scan of acts and also necessitate some expensive text searching. You have three choices to improve this:
Decompose the url into component parts before you store it so that the search matches a column value exactly.
Require the search term to appear at the beginning of the of the url field, not in the middle.
Investigate a full text search engine that will index the url field in such a way that even an internal LIKE search can be performed against indexes.
Finally, in the case of searching on acts.notes, if an index on notes doesn't provide sufficient search improvement, I'd consider calculating and storing an integer hash on notes and searching for that.
Try running 'EXPLAIN PLAN' on your query and look to see if there are any table scans.
Should this be a LEFT JOIN?
Maybe this site can help.

mysql performance comparison

Which is more efficient,and by how much?
type 1:
insert into table_name(column1,column2..) select column1,column2 ... from another_table where
columnX in (value_list)
type 2:
insert into table_name(column1,column2..) values (column1_0,column2_0..),(column1_1,column2_1..)
The first edition looks short,and the second may become extremely long,when value_list contains,say 500 or even more values.
But I've no idea about whose performance will be better,though feels the first should be more efficient,intuitively.
The first is cleaner, especially if your columns are already in mysql (which I'm assuming you are saying?). You would save some time in network overhead sending data, and parsing time, and have to worry less about hitting whatever query size limit your client has.
However, in general, I would expect the performance to be similar as the number rows grows larger, especially on a well-indexed table. Most of the time for inserts w/ large queries is spent doing things like building indexes (see here), and both those queries, absent turning indexes off, would have to do that.
I agree with Todd, the first query is cleaner and will be faster to send to the MySQL server and faster to compile. And it's probably true that as the number of inserted records increases, the speed differential will drop.
But the first form has substantial other benefits to consider:
It's far easier to maintain: you only have to add or modify a field every now and then.
You avoid the expense of querying another_table and processing the results to concatenate the second query (a hidden cost of that approach).
If you need to run this update more than once, the first query can be cached in the MySQL server along with its compiled form and query plan. This makes subsequent invocations of the query run a bit faster.