I am creating an asp.net *MVC* application using EF code first. I had used Sql azure as my database. But it turns out Sql Azure is not reliable. So I am thinking of using MySql/PostgreSQL for database.
I wanted to know the repercussions/implications of using EF code first with MySql/PostgreSQL in regards of performance.
Has anyone used this combo in production or knows anyone who has used it?
EDIT
I keep on getting following exceptions in Sql Azure.
SqlException: "*A transport-level error has occurred when receiving results from the server.*
(provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)"
SqlException: *"Database 'XXXXXXXXXXXXXXXX' on server 'XXXXXXXXXXXXXXXX' is not
currently available. Please retry the connection later.* If the problem persists, contact
customer support, and provide them the session tracing ID of '4acac87a-bfbe-4ab1-bbb6c-4b81fb315da'.
Login failed for user 'XXXXXXXXXXXXXXXX'."
First your problem seems to be a network issue, perhaps with your ISP. You may want to look at getting a remote PostgreSQL or MySQL db I think you will run into the same problems.
Secondly comparing MySQL and PostgreSQL performance is relatively tricky. In general, MySQL is optimized for pkey lookups, and PostgreSQL is more generally optimized for complex use cases. This may be a bit low-level but....
MySQL InnoDB tables are basically btree indexes where the leaf note includes the table data. The primary key is the key of the index. If no primary key is provided, one will be created for you. This means two things:
select * from my_large_table will be slow as there is no support for a physical order scan.
Select * from my_large_table where secondary_index_value = 2 requires two index traversals sinc ethe secondary index an only refer to the primary key values.
In contrast a selection for a primary key value will be faster than on PostgreSQL because the index contains the data.
PostgreSQL by comparison stores information in an unordered way in a series of heap pages. The indexes are separate from the data. If you want to pull by primary key you scan the index, then read the data page in which the data is found, and then pull the data. In comparison, if you pull from a secondary index, this is not any slower. Additionally, the tables are structured such that sequential disk access is possible when doing a long select * from my_large_table will result in the operating system read-ahead cache being able to speed performance significantly.
In short, if your queries are simply joinless selection by primary key, then MySQL will give you better performance. If you have joins and such, PostgreSQL will do better.
Related
I hava a table containing 38 millions of rows of entry. To make eventual queries faster, I thought to create index on some columns.
While creating index on one column, it runs for approximate 2 hours then shows:
Lost connection to MySQL server during query
However if I restart the mySql workbench, which I'm working on, it shows my new index there. I have two questions:
1) Is the new index that got created is a complete one or an incomplete/invalid one?
2) How to resolve the lost connection problem?
From Edit>Preferences>SQL Editor: I have changed the values for DBMS connection read time-out and 2 others to a large value. But that doesn't help.
Index creation is an atomic operation, in the sense that it will either succeed entirely or fail entirely... so if you have an index, it will be intact and complete.
The reason you are losing your connection is most likely the network --at least one device (such as a firewall, or a NAT router, or if this a cloud-based server, it may be a device in the cloud provider's infrastructure) in the network path between you and the server maintains a TCP flow state table of active TCP sessions, and with no data transferred for some period of time, the connection is purged from that device's memory, so the connection collapses.
The MySQL client/server protocol has no layer 7 keep-alive mechanism for keeping idle connections open on the network... and from the network's perspective, the connection is completely idle during an index creation operation.
It may be possible to change kernel-level parameters on client and/or server so that some keep-alive messages are exchanged closer to layer 4, keeping the connection alive at a lower level, but this is system specific (Linux example).
Often it is also possible to speed up index creation greatly on MySQL by disabling foreign key checks on your connection only, while adding the index. Don't do this unless you are absolutely sure that your index operation doesn't jeopardize any data integrity (i.e., don't use this unless you are not adding a foreign key).
mysql> SET ##FOREIGN_KEY_CHECKS = 0;
mysql> ALTER TABLE ADD KEY ...;
mysql> SET ##FOREIGN_KEY_CHECKS = 1;
Note also that if you are using the GUI from Workbench to add indexes rather than actually typing SQL statements to alter the table... don't do that. Using graphical tools for DDL increases the odds of your time being wasted because they sometimes generate statements that accomplish the purpose you intended, but do it in a very inefficient and sometimes illogical way.
In many cases you can also use this:
mysql> ALTER TABLE ALGORITHM=INPLACE, LOCK=NONE, ADD KEY ...;
These options speed up the index operation by avoiding unnecessary locking and by attaching the index to the table as it stands, rather than copying the table. If the server doesn't like these options for the particular operation you're performing, it will tell you so, with an error, and no harm will be done. The ALGORITHM and LOCK options sometimes need to be preceded by disabling the foreign key checks, and enabling them when you are done.
Worth repeating: turning off foreign key checks as shown above only impacts one single connection -- yours -- and not any other connection. This doesn't disable checks for the table being altered if it is accessed by other users, or even by you, if you access the same table from another connection. This setting doesn't jeopardize data integrity as long as you don't do anything that disturbs foreign key references while you have it disabled. It's a well-known and commonly-used optimization. The checks are not needed when you're adding indexes but the server will in some cases try to validate the existing data unnecessarily.
I started benchmarking with Zend_Db_Profiler by saving queries that take too long. For one user, this query:
SELECT chapter, order, topic, id, name
FROM topics
WHERE id = '1'
AND hidden = 'no'
took 2.97 seconds. I performed an Explain:
select_type table possible keys key key_len ref rows Extra
SIMPLE topics id id 4 const 42 Using Where
and ran the query myself from phpMyAdmin, and it only took 0.0108 seconds. I thought that perhaps the size of the table might have an effect, as there is one column which is varchar and 8000 characters long, but it's not a part of the Select. I also just switched over to semi-dedicated hosting but can't imagine that this would have had a negative effect. Any thoughts as to how I could troubleshoot would be appreciated.
No. PHP and MySQL are server-side technologies, meaning your server processes them and has no bearing on the client. If your server is slow, it will just be slower in returning the response to the client.
Sadly, your premise about bottleneck here is not right. Also, when testing how one query behaves within your browser and then within PHPMyAdmin (or any other GUI), you have to clear query cache before trying to do the same query again. You didn't mention whether you did that.
The second part of tracking what might be wrong includes confirming that your database's configuration variables have been optimally set, that you chose the proper storage engine, and that your indexing strategy is optimal (such as choosing an INT for primary key instead of VARCHAR and similar atrocities).
That means that in most cases you'd go with InnoDB storage engine. It's free, it's quick if optimized (server variable named innodb_buffer_pool does wonders when set to proper size and when you have sufficient RAM). Seeing you said that you use semi-dedicated hosting implies you don't have control over those configuration variables.
Only when you're sure that
1) you're not testing the same query off of cache
2) that you've done everything within your power to make it optimal (this includes making sure that you don't have rogue processes raping your server).
Only then you can assume there might be an error in communication between the server and client.
As both PHP and SQL run on a server side, the user's internet connection does not affect the speed of the query.
Maybe the database server was too loaded at the time and couldn't pass the query in time.
I've got an index on columns a VARCHAR(255), b INT in an InnoDB table. Given two a,b pairs, can I use the MySQL index to determine if the pairs are the same from a c program (i.e. without using a strcmp and numerical comparison)?
Where is a MySQL InnoDB index stored in the file system?
Can it be read and used from a separate program? What is the format?
How can I use an index to determine if two keys are the same?
Note: An answer to this question should either a) provide a method for accessing a MySQL index in order to accomplish this task or b) explain why the MySQL index cannot practically be accessed/used in this way. A platform-specific answer is fine, and I'm on Red Hat 5.8.
Below is the previous version of this question, which provides more context but seems to distract from the actual question. I understand that there are other ways to accomplish this example within MySQL, and I provide two. This is not a question about optimization, but rather of factoring out a piece of complexity that exists across many different dynamically generated queries.
I could accomplish my query using a subselect with a subgrouping, e.g.
SELECT c, AVG(max_val)
FROM (
SELECT c, MAX(val) AS max_val
FROM table
GROUP BY a, b) AS t
GROUP BY c
But I've written a UDF that allows me to do it with a single select, e.g.
SELECT b, MY_UDF(a, b, val)
FROM table
GROUP by c
The key here is that I pass the fields a and b to the UDF, and I manually manage a,b subgroups in each group. Column a is a varchar, so this involves a call to strncmp to check for matches, but it's reasonably fast.
However, I have an index my_key (a ASC, b ASC). Instead of checking for matches on a and b manually, can I just access and use the MySQL index? That is, can I get the index value in my_key for a given row or a,b pair in c (inside the UDF)? And if so, would the index value be guaranteed to be unique for any value a,b?
I would like to call MY_UDF(a, b, val) and then look up the mysql index value (a,b) in c from the UDF.
Look back at your original query
SELECT c, AVG(max_val)
FROM
(
SELECT c, MAX(val) AS max_val
FROM table
GROUP BY a, b
) AS t
GROUP BY c;
You should first make sure the subselect gives you what you want by running
SELECT c, MAX(val) AS max_val
FROM table
GROUP BY a, b;
If the result of the subselect is correct, then run your full query. If that result is correct, then you should do the following:
ALTER TABLE `table` ADD INDEX abc_ndx (a,b,c,val);
This will speed up the query by getting all needed data from the index only. The source table never needs to be consulted.
Writing a UDF is and calling it a single SELECT is just masquerading a subselect and creating more overhead than the query needs. Simply placing your full query (one nested pass over the data) in the Stored Procedure will be more effective that getting most of the data in the UDF and executing single row selects iteratively ( something like O(n log n) running time with possible longer Sending data states).
UPDATE 2012-11-27 13:46 EDT
You can access the index without touching the table by doing two things
Create a decent Covering Index
ALTER TABLE table ADD INDEX abc_ndx (a,b,c,val);
Run the SELECT query I mentioned before
Since the all the columns of the query all in the index, the Query Optimizer will only touch the index (or precache index pages). If the table is MyISAM, you can ...
setup the MyISAM table to have a dedicated key cache that can be preloaded on mysqld startup
run SELECT a,b,c,val FROM table; to load index pages into MyISAM's default keycache
Trust me, you really do not want to access index pages against mysqld's will. What do I mean by that?
For MyISAM, the index pages for a MyISAM table are stored in the .MYI file of the table. Each DML statement will summon a full table lock.
For InnoDB, the index pages are loaded into the InnoDB Buffer Pool. Consequently, the associated data pages will load into the InnoDB Buffer Pool as well.
You should not have to circumvent access to index pages using Python, Perl, PHP, C++, or Java because of the constant I/O needed by MyISAM or the constant MVCC protocols being exercised by InnoDB.
There is a NoSQL paradigm (called HandlerSocket) that would permit low-level access to MySQL tables that can cleanly bypass mysqld's normal access patterns. I would not recommend it since there was a bug in it when using it to issue writes.
UPDATE 2012-11-30 12:11 EDT
From your last comment
I'm using InnoDB, and I can see how the MVCC model complicates things. However, apparently InnoDB stores only one version (the most recent) in the index. The access pattern for the relevant tables is write-once, read-many, so if the index could be accessed, it could provide a single, reliable datum for each key.
When it comes to InnoDB, MVCC is not complicating anything. It can actually become your best friend provided:
if you have autocommit enabled (It should be enabled by default)
the access pattern for the relevant tables is write-once, read-many
I would expect the accessed index pages to be sitting in the InnoDB Buffer Pool virtually forever if it is read repeatedly. I would just make sure your innodb_buffer_pool_size is set high enough to hold necessary InnoDB data.
If you just want to access an index outside of MySQL, you will have to use the API for one of the MySQL storage engines. The default engine is InnoDB. See overview here: InnoDB Internals. This describes (at a very high level) both the data layout on disk and the APIs to access it. A more detailed description is here: Embedded InnoDB.
However, rather than write your own program that uses InnoDB APIs directly (which is a lot of work), you might use one of the projects that have already done that work:
HandlerSocket: gives NoSQL access to InnoDB tables, runs in a UDF. See a very informative blog post from the developer. The goal of HandlerSocket is to provide a NoSQL interface exposed as a network daemon, but you could use the same technique (and much of the same code) to provide something that would be used by a query withing MySQL.
memcached InnoDB plugin. gives memcached style access to InnoDB tables.
HailDB: gives NoSQL access to InnoDB tables, runs on top of Embedded InnoDB. see conference presentation. EDIT: HailDB probably won't work running side-by-side with MySQL.
I believe any of these can run side-by-side with MySQL (using the same tables live), and can be used from C, so they do meet your requirements.
If you can use/migrate to MySQL Cluster, see also NDB API, a direct API, and ndbmemcache, a way to access MySQL Cluster using memcache API.
This is hard to answer without knowing why you are trying to do this, because the implications of different approaches are very different.
You probably cannot access the key directly.
I don't think this would actually make any difference performance-wise.
If you set covering indizes in the right order MySQL will not fetch a single page from the hard disk but deliver the result directly out of the index. There's nothing faster than this.
Note that your subselect may end up in a temptable on disk if its result is getting larger than your tmp_table_size or max_heap_table_size.
Check the status of Created_tmp_tables_disk_tables if you're not sure.
More on how MySQL is using internal temporary tables you find here
http://dev.mysql.com/doc/refman/5.5/en/internal-temporary-tables.html
If you want, post your table structure for a review.
No. There is no practical way to make use of a MySQL index, from within a C program, accessing a MySQL index in a means other than the MySQL engine, to check whether two (a,b) pairs (keys) are the same or not.
There are more practical solutions which don't require accessing MySQL datafiles outside of the MySQL engine or writing a user-defined function.
Q: Do you know where the mysql index is stored in the file system?
The location the index within the file system is going to depend on the storage engine for the table. For MyISAM engine, the indexes are stored in .MYI files under the datadir/database directory; InnoDB indexes are stored within an InnoDB managed tablespace file. f innodb_file_per_table variable was set when the table was created, there will be a separate .ibd file for each table under the innodb_data_home_dir/database subdirectory.
Q: Do you know what the format is?
The storage format of each storage engine is different, MyISAM, InnoDB, et al., and also depends on the version. I have some familiarity with how the data is stored, in terms of what MySQL requires of the storage engine. Detailed information about the internals would be specific to each engine.
Q: What makes it impractical?
It's impractical because it's a whole lot of work, and it's going to be dependent on details of storage engines that are likely to change in the future. It would be much more practical to define the problem space, and to write a SQL statement that would return what you want.
As Quassnoi pointed out in his comment to your question, it's not at all clear what particular problem you are trying to solve by creating a UDF or accessing MySQL indexes from outside of MySQL. I'm certain that Quassnoi would have a good way to accomplish what you need with an efficient SQL statement.
I'm working on a project , where some clients have internet connection issues.
When internet connection does not work , we store informations on database located in the client PC.
When we get connection again we sychronise the local DB with the central one.
To avoid conflicts in record ids between the 2 databases we will use UUID [char(36)] instead of autoincrements.
Databases are Mysql with InnoDB engine.
My question is Will this have an impact on the performance for selects, joins etc?
Should we use varbinary(16) instead of char(36) to improve performance ?
note : We already have an existing database with 4 Go data
We are also open to other suggestion to resolve this offline/online issue.
Thanks
Since you didn't say which database engine is being used (MyISAM or InnoDB) then it's difficult to say what's the magnitude of the performance implication.
However, to cut the story short - yes, there will be performance implications for larger sets of data.
The reason for that is that you require 36 bytes for the primary key index opposed to 4 (8 if bigint) bytes for integer.
I'll give you a hint how you can avoid conflicts:
First is to have different autoincrement offset on the databases. If you have 2 databases, you'd have autoincrements to be odd on one and even on another.
Second is to have compound primary key. If you define your primary key as PRIMARY KEY(id, server_id) then you won't get any clashes if you replicate the data into the central DB.
You'll also know where it came from.
The downside is that you need to supply the server_id to every query you do.
For reasons that are irrelevant to this question I'll need to run several SQLite databases instead of the more common MySQL for some of my projects, I would like to know how SQLite compares to MySQL in terms of speed and performance regarding disk I/O (the database will be hosted in a USB 2.0 pen drive).
I've read the Database Speed Comparison page at http://www.sqlite.org/speed.html and I must say I was surprised by the performance of SQLite but since those benchmarks are a bit old I was looking for a more updated benchmark (SQLite 3 vs MySQL 5), again my main concern is disk performance, not CPU/RAM.
Also since I don't have that much experience with SQLite I'm also curious if it has anything similar to the TRIGGER (on update, delete) events in the InnoDB MySQL engine. I also couldn't find any way to declare a field as being UNIQUE like MySQL has, only PRIMARY KEY - is there anything I'm missing?
As a final question I would like to know if a good (preferably free or open source) SQLite database manager exists.
A few questions in there:
In terms of disk I/O limits, I wouldn't imagine that the database engine makes a lot of difference. There might be a few small things, but I think it's mostly just whether the database can read/write data as fast as your application wants it to. Since you'd be using the same amount of data with either MySQL or SQLite, I'd think it won't change much.
SQLite does support triggers: CREATE TRIGGER Syntax
SQLite does support UNIQUE constraints: column constraint definition syntax.
To manage my SQLite databases, I use the Firefox Add-on SQLite Manager. It's quite good, does everything I want it to.
In terms of disk I/O limits, I wouldn't imagine that the database engine makes
a lot of difference.
In Mysql/myISAM the data is stored UNORDERED, so RANGE reads ON PRIMARY KEY will theoretically need to issue several HDD SEEK operations.
In Mysql/InnoDB the data is sorted by PRIMARY KEY, so RANGE reads ON PRIMARY KEY will be done using one DISK SEEK operation (in theory).
To sum that up:
myISAM - data is written on HDD unordered. Slow PRI-KEY range reads if pri key is not AUTO INCREMENT unique field.
InnoDB - data ordered, bad for flash drives (as data needs to be re-ordered after insert = additional writes). Very fast for PRI KEY range reads, slow for writes.
InnoDB is not suitable for flash memory. As seeks are very fast (so you won't get too much benefit from reordering the data), and additional writes needed to maintain the order are damaging to flash memory.
myISAM / innoDB makes a huge difference for conventional and flash drives (i don't know what about SQLite), but i'd rather use mysql/myisam.
I actually prefer using SQLiteSpy http://www.portablefreeware.com/?id=1165 as my SQLite interface.
It supports things like REGEXP which can come in handy.