I want to import data from a MySQL server into Oracle database, and I found a suggestion to use Oracle database link. The Oracle instance is 10.0.2.1, and the MySQL server instance should be 5.1. The connection between two servers and the hard-disk should not be bottle neck.
I want to ask about the performance of Oracle database link? How fast it is? Is it very slow, slow or fast? Is it capable of transferring 1000 rows/second?
Thank you
1000 rows/sec is definitely acheivable... the question is whether it's acheivable on your database/network infrastructure.
Even if we had a detailed knowledge of your infrastructure it would still be very hard to say... it depends on so many factors like network speed, network latency, the size of the database rows being transfered etc.
The only way to tell for sure is to test it.
I would look on this as a good thing - the process of building the test is bound to teach you a lot about how it could work... it will throw up a number of issues that you're going to have to consider at some point - how do you handle backlogs when they form? What is the max through-put you can acheive? etc. You'll learn what kind of data-transfer works best for you (e.g. single rows at a time or larger batches) You might want to try it with a mechanisms other than SQL (e.g. queues)
You say that you don't think the network / hard disk access will be an issue - again, you need to test this assumption. Every database has a limiting factor on the performance somewhere (or they'd be infinitely fast!) and it's quite often disk access that is the limiting factor. In this case I would speculate that the network may be the limiting factor, but there's no way to know for sure without measuring it.
Generally speaking dblink performance limited by network speed, but there are some pitfalls, leading to performance issues:
unnecessary joins between local and remote tables that leads to transferring large amounts of data;
lack of parallelism built into the query (unions help in this case);
implicit sorting on remote database side;
failure to comply with Oracle recommendations such as using of collocated views and hints (mainly DRIVING_SITE and NO_MERGE).
Related
I want to set up a teamspeak 3 server. I can choose between SQLite and MySQL as database. Well I usually tend to "do not use SQLite in production". But on the other hand, it's a teamspeak server. Well okay, just let me google this... I found this:
Speed
SQLite3 is much faster than MySQL database. It's because file database is always faster than unix socket. When I requested edit of channel it took about 0.5-1 sec on MySQL database (127.0.0.1) and almost instantly (0.1 sec) on SQLite 3. [...]
http://forum.teamspeak.com/showthread.php/77126-SQLite-vs-MySQL-Answer-is-here
I don't want to start a SQLite vs MySQL debate. I just want to ask: Is his argument even valid? I can't imagine it's true what he says. But unfortunately I'm not expert enough to answer this question myself.
Maybe TeamSpeak dev's have some major differences in their db architecture between SQLite and MySQL which explains a huge difference in speed (I can't imagine this).
At First Access Time will Appear Faster in SQLite
The access time for SQLite will appear faster at first instance, but this is with a small number of users online. SQLite uses a very simplistic access algorithm, its fast but does not handle concurrency.
As the database starts to grow, and the amount of simultaneous access it will start to suffer. The way servers handle multiple requests is completely different and way more complex and optimized for high concurrency. For example, SQLite will lock the whole table if an update is going on, and queue the orders.
RDBMS's Makes a lot of extra work that make them more Scalable
MySQL for example, even with a single user will create an access QUEUE, lock tables partially instead of allowing only single user-per time executions, and other pretty complex tasks in order to make sure the database is still accessible for any other simultaneous access.
This will make a single user connection slower, but pays off in the future, when 100's of users are online, and in this case, the simple
"LOCK THE WHOLE TABLE AND EXECUTE A SINGLE QUERY EACH TIME"
procedure of SQLite will hog the server.
SQLite is made for simplicity and Self Contained Database Applications.
If you are expecting to have 10 simultaneous access writing at the database at a time SQLite may perform well, but you won't want an 100 user application that constant writes and reads data to the database using SQLite. It wasn't designed for such scenario, and it will trash resources.
Considering your TeamSpeak scenario you are likely to be ok with SQLite, even for some business it is OK, some websites need databases that will be read only unless when adding new content.
For this kind of uses SQLite is a cheap, easy to implement, self contained, perfect solution that will get the job done.
The relevant difference is that SQLite uses a much simpler locking algorithm (a simple global database lock).
Using fine-grained locking (as MySQL and most other DB servers do) is much more complex, and slower if there is only a single database user, but required if you want to allow more concurrency.
I have not personally tested SQLite vs MySQL, but it is easy to find examples on the web that say the opposite (for instance). You do ask a question that is not quite so religious: is that argument valid?
First, the essence of the argument is somewhat specious. A Unix socket would be used to communicate to a database server. A "file database" seems to refer to the fact that communication is through a compiled-in interface. In the terminology of SQLite, it is server-less. Most databases store data in files, so the terminology "file database" is a little misleading.
Performance of a database involves multiple factors, such as:
Communication of query to the database.
Speed of compilation (ability to store pre-compiled queries is a plus here).
Speed of processing.
Ability to handle complex processing.
Compiler optimizations and execution engine algorithms.
Communication of results back to the application.
Having the interface be compiled-in affects the first and last of these. There is nothing that prevents a server-less database from excelling at the rest. However, database servers are typically millions of lines of code -- much larger than SQLite. A lot of this supports extra functionality. Some of it supports improved optimizations and better algorithms.
As with most performance questions, the answer is to test the systems yourself on your data in your environment. Being server-less is not an automatic performance gain. Having a server doesn't make a database "better". They are different applications designed for different optimization points.
In short:
For Local application databses, single user applications, and little simple projects keeping small data SQLite is winner.
For Network database applications, multiuser and concurrency, load balancing and growing data managements, security and roll based authentications, big projects and widely used services you should choose MySql.
In your question I do not know much about teamspeak servers and what kind of data it actually needs to keep in its database but if it just needs a local DBMS and not needs to proccess lots of concurrency and managements SQLite will be my choice.
I'm thinking about moving our production env from a self hosted solution to amazon aws. I took a look at the different services and thought about using RDS as replacement for our mysql instances. The hardware we're using for our master seems to be better than the best hardware we can get when using rds (Quadruple Extra Large DB Instance). Since I can't simply move our production env to aws and see if the performance is still good enough I'd love to make some tests in advance.
I thought about creating a full query log from our current master, configure the rds instance and start to replay the full query log against it. Actually I don't even know if this kind of testing is a good idea but I guess you'll tell me if there are better ways to make sure the performance of mysql won't drop dramatically when making the move to rds.
Is there a preferred tool to replay the full query log?
at what metrics should I take a look while running the test
cpu usage?
memory usage?
disk usage?
query time?
anything else?
Thanks in advance
I'd recommend against replaying the query log - it's almost certainly not going to give you the information you want, and will take a significant amount of effort.
Firstly, you'd need to prepare your database so that replaying the query log won't break constraints when inserting, updating or deleting data, and that subsequent "select" queries will find the records they should find. This is distinctly non-trivial on anything other than a toy database - just taking a back-up and replaying the log doesn't necessarily guarantee the ordering of DML statements will match what happened on production. This may well give you a false sense of comfort - all your select statements return in a few milliseconds, because the data they're looking for doesn't exist!
Secondly, load and performance testing rarely works by replaying what happened on production - that doesn't (usually) reflect the peak conditions that will bring your system to its knees. For instance, most production systems run happily most of the time at <50% capacity, but go through spikes during the day, when they might reach 80% or more of capacity - that's what you care about, can your new environment handle the peaks.
My recommendation would be to use a tool like JMeter to write performance scripts (either directly to the database using the JDBC driver, or through the front end if you've got a web appilcation). Your performance scripts should reflect the behaviour you see from users, and be parameterized so they're not dependent on the order in which records are created.
Set yourself some performance targets (ideally based on current production levels, with a multiplier to cover you against spikes), e.g. "100 concurrent users, with no query taking more than 1 second"), and use JMeter to simulate that load. If you reach it first time, congratulations - go home! If not, look at the performance counters to see where the bottleneck is; see if you can alleviate that bottleneck (or tune your queries, your awesome on-premise hardware may be hiding some performance issues). Typical bottlenecks are CPU, RAM, and disk I/O.
Experiment with different test scenarios - "lots of writes", "lots of reads", "lots of reporting queries", and mix them up.
The idea is to understand the bottlenecks on the system, and see how far you are from those bottleneck, and understand what you can do to alleviate them. Once you know that, your decision to migrate will be far more robust.
In our (currently MySQL) database there are over 120 million records, and we make frequent use of complex JOIN queries and application-level logic in PHP that touch the database. We're a marketing company that does data mining as our primary focus, so we have many large reports that need to be run on a daily, weekly, or monthly basis.
Concurrently, customer service operates on a replicated slave of the same database.
We would love to be able to make these reports happen in real time on the web instead of having to manually generate spreadsheets for them. However, many of our reports take a significant amount of time to pull data for (in some cases, over an hour).
We do not operate in the cloud, choosing instead to operate using two physical servers in our server room.
Given all this, what is our best option for a database?
I think you're going the wrong way about the problem.
Thinking if you drop in NoSQL that you'll get better performance is not really true. At the lowest level, you're writing and retrieving a fair chunk of data. That implies your bottleneck is (most likely) HDD I/O (which is the common bottleneck).
Sticking to the hardware you have momentarily and using a monolithic data storage isn't scalable and as you noticed - has implications when wanting to do something in real-time.
What are your options? You need to scale your server and software setup (which is what you'd have to do with any NoSQL anyway, stick in faster hard drives at some point).
You also might want to look into alternative storage engines (other than MyISAM and InnoDB - for example, one of better engines that seemingly turn random I/O to sequential I/O is TokuDB).
Implementing faster HDD subsystem would also aid to your needs (FusionIO if you have the resources to get it).
Without more information on your end (what the server setup is, what MySQL version you're using and what storage engines + data sizes you're operating with), it's all speculation.
Cassandra still needs Hadoop for MapReduce, and MongoDB has limited concurrency with regard to MapReduce...
... so ...
... 120 mio records is not that much, and MySQL should easily be able to handle that. My guess is an IO bottleneck, or you're doing lots of random reads instead of sequential reads. I'd rather hire a MySQL techie for a month or so to tune your schema and queries, instead of investing into a new solution.
If you provide more information about your cluster, we might be able to help you better. "NoSQL" by itself is not the solution to your problem.
As much as I'm not a fan of MySQL once your data gets large, I have to say that you're nowhere near needing to move to a NoSQL solution. 120M rows is not a big deal: the database I'm currently working with has ~600M in one table alone and we query it efficiently. Managing that much data from an ops perspective is the problem; querying it isn't.
It's all about proper indexes and the correct use of them when joining, and secondarily memory settings. Find your slow queries (mysql slow query log FTW!), and learn to use the explain keyword to understand whey they are slow. Then tweak your indexes so your queries are efficient. Further, make sure you understand MySQL's memory settings. There are great pages in the docs explaining how they work, and they aren't that hard to understand.
If you've done both of those things and you're still having problems, make sure disk I/O isn't an issue. Then you should look in to another solution for querying your data if it is.
NoSQL solutions like Cassandra have a lot of benefits. Cassandra is fantastic at writing data. Scaling your writes is very easy--just add more nodes! But the tradeoff is that it's harder to get the data back out. From a cost perspective, if you have expertise in MySQl, it's probably better to leverage that and scale your current solution until it hits a limit before completely switching your underlying architecture.
So I have a website that could eventually get some pretty high traffic. My DB implementation is in SQL Server 2008 at the moment. I really only have 2 tables and a few stored procs. Most of the DB could be re-designed to work without joining (although it wouldn't make sense when I can join so easily within SQL Server).
I heard that sites like Digg and Facebook use NoSQL databases for a lot of their basic data access. Is this something worth looking into, or will SQL Server not really slow me down that bad?
I use paging on my site (although this might change in the future), and I also use AJAX'd data access for most of the "live" stuff, so it doesn't really seem to be a performance hindrance at the moment, but I'm afraid it will be as the data starts expanding exponentially.
Am I going to gain a lot of performance my moving to NoSQL? Honestly, right now I don't even completely understand NoSQL, so any tips on how this will help me improve the better.
Thanks guys.
Actually Facebook use a relational database at its core, see SOCC Keynote Address: Building Facebook: Performance at Massive Scale. And so do many other web-scale sites, see Why does Quora use MySQL as the data store instead of NoSQLs such as Cassandra, MongoDB, CouchDB etc?. There is also a discussion of how to scale SQL Server to web-scale size, see How do large-scale sites and applications remain SQL-based? which is based on MySpace's architecture (more details at Scale out SQL Server by using Reliable Messaging). I'm not saying that NoSQL doesn't have its use cases, I just want to point out that there are many shades of gray between white and black.
If you're afraid that your current solution will not scale then perhaps you should look at what are the factors that prevent scalability with your current solution. Test data is cheap to produce, load the 'exponentially increased' data volume and run your test harness, see where it cracks. None of the NoSQL solutions will bring magic off-the-shelf scalability, they all require you to understand how to use them effectively and deploy them correctly. And they also require you to test with large volumes if you want to ensure success at scale. Same for traditional relational solutions.
Sql Server scales pretty well. For example, Stack Overflow used it to serve you this very page. Facebook and Google might use a form of nosql, but even if you make it really big you're unlikely to rise to that level.
With a simple table structure and data that fits on one server, it doesn't matter much what platform you use. There are a several possible reasons to need to move to NoSQL:
Data scaling - SQL works best when all the data fits on one server (up to a few TB). The reason a lot of NoSQL stores don't have join is that they were designed not to require all the objects to be on one server.
Performance scaling - NoSQL stores do tend to be faster at handling high traffic, but not necessarily by enough to matter. You can improve SQL performance quite a lot with replication and caching as long as you aren't running into data size issues. Writes generally do have to run on the one server, but in most cases you will need to improve read performance long before write performance becomes an issue.
Complex data access - some types of queries simply don't fit well into a relational model. Graph and set stores work quite differently from relational databases so are a better fit for some applications.
Easier development - If you don't already have a SQL database and all the code to support it, using a schemaless datastore can save quite a bit of development time.
I don't think so you have to move your database from SQL to NoSQL unless and untill you are serving thousands of TB data. If you properly normalize your tables and serve the data and also need to set proper archive mechanism it should work.
If you still have question what to choose and how, than check this. Let's assume that you have decided to move on to NoSQL database than there are lot of market player. Just have a look at the list which is again depending upon your need and type of data you have.
Am I going to gain a lot of performance my moving to NoSQL?
It depends.
Check out this article for 7 reasons when you DON'T want to use NoSQL. If none is your case, then read further.
The main advantage of Document-based NoSQL for the traditional enterprise needs is cheaper hosting at high scale due to lower CPU usage on querying denormalised data (the most often request). Key points:
The CPU is going nuts on JOINs and GROUP BYs in the SQL queries, when a denormilised data structure implies no/less JOINs, hence less stress on CPU.
CPU is the most expensive resource in the cloud, then storage is the cheapest. And denormalised data trades higher storage for lower CPU.
How to get there?
Master the DDD (Domain-Driven Design).
Gain good understanding of CQRS (Command Query Responsibility Segregation) and Eventual consistency.
Understand your domain and business processes.
Design model, which is tuned to the access patterns.
Review.
Repeat steps 3 - 5.
IMPORTANT NOTE: I recieved many answers and I thank you all. But all the answers are more comments than answers. My question is related on the number of roundtrips per RDBMS. An experienced person told me that MySQL has less roundtrips than Firebird. I would like that the answer stays in the same area. I agree that this is not the first thing to consider, there are many others (application design, network settings, protocol settings...), anyway I 'd like to recieve an answer to my question, not a comment. By the way I found the comments all very useful. Thanks.
When the latency is high ("when pinging the server takes time") the server roundtrips make the difference.
Now I don't want to focus on the roundtrips created in programming, but the roundtrips that occur "under the hood" in the DB engine+Protocol+DataAccessLayer.
I have been told that FireBird has more roundtrips than MySQL. But this is the only information I know.
I am currently supporting MS SQL but I'd like to change RDBMS, so to make a wise choice I would like to include also this point into "my RDBMS comparison feature matrix" to understand which is the best RDBMS to choose as an alternative to MS SQL.
So the bold sentence above would make me prefer MySQL to Firebird (for the roundtrips concept, not in general), but can anyone add informations?
And MS SQL where is it located? Is someone able to "rank" the roundtrip performance of the main RDBMS, or at least:
MS SQL, MySql, Postegresql, Firebird (I am not interested in Oracle since it is not free, and if I have to change I would change to a free RDBMS).
Anyway MySql (as mentioned several times on stackoverflow) has a not clear future and a not 100% free license. So my final choice will probably dall on PostgreSQL or Firebird.
Additional info:
somehow you can answer my question by making a simple list like:
MSSQL:3;
MySQL:1;
Firebird:2;
Postgresql:2
(where 1 is good, 2 average, 3 bad). Of course if you can post some links where the roundtrips per RDBMSs are compared it would be great
Update:
I use Delphi and I plan to use DevArt DAC (UNIDAC), so somehow the "same" Data Access component is used, so if there are significant roundtrip differences they are due to the different RDBMS used.
Further update:
I have a 2 tier application (inserting a middle tier is not an option), so by choosing a RDBMS that is optimized "roundtrip-side" I have a chance to further improve the performance of the application. This kind of "optimization" is like "buy a faster internet connection" or "put more memory on the server" or "upgrade the server CPUs". Anyway also those "optimizations" are important.
Why are you concentrating on roundtrips? Normally they shouldn't affect your performance unless you had a very slow and unreliable network. For example, the difference between ODBC and OLEDB drivers for any database is nearly an order of magnitude in favor of OLEDB.
If you go to either MySQL or Firebird using ODBC instead of OLEDB/ADO.NET drivers you incur an overhead several orders of magnituted greater than the roundtrips you might save.
How your application is coded and how and when data are accessed and transferred have a much greater impact in slow connection or high latency situations than the db network protocol itself. Some database protocols may be tuned to better work in uncommon scenarios, i.e. increasing or decreasing the data packet size.
You may also encounter slow down at the TCP/IP layer itself, which could require TCP/IP tuning as well.
Until v2.1, Firebird certainly creates more traffic than MS SQL Server. I have a friend which developed a MSSQL C/S application here in Brazil where the db is hosted in a datacenter. The client apps runs from many stores directly connecting on server over VPN/Internet using end-user broadband connections (1Mbps, mostly) for 5+ years and no trouble with it. The distances involved range from few hundred to thousands of kilometers from datacenter.
After v2.1, I can't figure out if this remains true, because I haven't made a fair comparison since and Firebird's remote protocol had been changed to optimize network traffic on slow connections. More on FirebirdSQL site.
Can't say on PostGres ou MySQL, since I didn't used any.
I can't give round trip details but i was in a very similar situation a while back when i was trying to find alternatives to MS SQL due to budgeting. myself and 4 others spent some time comparing MySQL, Postgres, and FireBird.
Having worked with MySQL for a long time we quickly ruled it out for most of our larger projects. The decision fell between Postgres and FireBird. One thing just starting off was the lack of popular support/documentation with FireBird in contrast to Postgres. Our bench tests always either had Postgres on top or on level with FireBird, never under. In terms of features; Postgres again answered our needs while FiredBird had us needing to come up with creative solutions.
Below is a feature comparison chart. i'll admit it is now a bit dated but still very helpful:
Here is also a long forum thread discussing the difference
Good luck!
Sometimes the "roundtrips" are also in the protocol or data access layer, not the "DB engine"
I will not rank the client-server DBMS's from the roundtrips side. There are a lot of options to make one DBMS the best (ask SQL Server to use the default cursor), and other the worse (create an Oracle cursor with nested datasets).
What you are looking for is, probably, the general approach, oriented on the trafic minimization and the independent work of a client from a server. That are the middle-tier data access libraries.
So, if your application is so sensitive to the trafic optimization, then look for such libraries like the DataAbstract, kbmMW or ThinDAC.