I know MySQL and and trying to learn SQL Server .
I am looking for features/keywords that are in SQL Server and not in MySQL.
Eg: TOP, CLUSTERED/NONCLUSTERED Indexes etc..
Any links/pointers are appreciated.
Thanks!
I hope you don't mind me pasting from an answer I gave to someone else. The question was about performance in general, but in covering all aspects of performance I touched on most of the engine features as well. Giving yourself a thorough eduction in performance is going to also give you a thorough education in features.
So here are basic performance items to research. This is slanted toward MSSQL but not exclusive to it:
The basic logical data storage architecture of the system you're working with. For example, b-tree, extent, page, the sizes and configurations of these, how much data is read at once, the maximum size of a row (if that is an issue in your DBMS), what is done with out-of-row data (again if that is an issue in your DBMS).
Indexes, constraints, and basic ordering of tables and row data: heaps, clustered, nonclustered, unique and non-uniqueness of these indexes, primary keys, unique constraints, included columns. In all these indexes whether nulls are allowed, just one null is allowed, or none. Uniqueifiers. Covering index.
SARGability (look up SARG which is short for "Search ARGument").
Foreign keys, defaults, cascade deletes/update, their effect on inserts and deletes.
Whether NULLs require any storage space and if this is affected by column position. The number of bytes required to store each data type. When trailing spaces are stored or not stored for string data types. Packed vs. non-packed data types (e.g. float and decimal vs. integer). The concept of rows per page (or smallest unit of disk read) in both clustered and nonclustered indexes.
Fill factor, fragmentation, statistics, index selectivity, page splits, forwarding pointers.
When "batching" an operation can boost performance and why, and how to do it most efficiently.
INNER, LEFT, RIGHT, FULL, and CROSS JOINs. Semi-joins (EXISTS) and anti-semi-joins (NOT EXISTS). Any other language-specific syntax such as USING in mySql and CROSS APPLY/OUTER APPLY in SQL Server. The effect of putting a join condition in the ON clause of an outer join vs. putting it in the WHERE clause.
Independent subqueries, correlated subqueries, derived tables, common table expressions, understanding that EXISTS and NOT EXISTS generally appear to introduce a correlated subquery, but usually are seen in the execution plan as joins (semi or anti-semi joins).
Viewing and understanding execution plans either graphically or in text. Viewing the statistics/profile of CPU, reads, writes, and duration used by whole SQL batches or individual statements. Understanding the limitations of execution plans & profiles, which practically speaking means you generally have to use both to optimize well. Caching and reuse of execution plans, expiration of plans from the cache. Parameter sniffing and parameterization. Dynamic SQL in relation to these.
The relative costs of converting data types to other data types or just working with those data types. (For example, a solid rule of thumb is that working with strings is more costly than working with numbers.)
The generally exorbitant cost of row-by-row processing as opposed to set-based. The proper use for cursors (rare, though sometimes called for). How functions can hide execution plan costs. The tempting trap of writing functions that get called for every row when the problem could be solved in sets (though this can be tricky to learn how to see, especially because traditional application programming tends to train people to think in terms of functions like this).
Seeks, scans, range scans, "skip" scans. Bookmark lookups aka an index seek followed by table seek to the same table using the value found in the index seek. Loop, merge, and hash joins. Eager & lazy spools. Join order. Estimated row count. Actual row count.
When a query is too big and should be split into more than one, using temp tables or other means.
Multi-processor capabilities and the benefits and gotchas of parallel execution.
Tempdb or other temp file usage. Lifetime and scope of temp tables, table variables (if your DB engine has such). Whether statistics are collected for these (in SQL Server temp tables use statistics and table variables do not).
Locking, lock granularity, lock types, lock escalation, blocks, deadlocks. Data access pattern (such as UPDATE first, INSERT second, DELETE last). Intent, shared, exclusive locks. Lock hints (e.g. in SQL Server UPDLOCK, HOLDLOCK, READPAST, TABLOCKX).
Transactions and transaction isolation. Read committed, read uncommitted, repeatable read, serializable, snapshot, others I can't remember now.
Data files, file groups, separate disks, transaction logs, simple recovery, full recovery, oldest open transaction aka minimum log sequence number (LSN), file growth.
Sequences, arrays, lists, identity columns, windowing functions, TOP/rownum/limiting number of rows returned.
Materialized views aka indexed views. Calculated columns.
1 to 1, 1 to 0 or 1, 1 to many, many to many.
UNION, UNION ALL, and other "vertical" joins. SQL Server has EXCEPT and INTERSECT, too.
Expansion of IN () lists to OR. Expansion of IsNull(), Coalesce(), or other null-handling mechanisms to CASE statements.
The pitfalls of using DISTINCT to "fix" a query instead of dealing with the underlying problem.
How linked servers do NOT do joins across the link well, queries to a linked server often become row-by-row, large amounts of data can be pulled across the link to perform a join locally even if this isn't sensible.
The pitfall of doing any I/O or error-prone task in a trigger. The scope of triggers (whether they fire for every row or once for each data operation).
Making the front-end, GUI, reporting tool, or other client do client-type work (such as formatting dates or numbers as strings) instead of the DB engine.
Error handling. Rolling back transactions and how this always rolls back to the first transaction no matter how deeply nested, but a COMMIT only commits one level of work.
It is not a good way to learn new DBMS. The difference is not in the sql dialect only, but in the how DBMS works, its behaviour, practices, etc. Each new DBMS should be learned by you as yours first one.
NewID() is one different keyword too but I think keywords are not the only thing you need to look for. Stored procedures definition also have differences. Security is different (dbo user, use of windows groups/users in MSSQL), networking and I agree with zerkms answer too that behavior and practices are different.
delete from tableA where columnA in(
select columnA from (
select a.*, row_number() over (order by columnA) rn
from tableA a)
where rn>100)
Related
I have a theoretical question which pertains to the SQL function SUM().
Imagine we have a table which contains a column called "value"
"value" is a DECIMAL number either positive or negative.
In our potential solution, we'd like to run a SUM() across all rows for column "value"
SELECT SUM(value)
FROM table
No problems so far, but the dataset is potentially millions of rows. Possibly even hundreds of millions of rows as the data will be retained over years.
So my questions are:
Can you run SUM() across hundreds of millions of rows?
What kind of performance could I expect on a query across that many rows? We haven't settled but looking at using MySQL or SQL Server.
You can take a look to the column store in SQL Server. In short, you are able to create a column store index on your tables - different from the traditional row store index.
These indexes are specially design for optimizing aggregate queries when huge amount of data is involved (for example, like in Data Warehouse star and snowflake schemes).
From the docs:
Columnstore indexes can achieve up to 100x better performance on
analytics and data warehousing workloads and up to 10x better data
compression than traditional rowstore indexes.
because:
Data compression - you can many benefits from here; for example, columnstore indexes read compressed data from disk, which means fewer bytes of data need to be read into memory;
Column elimination - columnstore indexes skip reading in columns that are not required for the query result and further reduces I/O for query execution and therefore improves query performance (not like rowstore indexes)
Rowgroup elimination - optimize table scans using metadata to eliminate specific rowgroups based on your filtering criteria;
Batch Mode Execution - prior to SQL Server 2019, only queries involving such indexes, can benefit from batch mode processing which reduce your execution time further (check this video to see how great is the this mode)
You may certainly run SUM() across an entire table, and the performance would depend roughly on how many records that table has. Note that things like indices would not really help performance in this case, because SQL Server has to touch every record to compute the sum.
If running SUM on the entire table in production might not scale well, then one option to consider would be maintain the sum in a separate table. Then, when a record gets inserted or deleted, you may use a trigger to update the running total appropriately. This way, accessing the sum would be roughly constant time, though you would have some additional overhead because of the trigger logic.
I'll throw out a couple of ideas. If the data sets that you are working with are absolutely massive, consider running an overnight job to create a view, or some kind of temp table, and refer to this aggregated data blob when you get into the office in the morning. Or, move everything to the cloud, like Azure Databricks, for instance, and run these jobs in Spark. Spark is blazing fast and runs jobs in parallel, so everything is done super-fast. Good luck.
Good day everyone, I'm currently doing research on search algorithm optimization.
As of now, I'm researching on the Database.
In a database w/ SQL Support.
I can write the query for a specific table.
Select Number from Table1 where Name = "Test";
Select * from Table1 where Name = "Test";
1 searches the number from Table1 from where the Name is Test and 2 searches all the column for name Test.
I understand the concept of the function however what I'm interested in learning what is the approach of the search?
Is it just plain linear search where from the first index until the nth index it will grab so long as the condition is true thus having O(n) speed or does it have a unique algorithm that speeds its process?
If there's no indexes, then yes, a linear search is performed.
But, databases typically use a B Tree index when you specify a column(s) as a key. These are special data structure formats that are specifically tuned(high B Tree branching factors) to perform well on magnetic disk hardware, where the most significant time consuming factor is the seek operation(the magnetic head has to move to a diff part of the file).
You can think of the index as a sorted/structured copy of the values in a column. It can be determined quickly if the value being searched for is in the index. If it finds it, then it will also find a pointer that points back to the correct location of the corresponding row in the main data file(so it can go and read the other columns in the row). Sometimes a multi-column index contains all the data requested by the query, and then it doesn't need to skip back to the main file, it can just read what it found and then its done.
There's other types of indexes, but I think you get the idea - duplicate data and arrange it in a way that's fast to search.
On a large database, indexes make the difference between waiting a fraction of a second, vs possibly days for a complex query to complete.
btw- B tree's aren't a simple and easy to understand data structure, and the traversal algorithm is also complex. In addition, the traversal is even uglier than most of the code you will find, because in a database they are constantly loading/unloading chunks of data from disk and managing it in memory, and this significantly uglifies the code. But, if you're familiar with binary search trees, then I think you understand the concept well enough.
Well, it depends on how the data is stored and what are you trying to do.
As already indicated, a common structure for maintaining entries is a B+ tree. The tree is well optimized for disk since the actual data is stored only in leaves - and the keys are stored in the internal nodes. It usually allows a very small number of disk accesses since the top k levels of the tree can be stored in RAM, and only the few bottom levels will be stored on disk and require a disk read for each.
Other alternative is a hash table. You maintain in memory (RAM) an array of "pointers" - these pointers indicate a disk address, which contains a bucket that includes all entries with the corresponding hash value. Using this method, you only need O(1) disk accesses (which is usually the bottleneck when dealing with data bases), so it should be relatively fast.
However, a hash table does not allow efficient range queries (which can be efficiently done in a B+ tree).
The disadvantage of all of the above is that it requires a single key - i.e. if the hash table or B+ tree is built according to the field "id" of the relation, and then you search according to "key" - it becomes useless.
If you want to guarantee fast search for all fields of the relation - you are going to need several structures, each according to a different key - which is not very memory efficient.
Now, there are many optimizations to be considered according to the specific usage. If for example, number of searches is expected to be very small (say smaller loglogN of total ops), maintaining a B+ tree is overall less efficient then just storing the elements as a list and on the rare occasion of a search - just do a linear search.
Very gOod question, but it can have many answers depending on the structure of your table and how is normalized...
Usually to perform a seacrh in a SELECT query the DBMS sorts the table (it uses mergesort because this algorithm is good for I/O in disc, not quicksort) then depending on indexes (if the table has) it just match the numbers, but if the structure is more complex the DBMS can perform a search in a tree, but this is too deep, let me research again in my notes I took.
I recommend activating the query execution plan, here is an example in how to do so in Sql Server 2008. And then execute your SELECT statement with the WHERE clause and you will be able to begin understanding what is going on inside the DBMS.
given 2 large tables(imagine hundreds of millions of rows), each one has a string column, how do you get the diff?
Check out the open-source Percona Toolkit ---specifically, the pt-table-sync utility.
Its primary purpose is to sync a MySQL table with its replica, but since its output is the set of MySQL commands necessary to reconcile the differences between two tables, it's a natural fit for comparing the two.
What it actually does under the hood is a bit complex, and it actually uses different approaches depending on what it can tell about your tables (indexes, etc.), but one of the basic ideas is that it does fast CRC32 checksums on chunks of the indexes, and if the checksums don't match, it examines those records more closely. Note that this method is much faster than walking both indexes linearly and comparing them.
It only gets you part of the way, though. Because the generated commands are intended to sync a replica with its master, they simply replace the current contents of the replica for all differing records. In other words, the commands generated modify all fields in the record (not just the ones that have changed). So once you use pt-table-sync to find the diffs, you'd need to wrap the results in something to examine the differing records by comparing each field in the record.
But pt-table-sync does what you already knew to be the hard part: detecting diffs, really fast. It's written in Perl; the source should provide good breadcrumbs.
I'd think about creating an index on that column in each DB, then using a program to process through each DB in parallel using an ordering on that column. It would advance in both as you have records that are equal, and in one or the other as you find they are out of sync (keeping track of the out of sequence records). The creation of the index could be very costly in terms of both time and space (at least initially). Keeping it updated, though, if you are going to continue adding records may not add to much overhead. Once you have the index in place you should be able to process the difference in linear time. Producing the index -- assuming you have enough space -- should be an O(nlogn) operation.
I am trying to apply for a job, which asks for the experiences on handling large scale data sets using relational database, like mySQL.
I would like to know which specific skill sets are required for handling large scale data using MySQL.
Handling large scale data with MySQL isn't just a specific set of skills, as there are a bazillion ways to deal with a large data set. Some basic things to understand are:
Column Indexes, how, why, and when they're used, and the pros and cons of using them.
Good database structure to balance between fast writes and easy reads.
Caching, leveraging several layers of caching and different caching technologies (memcached, redis, etc)
Examining MySQL queries to identify bottlenecks and understanding the MySQL internals to see how queries get planned an executed by the database server in order to increase query performance
Configuring the MySQL server to be able to handle a lot of concurrent connections, and access it's data fast. Hardware bottlenecks, and the advantages to using different technologies to speed up your hardware (for example, storing your MySQL data on a RAID5 Array to increase IO performance))
Leveraging built-in MySQL technology (like Replication) to off-load read traffic
These are just a few things that get thought about in regards to big data in MySQL. There's a TON more, which is why the company is looking for experience in the area. Knowing what to do, or having experience with things that have worked or failed for you is an absolutely invaluable asset to bring to a company that deals with high traffic, high availability, and high volume services.
edit
I would be remis if I didn't mention a source for more information. Check out High Performance MySQL. This is an incredible book, and has a plethora of information on how to make MySQL perform in all scenarios. Definitely worth the money, and the time spent reading it.
edit -- good structure for balanced writes and reads
With this point, I was referring to the topic of normalization / de-normalization. If you're familiar with DB design, you know that normalization is the separation of data as to reduce (eliminate) the amount of duplicate data you have about any single record. This is generally a fantastic idea, as it makes tables smaller, faster to query, easier to index (individually) and reduces the number of writes you have to do in order to create/update a new record.
There are different levels of normalization (as #Adam Robinson pointed out in the comments below) which are referred to as normal forms. Almost every web application I've worked with hasn't had much benefit beyond the 3NF (3rd Normal Form). Which the definition of, if you were to read that wikipedia link above, will probably make your head hurt. So in lamens (at the risk of dumbing it down too far...) a 3NF structure satisfies the following rules:
No duplicate columns within the same table.
Create different tables for each set related data. (Example: a Companies table which has a list of companies, and an Employees table which has a list of each companies' employees)
No sub-sets of columns which apply to multiple rows in a table. (Example: zip_code, state, and city is a sub-set of data which can be identified uniquely by zip_code. These 3 columns could be put in their own table, and referenced by the Employees table (in the previous example) by the zip_code). This eliminates large sets of duplication within your tables, so any change that is required to the city/state for any zip code is a single write operation instead of 1 write for every employee who lives in that zip code.
Each sub-set of data is moved to it's own table and is identified by it's own primary key (this is touched/explained in the example for #3).
Remove columns which are not fully dependent on the primary key. (An example here might be if your Employees table has start_date, end_date, and years_employed columns. The start_date and end_date are both unique and dependent on any single employee row, but the years_employed can be derived by subtracting start_date from end_date. This is important because as end-date increases, so does years_employed so if you were to update end_date you'd also have to update years_employed (2 writes instead of 1)
A fully normalized (3NF) database table structure is great, if you've got a very heavy write-load. If your server is doing a lot of writes, it's very easy to write small bits of data, especially when you're running fewer of them. The drawback is, all your reads become much more expensive, because you have to (typically) run a lot of JOIN queries when you're pulling data out. JOINs are typically expensive and harder to create proper indexes for when you're utilizing WHERE clauses that span the relationship and when sorting the result-sets If you have to perform a lot of reads (SELECTs) on your data-set, using a 3NF structure can cause you some performance problems. This is because as your tables grow you're asking MySQL to cram more and more table data (and indexes) into memory. Ideally this is what you want, but with big data-sets you're just not going to have enough memory to fit all of this at once. This is when MySQL starts to create temporary tables, and has to use the disk to load data and manipulate it. Once MySQL becomes reliant on the hard disk to serve up query results you're going to see a significant performance drop. This is less-so the case with solid state disks, but they are super expensive, and (imo) are not mature enough to use on mission critical data sets yet (i mean, unless you're prepared for them to fail and have a very fast backup recovery system in place...then use them and gonuts!).
This is the balancing part. You have to decide what kind of traffic the data you're reading/writing is going to be serving more of, and design that to be fast. In some instances, people don't mind writes being slow because they happen less frequently. In other cases, writes have to be very fast, and the reads don't have to be fast because the data isn't accessed that often (or at all, or even in real time).
Workloads that require a lot of reads benefit the most from a middle-tier caching layer. The idea is that your writes are still fast (because you're 'normal') and your reads can be slow because you're going to cache it (in memcached or something competitive to it), so you don't hit the database very frequently. The drawback here is, if your cache gets invalidated quickly, then the cache is not reducing the read load by a meaningful amount and that results in no added performance (and possibly even more overhead to check/invalidate the caches).
With workloads that have the requirement for high throughput in writes, with data that is read frequently, and can't be cached (constantly changes), you have to come up with another strategy. This could mean that you start to de-normalize your tables, by removing some of the normalization requirements you choose to satisfy, or something else. Instead of making smaller tables with less repetitive data, you make larger tables with more repetitive / redundant data. The advantage here is that your data is all in the same table, so you don't have to perform as many (or, any) JOINs to pull the data out. The drawback...writes are more expensive because you have to write in multiple places.
So with any given situation the developer(s) have to identify what kind of use the data structure is going to have to serve, and balance between any number of technologies and paradigms to achieve an acceptable solution that meets their needs. No two systems or solutions are the same which is why the employer is looking for someone with experience on how to deal with these large datasets. Finding these solutions is not something that can really be learned out of a book, it typically takes some experience in the field and experience with how different solutions performed.
I hope that helps. I know I rambled a bit, but it's really a lot of information. This is why DBAs make the big dollars (:
You need to know how to process the data in "chunks". That means instead of simply trying to manipulate the entire data set, you need to break it into smaller more manageable pieces. For example, if you had a table with 1 Billion records, a single update statement against the entire table would likely take a long time to complete, and may possibly bring the server to it's knees.
You could, however, issue a series of update statements within a loop that would update 20,000 records at a time. Each iteration of the loop you would increment your range/counters/whatever to identify the next set of records.
Also, you commit your changes at the end of each loop, thereby allowing you to stop the process and continue where you left off.
This is just one aspect of managing large data sets. You still need to know:
how to perform backups
proper indexing
database maintenance
You can raed/learn how to handle large dataset with MySQL But it is not equivalent to having actual experiences.
Straight and simple answer: Study about partitioned database and find appropriate MySQL data structure types for large scale datasets similar with the partitioned database architecture.
I once had a MySQL database table containing 25 million records, which made even a simple COUNT(*) query takes minute to execute. I ended up making partitions, separating them into a couple tables. What i'm asking is, is there any pattern or design techniques to handle this kind of problem (huge number of records)? Is MSSQL or Oracle better in handling lots of records?
P.S
the COUNT(*) problem stated above is just an example case, in reality the app does crud functionality and some aggregate query (for reporting), but nothing really complicated. It's just that it takes quite a while (minutes) to execute some these queries because of the table volume
See Why MySQL could be slow with large tables and COUNT(*) vs COUNT(col)
Make sure you have an index on the column you're counting. If your server has plenty of RAM, consider increasing MySQL's buffer size. Make sure your disks are configured correctly -- DMA enabled, not sharing a drive or cable with the swap partition, etc.
What you're asking with "SELECT COUNT(*)" is not easy.
In MySQL, the MyISAM non-transactional engine optimises this by keeping a record count, so SELECT COUNT(*) will be very quick.
However, if you're using a transactional engine, SELECT COUNT(*) is basically saying:
Exactly how many records exist in this table in my transaction ?
To do this, the engine needs to scan the entire table; it probably knows roughly how many records exist in the table already, but to get an exact answer for a particular transaction, it needs a scan. This isn't going to be fast using MySQL innodb, it's not going to be fast in Oracle, or anything else. The whole table MUST be read (excluding things stored separately by the engine, such as BLOBs)
Having the whole table in ram will make it a bit faster, but it's still not going to be fast.
If your application relies on frequent, accurate counts, you may want to make a summary table which is updated by a trigger or some other means.
If your application relies on frequent, less accurate counts, you could maintain summary data with a scheduled task (which may impact performance of other operations less).
Many performance issues around large tables relate to indexing problems, or lack of indexing all together. I'd definitely make sure you are familiar with indexing techniques and the specifics of the database you plan to use.
With regards to your slow count(*) on the huge table, i would assume you were using the InnoDB table type in MySQL. I have some tables with over 100 million records using MyISAM under MySQL and the count(*) is very quick.
With regards to MySQL in particular, there are even slight indexing differences between InnoDB and MyISAM tables which are the two most commonly used table types. It's worth understanding the pros and cons of each and how to use them.
What kind of access to the data do you need? I've used HBase (based on Google's BigTable) loaded with a vast amount of data (~30 million rows) as the backend for an application which could return results within a matter of seconds. However, it's not really appropriate if you need "real time" access - i.e. to power a website. Its column-oriented nature is also a fairly radical change if you're used to row-oriented DBMS.
Is count(*) on the whole table actually something you do a lot?
InnoDB will have to do a full table scan to count the rows, which is obviously a major performance issue if counting all of them is something you actually want to do. But that doesn't mean that other operations on the table will be slow.
With the right indexes, MySQL will be very fast at retrieving data from tables much bigger than that. The problem with indexes is that they can hurt insert speeds, particularly for large tables as insert performance drops dramatically once the space required for the index reaches a certain threshold - presumably the size it will keep in memory. But if you only need modest insert speeds, MySQL should do everything you need.
Any other database will have similar tradeoffs between retrieve speed and insert speed; they may or may not be better for your application. But I would look first at getting the indexes right, and maybe rewriting your queries, before you try other databases. For what it's worth, we picked MySQL originally because we found it performed best.
Note that MyISAM tables in MySQL store the total size of the table. They maintain this because it's useful to the optimiser in some cases, but a side effect is that count(*) on the whole table is really fast. That doesn't necessarily mean they're faster than InnoDB at anything else.
I answered a similar question in This Stackoverflow Posting in some detail, describing the merits of the architectures of both systems. To some extent it was done from a data warehousing point of view but many of the differences also matter on transactional systems.
However, 25 million rows is not a VLDB and if you are having performance problems you should look to indexing and tuning. You don't need to go to Oracle to support a 25 million row database - you've got about 3 orders of magnitude to go before you're truly in VLDB territory.
You are asking for a books worth of answer and I therefore propose you get a good book on databases. There are many.
To get you started, here are some database basics:
First, you need a great data model based not just on what data you need to store but on usage patterns. Good database performance starts with good schema design.
Second, place indicies on columns based upon expected lookup AND update needs as update performance is often overlooked.
Third, don't put functions in where clauses if at all possible.
Fourth, use an -ahem- RDBMS engine that is of quality design. I would respectfully submit that while it has improved greatly in the recent past, mysql does not qualify. (Apologies to those who wish to argue it has finally made the grade in recent times.) There is no longer any need to choose between high-price and quality; Postgres (aka PostgreSql) is available open-source and is truly fantastic - and has all the plug-ins available to meet your needs.
Finally, learn what you are asking a database engine to do - gain some insight into internals - so you can better judge what kinds of things are expensive and why.
I'm going to second #Mark Baker, and say that you need to build indices on your tables.
For other queries than the one you selected, you should also be aware that using constructs such as IN() is faster than a series of OR statements in the query. There are lots of little steps you can take to speed-up individual queries.
Indexing is key to performance with this number of records, but how you write the queries can make a big difference as well. Specific performance tuning methods vary by database, but in general, avoid returning more records or fields than you actually need, make sure all join fields are indexed (as well as common where clause fields), avoid cursors (although I think this is less true in Oracle than SQL Server I don't know about mySQL).
Hardware can also be a bottleneck especially if you are running things besides the database server on the same machine.
Performance tuning is a very technical subject and can't really be answered well in a format like this. I suggest you get a performance tuning book and read it. Here is a link to one for mySQL
http://www.amazon.com/High-Performance-MySQL-Optimization-Replication/dp/0596101716