Multiple tables for similar data in MySQL - mysql

I am writing a server using Netty and MySQL(with JDBC connector/J).
Please note that I am very new to server programming.
Say I have an application that users input about 20 information about themselves.
And I need to make some methods where I need only specific data from those information.
Instead of using "select dataOne, dataTwo from tableOne where userNum=1~1000"
create a new table called tableTwo containing only dataOne and dataTwo.
Then use "select * from tableTwo where userNum=1~1000"
Is this a good practice when I make tables for every method I need?
If not, what can be a better practice?

You should not be replicating data.
SQL is made in such a way that you specify the exact columns you want after the SELECT statement.
There is no overhead to selecting specific columns, and this is the way SQL is designed for.
There is overhead to replicating your data, and storing in 2 different tables.
Consequences of using such a design:
In a world where we used only select * we would need a different table for each combination of columns we want in results.
Consequently, we would be storing the same data repeatedly. If you needed 10 different column combinations, this would be 10X your data.
Finally, data manipulation statements (update, insert) would need to update the same data in multiple tables also multiplying the time needed to perform basic operations.
It would cause databases to not be scalable.

Related

Join 10 tables on a single join id called session_id that's stored in session table. Is this good/bad practice?

There's 10 tables all with a session_id column and a single session table. The goal is to join them all on the session table. I get the feeling that this is a major code smell. Is this good/bad practice ?
What problems could occur?
Whether this is a good design or not depends deeply on what you are trying to represent with it. So, it might be OK or it might not be... there's no way to tell just from your question in its current form.
That being said, there are couple ways to speed up a join:
Use indexes.
Use covering indexes.
Under the right DBMS, you could use a materialized view to store pre-joined rows. You should be able to simulate that under MySQL by maintaining a special table via triggers (or even manually).
Don't join a table unless you actually need its fields. List only the fields you need in the SELECT list (instead of blindly using *). The fastest operation is the one you don't have to do!
And above all, measure on representative amounts of data! Possible results:
It's lightning fast. Yay!
It's slow, but it doesn't matter that it's slow (i.e. rarely used / not important).
It's slow and it matters that it's slow. Strap-in, you have work to do!
We need Query with 11 joins and the EXPLAIN posted in the original question when it is available, please. And be kind to your community, for every table involved post as well SHOW CREATE TABLE tblname SHOW INDEX FROM tblname to avoid additional requests for these 11 tables. And we will know scope of data and cardinality involved for each indexed column.
of Course more join kills performance.
but it depends !! if your data model is like that then you can't help yourself here unless complete new data model re-design happen !!
1) is it a online(real time transaction ) DB or offline DB (data warehouse)
if online , then better maintain single table. keep data in one table , let column increase in size.!!
if offline , it's better to maintain separate table , because you are not going to required all column always.!!

Is it faster to only query specific columns?

I've heard that it is faster to select colums manually ("col1, col2, col3, etc") instead of querying them all with "*".
But what if I don't even want to query all columns of a table? Would it be faster to query, for Example, only "col1, col2" insteaf of "col1, col2, col3, col4"?
From my understanding SQL has to search through all of the columns anyway, and just the return-result changes. I'd like to know if I can achieve a gain in performance by only choosing the right columns.
(I'm doing this anyway, but a backend API of one of my applications returns more often than not all columns, so I'm thinking about letting the user manually select the columns he want)
In general, reducing the number of columns in the select is a minor optimization. It means that less data is being returned from the database server to the application calling the server. Less data is usually faster.
Under most circumstances, this a minor improvement. There are some cases where the improvement can be more important:
If a covering index is available for the query, so the index satisfies the query without having to access data pages.
If some fields are very long, so records occupy multiple pages.
If the volume of data being retrieved is a small fraction (think < 10%) of the overall data in each record.
Listing the columns individually is a good idea, because it protects code from changes in underlying schema. For instance, if the name of a column is changed, then a query that lists columns explicitly will break with an easy-to-understand error. This is better than a query that runs and produces erroneous results.
You should try not to use select *.
Inefficiency in moving data to the consumer. When you SELECT *, you're often retrieving more columns from the database than your application really needs to function. This causes more data to move from the database server to the client, slowing access and increasing load on your machines, as well as taking more time to travel across the network. This is especially true when someone adds new columns to underlying tables that didn't exist and weren't needed when the original consumers coded their data access.
Indexing issues. Consider a scenario where you want to tune a query to a high level of performance. If you were to use *, and it returned more columns than you actually needed, the server would often have to perform more expensive methods to retrieve your data than it otherwise might. For example, you wouldn't be able to create an index which simply covered the columns in your SELECT list, and even if you did (including all columns [shudder]), the next guy who came around and added a column to the underlying table would cause the optimizer to ignore your optimized covering index, and you'd likely find that the performance of your query would drop substantially for no readily apparent reason.
Binding Problems. When you SELECT *, it's possible to retrieve two columns of the same name from two different tables. This can often crash your data consumer. Imagine a query that joins two tables, both of which contain a column called "ID". How would a consumer know which was which? SELECT * can also confuse views (at least in some versions SQL Server) when underlying table structures change -- the view is not rebuilt, and the data which comes back can be nonsense. And the worst part of it is that you can take care to name your columns whatever you want, but the next guy who comes along might have no way of knowing that he has to worry about adding a column which will collide with your already-developed names.
I got this from this answer.
I believe this topic has already been covered here:
select * vs select column
I believe it covers your concerns as well. Please take a look.
All the column labels and values occupy some space. Sending them to the issuer of the request instead of a subset of the columns means sending more data. More data is sent slower.
If you have columns, like
id, username, password, email, bio, url
and you want to get only the username and password, then
select username, password ...
is quicker than
select * ...
because id, email, bio and url are sent as well for the latter, which makes the response larger. But the main problem with select * is different. It might be the source of inconsistencies if, for some reason the order of the columns changed. Also, it might retrieve data you do not want to retrieve. It is always better to have a whitelist with the columns you actually want to retrieve.

Function in MySQL that operates on multiple columns

Is it possible to create a custom function in MySQL like SUM, MAX, and so on. That accepts multiple columns and do some operation on each row?
The reason I am asking this question is because I tried to do my logic using stored procedure but unfortunatelly couldn't find a way how to select data from table name where the name of the table is input parameter.
Somebody suggested to use dynamic SQL but I can not get the cursor. So my only hope is to use custom defined function.
To make the question more clear here is what I want to do:
I want to calculate the distance of a route where each row in the database table represents coordinates (latitude and longtitude). Unfortunatelly the data I have is really big and if I query the data and do the calculationgs using Java it takes more than half a minute to transfer the data to the web server so I want to do the calculations on the SQL machine.
Select something1, something2 from table_name where table name is a variable
Multiple identically-structured tables (prerequisite for this sort of query) is contrary to the Principle of Orthogonal Design.
Don't do it. At least not without very good reason—with suitable indexes, (tens of) millions of records per table is easily enough for MySQL to handle without any need for partitioning; and even if one does need to partition the data, there are better ways than this manual kludge (which can give rise to ambiguous, potentially inconsistent data and lead to redundancy and complexity in your data manipulation code).

Why shouldn't we use Select * in a mysql query on a production server?

Based on this question here Selecting NOT NULL columns from a table One of the posters said
you shouldn't use SELECT * in production.
My Question: Is it true that we shouldn't use Select * in a mysql query on a production server? If yes, why shouldn't we use select all?
Most people do advise against using SELECT * in production, because it tends to break things. There are a few exceptions though.
SELECT * fetches all columns - while most of the times you don't
need them all. This causes the SQL-server to send more columns than
needed, which is a waste and makes the system slower.
With SELECT *, when you later add a column, the old query will also
select this new column, while typically it will not need it. Naming
the columns explicitly prevents this.
Most people that write SELECT * queries also tend to grab the rows
and use column order to get the columns - which WILL break your code
once columns are injected between existing columns.
Explicitly naming the columns also guarantees they are always in the same order, while SELECT * might behave differently when the table column order is modified.
But there are exceptions, for example statements like these:
INSERT INTO table_history
SELECT * FROM table
A query like that takes rows from table, and inserts them into table_history. If you want this query to keep working when new rows are added to table AND to table_history, SELECT * is the way to go.
Remember that your database server isn't necessarily on the same machine as the program querying the database. The database server could be on a network with limited bandwidth; it could even be halfway across the world.
If you really do need every column, then by all means do SELECT * FROM table.
If you only need certain columns, though, it would waste bandwidth to ask for all columns using SELECT * FROM table only to throw half the columns away.
Other potential reasons it might be good to specify which exact columns you want:
The database structure may change. If your program assumes certain column names, then it may fail if the column names change, for example. Explicitly naming the columns you want to retrieve will make the program fail immediately if your assumptions about the column names are violated.
As #Konerak mentioned, naming the columns you want also ensures that the order of the columns in your result is the same, even if the table schema changes (i.e. inserting one column in-between two others.) This is important if you're depending on FirstName being the [2]nd element of a result.
(Note: a more robust and self-documenting way of dealing with this is to ask for your database results as a list of key-value pairs, like a PHP associative array, Perl hash or a Python dict. That way you never need to use a number to index into the result (name = result[2] ) - instead you can use the column name: name = result["FirstName"].)
Using SELECT * is very inefficient, especially for tables that have a lot of columns. You should only select the columns you need.
Besides this, using column names makes the query easier to read and maintain.

MySql DB Design question

Assuming I have an application where I expect no more than 50,000 users.
I have 7 tables in my DB schema. Is it a good idea to replicate all the
tables for every user? So, in my case, number of tables will roughly be
50,000 * 7 = 350,000.
Is it in anyway better than 7 monolithic tables?
NO, I would not recomend creating a table schema per user in the same database.
mysql should handle the load perfectly well, given the correct indexes on tables.
What you're proposing is horizontal partitoning. http://en.wikipedia.org/wiki/Partition_%28database%29
i.e. you take all the rows in what would (logically) be one table, and you spread those rows across multiple tables, according to some partitioning scheme (in this case, one table per user).
In general, for starter applications of this nature, only consider partitioning (along with a host of other options) when performance starts to suffer.
No, it's definitely not. For two reasons:
The database design should not change just because you store more data in it.
Accessing different tables is more expensive and more complicated than accessing specific data in a single table.
Just image if you wanted statistics for a value from each user. You would have to gather data from 50 000 tables using 50 000 separate queries instead of running a single query against one table.
A table can easily contain millons of records, but i'm not sure that the database can even contain 350 000 tables.
Certainly not. A DB is typically optimized for having multiple rows in a given table... Try to rethink your DB schema to have a field in each of your tables that holds the user's ID or another associative table that holds the user id and key for the particular entry in the data table.
There's a decent intro to DB design here.