Is multiple field index in MySQL a good choice? - mysql

I have a huge data set. The structure looks something like this:
K_Field1, K_Field2, K_Field3, K_Field4, D_Field5, D_Field6, D_Field7, D_field8
The problem is that only the first 4 field (K_Field1,K_Field2,K_Field3,K_Field4) together identify a row uniquely. I created one table, using these fields as its fields.
Let's say I have 1 million rows in the table using that structure. If I import a new record, I have to decide if it's already in the database. If it is, then I have to update it, if not, then I need to insert a new row.
To do that, I need to put a multiple field index on the first 4 columns, which is - I'm afraid - not the best solution. Is there a better database structure to store and search in that data, or I have to live with the four-fielded index?
I'm using MySQL

There's nothing wrong with creating an index on the first 4 columns, in fact you should:
create unique index mytable_key on mytable(K_Field1,K_Field2,K_Field3,K_Field4);
because that is the reality of your situation.
It is also the "correct" solution.

Related

Asking opinion about table structure

I'm working on a project to make a digital form of this paper
this paper (can't post image)
and the data will displayed on a Web in a simple table view. There will be NO altering, deleting, updating. It's just displaying (via SELECT * of course) the data inputted.
The data will be inserted via android app and stored in a single table which has 30 columns in mysql.
and the question is, is it a good idea if i use a single table? because i think there will be no complex operation in the sql.
and the other question is, am i violating some rules for this method?
I need your opinion. thanks.
It's totally ok to use only one table, if that suits your needs. What you can do to make the database a little bit 'smarter' is add new tables for attributes in your paper that will be repeated. So, for example, the Soil Type could be another table where there are two columns, ID and Description, and you will use it as a foreign key in each record in the main table. You need this if you want your database to be in 3NF.
To sum up, yes you can have one table if that's all you need. However, adding more tables might help save some space and make your database more flexible. It's up to you to decide! :)

One column or separate columns for extra data - mysql

I was thinking what if I have a table with columns for meta_description (varchar 300), meta_tags (varchar 300), and meta_title (varchar 200)... can I "join" all this columns just into one column "extra_information" (longtext) and save here the same information but maybe in JSON format?
Is this convenient or not and why :)?
This fields are not very important for me, I will never make any query to search or sort the results trough this information. The metatags for example are only a comma separated text I don't need to do some kind of relation table on this.
What I want to know is this will save space on my database or will be working a little bit faster, or things like that... But if you tell me that have 5 columns instead of just one is the same for MySQL of course I will have the 5 columns...
Thanks a lot!
The answer boils down on: Does MySQL have to work with your data?
If all date is concatenated in one column, be it as JSON or comma-seperated or what not, it is nearly off limits for any MySQL operation. You can surely SELECT it, but it is very hard to search, group or sort by anything inside that column. So, it you are absolutly sure MySQL soes never have to see the data itself and will only return some column with data in it, go for it.
Benefits are that the table structure does not have to be changed because your data changes. and column structure is very clean
if you need to filter, sort, group or do whatever operation on it within a SQL query, leave it in seperate columns.

Optimized SELECT query in MySQL

I have a very large number of rows in my table, table_1. Sometimes I just need to retrieve a particular row.
I assume, when I use SELECT query with WHERE clause, it loops through the very first row until it matches my requirement.
Is there any way to make the query jump to a particular row and then start from that row?
Example:
Suppose there are 50,000,000 rows and the id which I want to search for is 53750. What I need is: the search can start from 50000 so that it can save time for searching 49999 rows.
I don't know the exact term since I am not expert of SQL!
You need to create an index : http://dev.mysql.com/doc/refman/5.1/en/create-index.html
ALTER TABLE_1 ADD UNIQUE INDEX (ID);
The way I understand it, you want to select a row with id 53750. If you have a field named id you could do this:
SELECT * FROM table_1 WHERE id = 53750
Along with indexing the id field. That's the fastest way to do so. As far as I know.
ALTER table_1 ADD UNIQUE INDEX (<collumn>)
Would be a great first step if it has not been generated automatically. You can also use:
EXPLAIN <your query here>
To see which kind of query works best in this case. Note that if you want to change the where statement (anywhere in the future) but see a returning value in there it will be a good idea to put an index on that aswell.
Create an index on the column you want to do the SELECT on:
CREATE INDEX index_1 ON table_1 (id);
Then, select the row just like you would before.
But also, please read up on databases, database design and optimization. Your question is full of false assumptions. Don't just copy and paste our answers verbatim. Get educated!
There are several things to know about optimizing select queries like Range and Where clause Optimization, the documentation is pretty informative baout this issue, read the section: Optimizing SELECT Statements. Creating an index on the column you evaluate is very helpfull regarding performance too.
One possible solution You can create View then query from view. here is details of creating view and obtain data from view
http://www.w3schools.com/sql/sql_view.asp
now you just split that huge number of rows into many view (i. e row 1-10000 in one view then 10001-20000 another view )
then query from view.
I am pretty sure that any SQL database with a little respect for themselves does not start looping from the first row to get the desired row. But I am also not sure how they makes it work, so I can't give an exact answer.
You could check out what's in your WHERE-clause and how the table is indexed. Do you have a proper primary key? Like using a numeric data type for that. Do you have indexes on more columns, that is used in your queries?
There is also alot to concider when installing the database server, like where to put the data and log files, how much memory to give the server and setting the growth. There's a lot you can do to tune your server.
You could try and split your tables in partitions
More about alter tables to add partitions
Selecting from a specific partition
In your case you could create a partition on ID for every 50.000 rows and when you want to skip the first 50.000 you just select from partition 2. How to do this ies explained quite well in the MySQL documentation.
You may try simple as this one.
query = "SELECT * FROM tblname LIMIT 50000,0
i just tried it with phpmyadmin. WHERE the "50,000" is the starting row to look up.
EDIT :
But if i we're you i wouldn't use this one, because it will lapses the 1 - 49999 records to search.

Create columns that autoincrement with name in a MySQL Database

I know that this might seem like a strange question, but let me try and explain it. I have a database table called 'plan' and in it the first column is called 'username' and the columns after it are called 'question1', 'question2' and so on. I now need to add a hundred or so more columns named like this, but it would be nice to have a sql statement that would automatically do that for me.
I know this wasn't set up in the best way, but if you have a solution, please let me know :)
There isn't any SQL command or feature that would do this automatically; sure you can generate the alter table statements and add the columns programmatically; however, your design would be terribly flawed.
Instead of adding columns, you should create a table containing the question, the user_id (or username, whatever is the PK) to hold the records. If you need to identify a question by number (or ID), simply add another column called question_id.
Write the query in sql to excel. Seperate the incrementing number. Drag down until excel row 100. Hard to explain but i guess you ll figure it out. You'll have 100 incrementing add column sql statements. copy paste run it on a query tool.

adding data to interrelated tables..easier way?

I am a bit rusty with mysql and trying to jump in again..So sorry if this is too easy of a question.
I basically created a data model that has a table called "Master" with required fields of a name and an IDcode and a then a "Details" table with a foreign key of IDcode.
Now here's where its getting tricky..I am entering:
INSERT INTO Details (Name, UpdateDate) Values (name, updateDate)
I get an error: saying IDcode on details doesn't have a default value..so I add one then it complains that Field 'Master_IDcode' doesn't have a default value
It all makes sense but I'm wondering if there's any easy way to do what I am trying to do. I want to add data into details and if no IDcode exists, I want to add an entry into the master table. The problem is I have to first add the name to the fund Master..wait for a unique ID to be generated(for IDcode) then figure that out and add it to my query when I enter the master data. As you can imagine the queries are going to probably get quite long since I have many tables.
Is there an easier way? where everytime I add something it searches by name if a foreign key exists and if not it adds it on all the tables that its linked to? Is there a standard way people do this? I can't imagine with all the complex databases out there people have not figured out a more easier way.
Sorry if this question doesn't make sense. I can add more information if needed.
p.s. this maybe a different question but I have heard of Django for python and that it helps creates queries..would it help my situation?
Thanks so much in advance :-)
(decided to expand on the comments above and put it into an answer)
I suggest creating a set of staging tables in your database (one for each data set/file).
Then use LOAD DATA INFILE (or insert the rows in batches) into those staging tables.
Make sure you drop indexes before the load, and re-create what you need after the data is loaded.
You can then make a single pass over the staging table to create the missing master records. For example, let's say that one of your staging table contains a country code that should be used as a masterID. You could add the master record by doing something along the lines of:
insert
into master_table(country_code)
select distinct s.country_code
from staging_table s
left join master_table m on(s.country_code = m.country_code)
where m.country_code is null;
Then you can proceed and insert the rows into the "real" tables, knowing that all detail rows references a valid master record.
If you need to get reference information along with the data (such as translating some code) you can do this with a simple join. Also, if you want to filter rows by some other table this is now also very easy.
insert
into real_table_x(
key
,colA
,colB
,colC
,computed_column_not_present_in_staging_table
,understandableCode
)
select x.key
,x.colA
,x.colB
,x.colC
,(x.colA + x.colB) / x.colC
,c.understandableCode
from staging_table_x x
join code_translation c on(x.strange_code = c.strange_code);
This approach is a very efficient one and it scales very nicely. Variations of the above are commonly used in the ETL part of data warehouses to load massive amounts of data.
One caveat with MySQL is that it doesn't support hash joins, which is a join mechanism very suitable to fully join two tables. MySQL uses nested loops instead, which mean that you need to index the join columns very carefully.
InnoDB tables with their clustering feature on the primary key can help to make this a bit more efficient.
One last point. When you have the staging data inside the database, it is easy to add some analysis of the data and put aside "bad" rows in a separate table. You can then inspect the data using SQL instead of wading through csv files in yuor editor.
I don't think there's one-step way to do this.
What I do is issue a
INSERT IGNORE (..) values (..)
to the master table, wich will either create the row if it doesn't exist, or do nothing, and then issue a
SELECT id FROM master where someUniqueAttribute = ..
The other option would be stored procedures/triggers, but they are still pretty new in MySQL and I doubt wether this would help performance.