Fetching/Inserting huge chunks of data from/to a large table - mysql

We have an Online Judge (something similar to SPOJ.pl) where we conduct these 3 hour long contests during the weekends, by the end of which we have close to around 1000 submissions. And we store all these runs on a single table (which includes the submitted code). The present structure of the table is as follows :
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| rid | int(11) | NO | PRI | NULL | auto_increment |
| pid | int(11) | YES | | NULL | |
| tid | int(11) | YES | | NULL | |
| language | tinytext | YES | | NULL | |
| name | tinytext | YES | | NULL | |
| code | longtext | YES | | NULL | |
| time | tinytext | YES | | NULL | |
| result | tinytext | YES | | NULL | |
| error | text | YES | | NULL | |
| access | tinytext | YES | | NULL | |
| submittime | int(11) | YES | | NULL | |
| output | longtext | YES | | NULL | |
+------------+----------+------+-----+---------+----------------+
Now the problem is that, every time we use the ORDER BY clause while querying within this, it ends up sorting the whole table. And in case of more than 1000 rows, each with a considerable amount of data, the time taken is significant. Please note that this is after OPTIMIZE-ing the tables at regular intervals say there have been changes made to the submissions. We do have two options :
Split up the tables after say around 100 entries.
Store the huge chunks of data (the submitted code) as files instead of inserting them as values into the table to reduce the overhead.
Is there another alternative/workaround to this were we can actually maintain the table structure as it is? I could really use some help here. Thanks.

My recommendation would be to do something called vertical partitioning: split the table into multiple tables, with different columns.
In this case, I would have one table that has all the small data: rid, pid, tid, language, name, time, result, access, submittime.
A second table would have: rid, code, error, output.
This way, you can do the sort on the first table and then join in the other fields after the sort. I put code, error, and output together since they sort of seem to go together.

Related

Having more than one value inserted into the same column

I have a webapp that I'm building. This webapp will take as input some products (cars, motos, boats, houses, etc...) and each product will have one or more photos associated with it. The id of each of photo is generated by the uniqid() function of php.
My problem is:I can't seem to fit more than two id_photos into the same column
+-----------+------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------------------------------+------+-----+---------+----------------+
| carid | int(11) | NO | MUL | NULL | auto_increment |
| brand | enum('Alfa Romeo','Aston Martin','Audi') | NO | | NULL | |
| color | varchar(20) | NO | | NULL | |
| type | enum('gasoline','diesel','eletric') | YES | | NULL | |
| price | mediumint(8) unsigned | YES | | NULL | |
| mileage | mediumint(8) unsigned | YES | | NULL | |
| model | text | YES | | NULL | |
| year | year(4) | YES | | NULL | |
| id_photos | varchar(30) | YES | | NULL | |
+-----------+------------------------------------------+------+-----+---------+----------------+
What I would like to happen is something like this: INSERT INTO cars(id_photos) values ('id_1st_photo', 'id_2nd_photo')
Ending up having something like this:
| 60 | Audi | Yellow | diesel | 252352 | 1234112 | R8 | 1990 | id_1st_photo id_2nd_photo |
Eventually I would have to grab those photos from the folders they are in which is something like this: /var/www/website/$login/photos/id_of_photo with the query select id_photos from cars where carid=$id.
You may found some data types that is not proprelly good for the data that the server will receive but I'm one week into mysql and I'll worry about data types later on.
First of all I don't know if that is possible, if it's not how can I design something to work like that?
I have found this question that is quite the same of mine but I can't seem to implement something like this: add multiple values in one column
You can insert the concatenated values into a field. But it is not a good practice. You can create another table with foreign key having the id of the parent table.
You can easily adapt the approach in the linked question and even remove one table needed:
You first table stays almost the same, but has the id_photos column removed:
+-----------+------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------------------------------+------+-----+---------+----------------+
| carid | int(11) | NO | MUL | NULL | auto_increment |
| brand | enum('Alfa Romeo','Aston Martin','Audi') | NO | | NULL | |
| color | varchar(20) | NO | | NULL | |
| type | enum('gasoline','diesel','eletric') | YES | | NULL | |
| price | mediumint(8) unsigned | YES | | NULL | |
| mileage | mediumint(8) unsigned | YES | | NULL | |
| model | text | YES | | NULL | |
| year | year(4) | YES | | NULL | |
+-----------+------------------------------------------+------+-----+---------+----------------+
Then you'll add a second table to store the links to the photo ids:
+-----------+------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------------------------------+------+-----+---------+----------------+
| carid | int(11) | NO | MUL | NULL | |
| id_photos | varchar(30) | NO | | NULL | |
+-----------+------------------------------------------+------+-----+---------+----------------+
Both tables are linked by the field carid (You should even make carid in the second table a foreign key pointing to the one in the first table).
Each id_photos then results in a new row in the second table.
To query the data you probably need a JOIN between both tables and maybe a GROUP BY to reduce the result to one row per carid again, but this depends on the other usecases.
You can insert the string formatted woth multiple photo name
INSERT INTO cars(id_photos) values ('id_1st_photo, id_2nd_photo')
In this way you don'have a well normalized database structure so you have problem when retrive the singole foto name ..
i suggest you of normalize the id_photo column in a separata table with reference to the master table and in this way store each single photo in one row

How to simplify my SQL requests?

I have two tables here. One is Items and other is Parts.
Items have a part_id and Parts have an item_id.
When a user press on the submit button from the ItemDetail view, data are sent to the server and inserted into those two tables.
Here is how my code works :
Insert to Items table first and get the id of new Item data
Insert to Parts table with this item_id and other Part data
Update to Items table using new part_id
But can I write those three SQL requests in just one request ?
Here is the structure of my tables:
Items
Field | Type | Null | Key | Default | Extra |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(255) | YES | | NULL | |
| price | int(11) | YES | | NULL | |
| part_id | int(10) unsigned | YES | | NULL | |
| type | varchar(255) | YES | | NULL | |
Parts
Field | Type | Null | Key | Default | Extra |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| item_id | int(10) unsigned | NO | | NULL | |
| name | varchar(255) | NO | | NULL | |
| number | varchar(255) | YES | | NULL | |
You shouldn't have 2 tables pointing to each other like this, only one of the tables should have a foreign key, not both.
Then what you are looking for is this: http://dev.mysql.com/doc/refman/5.7/en/commit.html
Transactions make sure that either all queries are executed, or if there is an error somewhere all changes will be reverted.
Looking at the logic you are using, you are doing it correctly.
As they are two separate tables you will need to do two separate insert statements in SQL. Of course you can use a stored procedure so that you only need to call one item in your code and the SP will do two inserts.
A question here is what code are you using? If you are using something like entity framework and your relationships are defined between your elements such as
Items
-Field 1
-Parts (FK) List<Parts>
That would work, but looking at what you have tagged I am guessing your not using a ASP language?? If you are let me know and I may have a better solution for you.

Database schema: Key/Value table or all keys in one record

I guess that this is somewhat of a philosophical question. I need to collect pathology results for a group of patients and store them in a database. In the past I have used a very simple table structure (simplified):
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| Updated | datetime | NO | PRI | NULL | |
| PatientId | varchar(255) | NO | | NULL | |
| Name | varchar(255) | NO | | NULL | |
| Value | varchar(255) | NO | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
More often in schema design I see:
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| PatientId | varchar(255) | NO | | NULL | |
| Ph_Value | varchar(255) | NO | | NULL | |
| K_Value | varchar(255) | NO | | NULL | |
| Ca_Value | varchar(255) | NO | | NULL | |
| Ph_Value_updated | datetime | NO | | NULL | |
| K_Value_updated | datetime | NO | | NULL | |
| Ca_Value_updated | datetime | NO | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
It seems to me that the first design is much more flexible, expandable etc. However, I do wonder about performance hits when the records run to the millions.
The issue with the second is that there may be a couple of hundred fields that need to be recorded on occasions.
I would be really interested to get comments / advice / guidance on this.
You are absolutely right, the first schema is a lot more flexible: you can add new keys on a live database without changing the schema. However, flexibility is usually bought with the time and/or the space. In this case, it's both: you need more space to store all keys for the same row because the ID is replicated N times, and the joins or orderings required to get the fields together would take time.
There is no reason to pay for flexibility unless you need it. If most of your queries need most of the columns, the second result is the most economical. However, if most of your queries ask for a single column, getting the flexibility may be worth spending the CPU time and the database space.
In my opinion, If that name/value pairs won't be changed much so the second option is much better in the terms of space and number of rows.
Also you can have another solution to optimize the first schema , to put the names in another table and just put name_id instead of repeating the same name several times.
The other schema is to have patient table and a table for each value that contains patient_id and value and the table name is the name for that value

mysql time field conditions for expiration

I have table with auction lots:
mysql> desc lots;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(10) | NO | PRI | NULL | auto_increment |
| account_id | int(10) | YES | | NULL | |
| item_id | int(10) | YES | | NULL | |
| bid | int(10) | YES | | NULL | |
| buyout | int(10) | YES | | NULL | |
| leader | int(13) | YES | | NULL | |
| left | int(10) | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
left - field in unix timestamp format like '1391143424'. This shows the timestamp after which the lot should be expired.
I need to develop efficient algorithm for:
-when customer click on SEARCH button, the result will show only non-expired lots
-once lot expired it should be automatically put into completed_lots table
I have some ideas, but want to verify if there is any better approaches:
1: add additional search criteria like:
WHERE '$currtime' <= left;
2: setup some asynchronous tasks using crontab that will check lots table for expired lots every 5 minutes and move it to completed_lots table.
Are there any better ways? maybe using another mysql filed types for unix time, or create some mysql functions that will do that job automatically?

In mysql can I have a composite primary key composed of an auto increment and another field? Also, please critique my "mysql partitioning" logic

I am experimenting with mysql partitioning ( splitting the table up to help it scale better ), and I am having a problem with the keys on the table. First, I am using a python's threaded comments module... here is the schema
+-----------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| content_type_id | int(11) | NO | MUL | NULL | |
| object_id | int(10) unsigned | NO | | NULL | |
| parent_id | int(11) | YES | MUL | NULL | |
| user_id | int(11) | NO | MUL | NULL | |
| date_submitted | datetime | NO | | NULL | |
| date_modified | datetime | NO | | NULL | |
| date_approved | datetime | YES | | NULL | |
| comment | longtext | NO | | NULL | |
| markup | int(11) | YES | | NULL | |
| is_public | tinyint(1) | NO | | NULL | |
| is_approved | tinyint(1) | NO | | NULL | |
| ip_address | char(15) | YES | | NULL | |
| id | int(11) | YES | | NULL | |
+-----------------+------------------+------+-----+---------+-------+
Note, I have modified this database by dropping the id col (primary by default), and re adding it.
Essentially, I want to have id AND content_type_id as my primary keys. I also want id to auto increment. Is this possible.
Second question. Since I am just learning about mysql partitioning, I am wondering if my partitioning logic is sound. There are 67 different content_types, and some (maybe all) of those content types allow comments to be made on them. My plan is to partition based on the type of object that is being commented on. For instance, the images will be commented on a lot, so I put any content type pertaining to images into one partition, and another content type that can be commented on is "blog entries", so there is a separate partition for that, and so on and so on. This will allow me to spread these partitions possibly to dedicated machines as the load grows. How is my understanding of this concept so far?
Thanks so much!
Since id will be auto incremented, it can be the primary key all by itself. Adding content_type to the primary key would not gain you anything in regards to the uniqueness of the key.
If you want to add an index for faster performance to the 2 columns, then add an alternate unique index to the table with the 2 columns instead of trying to add them both to the primary key. However, be aware that enforing uniqueness on the 2 columns would be a waste since id is already guaranteed to be unique by itself, so a regular index would make more sense if needed.