I've created a table that holds items according to categories:
+------------+---------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+------+-----+-------------------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL | |
| category | varchar(30) | YES | MUL | NULL | |
| timestamp | timestamp | NO | | CURRENT_TIMESTAMP | |
| data | mediumblob | YES | | NULL | |
+------------+---------------------+------+-----+-------------------+----------------+
Old data is deleted using a sliding window technique, meaning that only the last N items in each category are kept in the table.
How can I keep track the total number of the items per category, and the timestamp of the first item in the category?
Edit - COUNT and MIN on the original table won't work, because this is a Sliding Window data structure meaning that the first items have already been deleted.
Clearly you need to keep a separate table when you delete the records. Your table should summarize the categories and include the fields:
Category first start time
Total number of items in the category
and so on.
When you go to delete, you need to update this table. In general, I prefer to use stored procedures to handle database maintenance, so this code could be added to the stored procedure. Others prefer triggers, so you could have a delete trigger that does the same thing.
try with SELECT count(id) FROM table GROUP BY category
Related
So long story short:
I have table A which might expand in columns in the future. I'd like to write a php pdo prepared select statement with a WHERE clause which applies the where condition to ALL columns on the table. To prevent having to update the query manually if columns are added to the table later on, I'd like to just tell the query to check ALL columns on the table.
Like so:
$fetch = $connection->prepare("SELECT product_name
FROM products_tbl
WHERE _ANYCOLUMN_ = ?
");
Is this possible with mysql?
EDIT:
To clarify what I mean by "having to expand the table" in the future:
MariaDB [foundationtests]> SHOW COLUMNS FROM products_tbl;
+----------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+----------------+
| product_id | int(11) | NO | PRI | NULL | auto_increment |
| product_name | varchar(100) | NO | UNI | NULL | |
| product_manufacturer | varchar(100) | NO | MUL | diverse | |
| product_category | varchar(100) | NO | MUL | diverse | |
+----------------------+--------------+------+-----+---------+----------------+
4 rows in set (0.011 sec)
Here you can see the current table. Basically, products are listed here by their name, and they are accompanied by their manufacturers (say, Bosch) and category (say, drill hammer). Now I want to add another "attribute" to the products, like their price.
In such a case, I'd have to add another column, and then I'd have to specify this new column inside my MySQL queries.
I'm working on "online streaming" project and I need some help in constructing a DB for best performance. Currently I have one table containing all relevant information for the player including file, poster image, post_id etc.
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| post_id | int(11) | YES | | NULL | |
| file | mediumtext | NO | | NULL | |
| thumbs_img | mediumtext | YES | | NULL | |
| thumbs_size | mediumtext | YES | | NULL | |
| thumbs_points | mediumtext | YES | | NULL | |
| poster_img | mediumtext | YES | | NULL | |
| type | int(11) | NO | | NULL | |
| uuid | varchar(40) | YES | | NULL | |
| season | int(11) | YES | | NULL | |
| episode | int(11) | YES | | NULL | |
| comment | text | YES | | NULL | |
| playlistName | text | YES | | NULL | |
| time | varchar(40) | YES | | NULL | |
| mini_poster | mediumtext | YES | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
With 100k records it takes around 0.5 sec for a query and performance constantly degrading as I have more records.
+----------+------------+----------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+----------------------------------------------------------------------+
| 1 | 0.04630675 | SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1' |
+----------+------------+----------------------------------------------------------------------+
explain SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1';
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | dle_playerFiles | ALL | NULL | NULL | NULL | NULL | 61777 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
How can I improve DB structure? How big websites like youtube construct their database?
Generally when query time is directly proportional to the number of rows, that suggests a table scan, which means for a query like
SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1'
The database is executing that literally, as in, iterate over every single row and check if it meets criteria.
The typical solution to this is an index, which is a precomputed list of values for a column (or set of columns) and a list of rows which have said value.
If you create an index on the post_id column on dle_playerFiles, then the index would essentially say
1: <some row pointer>, <some row pointer>, <some row pointer>
2: <some row pointer>, <some row pointer>, <some row pointer>
...
100: <some row pointer>, <some row pointer>, <some row pointer>
...
7000: <some row pointer>, <some row pointer>, <some row pointer>
250000: <some row pointer>, <some row pointer>, <some row pointer>
Therefore, with such an index in place, the above query would simply look at node 7000 of the index and know which rows contain it.
Then the database only needs to read the rows where post_id is 7000 and check if their type is 1.
This will be much quicker because the database never needs to look at every row to handle a query. The costs of an index:
Storage space - this is more data and it has to be stored somewhere
Update time - databases keep indexes in sync with changes to the table automatically, which means that INSERT, UPDATE and DELETE statements will take longer because they need to update the data. For small and efficient indexes, this tradeoff is usually worth it.
For your query, I recommend you create an index on 2 columns. Make them part of the same index, not 2 separate indexes:
create index ix_dle_playerFiles__post_id_type on dle_playerFiles (post_id, type)
Caveats to this working efficiently:
SELECT * is bad here. If you are returning every column, then the database must go to the table to read the columns because the index only contains the columns for filtering. If you really only need one or two of the columns, specify them explicitly in the SELECT clause and add them to your index. Do NOT do this for many columns as it just bloats the index.
Functions and type conversions tend to prevent index usage. Your SQL wraps the integer types post_id and type in quotes so they are interpreted as strings. The database may feel that an index can't be used because it has to convert everything. Remove the quotes for good measure.
If I read your Duration correctly, it appears to take 0.04630675 (seconds?) to run your query, not 0.5s.
Regardless, proper indexing can decrease the time required to return query results. Based on your query SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1', an index on post_id and type would be advisable.
Also, if you don't absolutely require all the fields to be returned, use individual column references of the fields you require instead of the *. The fewer fields, the quicker the query will return.
Another way to optimize a query is to ensure that you use the smallest data types possible - especially in primary/foreign key and index fields. Never use a bigint or an int when a mediumint, smallint or better still, a tinyint will do. Never, ever use a text field in a PK or FK unless you have no other choice (this one is a DB design sin that is committed far too often IMO, even by people with enough training and experience to know better) - you're far better off using the smallest exact numeric type possible. All this has positive impacts on storage size too.
I have a string in database (mysql) which is like:
{"StateId":73,"CallTime":"\/Date(1336365498912+0500)\/","CallId":"1336365489.14157","Target":"agi://127.0.0.1"}},"Profile":{"$type":"DataWriter.DbProfile, DataWriterObjects","Name":"DataService","Provider":"mssql","ConnectionString":"Data Source=localhost\\mydb; Database=mydb; User Id=sa; Password=admin;"}}
The string is a JSON object which contains multiple fields. The problem is that I have multiple duplicate rows which I want to remove from the database. A row is considered a duplicate if the CallId and StateId is same but the CallTime is different. So first I want to get list of the duplicates (GROUP BY) of those rows which have CallId same and ignore the difference in CallTime. The below record has different CallTime from the first one but same CallId, hence it is considered a duplicate (basically need not to consider CallTime for duplicate)
{"StateId":73,"CallTime":"\/Date(1336365498913+0500)\/","CallId":"1336365489.14157","Target":"agi://127.0.0.1"}},"Profile":{"$type":"DataWriter.DbProfile, DataWriterObjects","Name":"DataService","Provider":"mssql","ConnectionString":"Data Source=localhost\\mydb; Database=mydb; User Id=sa; Password=admin;"}}
So how do I do a GROUP BY? Basically everything in the GROUP BY should be matched ignoring the CallTime value.
The table structure is
mysql> describe Statements;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| SequenceId | bigint(10) | NO | PRI | NULL | auto_increment |
| Profile | varchar(32) | YES | MUL | NULL | |
| CacheItem | text | NO | | NULL | |
+------------+-------------+------+-----+---------+----------------+
After that I want to delete the duplicates. Anyone help me out?
I think your database is not atomic enough, you may have to split out your JSON string into separate fields
I have a MySQL table contacts, with structure as follows
+--------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| contactee_id | int(11) | NO | MUL | 0 | |
| contacter_id | int(11) | NO | MUL | 0 | |
+--------------+----------+------+-----+---------+----------------+
contactee_id and contacter_id are both ids, which together defines a relationship between two users. In order to calculate the count of relations, a user have, I have the following query
INSERT INTO followers (id, followers)
SELECT contactee_id, 1
FROM contacts
ON DUPLICATE KEY
UPDATE followers = followers + 1
The problem with this query is that it locks the contacts table for too long (more than 16 minutes). I want to get it done in batches, so that the SQL does not locks contacts table for too long. Few ways, I thought of, but they all need to lock the entire table. Is there a way this could be done?
If you just want the count of relations use the count and group by together like
SELECT contactee_id,count(contacter_id) FROM contacts group by contactee_id;
This will give you all the contactee_id and the number of contacter_id's for each contactee
Run query for some records and then save the id of the last record in a table or filesystem, start next query from that id and update it every cycle.
When the customer place an order, the item_id and option_id are stored in the order_items table, from there it will generate invoice for the customer. However the price of the item always change every few months and it will affect old invoices information.
What are the solution to fix this problem? I do not want to store the price and item name in the order_items table.
I have read the possible solution is to create history_prices table (audit system via trigger or SQL insert query manually via php?), is Audit best solution or is there any other solution?
Can you provide example how do I create history_prices table, so when I change the price from item_options.option_price - it will be stored into history_prices table?
Right now I have over 200,000 rows in item_options table, do I need to copy the prices into history_prices?
I need an efficient way so the invoices will not be affected from the new price change.
item_options table:
mysql> desc item_options;
+---------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+----------------+
| option_id | int(11) | NO | PRI | NULL | auto_increment |
| item_id | int(11) | YES | MUL | NULL | |
| option_name | varchar(100) | YES | | NULL | |
| option_price | int(11) | YES | | NULL | |
+---------------+--------------+------+-----+---------+----------------+
order_items table:
mysql> desc order_items;
+----------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------+------+-----+---------+----------------+
| order_items_id | int(11) | NO | PRI | NULL | auto_increment |
| order_id | int(11) | NO | | NULL | |
| item_id | int(11) | NO | | NULL | |
| option_id | int(11) | NO | | NULL | |
+----------------+---------+------+-----+---------+----------------+
Check the following designs out:
Design 1: Stores a rolling history of changes to the item (New row if anything changes: name, description, price).
Design 2: New row on Price change only.
Alternatively, you can store the price with the order itself.
This is a matter of opinion, some will agree that a historical price lookup will be ok, my opinion is that it is not.
The problem with looking up a history of prices and determining the invoice price from that is there is plenty of room for error. You will have several pieces of logic used to determine the right price, all of which are prone to errors. You could forget to convert the time zone of the invoice, and this could cause it to be on the wrong side of a price change. You could forget to make any applied discounts or coupon codes date sensitive, etc. What about ever changing shipping charges?
It is best to store the actual invoice price with the invoice itself. Disk space is cheap, use the redundancy to sleep better at night.
The best thing you can do is creating a a column in order_items for the price. And that's also the most straightforward.
If you want to create a table with price history for reporting use, that might be fine. But do not give yourself the painful headache of querying the price history just to get some items' price. The price IS an attribute of the item. The price might change due to promotion, discount, special offer, etc.