Delete duplicate rows GROUP BY with LIKE - mysql

I have a string in database (mysql) which is like:
{"StateId":73,"CallTime":"\/Date(1336365498912+0500)\/","CallId":"1336365489.14157","Target":"agi://127.0.0.1"}},"Profile":{"$type":"DataWriter.DbProfile, DataWriterObjects","Name":"DataService","Provider":"mssql","ConnectionString":"Data Source=localhost\\mydb; Database=mydb; User Id=sa; Password=admin;"}}
The string is a JSON object which contains multiple fields. The problem is that I have multiple duplicate rows which I want to remove from the database. A row is considered a duplicate if the CallId and StateId is same but the CallTime is different. So first I want to get list of the duplicates (GROUP BY) of those rows which have CallId same and ignore the difference in CallTime. The below record has different CallTime from the first one but same CallId, hence it is considered a duplicate (basically need not to consider CallTime for duplicate)
{"StateId":73,"CallTime":"\/Date(1336365498913+0500)\/","CallId":"1336365489.14157","Target":"agi://127.0.0.1"}},"Profile":{"$type":"DataWriter.DbProfile, DataWriterObjects","Name":"DataService","Provider":"mssql","ConnectionString":"Data Source=localhost\\mydb; Database=mydb; User Id=sa; Password=admin;"}}
So how do I do a GROUP BY? Basically everything in the GROUP BY should be matched ignoring the CallTime value.
The table structure is
mysql> describe Statements;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| SequenceId | bigint(10) | NO | PRI | NULL | auto_increment |
| Profile | varchar(32) | YES | MUL | NULL | |
| CacheItem | text | NO | | NULL | |
+------------+-------------+------+-----+---------+----------------+
After that I want to delete the duplicates. Anyone help me out?

I think your database is not atomic enough, you may have to split out your JSON string into separate fields

Related

MySQL: WHERE Condition against all Columns without specifying them

So long story short:
I have table A which might expand in columns in the future. I'd like to write a php pdo prepared select statement with a WHERE clause which applies the where condition to ALL columns on the table. To prevent having to update the query manually if columns are added to the table later on, I'd like to just tell the query to check ALL columns on the table.
Like so:
$fetch = $connection->prepare("SELECT product_name
FROM products_tbl
WHERE _ANYCOLUMN_ = ?
");
Is this possible with mysql?
EDIT:
To clarify what I mean by "having to expand the table" in the future:
MariaDB [foundationtests]> SHOW COLUMNS FROM products_tbl;
+----------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+----------------+
| product_id | int(11) | NO | PRI | NULL | auto_increment |
| product_name | varchar(100) | NO | UNI | NULL | |
| product_manufacturer | varchar(100) | NO | MUL | diverse | |
| product_category | varchar(100) | NO | MUL | diverse | |
+----------------------+--------------+------+-----+---------+----------------+
4 rows in set (0.011 sec)
Here you can see the current table. Basically, products are listed here by their name, and they are accompanied by their manufacturers (say, Bosch) and category (say, drill hammer). Now I want to add another "attribute" to the products, like their price.
In such a case, I'd have to add another column, and then I'd have to specify this new column inside my MySQL queries.

MySQL - insert rows in one table and then update another with the auto increment ID

I have 2 tables called applications and filters. The structure of the tables are as follows:
mysql> DESCRIBE applications;
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| id | tinyint(3) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
| filter_id | int(3) | NO | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)
mysql> DESCRIBE filters;
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(100) | NO | | NULL | |
| label | varchar(255) | NO | | NULL | |
| link | varchar(255) | NO | | NULL | |
| anchor | varchar(100) | NO | | NULL | |
| group_id | tinyint(3) unsigned | NO | MUL | NULL | |
| comment | varchar(255) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
7 rows in set (0.02 sec)
What I want to do is select all the records in applications and make a corresponding record in filters (so that filters.name is the same as applications.name). When the record is inserted in filters I want to get the primary key (filters.id) of the newly inserted record - which is an auto increment field - and update applications.filter_id with it. I should clarify that applications.filter_id is a field I've created for this purpose and contains no data at the moment.
I am a PHP developer and have written a script which can do this, but want to know if it's possible with a pure MySQL solution. In pseudo-code the way my script works is as follows:
Select all the records in applications
Do a foreach loop on (1)
Insert a record in filters (filters.name == applications.name)
Store the inserted ID (filters.id) to a variable and then update applications.filter_id with the variable's data.
I'm unaware of how to do the looping (2) and storing the auto increment ID (4) in MySQL.
I have read about Get the new record primary key ID from mysql insert query? so am aware of LAST_INSERT_ID() but not sure how to reference this in some kind of "loop" which goes through each of the applications records.
Please can someone advise if this is possible?
I don't think this is possible to do this with only one request to mysql.
But, i think this is a good use case for mysql triggers.
I think you should write it like this :
CREATE TRIGGER after_insert_create_application_filter AFTER INSERT
ON applications FOR EACH ROW
BEGIN
INSERT INTO filters (name) VALUES (NEW.name);
UPDATE applications SET filter_id = LAST_INSERT_ID() WHERE id = NEW.id;
END
This trigger is not tested but you should understand the way to write it.
If you don't know mysql triggers, you can read this part of the documentation.
This isn't an answer to your question, more a comment on your database design.
First of all, if the name field needs to contain the same information, they should be the same type and size (varchar(255))
Overall though, I think the schema you're using for your tables is wrong. Your description says that each record in applications can only hold one filter_id. If that is the case, there's no point in using two separate tables.
If there is a chance that there will be a many-to-one relationship, link the records via the relevant primary key. If multiple records in application can relate to a single filter, store filters.id in the applications table. If there are multiple filters for a single application, store applications.id in the filters table.
If there is a many-to-many relationship, create another table to store it:
CREATE TABLE `application_filters_mappings` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`application_id` int(10) unsigned NOT NULL,
`filters_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
);

Constructing a DB for best performance

I'm working on "online streaming" project and I need some help in constructing a DB for best performance. Currently I have one table containing all relevant information for the player including file, poster image, post_id etc.
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| post_id | int(11) | YES | | NULL | |
| file | mediumtext | NO | | NULL | |
| thumbs_img | mediumtext | YES | | NULL | |
| thumbs_size | mediumtext | YES | | NULL | |
| thumbs_points | mediumtext | YES | | NULL | |
| poster_img | mediumtext | YES | | NULL | |
| type | int(11) | NO | | NULL | |
| uuid | varchar(40) | YES | | NULL | |
| season | int(11) | YES | | NULL | |
| episode | int(11) | YES | | NULL | |
| comment | text | YES | | NULL | |
| playlistName | text | YES | | NULL | |
| time | varchar(40) | YES | | NULL | |
| mini_poster | mediumtext | YES | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
With 100k records it takes around 0.5 sec for a query and performance constantly degrading as I have more records.
+----------+------------+----------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+----------------------------------------------------------------------+
| 1 | 0.04630675 | SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1' |
+----------+------------+----------------------------------------------------------------------+
explain SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1';
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | dle_playerFiles | ALL | NULL | NULL | NULL | NULL | 61777 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
How can I improve DB structure? How big websites like youtube construct their database?
Generally when query time is directly proportional to the number of rows, that suggests a table scan, which means for a query like
SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1'
The database is executing that literally, as in, iterate over every single row and check if it meets criteria.
The typical solution to this is an index, which is a precomputed list of values for a column (or set of columns) and a list of rows which have said value.
If you create an index on the post_id column on dle_playerFiles, then the index would essentially say
1: <some row pointer>, <some row pointer>, <some row pointer>
2: <some row pointer>, <some row pointer>, <some row pointer>
...
100: <some row pointer>, <some row pointer>, <some row pointer>
...
7000: <some row pointer>, <some row pointer>, <some row pointer>
250000: <some row pointer>, <some row pointer>, <some row pointer>
Therefore, with such an index in place, the above query would simply look at node 7000 of the index and know which rows contain it.
Then the database only needs to read the rows where post_id is 7000 and check if their type is 1.
This will be much quicker because the database never needs to look at every row to handle a query. The costs of an index:
Storage space - this is more data and it has to be stored somewhere
Update time - databases keep indexes in sync with changes to the table automatically, which means that INSERT, UPDATE and DELETE statements will take longer because they need to update the data. For small and efficient indexes, this tradeoff is usually worth it.
For your query, I recommend you create an index on 2 columns. Make them part of the same index, not 2 separate indexes:
create index ix_dle_playerFiles__post_id_type on dle_playerFiles (post_id, type)
Caveats to this working efficiently:
SELECT * is bad here. If you are returning every column, then the database must go to the table to read the columns because the index only contains the columns for filtering. If you really only need one or two of the columns, specify them explicitly in the SELECT clause and add them to your index. Do NOT do this for many columns as it just bloats the index.
Functions and type conversions tend to prevent index usage. Your SQL wraps the integer types post_id and type in quotes so they are interpreted as strings. The database may feel that an index can't be used because it has to convert everything. Remove the quotes for good measure.
If I read your Duration correctly, it appears to take 0.04630675 (seconds?) to run your query, not 0.5s.
Regardless, proper indexing can decrease the time required to return query results. Based on your query SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1', an index on post_id and type would be advisable.
Also, if you don't absolutely require all the fields to be returned, use individual column references of the fields you require instead of the *. The fewer fields, the quicker the query will return.
Another way to optimize a query is to ensure that you use the smallest data types possible - especially in primary/foreign key and index fields. Never use a bigint or an int when a mediumint, smallint or better still, a tinyint will do. Never, ever use a text field in a PK or FK unless you have no other choice (this one is a DB design sin that is committed far too often IMO, even by people with enough training and experience to know better) - you're far better off using the smallest exact numeric type possible. All this has positive impacts on storage size too.

MySQL: Count items by categories

I've created a table that holds items according to categories:
+------------+---------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+------+-----+-------------------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL | |
| category | varchar(30) | YES | MUL | NULL | |
| timestamp | timestamp | NO | | CURRENT_TIMESTAMP | |
| data | mediumblob | YES | | NULL | |
+------------+---------------------+------+-----+-------------------+----------------+
Old data is deleted using a sliding window technique, meaning that only the last N items in each category are kept in the table.
How can I keep track the total number of the items per category, and the timestamp of the first item in the category?
Edit - COUNT and MIN on the original table won't work, because this is a Sliding Window data structure meaning that the first items have already been deleted.
Clearly you need to keep a separate table when you delete the records. Your table should summarize the categories and include the fields:
Category first start time
Total number of items in the category
and so on.
When you go to delete, you need to update this table. In general, I prefer to use stored procedures to handle database maintenance, so this code could be added to the stored procedure. Others prefer triggers, so you could have a delete trigger that does the same thing.
try with SELECT count(id) FROM table GROUP BY category

Break this Query into batches

I have a MySQL table contacts, with structure as follows
+--------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| contactee_id | int(11) | NO | MUL | 0 | |
| contacter_id | int(11) | NO | MUL | 0 | |
+--------------+----------+------+-----+---------+----------------+
contactee_id and contacter_id are both ids, which together defines a relationship between two users. In order to calculate the count of relations, a user have, I have the following query
INSERT INTO followers (id, followers)
SELECT contactee_id, 1
FROM contacts
ON DUPLICATE KEY
UPDATE followers = followers + 1
The problem with this query is that it locks the contacts table for too long (more than 16 minutes). I want to get it done in batches, so that the SQL does not locks contacts table for too long. Few ways, I thought of, but they all need to lock the entire table. Is there a way this could be done?
If you just want the count of relations use the count and group by together like
SELECT contactee_id,count(contacter_id) FROM contacts group by contactee_id;
This will give you all the contactee_id and the number of contacter_id's for each contactee
Run query for some records and then save the id of the last record in a table or filesystem, start next query from that id and update it every cycle.