MySQL: WHERE Condition against all Columns without specifying them - mysql

So long story short:
I have table A which might expand in columns in the future. I'd like to write a php pdo prepared select statement with a WHERE clause which applies the where condition to ALL columns on the table. To prevent having to update the query manually if columns are added to the table later on, I'd like to just tell the query to check ALL columns on the table.
Like so:
$fetch = $connection->prepare("SELECT product_name
FROM products_tbl
WHERE _ANYCOLUMN_ = ?
");
Is this possible with mysql?
EDIT:
To clarify what I mean by "having to expand the table" in the future:
MariaDB [foundationtests]> SHOW COLUMNS FROM products_tbl;
+----------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+----------------+
| product_id | int(11) | NO | PRI | NULL | auto_increment |
| product_name | varchar(100) | NO | UNI | NULL | |
| product_manufacturer | varchar(100) | NO | MUL | diverse | |
| product_category | varchar(100) | NO | MUL | diverse | |
+----------------------+--------------+------+-----+---------+----------------+
4 rows in set (0.011 sec)
Here you can see the current table. Basically, products are listed here by their name, and they are accompanied by their manufacturers (say, Bosch) and category (say, drill hammer). Now I want to add another "attribute" to the products, like their price.
In such a case, I'd have to add another column, and then I'd have to specify this new column inside my MySQL queries.

Related

Index on a TEXT column with defined set of values

mysql> describe marketing_details;
+--------------------------+---------------------+------+-----+-------
------------+-----------------------------+
| Field | Type | Null | Key |
Default | Extra |
+--------------------------+---------------------+------+-----+-------
------------+-----------------------------+
| id | bigint(20) | NO | PRI | NULL
| auto_increment |
| platform_origin | varchar(100) | NO | MUL |
| |
| partner_id | bigint(20) unsigned | NO | MUL | 0
| |
+--------------------------+---------------------+------+-----+-------
------------+-----------------------------+
For a query like,
SELECT *
FROM partners
INNER JOIN marketing_details ON partners.id = marketing_details.partner_id
where marketing_details.platform_origin IN ('platform_A', 'platform_B');
The platform_origin column can have a set of defined values (one of 4 values). So adding index/full text index does not seem to be of use here. But the data can be huge and there can be multiple constraints in the query. What will be a good way to optimise the query?
For this query:
SELECT . . .
FROM partners p INNER JOIN
marketing_details md
ON p.id = md.partner_id
WHERE md.platform_origin IN ('platform_A', 'platform_B');
You should try indexes on:
marketing_details(platform_origin, partner_id)
partners(id)
You probably already have the second one if id is declared as the primary key.
Another suggestion would be to have a check constraint on platform_origin to have only the 4 possible values
Also it would help if you have histograms on the column platform_origin if your version allows. This would give the optimizer more inputs while creating an optimal plan if there is data skew on the values in the column.
Have a look at the following link.
https://mysqlserverteam.com/histogram-statistics-in-mysql/

Constructing a DB for best performance

I'm working on "online streaming" project and I need some help in constructing a DB for best performance. Currently I have one table containing all relevant information for the player including file, poster image, post_id etc.
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| post_id | int(11) | YES | | NULL | |
| file | mediumtext | NO | | NULL | |
| thumbs_img | mediumtext | YES | | NULL | |
| thumbs_size | mediumtext | YES | | NULL | |
| thumbs_points | mediumtext | YES | | NULL | |
| poster_img | mediumtext | YES | | NULL | |
| type | int(11) | NO | | NULL | |
| uuid | varchar(40) | YES | | NULL | |
| season | int(11) | YES | | NULL | |
| episode | int(11) | YES | | NULL | |
| comment | text | YES | | NULL | |
| playlistName | text | YES | | NULL | |
| time | varchar(40) | YES | | NULL | |
| mini_poster | mediumtext | YES | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
With 100k records it takes around 0.5 sec for a query and performance constantly degrading as I have more records.
+----------+------------+----------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+----------------------------------------------------------------------+
| 1 | 0.04630675 | SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1' |
+----------+------------+----------------------------------------------------------------------+
explain SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1';
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | dle_playerFiles | ALL | NULL | NULL | NULL | NULL | 61777 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
How can I improve DB structure? How big websites like youtube construct their database?
Generally when query time is directly proportional to the number of rows, that suggests a table scan, which means for a query like
SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1'
The database is executing that literally, as in, iterate over every single row and check if it meets criteria.
The typical solution to this is an index, which is a precomputed list of values for a column (or set of columns) and a list of rows which have said value.
If you create an index on the post_id column on dle_playerFiles, then the index would essentially say
1: <some row pointer>, <some row pointer>, <some row pointer>
2: <some row pointer>, <some row pointer>, <some row pointer>
...
100: <some row pointer>, <some row pointer>, <some row pointer>
...
7000: <some row pointer>, <some row pointer>, <some row pointer>
250000: <some row pointer>, <some row pointer>, <some row pointer>
Therefore, with such an index in place, the above query would simply look at node 7000 of the index and know which rows contain it.
Then the database only needs to read the rows where post_id is 7000 and check if their type is 1.
This will be much quicker because the database never needs to look at every row to handle a query. The costs of an index:
Storage space - this is more data and it has to be stored somewhere
Update time - databases keep indexes in sync with changes to the table automatically, which means that INSERT, UPDATE and DELETE statements will take longer because they need to update the data. For small and efficient indexes, this tradeoff is usually worth it.
For your query, I recommend you create an index on 2 columns. Make them part of the same index, not 2 separate indexes:
create index ix_dle_playerFiles__post_id_type on dle_playerFiles (post_id, type)
Caveats to this working efficiently:
SELECT * is bad here. If you are returning every column, then the database must go to the table to read the columns because the index only contains the columns for filtering. If you really only need one or two of the columns, specify them explicitly in the SELECT clause and add them to your index. Do NOT do this for many columns as it just bloats the index.
Functions and type conversions tend to prevent index usage. Your SQL wraps the integer types post_id and type in quotes so they are interpreted as strings. The database may feel that an index can't be used because it has to convert everything. Remove the quotes for good measure.
If I read your Duration correctly, it appears to take 0.04630675 (seconds?) to run your query, not 0.5s.
Regardless, proper indexing can decrease the time required to return query results. Based on your query SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1', an index on post_id and type would be advisable.
Also, if you don't absolutely require all the fields to be returned, use individual column references of the fields you require instead of the *. The fewer fields, the quicker the query will return.
Another way to optimize a query is to ensure that you use the smallest data types possible - especially in primary/foreign key and index fields. Never use a bigint or an int when a mediumint, smallint or better still, a tinyint will do. Never, ever use a text field in a PK or FK unless you have no other choice (this one is a DB design sin that is committed far too often IMO, even by people with enough training and experience to know better) - you're far better off using the smallest exact numeric type possible. All this has positive impacts on storage size too.

MySQL: Count items by categories

I've created a table that holds items according to categories:
+------------+---------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+------+-----+-------------------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL | |
| category | varchar(30) | YES | MUL | NULL | |
| timestamp | timestamp | NO | | CURRENT_TIMESTAMP | |
| data | mediumblob | YES | | NULL | |
+------------+---------------------+------+-----+-------------------+----------------+
Old data is deleted using a sliding window technique, meaning that only the last N items in each category are kept in the table.
How can I keep track the total number of the items per category, and the timestamp of the first item in the category?
Edit - COUNT and MIN on the original table won't work, because this is a Sliding Window data structure meaning that the first items have already been deleted.
Clearly you need to keep a separate table when you delete the records. Your table should summarize the categories and include the fields:
Category first start time
Total number of items in the category
and so on.
When you go to delete, you need to update this table. In general, I prefer to use stored procedures to handle database maintenance, so this code could be added to the stored procedure. Others prefer triggers, so you could have a delete trigger that does the same thing.
try with SELECT count(id) FROM table GROUP BY category

Delete duplicate rows GROUP BY with LIKE

I have a string in database (mysql) which is like:
{"StateId":73,"CallTime":"\/Date(1336365498912+0500)\/","CallId":"1336365489.14157","Target":"agi://127.0.0.1"}},"Profile":{"$type":"DataWriter.DbProfile, DataWriterObjects","Name":"DataService","Provider":"mssql","ConnectionString":"Data Source=localhost\\mydb; Database=mydb; User Id=sa; Password=admin;"}}
The string is a JSON object which contains multiple fields. The problem is that I have multiple duplicate rows which I want to remove from the database. A row is considered a duplicate if the CallId and StateId is same but the CallTime is different. So first I want to get list of the duplicates (GROUP BY) of those rows which have CallId same and ignore the difference in CallTime. The below record has different CallTime from the first one but same CallId, hence it is considered a duplicate (basically need not to consider CallTime for duplicate)
{"StateId":73,"CallTime":"\/Date(1336365498913+0500)\/","CallId":"1336365489.14157","Target":"agi://127.0.0.1"}},"Profile":{"$type":"DataWriter.DbProfile, DataWriterObjects","Name":"DataService","Provider":"mssql","ConnectionString":"Data Source=localhost\\mydb; Database=mydb; User Id=sa; Password=admin;"}}
So how do I do a GROUP BY? Basically everything in the GROUP BY should be matched ignoring the CallTime value.
The table structure is
mysql> describe Statements;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| SequenceId | bigint(10) | NO | PRI | NULL | auto_increment |
| Profile | varchar(32) | YES | MUL | NULL | |
| CacheItem | text | NO | | NULL | |
+------------+-------------+------+-----+---------+----------------+
After that I want to delete the duplicates. Anyone help me out?
I think your database is not atomic enough, you may have to split out your JSON string into separate fields

MySQL: return field for which no related entries exist in another table

First, sorry for the title, as I'm no native english-speaker, this is pretty hard to phrase. In other words, what I'm trying to achieve is this:
I'm trying to fetch all domain names from the table virtual_domains where there is no corresponding entry in the virtual_aliases table starting like "postmaster#%".
So if I have two domains:
foo.org
example.org
An they got aliases like:
info#foo.org => admin#foo.org
postmaster#foo.org => user1#foo.org
info#example.org => admin#example.org
I want the query to return only the domain "foo.org" as "example.org" is missing the postmaster alias.
This is the table layout:
mysql> show columns from virtual_aliases;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| domain_id | int(11) | NO | MUL | NULL | |
| source | varchar(100) | NO | | NULL | |
| destination | varchar(100) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
mysql> show columns from virtual_domains;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
I tried for many hours with IF, CASE, LIKE queries with no success. I don't need a final solution, maybe just a hint with some explanation. Thanks!
SELECT * FROM virtual_domains AS domains
LEFT JOIN virtual_aliases AS aliases
ON domains.id = aliases.domain_id
WHERE aliases.domain_id IS NULL
LEFT JOIN returns all records from the "left" table, even they have no corresponding records in "right" table. Those records will have the right table fields set to NULL. Use WHERE to strip all the others.
I guess I didn't understand you correctly the first time. You have several entries in aliases for single domain, and you want to display only those domains that don't have an entry in aliases table that starts with "postmaster"?
In this case you are should use NOT IN like this:
SELECT * FROM virtual_domains AS domains
WHERE domains.id NOT IN (
SELECT domain_id
FROM virtual_aliases
WHERE whatever_column LIKE "postmaster#%"
)
select id,domain from virtual_domains
where id not in (select domain_id from virtual_aliases)
SELECT * FROM virtual_domains vd
LEFT JOIN virtual_aliases va ON vd.id = va.domain_id
AND va.destination NOT LIKE 'postmaster#%';