MySQL namespace in create table [duplicate] - mysql

This question already has answers here:
Grouping tables within a MySQL database
(4 answers)
Closed 3 years ago.
Can I give the namespace on the table create syntax ? Like below in com.sumeet.QRTZ_JOB_DETAILS
It is not working with below syntax
CREATE TABLE "com"."sumeet"."QRTZ_JOB_DETAILS"(
SCHED_NAME VARCHAR(120) NOT NULL,
JOB_NAME VARCHAR(200) NOT NULL,
JOB_GROUP VARCHAR(200) NOT NULL,
DESCRIPTION VARCHAR(250) NULL,
JOB_CLASS_NAME VARCHAR(250) NOT NULL,
IS_DURABLE VARCHAR(1) NOT NULL,
IS_NONCONCURRENT VARCHAR(1) NOT NULL,
IS_UPDATE_DATA VARCHAR(1) NOT NULL,
REQUESTS_RECOVERY VARCHAR(1) NOT NULL,
JOB_DATA BLOB NULL,
PRIMARY KEY (SCHED_NAME,JOB_NAME,JOB_GROUP));

There is no concept of a hierarchy in MySQL. ie. No folders, containers or namespaces for your databases.
MySQL has 2 concepts: A database (Often referred to as a Schema), and tables. You'd need to name your database "com.sumeet" (Which is allowed, but can be considered messy*), and your table: QRTZ_JOB_DETAILS. Various software (like PHPMyAdmin) will look at the similarities and will group databases together into folders based on the name itself. For example, if you use underscores "com_sumeet" phpmyadmin will actually group those together:
Note: Naming your database with periods is likely more trouble than it's worth, due to the fact that your database calls will now require quotes in order to function correctly. (ie. "com.test".table vs com_test.table) Period is the universal separator between databases and tables, so the quotes are required to tell MySQL specifically what is a database, and what is the table in your queries.

Related

problems with the check in a sql table [duplicate]

This question already has answers here:
CHECK constraint in MySQL is not working
(8 answers)
Closed 2 years ago.
i've created this table in SQL
CREATE TABLE product (
code CHAR(7) NOT NULL,
name VARCHAR(30) NOT NULL,
Description VARCHAR(500) NOT NULL,
cost DOUBLE UNSIGNED NOT NULL,
PRIMARY KEY (code),
check(substring(code,1,3) like '%[a-z]%'
and substring(code,4,4) like '%[0-9]%'),
);
the value 'code' must consist of 3 characters and 4 numbers, but it doesn't work. what's wrong in the check?
the value 'code' must consist of 3 characters and 4 numbers, but it doesn't work. what's wrong in the check?
Use regular expressions:
check (code regexp '^[A-Z]{3}[0-9]{4}$')
MySQL does not extend the definition of LIKE to include character classes. It has real regular expression support.

MySQL: Expression of generated column contains disallowed function? CONCAT?

I have a table with a virtual generated column that concatenates five other columns (int and char) using CONCAT_WS(). This table contains 200-odd records and is never updated - it's just used as a lookup table. Recently, after months of untroubled processing, when I update records in a child table during which a SELECT is performed on this table, I sometimes see this error (ignore the "ITEM UPDATE FAILED" - that's me):
I am in development with a many changes every day, so it is impossible for me to determine if there is a correlating change. I have recently added "created" and "lastmodified" datetime fields to several tables with CURRENT_TIMESTAMP for DEFAULT or ON UPDATE, but not to this table.
Here's the table:
{EDIT} --- adding table definition:
CREATE TABLE `cpct_fixedfield` (
`id` int(11) UNSIGNED NOT NULL,
`name` varchar(256) NOT NULL,
`label` varchar(50) NOT NULL,
`field` int(11) NOT NULL,
`start` int(11) NOT NULL,
`rectype` int(11) NOT NULL ,
`mediatype` char(1) NOT NULL DEFAULT '' ,
`length` int(11) NOT NULL,
`userdefined` tinyint(1) NOT NULL,
`defaultval` varchar(5) NOT NULL,
`helpcode` varchar(10) NOT NULL,
`mandatory` varchar(2) NOT NULL ,
`idx` varchar(20) GENERATED ALWAYS AS (concat_ws('.',`field`,`rectype`,`mediatype`,`start`,`length`)) VIRTUAL NOT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
The length of the data in field never exceeds 11chars. I can view the entire table in pma or Mysql Workbench and the virtual field materialises in all records without complaint, which suggests to me that there is nothing wrong with either the expression for the virtual column or the data in the columns that expression draws on.
The error occurs in several contexts when I am updating a child table. All the updates occur in Stored Procedures/Functions. One section of code that seems to trigger the error is this:
SET idxvar = CONCAT_WS(".", SUBSTRING(tmpfldkey,3,1), rectype, ptype, position, "%") COLLATE utf8mb4_general_ci;
SELECT id INTO ffid FROM cpct_fixedfield WHERE idx LIKE idxvar AND idx != "0.0..6.2";
All the variables involved are varchars or ints. utf8mb4_general_ci is used throughout the database.
I cannot find any reference in MYSQL documentation to CONCAT or CONCAT_WS being unsafe, and none of the columns referenced has a default using a non-deterministic function. All the other questions I can find in this forum and elsewhere about this error have arisen because of the use of non-deterministic functions like CURRENT_TIMESTAMP() in the virtual field, or a component of the field.
I replaced the SELECT on the table with a (large) CASE statement and all was well, and in fact, after I did this then reverted to the SELECT I had no errors for many hours. But it just happened again (so I'm back to the case statement).
I have run out of ideas - I'm hoping someone has some knowledge/experience that can help.
Thanks

How to efficiently update values without a primary key in MySQL?

I am currently facing an issue with designing a database table and updating/inserting values into it.
The table is used to collect and aggregate statistics that are identified by:
the source
the user
the statistic
an optional material (e.g. item type)
an optional entity (e.g. animal)
My main issue is, that my proposed primary key is too large because of VARCHARs that are used to identify a statistic.
My current table is created like this:
CREATE TABLE `Statistics` (
`server_id` varchar(255) NOT NULL,
`player_id` binary(16) NOT NULL,
`statistic` varchar(255) NOT NULL,
`material` varchar(255) DEFAULT NULL,
`entity` varchar(255) DEFAULT NULL,
`value` bigint(20) NOT NULL)
In particular, the server_id is configurable, the player_id is a UUID, statistic is the representation of an enumeration that may change, material and entity likewise. The value is then aggregated using SUM() to calculate the overall statistic.
So far it works but I have to use DELETE AND INSERT statements whenever I want to update a value, because I have no primary key and I can't figure out how to create such a primary key in the constraints of MySQL.
My main question is: How can I efficiently update values in this table and insert them when they are not currently present without resorting to deleting all the rows and inserting new ones?
The main issue seems to be the restriction MySQL puts on the primary key. I don't think adding an id column would solve this.
Simply add an auto-incremented id:
CREATE TABLE `Statistics` (
statistis_id int auto_increment primary key,
`server_id` varchar(255) NOT NULL,
`player_id` binary(16) NOT NULL,
`statistic` varchar(255) NOT NULL,
`material` varchar(255) DEFAULT NULL,
`entity` varchar(255) DEFAULT NULL,
`value` bigint(20) NOT NULL
);
Voila! A primary key. But you probably want an index. One that comes to mind:
create index idx_statistics_server_player_statistic on statistics(server_id, player_id, statistic)`
Depending on what your code looks like, you might want additional or different keys in the index, or more than one index.
Follow the below hope it will solve your problem :-
- First use a variable let suppose "detailed" as money with your table.
- in your project when you use insert statement then before using statement get the maximum of detailed (SELECT MAX(detailed)+1 as maxid FROM TABLE_NAME( and use this as use number which will help you to FETCH,DELETE the record.
-you can also update with this also BUT during update MAXIMUM of detailed is not required.
Hope you understand this and it will help you .
I have dug a bit more through the internet and optimized my code a lot.
I asked this question because of bad performance, which I assumed was because of the DELETE and INSERT statements following each other.
I was thinking that I could try to reduce the load by doing INSERT IGNORE statements followed by UPDATE statements or INSERT .. ON DUPLICATE KEY UPDATE statements. But they require keys to be useful which I haven't had access to, because of constraints in MySQL.
I have fixed the performance issues though:
By reducing the amount of statements generated asynchronously (I know JDBC is blocking but it worked, it just blocked thousand of threads) and disabling auto-commit, I was able to improve the performance by 600 times (from 60 seconds down to 0.1 seconds).
Next steps are to improve the connection string and gaining even more performance.

MySql - Handle table size and performance

We are having a Analytics product. For each of our customer we give one JavaScript code, they put that in their web sites. If a user visit our customer site the java script code hit our server so that we store this page visit on behalf of this customer. Each customer contains unique domain name.
we are storing this page visits in MySql table.
Following is the table schema.
CREATE TABLE `page_visits` (
`domain` varchar(50) DEFAULT NULL,
`guid` varchar(100) DEFAULT NULL,
`sid` varchar(100) DEFAULT NULL,
`url` varchar(2500) DEFAULT NULL,
`ip` varchar(20) DEFAULT NULL,
`is_new` varchar(20) DEFAULT NULL,
`ref` varchar(2500) DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL,
`stats_time` datetime DEFAULT NULL,
`country` varchar(50) DEFAULT NULL,
`region` varchar(50) DEFAULT NULL,
`city` varchar(50) DEFAULT NULL,
`city_lat_long` varchar(50) DEFAULT NULL,
`email` varchar(100) DEFAULT NULL,
KEY `sid_index` (`sid`) USING BTREE,
KEY `domain_index` (`domain`),
KEY `email_index` (`email`),
KEY `stats_time_index` (`stats_time`),
KEY `domain_statstime` (`domain`,`stats_time`),
KEY `domain_email` (`domain`,`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
We don't have primary key for this table.
MySql server details
It is Google cloud MySql (version is 5.6) and storage capacity is 10TB.
As of now we are having 350 million rows in our table and table size is 300 GB. We are storing all of our customer details in the same table even though there is no relation between one customer to another.
Problem 1: For few of our customers having huge number of rows in table, so performance of queries against these customers are very slow.
Example Query 1:
SELECT count(DISTINCT sid) AS count,count(sid) AS total FROM page_views WHERE domain = 'aaa' AND stats_time BETWEEN CONVERT_TZ('2015-02-05 00:00:00','+05:30','+00:00') AND CONVERT_TZ('2016-01-01 23:59:59','+05:30','+00:00');
+---------+---------+
| count | total |
+---------+---------+
| 1056546 | 2713729 |
+---------+---------+
1 row in set (13 min 19.71 sec)
I will update more queries here. We need results in below 5-10 seconds, will it be possible?
Problem 2: The table size is rapidly increasing, we might hit table size 5 TB by this year end so we want to shard our table. We want to keep all records related to one customer in one machine. What are the best practises for this sharding.
We are thinking following approaches for above issues, please suggest us best practices to overcome these issues.
Create separate table for each customer
1) What are the advantages and disadvantages if we create separate table for each customer. As of now we are having 30k customers we might hit 100k by this year end that means 100k tables in DB. We access all tables simultaneously for Read and Write.
2) We will go with same table and will create partitions based on date range
UPDATE : Is a "customer" determined by the domain? Answer is Yes
Thanks
First, a critique if the excessively large datatypes:
`domain` varchar(50) DEFAULT NULL, -- normalize to MEDIUMINT UNSIGNED (3 bytes)
`guid` varchar(100) DEFAULT NULL, -- what is this for?
`sid` varchar(100) DEFAULT NULL, -- varchar?
`url` varchar(2500) DEFAULT NULL,
`ip` varchar(20) DEFAULT NULL, -- too big for IPv4, too small for IPv6; see below
`is_new` varchar(20) DEFAULT NULL, -- flag? Consider `TINYINT` or `ENUM`
`ref` varchar(2500) DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL, -- normalize! (add new rows as new agents are created)
`stats_time` datetime DEFAULT NULL,
`country` varchar(50) DEFAULT NULL, -- use standard 2-letter code (see below)
`region` varchar(50) DEFAULT NULL, -- see below
`city` varchar(50) DEFAULT NULL, -- see below
`city_lat_long` varchar(50) DEFAULT NULL, -- unusable in current format; toss?
`email` varchar(100) DEFAULT NULL,
For IP addresses, use inet6_aton(), then store in BINARY(16).
For country, use CHAR(2) CHARACTER SET ascii -- only 2 bytes.
country + region + city + (maybe) latlng -- normalize this to a "location".
All these changes may cut the disk footprint in half. Smaller --> more cacheable --> less I/O --> faster.
Other issues...
To greatly speed up your sid counter, change
KEY `domain_statstime` (`domain`,`stats_time`),
to
KEY dss (domain_id,`stats_time`, sid),
That will be a "covering index", hence won't have to bounce between the index and the data 2713729 times -- the bouncing is what cost 13 minutes. (domain_id is discussed below.)
This is redundant with the above index, DROP it:
KEY domain_index (domain)
Is a "customer" determined by the domain?
Every InnoDB table must have a PRIMARY KEY. There are 3 ways to get a PK; you picked the 'worst' one -- a hidden 6-byte integer fabricated by the engine. I assume there is no 'natural' PK available from some combination of columns? Then, an explicit BIGINT UNSIGNED is called for. (Yes that would be 8 bytes, but various forms of maintenance need an explicit PK.)
If most queries include WHERE domain = '...', then I recommend the following. (And this will greatly improve all such queries.)
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
domain_id MEDIUMINT UNSIGNED NOT NULL, -- normalized to `Domains`
PRIMARY KEY(domain_id, id), -- clustering on customer gives you the speedup
INDEX(id) -- this keeps AUTO_INCREMENT happy
Recommend you look into pt-online-schema-change for making all these changes. However, I don't know if it can work without an explicit PRIMARY KEY.
"Separate table for each customer"? No. This is a common question; the resounding answer is No. I won't repeat all the reasons for not having 100K tables.
Sharding
"Sharding" is splitting the data across multiple machines.
To do sharding, you need to have code somewhere that looks at domain and decides which server will handle the query, then hands it off. Sharding is advisable when you have write scaling problems. You did not mention such, so it is unclear whether sharding is advisable.
When sharding on something like domain (or domain_id), you could use (1) a hash to pick the server, (2) a dictionary lookup (of 100K rows), or (3) a hybrid.
I like the hybrid -- hash to, say, 1024 values, then look up into a 1024-row table to see which machine has the data. Since adding a new shard and migrating a user to a different shard are major undertakings, I feel that the hybrid is a reasonable compromise. The lookup table needs to be distributed to all clients that redirect actions to shards.
If your 'writing' is running out of steam, see high speed ingestion for possible ways to speed that up.
PARTITIONing
PARTITIONing is splitting the data across multiple "sub-tables".
There are only a limited number of use cases where partitioning buys you any performance. You not indicated that any apply to your use case. Read that blog and see if you think that partitioning might be useful.
You mentioned "partition by date range". Will most of the queries include a date range? If so, such partitioning may be advisable. (See the link above for best practices.) Some other options come to mind:
Plan A: PRIMARY KEY(domain_id, stats_time, id) But that is bulky and requires even more overhead on each secondary index. (Each secondary index silently includes all the columns of the PK.)
Plan B: Have stats_time include microseconds, then tweak the values to avoid having dups. Then use stats_time instead of id. But this requires some added complexity, especially if there are multiple clients inserting data. (I can elaborate if needed.)
Plan C: Have a table that maps stats_time values to ids. Look up the id range before doing the real query, then use both WHERE id BETWEEN ... AND stats_time .... (Again, messy code.)
Summary tables
Are many of the queries of the form of counting things over date ranges? Suggest having Summary Tables based perhaps on per-hour. More discussion.
COUNT(DISTINCT sid) is especially difficult to fold into summary tables. For example, the unique counts for each hour cannot be added together to get the unique count for the day. But I have a technique for that, too.
I wouldn't do this if i were you. First thing that come to mind would be, on receive a pageview message, i send the message to a queue so that a worker can pickup and insert to database later (in bulk maybe); also i increase the counter of siteid:date in redis (for example). Doing count in sql is just a bad idea for this scenario.

Polymorphic Associations Pattern or AntiPattern or Both?

Multiple questions on this site and others relate to using a MySQL table definition where the name of the table is a column name.
For instance, for "notes" in a DB I am thinking of using the structure:
CREATE TABLE IF NOT EXISTS `Notes` (
`id` int(10) NOT NULL,
`table` varchar(30) NOT NULL,
`row_id` int(10) NOT NULL,
`note` varchar(500) NOT NULL,
`user_id` int(11) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I keep reading all over the place that this is poor database design. I have figured out that this is called polymorphic association. Polymorphic association is specifically listed as a SQL Anti-Pattern. (or in slides)
I have seen the drawbacks of the antipattern, but I have no requirement for doing any of those types of queries that I can think of.
For my app, I want to be able to write notes on just about every other row in the database. For potentially hundreds of other rows.
It is confusing that while this is listed as an AntiPattern, it seems to be a fundamental part of the ruby ActiveRecord concept. Is the active record layer doing magic that makes this OK. (i.e. its polymorphic association at the record level, but not at the DB level)?
Specifically I would like to understand when/if using this SQL design is safe to use.
-FT