I am trying to index json arrays where the contents are variable length strings and I can't figure out if its possible, let alone scalable.
A very similar question about indexing JSON data using the new multi value index is here: Indexing JSON column in MySQL 8
The syntax from that question executes, but using CHAR isn't right for me and ends in an error anyway. After changing names and adjusting the CHAR length for my data:
ALTER TABLE catalog ADD INDEX idx_30144( (CAST( j_data->>'$."30144"' AS char(250) ARRAY)) );
I get this error
1034 - Incorrect key file for table 'catalog'; try to repair it
Trying this:
ALTER TABLE catalog ADD INDEX idx_30144( (CAST( j_data->>'$."30144"' AS varchar(250) ARRAY)) );
Gives this error:
1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'varchar(250) ARRAY)) )' at line 1
This is an InnoDB table so obviously the 1034 error isn't accurate. It completes in around 2 seconds so while it could be running out of space, it happens too fast to see that, and there's 350 GB free on the drive.
I have over 200 JSON nodes like this that I would like to index, ideally. If this is a huge storage suck I can be happy with a subset of them, but I need to know if its possible in the first place.
You can only index such values, by generating a column which you index
Like
CREATE TABLE jempn (
id BIGINT(20) NOT NULL AUTO_INCREMENT PRIMARY KEY,
j_data JSON DEFAULT NULL,
g varchar(250) GENERATED ALWAYS AS (j_data->'$."30144"' ) STORED,
INDEX idx_30144 (g)
) ENGINE=INNODB;
Related
Is it possible to create an index in MySQL for the last digit of an int column?
Based on this answer i have created partitions based on last digit of an int column
CREATE TABLE partition_test(
textfiled INT,
cltext TEXT,
reindexedAt TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
indexedAt TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
status TINYINT(2),
postId INT)
PARTITION BY HASH(MOD(postId, 10))
PARTITIONS 10;
I'm trying to create an index for the last digit of postId for optimizing queries time. Is there any way to do this or a simple index on postId is enough?
Some failed tries:
CREATE INDEX postLastDigit USING HASH ON partition_test (MOD(postId, 10));
(1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'MOD(postId, 10))' at line 1")
and
CREATE INDEX postLastDigit ON partition_test (MOD(postId, 10));
(1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'MOD(postId, 10))' at line 1")
UPDATE:
The table has more than 100M row.
My goal is to optimize queries like:
1)
SELECT cltext FROM partition_tables
WHERE postId in (<INT>, <INT>)
AND status IS NOT NULL
2)
SELECT cltext FROM partition_tables
WHERE postId in (<INT>, <INT>)
AND status IS NOT NULL
AND reindexedAt BETWEEN (<DATE>, <DATE>)
MariaDB version: 10.1.23-MariaDB-9+deb9u1
What query are you trying to speed up? Without any indexes on the table, any query will have to scan the entire table! If you want speed, first look to indexing.
If your query is SELECT ... WHERE post_id = 123, your Partitioning might make it run about 10 times as fast. But INDEX(post_id), with or without partitioning, will make it run hundreds of times as fast.
Please provide the SELECTs so we can help you speed them up.
(OK, if you are just playing around with partitioning, the others have given you viable answers.)
"Partition Pruning" is rarely faster than a suitable index that starts with the pruning column.
After you solve your stated hashing problem, please report back whether the queries any faster than using an index. Even pitted against an index, I predict partitioning will not run faster, and may even run a little slower.
You have tagged your question with mariadb and mysql. If you are using a resonably recent version of MariaDB, you can use generated columns for indexing. If you are using MySQL, you can do the same if your MySQL version is at least 5.7.
If you are using a lower version of MySQL, you could create an additional column in your table where you store the last digit of postId for each row, and use that column for indexing / partitioning.
This would mean minimal changes to your application code: Before inserting or updating, get the last digit of postId first, and insert / update one more field. As an alternative, you eventually could use triggers to automatically fill that additional column.
Use virtual columns. In MariaDB 10.2, you can create index on virtual aka generated column, like this
CREATE TABLE t (
num int,
last_digit int(1) AS (num % 10) VIRTUAL,
KEY index_last_digit (last_digit)
)
Then you can use last_digit in your queries, i.e SELECT ... WHERE last_digit=1
In older versions of MariaDB, 5.2 to 10.1 , you'd need to specify PERSISTENT attribute rather than VIRTUAL, because non-persistent generated columns could not be indexed.
I have searched everything I could ) Truly. But I can't find the correct way to add new columns only after checking if the column doesn't exist. I am writing a program in C.
Here is what I am doing, and I can't find my mistake in syntax. I will be very grateful for your help! I get an error You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use
//create buffer to store the query
char buff[1024];
//store query in the buffer
snprintf(buff, sizeof buff, "IF NOT EXISTS(SELECT * FROM information_schema.COLUMNS WHERE COLUMN_NAME = '%i' AND TABLE_NAME = '%s' AND TABLE_SCHEMA = '%s') THEN ALTER TABLE `%s`.`%s` ADD COLUMN `%i` INT; END IF;", value1, table, database, database, table, value1);
EDIT
I am editing the post to show what I am trying to achieve.
Using nested if statement in the main function, I have created the database and the table, and have populated the table with column names; my code is designed in a way that all functions are interrelated: only if connection is established, the program calls "create database" function; only if database is created, the program calls "create table" function; only if the table is created and initially only two columns are added (id and Names), the program calls the function to alter table in order to add other columns.
I do so because I need a for loop to loop those additional column names, which were created previously by my previous C program.
So the table should look like this:
id name 1988 1977 1966 1955
1 name1 value value value value
2 name2 value value value value
3 name3 value value value value
Each time the program is called, each function checks if database exists, then it is not created from scratch, if table exists, it is not created, and now I am stumbled on how to check of columns exist, because if they do, I get an error and the program can't move on.
To add a column you can do it like this
snprintf(
buff,
sizeof buff,
"ALTER TABLE `%s`.`%s` ADD COLUMN IF NOT EXISTS `%s` `%s`",
database,
table,
column_name,
column_type
);
Note that in your format string there is a %i that doesn't look right.
After giving you the answer, because this is what you asked for, I want to say that adding a column in code like that looks like a bad sign. SQL databases are pretty static in their structure, you should never need to add or remove columns from it. If you have to, then there is a problem either in the database design or the way you are handling it.
According to the comments below you need something like this
CREATE TABLE `names` (
`name` VARCHAR(128) PRIMARY KEY
) ENGINE = InnoDB;
CREATE TABLE `entries` (
`name` VARCHAR(128) NOT NULL,
`year` INTEGER NOT NULL,
-- Or the required type (FLOAT perhaps?)
`value` INTEGER NOT NULL,
-- All names MUST come from the `names` table
CONSTRAINT `name_fk` FOREIGN KEY (`name`) REFERENCES `names` (`name`),
-- Allow only one entry per `name`/`year`
CONSTRAINT `entry_pk` PRIMARY KEY (`name`, `year`)
) ENGINE = InnoDB;
And then you can insert each name in the names table, and one entry per year in the entries table, you can have the combinations you want and you can query all years for a given name
SELECT * FROM `entries` WHERE `name` = ?
Creating a database schema dynamically is wrong, it's just against the whole idea of a schema, a database has a schema so you can write queries an rely on them working, the language is called Structured Query Language for a reason.
I created Unique Compound Index:
Alter Table TableX Add Unique Index `UniqueRecord` (A,B,C,D)
The issue is that sometimes C can be NULL.
I noticed that
`Insert IGNORE`
Was still in some cases adding duplicate records and this turned out to be when those incoming records had C as NULL.
I tested the hypothesis that this was an issue by doing:
Select concat(A,B,C,D) as Index from TableA where C is NULL
And Index in each of those cases was in fact NULL. Once I remove the null field from the select:
Select concat(A,B,D) as Index from TableA where C is NULL
I get the expected string values vs nulls.
So the question is, other than doing an update like set C='' where C is NULL is there some way to set up the Index so that it works? I am loathe to simply make the Index A,B,D as that might introduce unwanted dupes when C in fact is not NULL.
Update:
I did try using IfNull in the Index creation but Mysql did not like that:
Alter Table TableA Add Unique Index UniqueLocator (A,B,IfNull(C,''),D
Mysql said:
[Err] 1064 - You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version
for the right syntax to use near 'C,''),D)' at line 1
Yes MySQL allows NULLs in unique indexes, which is the right thing to do. But you can define column C as NOT NULL if you don't like that.
MySQL -- but not all databases -- allow duplicate NULL values in unique indexes. I believe the ANSI standard is rather ambiguous on this point (or perhaps even contradictory). You basically have two choices.
The first is to define a default value for the column. This may not be appealing in terms of code, but it will at least generate an error on duplicate insert. For instance, if "C" is a foreign key reference to an auto-incremented id, then you might use -1 or 0 as the default value. If it is a date, you might use the zero date.
The other solution is a trigger, where you manually check for the duplicate values before doing an insert (or update).
Using JDBC 3 driver, one can insert a record into a table and immediately get autogenerated value for a column. This technique is used in ActiveJDBC.
Here is the table definition:
CREATE TABLE users (id int(11) NOT NULL auto_increment PRIMARY KEY, first_name VARCHAR(56), last_name VARCHAR(56), email VARCHAR(56)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This is working fine on H2 and PostgreSQL, and the type of the returned value is Integer.
However, in MySQL the type is Long, while I believe it should be Integer.
When querying this same row in Mysql, the "id" comes back as Integer.
Anyone knows why the "getGeneratedKeys()" returns java.lang.Long in Mysql and how to fix it?
The why:
The generator that MySQL uses for keeping track of the value is BIGINT, so the driver describes it as BIGINT, and that is equivalent to Long. See LAST_INSERT_ID in the MySQL manual.
Drivers like PostgreSQL return the actual column of the table (actually PostgreSQL returns all columns when using getGeneratedKeys(); I assume that MySQL simply calls LAST_INSERT_ID().
How to solve it:
As indicated by Jim Garrison in the comments: Always use getInt(), or getLong(), and not getObject().
I have a query from SQL Server which I want to run in Mysql. but I cannot find any replacement for uniqueidentifier keyword from SQL Server script to MYSQL Script.
Here is the query
CREATE TABLE foo(
myid uniqueidentifier NOT NULL,
barid uniqueidentifier NOT NULL
)
What will be the query in Mysql for the same above SQL Server script?
CREATE TABLE FOO (
myid CHAR(38) NOT NULL,
barid CHAR(38) NOT NULL
);
According to MS website, GUID's are 38 chars in length.
The accepted answer, although not exactly wrong, is somewhat incomplete. There certainly are more space efficient ways to store GUID/UUIDs. Please have a look at this question: "Storing MySQL GUID/UUIDs"
This is the best way I could come up with to convert a MySQL GUID/UUID generated by UUID() to a binary(16):
UNHEX(REPLACE(UUID(),'-',''))
And then storing it in a BINARY(16)
If storage space of the GUID/UUID is a primary concern this method will deliver significant savings.
According the MySQL website you should match it to VARCHAR(64)
UNIQUEIDENTIFIER, VARCHAR(64)
http://dev.mysql.com/doc/workbench/en/wb-migration-database-mssql-typemapping.html
Remember also that a 16 byte value is represented in hex as 32 bytes. With the 4 dashes and the 2 curly braces, that gets us the 38 bytes in this format compatible with SQL Server with a 38 byte string. For example: {2DCBF868-56D7-4BED-B0F8-84555B4AD691}.