Can MySQL JSON columns be indexed in version 8.0.19? - mysql

According to this webpage, MySQL JSON columns cannot be indexed.
MySQL Server Blog
"JSON columns cannot be indexed. You can work around this restriction by creating an index on a generated column that extracts a scalar value from the JSON column."
Can someone please tell me if this is changed in latest MySQL community version 8.0.19?
What will give me the best performance? A index on a generated column or a duplicate (a non JSON column with the exact same text as in the JSON column) column with normal fulltext search?

This is still the case, from the documentation:
JSON columns, like columns of other binary types, are not indexed
directly; instead, you can create an index on a generated column that
extracts a scalar value from the JSON column. See Indexing a Generated
Column to Provide a JSON Column Index, for a detailed example.
and here also:
Indexing a Generated Column to Provide a JSON Column Index
As noted elsewhere, JSON columns cannot be indexed directly.

JSON is a handy way to store somewhat arbitrary information in a database table. If will regularly use the components of the JSON in RDBMS operations, such as indexing, then you should pull them out of the JSON (or copy them out).

Related

MySQL JSON column speed

I'm working on a custom CRM for a client who wants "custom fields" which need to be completely dynamic on a per-user basis. I have considered EAV because I cannot simply add these columns to the table. However, after doing some digging, I have settled with using a JSON column in order to store the custom field values in key value pairs.
My concern is the speed of doing lookups on data in this JSON column. How can I ensure that queries which search or perform where's on values in this JSON column remain fast with large amounts of rows in the table?
Or is a JSON column not the ideal solution for this?

json field type vs. one field for each key

I'm working on a website which has a database table with more than 100 fields.
The problem is when my records number get very much (like more than 10000) the speed of response gets very much and actually doesn't return any answer.
Now i want to optimize this table.
My question is: Can we use json type for fields to reduce the number of columns?
my limitation is that i want to search, change and maybe remove that specific data which is stored in json.
PS: i read this qustion : Storing JSON in database vs. having a new column for each key, but that was asked in 2013 and as we know in MuSQL 5.7 json field type is added.
tnx for any guide...
First of all having table with 100 columns may suggest you should rethink your architecture before proceeding. Otherwise it will only become more and more pain in later stages.
May be you are storing data as seperate columns which can be broken down to be stored as seperate rows.
I think the sql query you are writing is like (select * ... ) where you may be fetching extra columns than you may require. You may specify the columns you require. It will definitely speed up the api response.
In my personal view storing active data in json inside sql is not useful. Json should be used as last resort for the meta data which does not mutate or needs not to be searched.
Please make your question more descriptive about the schema of your database and query you are making for api.

Google-BigQuery - schema parsing of CSV file

We are using Java API to load a CSV file to Google Big Query. Is there a way to detect the columns on load and auto select the appropriate schema type?
For example, if a specific column has only float, then BigQuery assigns the column as float, if non numeric then it assigns column as string. Is there a method to do this?
The roundabout way is to assign each column as string by default when loading the CSV.
Then do a query on each column -
SELECT count(columnname)- count(float(columnname)) FROM dataset.table
(assuming I am only interested in isolating columns that have "float values" that I can use for math functions from my application)
Any other method to solve this problem?
Right now, BigQuery does not support schema inference, so as you suggest, your options are:
Provide the schema explicitly when loading data.
Load all data using the string type, and cast/convert at query time.
Note that you can use the allowLargeResults feature to clean up and rewrite your imported data (but note that you'll be charged for the query, which will increase your data ingestion costs).
For the record, schema auto-detect is now supported: https://cloud.google.com/bigquery/federated-data-sources#auto-detect

Indexes on BLOBs that contain encrypted data

I have a bunch of columns in a table that are of the type BLOB. The data that's contained in these columns are encrypted with MySQL's AES_ENCRYPT() function. Some of these fields are being used in a search section of an application I'm building. Is it worth it to put indexes on the columns that are being frequently accessed? I wasn't sure if the fact that they are BLOBs or the fact that the data itself is encrypted would make an index useless.
EDIT: Here are some more details about my specific case. There is a table with ~10 columns or so that are each BLOBs. Each record that is inserted into this table will be encrypted using the AES_ENCRYPT() function. In the search portion of my application users will be able to type in their query. I take their query and decrypt it like this SELECT AES_DECRYPT(fname MYSTATICKEY) AS fname FROM some_table so that I can perform a search using a LIKE clause. What I am curious about is if the index will index the encrypted data and not the actual data that is returned from the decryption. I am guessing that if the index applied to only the encrypted binary string then it would not help performance at all. Am I wrong on that?
Note the following:
You can't add an index of type FULLTEXT to a BLOB column (http://dev.mysql.com/doc/refman/5.5/en//fulltext-search.html)
Therefore, you will need to use another type of index. For BLOBs, you will have to specify a prefix length (http://dev.mysql.com/doc/refman/5.0/en/create-index.html) - the length will depend on the storage engine (e.g. up to 1000 bytes long for MyISAM tables, and 767 bytes for InnoDB tables). Therefore, unless the values you are storing are short you won't be able to index all the data.
AES_ENCRYPT() encrypts a string and returns a binary string. This binary string will be the value that is indexed.
Therefore, IMO, your guess is right - an index won't help the performance of your searches.
Note that 'indexing an encrypted column' is a fairly common problem - there's quite few articles online about it. For example (although this is quite old and for MS SQL it does cover some ideas): http://blogs.msdn.com/b/raulga/archive/2006/03/11/549754.aspx
Also see: What's the best way to store and yet still index encrypted customer data? (the top answer links to the same article I found above)

I need to use index the column of json data in mysql

My database is MySQL and I found that there isn't a JSON type but only text. But in my situation, I need to store the data of JSON meaning and use index on the hash key of the JSON data. So 'text' type isn't the right way. By the way, I can't separate the JSON data into different columns, because I don't know what the keys are in the JSON.
What I really need is a solution for searching JSON data by the JSON keys in MySQL using index or sth, so the searching speed could be fast and acceptable.
If you can't normalize your data into a RDBMS-readable format, you should not use MySQL.
A noSQL database might be the better approach.
MySQLs strengths are to work with relational data.
You are trying to fit a square object through a round hole :)
select concat('{"objects\":['
,(select group_concat(
concat('{"id":"',id, '","field":"',field,'"}') separator ',')
from placeholder)
,']}');
Anything more generic would require dynamic sql
I think your situation calls for the Levenshtein Distance algorithm. You basically need to do a text search of an unknown quantity (the haystack) with a text fragment (the needle). And you need it to be fast, because indexing is out of the question.
There is a little known (or little-used at any rate) capability of MySQL whereby you can create User-Defined Functions. The function itself is a stored procedure, and for extra speed, it can be compiled in C++. UDF's are used natively in your queries as though they are part of regular MySQL syntax.
Here are the details for implementing a Levenshtein User-Defined Function in MySQL.
An example query might be...
SELECT json FROM my_table WHERE levenshtein('needle') < 5;
The 5 refers to the 'edit distance' and actually allows for near matches to be located.