Indexing JSON column in MySQL 8 - mysql

So I'm experimenting with json column. Mysql 8.0.17 supposed to work with multi value JSON indexes, like this:
CREATE INDEX data__nbr_idx ON a1( (CAST(data->'$.nbr' AS UNSIGNED ARRAY)) )
I have column categories with JSON like this ["books", "clothes"].
I need to get all products from "books" category. I can use "json_contains" or new "member of".
SELECT * FROM products WHERE JSON_CONTAINS(categories, '\"books\"')
SELECT * FROM products WHERE "books" MEMBER OF(categories)
And it works. The problem is that of course EXPLAIN will reveal that there queries are making full table scan, and because of that it is slow.
So I need some index.
I changed index example by replacing "unsigned" type with "char(32) since my categories are strings and not numbers. I cannot find any example for this in google so I assumed that char() will be fine, but not.
This is my index query:
CREATE INDEX categories_index ON products((CAST(categories AS CHAR(32) ARRAY)))
also tried
CREATE INDEX categories_index ON products((CAST(categories->'$' AS CHAR(32) ARRAY)))
but selects are still making full table scan. What I'm doing wrong?
How to correctly index json column without using virtual columns?

For a multi-valued json index, the json paths have to match, so with an index
CREATE INDEX categories_index
ON products((CAST(categories->'$' AS CHAR(32) ARRAY)))
your query has to also use the path ->'$' (or the equivalent json_extract(...,'$'))
SELECT * FROM products WHERE "books" MEMBER OF(categories->'$')
Make it consistent and it should work.
It seems though that an index without an explicit path does not work as expected, so you have to specify ->'$' if you want to use the whole document. This is probably a bug, but could also be an intended behaviour of casting or autocasting to an array. If you specify the path you'll be on the safe side.

Related

How to manage JSON query performance in MySQL DB

I have a Mysql8 DB which contains JSON data. Unfortunately, the content is not always the same. To make it simple, the hierarchy is always the same, but sometimes part of the "tree" is missing or slightly different. For instance:
$.bilan.victimes.*.preview.TAGSAU (I use a star, since sometimes, it's '1', '2', etc... and sometimes it is only '$.bilan.victimes' (without further subkeys)
Now, I am using queries to lookup information in the JSON like:
SELECT
COUNT(fiche_id) AS USAGE_DSA,
JSON_VALUE(content, '$.bilan.victimes.*.preview.DSA') AS DSA
FROM bilan_json
WHERE STR_TO_DATE(JSON_VALUE(content, '$.bilan.victimes.*.preview.TAGSAU'),'%e/%c/%Y %H%#%i') >= '2021-01-01'
GROUP BY DSA;
This is working fine, but since there is a lot of records, and JSON could be very long, it takes an awful bunch of time to display the result. In this example, this is only key... I am supposed to retrieve multiples values from the JSON, sometimes in a single query.
I've read about virtual columns (https://stackoverflow.com/questions/68118107/how-to-create-a-virtual-column-to-index-json-column-in-mysql#:~:text=if%20table%20is%20already%20created%20and%20you%20want,%60jval%60%3B%20Dont%20forget%20to%20index%20the%20Generated%20Columns) and also improving performance for JSON object (https://blogit.create.pt/goncalomelo/2018/12/20/query-performance-for-json-objects-inside-sql-server/) but I can't really figure out if I should create a virtual column per key ? And, how can I create a virtual column with a transform ? In above case, I would create something like :
ALTER TABLE bilan_json
ADD COLUMN tagsau DATETIME
GENERATED ALWAYS AS STR_TO_DATE(JSON_VALUE(content, '$.bilan.victimes.*.preview.TAGSAU'),'%e/%c/%Y %H%#%i')
AFTER content;
What would be your advice ?
Simply put, If you expect to need a field in JSON for a WHERE or ORDER BY clause, that field should be in its own column.
3 approaches:
Redundantly store it in a column as you INSERT the rows.
Use a Virtual ("Generated") column (as you suggest).
Remove it from JSON as you put it in its own column.
Once it is in a column, it can be indexed. (It is unclear how useful an index would be for the SELECT you show.)
Did you try that ALTER? Did it work? We need SHOW CREATE TABLE in order to advise further.

How do I get the output of a SELECT statements in MariaDB/MySQL workbench to name columns in "table.column" format instead of just "column"?

I have to explore undocumented schemas to come up with query statements that will satisfy some business need.
When I SELECT * FROM foo JOIN bar ON foo.barid=bar.id; I get back a list of columns and I could guess where the columns from one table end and the next begin. But it would be awfully convenient if it just used the columns' full names i the output. i.e, every column would display as foo.columnname or bar.columnname.
yes, that's bulkier than optimal, and no I'd never use it in a production solution. but for exploratory pokings and prodings it would make things easier when I'm trying to figure out why a query isn't working right.
How do I turn that on by default?
CLARIFICATION: No. I'm not looking for "how to list all columns in a table/schema. I want to run queries joining tables together, and see the results, and see unambiguously and easily, what table a given field came from.
You can query the database information_schema to help you figure out what is what in your database. Running the following will get you close:
select table_name, column_name from information_schema.columns
https://dev.mysql.com/doc/refman/8.0/en/information-schema-columns-table.html
If you must use * you can qualify by table for example
select foo.*,'//',bar.*
will display all columns from foo first then a divider then all columns from bar and within foo the display left to right represents the ordinal position of the columns in foo. If the display columns are left justified then the column datatype is string of some description (varchar,char,text etc) if right justified then a number of some sort (int,decimal,float etc) . If a number is left justified then the underlying datatype is string. Date datatypes in mysql are in the form yyyy-mm-dd so if you see this then the underlying dataype is likely to be date. Similarly datetime
To understand the actual datatypes and find the indexes,constraints and foreign keys on a table then show create table tablename . If you want all table definitions use workbench export or mysqldump utility.
Also do read up on what information_schema https://dev.mysql.com/doc/refman/8.0/en/information-schema.html can do for you and consider reverse engineering your DB in workbench.

How to search either on id or name for certain purchase orders

We would like to filter purchase orders either based on purchase order id (primary key) or name of the purchase order using a single search box.
We used the like parameter to search on the name field, but it doesn't seem to work on the primary key. It works only when we use the equal operator for id(s). But it would be preferable if we can filter purchase orders using like for id(s). How to do this?
create table purchase_orders (
id int(11) primary key,
name varchar(255),
...
)
Option 1
SELECT *
FROM purchase_orders
WHERE id LIKE '%123%'; -- tribute to TemporaryNickName
This is horrible, performance-wise :)
Option 2a
Add a text column which receives a string version of id. Maybe add some triggers to populate it automatically.
Option 2b
Change the type of id column to CHAR or VARCHAR (I believe CHAR should be preferred for a primary key).
In both 2a. and 2b. cases, add an index (maybe a FULLTEXT one) to this column.
I think LIKE should work. I assume that your SQL wasn't correctly written.
Let's assume that you have order name "ABCDEF" then you can find this using the following query structure.
SELECT id FROM purchase_orders WHERE name LIKE '%CD%';
To explain it, % sign means it's a wildcard. As a result this query is going to select any String that contains "CD" inside of it.
According to the table structure, varchar can contain 255 characters. I think this is quite a large string and it's probably going to consume a lot of resources and going to take more time to search something using SQL functions like LIKE. You can always search it by id
WHERE id = something. This is much faster way btw
, but I don't think order id is an user friendly data, instead I would let users to use product name. My recommendation is to use apache Lucene or MySQL's full text search feature (which can improve search performance).
Apache lucene
MySQL Full text search function
These are tools built to search certain pattern or word through list of large strings in much faster way. Many websites use this to build their own mini search engines. I found mysql full text search function requires pretty much no learning curve and straight forward to use =D

mysql query and index how to do it

i need hand to index a large table ! and i have no idea about index mysql tables
this is the query when i order rows from table
SELECT "posts.* AS `posts` , user.nickname AS nickname
FROM `posts`
LEFT JOIN user AS user ON (user.userid = posts.userid )
WHERE
posts.userid= '" . intval($bbinfo['userid']) . "'
ORDER BY posts.timestamp DESC
LIMIT $start , $_limit
"
how i can use index to index this table after inser a new post to the table ? or by alert the table where and when i can use index table and how ? please help
Just create the index and define the way it works. Then you have nothing to do. If the SQL storage engine think your index should be used he will use it. And when you create or update data it will be maintained.
Now the hard part is the definition of the index.
You can see an index as an order, like when you use a phone book. Your phone book is ordered by city, then by lastName and then by first name. It's an oreder stored near the table that the engine can use to find the results faster than it would be if he needs to read the whole table data.
In a phone book there is only one index, so the data is ordered on, that index. In a database you can have several indexes, so they are stored near the table and contains pointers to the real data addresses.
Indexes are very important when you search data. You can easily find people names Smith in New York. It's harder to find all the Smith in all US cities (with a phone book).
In your query you have two instructions that may benefits from an index. You are filtering by user and then ordering by timestamp.
If you create an index by user and then timestamp the engine will already have the solution of your query by simply reading the index.
So I would create this one:
CREATE index posts_user_and_timestamp_idx ON posts(userid, timestamp DESC);
And this index could be reused for all queries where you are simply filtering by users (like the phone book. You can easily extract pages about one city). But not for queries where the only filter is the timestamp (you would need an index on the timestamp only, hard to extract all smith on all cities from the phone book).
So in fact the main problem of index is that they heavily depends on the queries you are usually using on the database. If you are never using the same sort of queries on a table then you will need a lot of different indexes. And an index is something which takes a looot of place. Most tables are using 3 or 4 more physical space for indexes than for the data.
You should find a MySQL admin tool that works for you since schema changes to your dbs, including adding indexes are a very common task.
I use MySQL Workbench to do most of the schema manipulation, including setting indexes on tables. This is a free admin app for mySQL dbs. If you dont have it, download it.
http://dev.mysql.com/downloads/workbench/5.1.html
Open your db in Workbench, right click on the table to add the index to and choose Alter Table... Then click on indexes at the bottom of the window, you should see something similar to:
You can also use PHPMyAdmin, which is a little more complex and a little harder to instal, IMHO.
I drilled down into my Program Files directory (Windows XP) to find the PHPMyAdmin executable file - which launched the app.
From PHPMyAdmin 3.2.1 - open your schema. Click on the table - which presents you with a GUI menu that will allow you to easily specify an index using the icon with the lighting bolt to the right of the column to be indexed.
You only need to add an index once. No need to worry about doing anything after every INSERT. Based-on what you have in your post, I would try something like this:
CREATE INDEX posts_userid_idx ON posts(userid);
If that doesn't seem to work very well, I would then advise you to check the MySQL Documentation on CREATE INDEX and see if any of the available options would apply to your situation.
Based-on your (revised) comment, you should also add a PRIMARY KEY on postid, as well.
ALTER TABLE posts ADD PRIMARY KEY (postid);
And yes, you should be able to run both of those commands in MySQL Workbench as you would any other query.

Suggest Sphinx index scheme

In a MySQL database I have documents of different type: some have text content, meta keys, descriptions, others have code, SKU number, size and brand name and so on. The problem is, I have to search something in all of these documents and then display a single page, where the results will be grouped by the document type, such as help page, blog post, item... It's not clear for me how to implement the Sphinx index: I want to have a single index to speed up queries, but since different docs have different structure - how can I group them? I was thinking about just concatenating them, but it just doesn't feel right.
Sphinx does not actually return documents, concatenated or not, it returns primary keys of the items or attributes you have indexed. Here, in this snippet from a sphinx.conf, the SQL here is used to build an index. When the index is subsequently searched, product.id will be returned whilst text2search will be searched.
sql_query = SELECT id, CONCAT_WS( ' ', field1, field2 ) as text2search FROM product
If your documents/products reside in the same database table, this is very straightforward. You are able to retrieve and recreate your data structure on the database side when given the primary key(s) to work with.
If you are indexing items of different types in one sphinx index when each type is mapped to a different table, it's a little more involved.