My database is MySQL and I found that there isn't a JSON type but only text. But in my situation, I need to store the data of JSON meaning and use index on the hash key of the JSON data. So 'text' type isn't the right way. By the way, I can't separate the JSON data into different columns, because I don't know what the keys are in the JSON.
What I really need is a solution for searching JSON data by the JSON keys in MySQL using index or sth, so the searching speed could be fast and acceptable.
If you can't normalize your data into a RDBMS-readable format, you should not use MySQL.
A noSQL database might be the better approach.
MySQLs strengths are to work with relational data.
You are trying to fit a square object through a round hole :)
select concat('{"objects\":['
,(select group_concat(
concat('{"id":"',id, '","field":"',field,'"}') separator ',')
from placeholder)
,']}');
Anything more generic would require dynamic sql
I think your situation calls for the Levenshtein Distance algorithm. You basically need to do a text search of an unknown quantity (the haystack) with a text fragment (the needle). And you need it to be fast, because indexing is out of the question.
There is a little known (or little-used at any rate) capability of MySQL whereby you can create User-Defined Functions. The function itself is a stored procedure, and for extra speed, it can be compiled in C++. UDF's are used natively in your queries as though they are part of regular MySQL syntax.
Here are the details for implementing a Levenshtein User-Defined Function in MySQL.
An example query might be...
SELECT json FROM my_table WHERE levenshtein('needle') < 5;
The 5 refers to the 'edit distance' and actually allows for near matches to be located.
Related
I retrieve data from a MySQL database using a simple SELECT FROM WHERE LIKE case-insensitive query where I escape any % or _ in the like clause, so really the user can only perform basic text research and cannot mess up with regex because I then surround it myself with % in the LIKE clause.
For every row returned by this query, I have to search again using a JS script in order to find all the indexes of the substring in the original string. I dislike this method because I it's a different pattern matching than the one used by the LIKE query, I can't guarantee that the algorithm is the same.
I found MySQL functions POSITION or LOCATE that can achieve it, but they return only the first index if it was found or 0 if it was not found. Yes you can set the first index to search from, and by searching by passing the previously returned index as the first index until the new returned index is 0, you can find all indexes of the substring, but it means a lot of additional queries and it might end up slowing down my application a lot.
So I'm now wondering: Is there a way to have the LIKE query to return substring positions directly, but I didn't find any because I lack MySQL vocabulary yet (I'm a noob).
Simple answer: No.
Longer answer: MySQL has no syntax or mechanism ot return an array of anything -- from either a SELECT or even a Stored Procedure.
Maybe answer: You could write a Stored procedure that loops through one result, finding the results and packing them into a commalist. But I cringe at how messy that code would be. I would quickly decide to write JS code, as you have already done.
Moral of the story: SQL is is not a full language. It's great at storing and efficiently retrieving large sets of rows, but lousy at string manipulation or arrays (other than "rows").
Commalist
If you are actually searching a simple list of things separated by commas, then FIND_IN_SET() and SUBSTRING_INDEX() in MySQL closely match what JS can be done with its split (on comma) method on strings.
I have a column that has brand names in an array format as below:
I want to extract information associated with Brand4 for example 'price'.
I tried using the below, but that's a psql query. How can I extract this information using MySQL in GCP.
SELECT Brand_name, price
FROM table_name
Where 'Brand4'=Any(Brand_name)
First, the explanation for your error message is that in MySQL, ANY() accepts a subquery, not just a single column or expression. See https://dev.mysql.com/doc/refman/8.0/en/any-in-some-subqueries.html
MySQL does not have an array type. Your Brand_name column is not an array, it's a string. It happens to contain commas and square brackets, but these are just characters in a string.
So your solutions are to use various string-search functions or expressions, as other folks have suggested.
The downside to all the string-search functions is that they cannot be optimized with a conventional index. So every search will be expensive, because it requires a table-scan.
Another solution I did not see yet is to use a fulltext index.
alter table brands add fulltext index (brand_name);
select * from brands
where match(brand_name) against ('Brand4' in boolean mode);
This may require some special handling if the brand names contain spaces or punctuation, but if they are plain words, it should work.
Read https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html to understand more about fulltext indexes.
The best solution would be to eliminate this fake "array" column by normalizing the schema to store one brand per row in another table. Then you can match strings exactly and optimize with a conventional index. But I understand you said that the table structure is not up to you.
This should work in MySQL (using a string function as mention here):
SELECT *
FROM brands
WHERE FIND_IN_SET('Brand4',brand_name);
see: DBFIDDLE
Provided SQL query will work in MySQL, if you will make a subquery within the parentheses, or use FIND_IN_SET instead of using ANY.
But, as stated in the MySQL documentation:
This function does not work properly if the first argument contains a
comma (,) character.
So, as an alternative, you could use LIKE (simple pattern matching).
Your SQL code then would be:
SELECT `brand_name`, `price`
FROM `test`
WHERE `brand_name` LIKE "%Brand4%"
See SQLFiddle for live example.
Also, you could use LOCATE.
Or any other alternative solution.
But, I must say that storing list data in the way you do, - it's not the best practice out there.
There are plenty of ways this can be done better.
For example, using M:M (many-to-many) relationship.
In case you made this design you really have to reconsider/redesign. Databases have there own data structures and sql is not an imparative language but a declaritve one.
If when you didn´t desing you should consider create a table out of the one column. Perhaps this is what you try.
If it is just locating a specific string in the values of a field use like
SELECT Brand_name, price
FROM table_name
Where brand_anme like '%Brand4%'
But realize this is will not always yield accurate results.
I'm working in a existing application and I'm asked to order by a child field of a DC2Type:json_array field in Symfony. Normally I would add the field as a column in the table. In this case this is not possible.
We have a JsonSerializable invoice entity with a normal date attribute. But also a data attribute which contains the due_date. I whould like to order by data[due_date] in a Symfony query. Is this at all possible?
tl;dr: No, not really.
According to Doctrine's type mapping matrix, json_array gets mapped to MySQL's MEDIUMTEXT column type, which by default obviously does not index its contents as json hence provides little to no performance advantage. (also, AFAICT, doctrine doesn't provide any json functionality besides converting db json from and to php arrays/nulls)
Maybe you could magically do some string search magic to extract a value to sort by it, but you still wouldn't get the performance boost a proper index provides. Depending on your data this could get noticably slow (and eat memory).
The JSON data type is fairly "new" to the relational database world and mappers like doctrine have not yet fully adopted it either. Extending doctrine to handle this data type will probably take lots of work. Instead you could rethink your table schema to include all the fields as columns you want to order by to use all benefits a relational database provides (like indexing).
I am doing a project, where I need to store billions of rows of unstructured history_data in a sql database (postgres) 2-3 years. The data/columns may change from day to day.
So example, day one the user might save {“user_id”:”2223”, “website”:”www.mywebsite.org”, “webpage”:”mysubpageName”}.
And the following day {“name”:”username”, “user_id”: “2223”, “bookclub_id”:”1” }.
I have been doing a project earlier, where we used the classic entity key/value table model for this problem. We saved maybe up to 30 key/values pr entity. But when exceeding 70-100 mill rows, the queries began to run slower and slower (too many inner joins).
Therefore i am wondering if I should change using the Json model in postgres. After searching the web, and reading blogs, I am really confused. What are the pro and con changing this to json in postgres?
You can think about this in terms of query complexity. If you have an index to the json documents (maybe user_id) you can do a simple index-scan to access the whole json string very fast.
You have to dissect it on the client side then, or you can pass it to functions in postgres, if e.g. you want to extract only data for specific values.
One of the most important features of postgres when dealing with json is having functional indexes. In comparison to "normal" index which index the value of a column, function indexes apply a function to a value of one (or even more) column values and index the return value. I don't know the function that extracts the value of a json string, but consider you want the user that have bookclub_id = 1. You can create an index like
create index idx_bookblub_id on mytable using getJsonValue("bookclub_id",mytable.jsonvalue)
Afterwards queries like
select * from mytable where getJsonValue("bookclub_id",mytable.jsonvalue) = 1
are lightning fast.
Part of my app includes volume automation for songs.
The volume automation is described in the following format:
[[0,50],[20,62],[48,92]]
I consider each item in this array a 'data point' with the first value containing the position in the song and the second value containing the volume on a scale of 0-100.
I then take these values and perform a function client-side to interpolate this data with 'virtual' data points in order to create a bezier curve allowing smooth volume transition as an audio file is playing.
However, the need has arisen to allow a user to save this automation into the database for recall at a later date.
The datapoints can be unlimited (though in reality should never really exceed around 40-50 with most being less than 10)
Also how should I handle the data? Should it be stored as is, in a text field? Or should I process it in some way beforehand for optimum results?
What data type would be best to use in MySQL to store an array?
Definitely not a text field, but a varchar -- perhaps. I wouldn't recommend parsing the results and storing them in individual columns unless you want to take advantage of that data in database sense -- statistics etc.
If you never see yourself asking "What is the average volume that users use?" then don't bother parsing it.
To figure out how to store this data ask yourself "How will i use it later?" If you will obtain the array and need to utilize it with PHP you can use serialize function. If you will use the values in JavaScript then JSON encoding will probably be best for you (plus many languages know how to decode it)
Good luck!
I suggest you to take a look at the JSON data type. This way you can store your array in a more efficient way than text or varchar, and you can access your data directly form MySQL without having to parse the whole thing.
Take a look at this link : https://dev.mysql.com/doc/refman/5.7/en/json.html
If speed is the most important when retrieving the rows then make a new table and make it dedicated to holding the indices of your array. Use the data type of integer and have each row represent an index of the array. You'll have to create another numeric column which binds these together so you can re-assemble the array with an SQL query.
This way you help MySQL help you speed up access. If you only want certain parts of the array, you just change the range in the SQL query and you can reassemble the array however you want.
The best way to store array is JSON data type -
CREATE TABLE example (
`id` int NOT NULL AUTO_INCREMENT,
`docs` JSON,
PRIMARY KEY (`id`)
);
INSERT INTO example (docs)
VALUES ('["hot", "cold"]');
Read more - https://sebhastian.com/mysql-array/#:~:text=Although%20an%20array%20is%20one,use%20the%20JSON%20data%20type.