Search in mysql database - unserialized data - mysql

Situation:
I have user model. attribute "meta_data" in db represents "text" type field.
In model it seriazized by custom class. ( serialize :meta_data, CustomJsonSerializer.new )
It means, when I have an instance of user, I can work with meta_data like with Hash.
User.first.meta_data['username']
Problem:
I need to write a search function, which will search users by given string. I can do it by manual building search query in rails ex. User.where("email LIKE '%#{string}%'")...
But what about meta_data ? Should I search in this field by LIKE statement too? If I will do so, it will decrease relevance of found record.
For example:
I have 2 users. One of them has username "patrick", another one is "sergio"
meta data in db will look like this:
1) {username: patrick}
2) {username: sergio}
I want to find sergio , I enter a search string "ser" => but I have 2 results, instead of one. This meta_data string "{uSERname: Patrick}" also has "ser", so it makes this record irrelevant.
Do you have any idea how to solve it?

That's really the problem with serialized data. In theory, the serialization could be an algorithm that is very unsearchable. It could do a Hoffman encoding, or other compression, and store the serialization in binary. You are relying on the assumption that the serialization uses JSON and your string will still be findable as a sub-string in the serialization.
Then the problem you are having is another issue. Other data in the serialization can mess up your results.
In general, if you serialize data, you are making a choice to not be searchable.
So a solution would be to add an additional field that you populate in a way that you control. Have a values field and store a pipe (|) delimited value that you can search. So if the data is {firstname: "Patrick", lastname: "Stern"}, your meta_values field might be "Patrick|Stern".
Also, don't use the where method with a string with #{} expansion of input values. The makes it vulnerable to SQL attacks. Instead use:
where("meta_values is like :pattern", pattern: "%#{string}%")
I know that may not look very different, but ActiveRecord will go through a sanitizing this way. If someone has a semi-colon in string, then ActiveRecord will escape the semi-colon in the search condition.

Related

Bookshelf: query builder: how to use like operator with json column?

I'm using bookshelf with postgresql database
Information is a column of type json.
I want to retrieve all column that are like '%pattern%'
With sql query i use
select * from table where information::text like '%pattern%';
I want to do that with bookshelf query builder
model.query(function(qb) {
qb.where('information', 'LIKE', '%pattern%')
}).fetch()
But it didn't work and i can't find how to do it in bookshelf docs
Any idea?
The tricky part here is, although you might think that JSON (and JSONB) columns are text, they aren't! So there's no way to do a LIKE comparison on one. Well, there is, but you'd have to convert it to a string first:
SELECT * FROM wombats WHERE information #>> '{}' LIKE '%pattern%';
which is a really terrible idea, please don't do that! As #GMB points out in the comments, JSON is a structured format that is far more powerful. Postgres is great at handling JSON, so just ask it for what you need. Let's say your value is in a JSON property named description:
SELECT * FROM wombats
WHERE (information->'description')::TEXT
LIKE '%pattern%';
Here, even though we've identified the correct property in our JSON object, it comes out as type JSON: we still have to cast it to ::TEXT before comparing it with a string using LIKE. The Bookshelf/Knex version of all this would look like:
model
.query(function(qb) {
const keyword = "pattern";
qb.whereRaw(`(information->'description')::TEXT LIKE '%${keyword}%'`)
})
.fetch();
Apparently this part of the raw query cannot be parameterized (in Postgres, at least) so the string substitution in JavaScript is required. This means you should be extra careful with where that string comes from (ie only use a limited subset, or sanitise before use) as you're bypassing Knex's usual protections.

MYSQL REGEXP with JSON array

I have an JSON string stored in the database and I need to SQL COUNT based on the WHERE condition that is in the JSON string. I need it to work on the MYSQL 5.5.
The only solution that I found and could work is to use the REGEXP function in the SQL query.
Here is my JSON string stored in the custom_data column:
{"language_display":["1","2","3"],"quantity":1500,"meta_display:":["1","2","3"]}
https://regex101.com/r/G8gfzj/1
I now need to create a SQL sentence:
SELECT COUNT(..) WHERE custom_data REGEXP '[HELP_HERE]'
The condition that I look for is that the language_display has to be either 1, 2 or 3... or whatever value I will define when I create the SQL sentence.
So far I came here with the REGEX expression, but it does not work:
(?:\"language_display\":\[(?:"1")\])
Where 1 is replaced with the value that I look for. I could in general look also for "1" (with quotes), but it will also be found in the meta_display array, that will have different values.
I am not good with REGEX! Any suggestions?
I used the following regex to get matches on your test string
\"language_display\":\[(:?\"[0-9]\"\,)*?\"3\"(:?\,\"[0-9]\")*?\]
https://regex101.com/ is a free online regex tester, it seems to work great. Start small and work big.
Sorry it doesn't work for you. It must be failing on the non greedy '*?' perhaps try without the '?'
Have a look at how to serialize this data, with an eye to serializing the language display fields.
How to store a list in a column of a database table
Even if you were to get your idea working it will be slow as fvck. Better off to process through each row once and generate something more easily searched via sql. Even a field containing the comma separated list would be better.

Dynamic field name query using N1QL

I'm having a use case here which I can't seem to solve. Basically, I need to create a webservice where users may query the couchbase cluster "dynamically". Indeed, i'm storing metadata of different files, and the "creation" of this metadata is up to the user : I don't have specific fields in my Java POJO, i'm inserting a MAP which gets inserted as a nested object in couchbase.
Now the query I need is pretty simple on paper and looks something like this :
#Query("#{#n1ql.selectEntity} WHERE #{#n1ql.filter} AND $1 = $2")
List<FileMetadata> findListMetadata(String pKey, String pValue);
But it doesn't seem to work, $1 doesn't seem to ever get replaced by the pKey variable.
I'm using CouchBase 4.5 with the Spring Data connector.
Any ideas on how to solve that use case ?
You need to dynamically generate the query string, so that pKey is inserted into the query string and pValue is passed as a parameter (as you are doing).

What is DC2Type array datatype in mysql

I have been working with Symfony2 and doctrine2 recently and have realized a peculiar datatype called DC2Type:array that certain Symfony2 Roles get saved as. To me it just looks like a serialized PHP array where a signifies the total number of elements, i is the array index.
The value looks like this:
a:15:{i:0;s:32:"ROLE_SONATA_USER_ADMIN_USER_EDIT";i:1;s:32:"ROLE_SONATA_USER_ADMIN_USER_LIST";i:2;s:34:"ROLE_SONATA_USER_ADMIN_USER_CREATE";i:3;s:32:"ROLE_SONATA_USER_ADMIN_USER_VIEW";i:4;s:34:"ROLE_SONATA_USER_ADMIN_USER_DELETE";i:5;s:36:"ROLE_SONATA_USER_ADMIN_USER_OPERATOR";i:6;s:34:"ROLE_SONATA_USER_ADMIN_USER_MASTER";i:7;s:33:"ROLE_SONATA_USER_ADMIN_GROUP_EDIT";i:8;s:33:"ROLE_SONATA_USER_ADMIN_GROUP_LIST";i:9;s:35:"ROLE_SONATA_USER_ADMIN_GROUP_CREATE";i:10;s:33:"ROLE_SONATA_USER_ADMIN_GROUP_VIEW";i:11;s:35:"ROLE_SONATA_USER_ADMIN_GROUP_DELETE";i:12;s:37:"ROLE_SONATA_USER_ADMIN_GROUP_OPERATOR";i:13;s:35:"ROLE_SONATA_USER_ADMIN_GROUP_MASTER";i:14;s:10:"ROLE_ADMIN";}
I want to know what this datatype is?
And what do the following identifier signifies:
s:
I have searched the internet but haven't got any useful data.
I also bumped upon this cookbook entry - http://readthedocs.org/docs/doctrine-orm/en/2.0.x/cookbook/mysql-enums.html but didn't figure out the origin.
This is not a data type. You might have noticed that the column type is LONGTEXT. DC2Type:array is a comment of the field.
Doctrine uses the field's comment as column's metadata storage place. Since Mysql does not allow you to store an array, Doctrine use DC2Type:array as comment in order to know how to unserialize the content.
Take a look at the link below.
https://github.com/doctrine/dbal/issues/1614
From the link you mentioned, you can see that the comment DC2Type:enumvisibility indicates that the content of the field is a flag, indicating that the record is visible or not. It is not a new data type at all. It should be considered an helper strategy in the database level. For Doctrine, it's a custom data type.
This is simply a string. Its format is a serialized PHP array. The s: refers to the size or length of each item value in the array.
e.g. s:32:"ROLE_SONATA_USER_ADMIN_USER_EDIT"
If you count the characters in the ROLE string, there are 32.
Hope this helps.

Converting a mySQL table with id's to a MongoDB table with _id's

I'm transferring a MySQL table to MongoDB. There is a primary key id in my MySQL table and I want this id to be converted to the _id in MongoDB.
I use php's MongoCollection::insert( $mysql_array );
However it doesn't work because if I set $mysql_array['_id'] it is seen by mongo as a String instead of a MongoId. I tried $mysql_array['_id'] = new MongoId( $id ) but it doesn't allow me to override the default _id value. I saw that all my MySQL's integer columns are converted to string by MongoCollection::insert(). If I could set MongoCollection::insert() to correctly transfer an integer it would maybe work.
typecast the _id to a integer value like this...
(int) $mysql_array['_id']
You'll find yourself doing this a lot in mongoDB
The ObjectId is a special type in Mongo, but the _id property doesn't have to be of this type. You can't coerce a string or number into an ObjectId, and you shouldn't.
I assume the problem as you perceive it is that your insert worked, but when you looked at the data in the database the _id property didn't look like _id: ObjectId("1234") (if the original ID was 1234). This is as it should be, and it's perfectly fine.
The idea with ObjectId is that it has a predefined structure that makes it guaranteed (mostly) to be unique across a Mongo cluster, but this also means that it has to have this structure, otherwise it is not an ObjectId.
You also mention that all your integer columns are converted to strings. PHP and PHP libraries, are notoriously sloppy when it comes to types, so make sure that it's not the case that the values are already strings when they come from the MySQL result set. Worst case you have to explicitly cast the values before inserting them into Mongo.
You won't be able to convert an arbitrary String value into an Mongo ObjectId due to its specific characteristics (12 bytes -> 24 chars generated from 4 bytes timestamp, 3 bytes client hostname, 2 bytes PID, 3 bytes inc value).
Either you abandon using the MongoId type in your collection's _id-fields and use your MySQL-ID as a string instead (which is not a problem and makes the most sense) or you let Mongo to generate the documents' _id for you, which is also a suitable solution if you want to be able to use the MongoId functions (assuming you're working with PHP):
The MongoId class
Choosing the second solution you still are able to store your MySQL-IDs in another field of the doc, like id or *mysql_id* to reference them later.
Concerning your question about (int) and (string) values: Are you sure they come as a PHP integer from your MySQL DB? If so, they usually should be stored as integers in Mongo. Check it with a var_dump() and in case of incompatibility cast it with an (int). Maybe it would be helpful if you post your select/insert code...
Use MongoCollection::save() and your array should work.