I have a use-case where I want to search all the SQL queries on an entity in database, and later I want to query on which fields and for which values that query had been executed. I am developing this in ruby.
For example, Following queries were ran:
select * from entity where created_at > "2015-03-01" and created_at < "2015-03-03" ;
select * from entity where created_at > "2015-03-03" and created_at < "2015-03-04" and type in ('forward');
select * from entity where created_at > "2015-03-01" and created_at < "2015-03-03" and type in ('reverse');
Later on, I should be able to fetch for which dates on created_at, these queries were ran. This can be queried for type as well, so want to keep it completely generic.
So, I was thinking to save conditions of these queries in JSON format. Let me know if you know of any format or gem, which can help for such case.
I was thinking of constructing JSON for these queries in following format:
{
"query": [
{
"field": "created_at",
"value_more_than": "2015-03-03",
"value_less_than": "2015-03-04",
"type" : "date"
},
{
"field": "type",
"value_in": "forward",
"type" : "String"
}
]
}
And saving these for future reference and querying on these when asked for. Please let me know, of you are familiar with a better format or can suggest some improvement overall.
Related
The data in the column looks like the below:
{
"activity": {
"token": "e7b64be4-74d4-7a6d-a74b-xxxxxxx",
"route": "http://example.com/enroll/confirmation",
"url_parameters": {
"Success": "True",
"ContractNumber": "003992314W",
"Barcode": "1908Y10Z",
"price": "8.99"
},
"server_info": {
"cookie": [
"_ga=xxxx; _fbp=xxx; _hjid=xxx; XDEBUG_SESSION=XDEBUG_ECLIPSE;"
],
"upgrade-insecure-requests": [
"1"
],
},
"campaign": "Unknown/None",
"ip": "192.168.10.1",
"entity": "App\\Models\\User",
"entity_id": "1d9f3066-13ce-4659-b10d-xxxxx",
},
"time": "2021-05-21 20:15:02"
}
My code that I am using is below:
SELECT *
FROM websote.stored_events
WHERE JSON_EXTRACT(event_properties, '$.route') = 'http://example.com/enroll/confirmation'
ORDER BY created_at DESC LIMIT 500;
The code works on the other the json values just not the url ones. I've tried escaping the values in MySQL like the below:
SELECT *
FROM websote.stored_events
WHERE JSON_EXTRACT(event_properties, '$.route') = 'http:///example.com//enroll//confirmation'
ORDER BY created_at DESC LIMIT 500;
But still no luck. Any help on this would be appreciated!
Route is a nested property; I would have expected the path to be
JSON_EXTRACT(event_properties, '$.activity.route')
Your example data isn't valid JSON. You can't have a comma after the last element in an object or array:
"entity_id": "1d9f3066-13ce-4659-b10d-xxxxx",
},
^ here
If I remove that and other similar cases, I can test your data inserts into a JSON column and I can extract the object element you described:
mysql> select json_extract(event_properties, '$.activity.route') as route from stored_events;
+------------------------------------------+
| route |
+------------------------------------------+
| "http://example.com/enroll/confirmation" |
+------------------------------------------+
Note the value is returned with double-quotes. This is because it's returned as a JSON document, a scalar string. If you want the raw value, you have to unquote it:
mysql> select json_unquote(json_extract(event_properties, '$.activity.route')) as route from stored_events;
+----------------------------------------+
| route |
+----------------------------------------+
| http://example.com/enroll/confirmation |
+----------------------------------------+
If you want to search for that value, you would have to do a similar expression:
select * from stored_events
where json_unquote(json_extract(event_properties, '$.activity.route'))
= 'http://example.com/enroll/confirmation'
Searching based on object properties stored in JSON has disadvantages.
It requires complex expressions that force you (and anyone else you needs to maintain your code) to learn a lot of details about how JSON works.
It cannot be optimized with an index. This query will run a table-scan. You can add virtual columns with indexes, but that adds to complexity and if you need to ALTER TABLE to add virtual columns, it misses the point of JSON to store semi-structured data.
The bottom line is that if you find yourself using JSON functions in the WHERE clause of a query, it's a sign that you should be storing the column you want to search as a normal column, not as part of a JSON document.
Then you can write code that is easy to read, easy for your colleagues to maintain, and can be optimized easily with indexes:
SELECT * FROM stored_events
WHERE route = 'http://example.com/enroll/confirmation';
You can still store other properties in the JSON document, but the ones you want to be searchable should be stored in normal columns.
You might like to view my presentation How to Use JSON in MySQL Wrong.
I am new to couchbase and I have been going through couchbase documents and other online resources for a while but I could't get my query working. Below is the data structure and my query:
Table1:
{
"jobId" : "101",
"jobName" : "abcd",
"jobGroup" : "groupa",
"created" : " "2018-05-06T19:13:43.318Z",
"region" : "dev"
},
{
"jobId" : "102",
"jobName" : "abcd2",
"jobGroup" : "groupa",
"created" : " "2018-05-06T22:13:43.318Z",
"region" : "dev"
},
{
"jobId" : "103",
"jobName" : "abcd3",
"jobGroup" : "groupb",
"created" : " "2018-05-05T19:11:43.318Z",
"region" : "test"
}
I need to get the jobId which has the latest job information (max on created timestamp) for a given jobGroup and region (group by jobGroup and region).
My sql query doesn't help me using self-join on jobId.
Query:
/*
Idea is to pull out the job which was executed latest for all possible groups and region and print the details of that particular job
select * from (select max(DATE_FORMAT_STR(j.created,'1111-11-11T00:00:00+00:00')) as latest, j.jobGroup, j.region from table1 j
group by jobGroup, region) as viewtable
join table t
on keys meta(t).id
where viewtable.latest in t.created and t.jobGroup = viewtable.jobGroup and
viewtable.region = t.region
Error Result: No result displayed
Desired result :
{
"jobId" : "102",
"jobName":"abcd2",
"jobGroup":"groupa",
"latest" :"2018-05-06T22:13:43.318Z",
"region":"dev"
},
{
"jobId" : "103",
"jobName" : "abcd3",
"jobGroup" : "groupb",
"created" : " "2018-05-05T19:11:43.318Z",
"region" : "test"
}
If I understand your query correctly, this can be answered using 'group by' and no join. I tried entering your sample data and the following query gives the correct result:
select max([created,d])[1] max_for_group_region
from default d
group by jobGroup, region;
How does it work? It uses 'group by' to group documents by jobGroup and region, then creates a two-element array holding, for every document in the group:
the 'created' timestamp field
the document where the timestamp came from
It then applies the max function on the set of 2-element arrays. The max of a set of arrays looks for the maximum value in the first array position, and if there's a tie look at the second position, and so on. In this case we are getting the two-element array with the max timestamp.
Now we have an array [ timestamp, document ], so we apply [1] to extract just the document.
I'm seeing some inconsistencies and invalid JSON in your examples, so I'm going to do the best I can. First off, I'm using Couchbase Server 5.5 which provides the new ANSI JOIN syntax. There might be a way to do this in an earlier version of Couchbase Server.
Next, I created an index on the created field: CREATE INDEX ix_created ON bucketname(created).
Then, I use a subquery to get the latest date, aggregated by jobGroup and region. I then join the latest date from this query to the entire bucket and select the fields that (I think) you want in your desired result:
SELECT k.jobId, k.jobName, k.jobGroup, k.created AS latest, k.region
FROM (
SELECT j.jobGroup, j.region, MAX(j.created) as latestDate
FROM so j
GROUP BY j.jobGroup, j.region
) dt
LEFT JOIN so k ON k.created = dt.latestDate;
Problems with this approach:
If two documents have the exact same date, this isn't a reliable way to determine the latest. You can add a LIMIT 1 to the subquery, which would just pick one arbitrarily, or you could ORDER BY whatever your preference is.
Subquery performance: I don't know how large your data set is, but this could be pretty slow.
Requires Couchbase Server 5.5, which is currently in beta.
If you are using a different version of Couchbase Server, you may want to consider asking in the Couchbase N1QL Forums for a more expert answer.
I have created a new bucket, FooBar on my couchbase server.
I have a Json Document which is a List with some properties and it is in my couchbase bucket as follows:
[
{
"Venue": "Venue1",
"Country": "AU",
"Locale": "QLD"
},
{
"Venue": "Venue2",
"Country": "AU",
"Locale": "NSW"
},
{
"Venue": "Venue3",
"Country": "AU",
"Locale": "NSW"
}
]
How Do i get the couchbase query to return a List of Locations when using N1QL query.
For instance, SELECT * FROM FooBar WHERE Locale = 'QLD'
Please let me know of any indexes I would need to create as well. Additionally, how can i return only results where the object is of type Location, and not say another object which may have the 'Locale' Property.
Chud
PS - I have also created some indexes, however I would like an unbiased answer on how to achieve this.
Typically you would store these as separate documents, rather than in a single document as an array of objects, which is how the data is currently shown.
Since you can mix document structures, the usual pattern to distinguish them is to have something like a 'type' field. ('type' is in no way special, just the most common choice.)
So your example would look like:
{
"Venue": "Venue1",
"Country": "AU",
"Locale": "QLD",
"type": "location"
}
...
{
"Venue": "Venue3",
"Country": "AU",
"Locale": "NSW",
"type": "location"
}
where each JSON object would be a separate document with a unique document ID. (If you have some predefined data you want to load, look at cbimport for how to add it to your database. There are a few different formats for doing it. You can also have it generate document IDs for you.)
Then, what #vsr wrote is correct. You'd create an index on the Locale field. That will be optimal for the query you want. Note you could create an index on every document with CREATE INDEX ix1 ON FooBar(Locale); too. In this simple case it doesn't really make a difference. Read about the query Explain feature of the admin console to for help using that to understand optimizing queries.
Finally, the query #vsr wrote is also correct:
SELECT * FROM FooBar WHERE type = "Location" AND Locale = "QLD";
CREATE INDEX ix1 ON FooBar(Locale);
https://dzone.com/articles/designing-index-for-query-in-couchbase-n1ql
CREATE INDEX ix1 ON FooBar(Locale) WHERE type = "Location";
SELECT * FROM FooBar WHERE type = "Location" AND Locale = "QLD";
If it is array and filed name is list
CREATE INDEX ix1 ON FooBar(DISTINCT ARRAY v.Locale FOR v IN list END) WHERE type = "Location";
SELECT * FROM FooBar WHERE type = "Location" AND ANY v IN list SATISFIES v.Locale = "QLD" END;
I have a table item with a field called data of type JSONB. I would like to query all items that have text that equals 'Super'. I am trying to do this currently by doing this:
Item.objects.filter(Q(data__areas__texts__text='Super'))
Django debug toolbar is reporting the query used for this is:
WHERE "item"."data" #> ARRAY['areas', 'texts', 'text'] = '"Super"'
But I'm not getting back any matching results. How can I query this using Django? If it's not possible in Django, then how can I query this in Postgresql?
Here's an example of the contents of the data field:
{
"areas": [
{
"texts": [
{
"text": "Super"
}
]
},
{
"texts": [
{
"text": "Duper"
}
]
}
]
}
try Item.objects.filter(data__areas__0__texts__0__text='Super')
it is not exact answer, but it can clarify some jsonb filter features, also read django docs
I am not sure what you want to achieve with this structure, but I was able to get the desired result only with strange raw query, it can look like this:
Item.objects.raw("SELECT id, data FROM (SELECT id, data, jsonb_array_elements(\"table_name\".\"data\" #> '{areas}') as areas_data from \"table_name\") foo WHERE areas_data #> '{texts}' #> '[{\"text\": \"Super\"}]'")
Dont forget to change table_name in query (in your case it should be yourappname_item).
Not sure you can use this query in real programs, but it probably can help you to find a way for a better solution.
Also, there is very good intro to jsonb query syntax
Hope it will help you
I'm currently storing the data in the following format (JSON) in a Redis ZSET. The score is the timestamp in miliseconds.
<timestamp_1> - [ { "key1" : 200 }, { "key2": 100 }, {"key3" : 5 }, .... {"key_n" : 1} ]
<timestamp_2> - [ { "key50" : 500 }, { "key2": 300 }, {"key3" : 290 }, ....{"key_m" : 26} ]
....
....
<timestamp_k> - [ { "key1" : 100 }, { "key2": 200 }, {"key3" : 50 }, ....{"key_p" : 150} ]
I want to extract the values for a key between a given time range.
For example, the values of key2 will in the above example for the entire time range would be.
[timestamp_1:100, timestamp_2:300, ..... timestamp_k:200]
I can get the current output but I've to parse the JSON for each row and then iterate through it to get the value of a given key in each row. The parsing becomes a bottleneck as the size of each row increases (n,m, and p can be as big as 10000).
I'm looking for suggestions about if there is a way to improve the performance in Redis? Are there any specific parsers (in Scala) that can help here.
I'm also open to using other stores such as Cassandra and Elasticsearch if they give better performance. I'm also open to other formats apart from JSON to store the data in Redis ZSet.
Cassandra will work just fine for your requirement.
You can keep key_id as the partitioning key and timestamp as the row key.
You always define your query before designing your column family in cassandra. extract the values for a key between a given time range.
If you are using CQL3,
Create schema:
CREATE TABLE imp_keys (key_id text, score int, timestamp timeuuid,PRIMARY KEY(key_id,timestamp));
Access data:
SELECT score FROM imp_keys WHERE key_id=key2 AND timestamp > maxTimeuuid(start_date) AND timestamp < maxTimeuuid(end_date);