I am having a JSON array of numbers like [16, 9, 11, 22, 23, 12]. I would like to get the index of numbers within the array. For example is I say that I would like to have the index of 9, it should return 1.
I tried using below mentioned query in MySQL, but getting null.
SELECT JSON_SEARCH(CAST('[16, 9, 11, 22, 23, 12]' AS JSON),'one',9)
Do you guys have solution for this ?
CAST is not necessary here. But array values should be quoted as
JSON_SEARCH(json_doc, one_or_all, search_str[, escape_char[, path] ...])
Returns the path to the given string within a JSON document.
SELECT json_search('["16", "9", "11", "22", "23", "12"]', 'one', '9');
returns "$[1]"
Related
I'm new to MySQL and received a task which requires a complex(for me) query. I read the documentation and a few sources but I still cannot write it myself.
I'm selecting a rows from a table where in one of the cells I have JSON like this one
{
[
{
"interval" : 2,
"start": 03,
"end": 07,
"day_of_week": 3
}, {
"interval" : 8,
"start": 22,
"end": 23,
"day_of_week": 6
}
]
}
I want to check if some of the "day_of_week" values is equal to the current day of week and if so to write this value and the values of "start", "end" and "day_of_week" assoiciated with it in a variables to use them in the query.
That's not valid JSON format, so none of the MySQL JSON functions will work on it regardless. Better just fetch the whole blob of not-JSON into a client application that knows how to parse it, and deal with it there.
Even if it were valid JSON, I would ask this: why would you store data in a format you don't know how to query?
The proper solution is the following:
SELECT start, end, day_of_week
FROM mytable
WHERE day_of_week = DAYOFWEEK(CURDATE());
See how easy that is when you store data in normal rows and columns? You get to use ordinary SQL expressions, instead of wondering how you can trick MySQL into giving up the data buried in your non-JSON blob.
JSON is the worst thing to happen to relational databases.
Re your comment:
If you need to query by day of week, then you could reorganize your JSON to support that type of query:
{
"3":{
"interval" : 2,
"start": 03,
"end": 07,
"day_of_week": 3
},
"6": {
"interval" : 8,
"start": 22,
"end": 23,
"day_of_week": 6
}
}
Then it's possible to get results for the current weekday this way:
SELECT data->>'$.start' AS `start`,
data->>'$.end' AS `end`,
data->>'$.day_of_week' AS `day_of_week`
FROM (
SELECT JSON_EXTRACT(data, CONCAT('$."', DAYOFWEEK(CURDATE()), '"')) AS data
FROM mytable
) AS d;
In general, when you store data in a non-relational manner, the way to optimize it is to organize the data to support a specific query.
I have date value stored in a format like this:
"timestamp": [
2020,
2,
14,
15,
45,
47,
8000000
]
I have no idea what this format is. I haven't found any solution on how to convert this array of numbers into a MySQL timestamp via MySQL tools. Could anybody give me a clue about it?
In MS Access , I have a field by name "TargetDays" that has values like "0", "13", 20", "6", "1", "9", ".""2", "28"
I want them to be sorted as
., 0, 1, 2, 6, 9, 13, 20, 28
I tried doing ORDER BY val(TargetDays)
But this sorts sometimes as ., 0, 1, 2, 6, 13, 20, 28. But other times it sorts as 0, ., 1, 2, 6, 13, 20, 28. The problem is coming with "." and "0".
Could someone please tell me a solution to sort in the intended order (as mentioned above)?
That happens because Val(".") and Val("0") both return 0, so your ORDER BY has no way to distinguish between those 2 characters in your [TargetDays] field ... and no way to know it should sort "." before "0".
You can include a secondary sort, based on ASCII values, to tell it what you want. An Immediate window example of the Asc() function in action ...
? Asc("."), Asc("0")
46 48
You could base your secondary sort on that function ...
ORDER BY val(TargetDays), Asc(TargetDays)
However, I don't think you should actually need to include the function because this should give you the same result ...
ORDER BY val(TargetDays), TargetDays
I have a weather file where I would like to extract the first value for "air_temp" recorded in a JSON file. The format this HTTP retriever uses is regex (I know it is not the best method).
I've shortened the JSON file to 2 data entries for simplicity - there are usually 100.
{
"observations": {
"notice": [
{
"copyright": "Copyright Commonwealth of Australia 2017, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml",
"copyright_url": "http://www.bom.gov.au/other/copyright.shtml",
"disclaimer_url": "http://www.bom.gov.au/other/disclaimer.shtml",
"feedback_url": "http://www.bom.gov.au/other/feedback"
}
],
"header": [
{
"refresh_message": "Issued at 12:11 pm EST Tuesday 11 July 2017",
"ID": "IDN60901",
"main_ID": "IDN60902",
"name": "Canberra",
"state_time_zone": "NSW",
"time_zone": "EST",
"product_name": "Capital City Observations",
"state": "Aust Capital Territory"
}
],
"data": [
{
"sort_order": 0,
"wmo": 94926,
"name": "Canberra",
"history_product": "IDN60903",
"local_date_time": "11/12:00pm",
"local_date_time_full": "20170711120000",
"aifstime_utc": "20170711020000",
"lat": -35.3,
"lon": 149.2,
"apparent_t": 5.7,
"cloud": "Mostly clear",
"cloud_base_m": 1050,
"cloud_oktas": 1,
"cloud_type_id": 8,
"cloud_type": "Cumulus",
"delta_t": 3.6,
"gust_kmh": 11,
"gust_kt": 6,
"air_temp": 9.0,
"dewpt": 0.2,
"press": 1032.7,
"press_qnh": 1031.3,
"press_msl": 1032.7,
"press_tend": "-",
"rain_trace": "0.0",
"rel_hum": 54,
"sea_state": "-",
"swell_dir_worded": "-",
"swell_height": null,
"swell_period": null,
"vis_km": "10",
"weather": "-",
"wind_dir": "WNW",
"wind_spd_kmh": 7,
"wind_spd_kt": 4
},
{
"sort_order": 1,
"wmo": 94926,
"name": "Canberra",
"history_product": "IDN60903",
"local_date_time": "11/11:30am",
"local_date_time_full": "20170711113000",
"aifstime_utc": "20170711013000",
"lat": -35.3,
"lon": 149.2,
"apparent_t": 4.6,
"cloud": "Mostly clear",
"cloud_base_m": 900,
"cloud_oktas": 1,
"cloud_type_id": 8,
"cloud_type": "Cumulus",
"delta_t": 2.9,
"gust_kmh": 9,
"gust_kt": 5,
"air_temp": 7.3,
"dewpt": 0.1,
"press": 1033.1,
"press_qnh": 1031.7,
"press_msl": 1033.1,
"press_tend": "-",
"rain_trace": "0.0",
"rel_hum": 60,
"sea_state": "-",
"swell_dir_worded": "-",
"swell_height": null,
"swell_period": null,
"vis_km": "10",
"weather": "-",
"wind_dir": "NW",
"wind_spd_kmh": 4,
"wind_spd_kt": 2
}
]
}
}
The regex expression I am currently using is: .*air_temp": (\d+).* but this is returning 9 and 7.3 (entries 1 and 2). Could someone suggest a way to only return the first value?
I have tried using lazy quantifier group, but have had no luck.
This regex will help you. But I think you should capture and extract the first match with features of the programming language you are using.
.*air_temp": (\d{1,3}\.\d{0,3})[\s\S]*?},
To understand the regex better: take a look at this.
Update
The above solution works if you have only two data entries. For more than two entries, we should have used this one:
header[\s\S]*?"air_temp": (\d{1,3}\.\d{0,3})
Here we match the word header first and then match anything in a non-greedy way. After that, we match our expected pattern. thus we get the first match. Play with it here in regex101.
To capture the negative numbers, we need to check if there is any - character exists or not. We do this by ? which means 'The question mark indicates zero or one occurrence of the preceding element'.
So the regex becomes,
header[\s\S]*?"air_temp": (-?\d{1,3}\.\d{0,3}) Demo
But the use of \K without the global flag ( in another answer given by mickmackusa ) is more efficient. To detect negative numbers, the modified version of that regex is
air_temp": \K-?\d{1,2}\.\d{1,2} demo.
Here {1,2} means 1~2 occurance/s of the previous character. We use this as {min_occurance,max_occurance}
I do not know which language you are using, but it seems like a difference between the global flag and not using the global flag.
If the global flag is not set, only the first result will be returned. If the global flag is set on your regex, it will iterate through returning all possible results. You can test it easily using Regex101, https://regex101.com/r/x1bwg2/1
The lazy/greediness should not have any impact in regards to using/not using the global flag
If \K is allowed in your coding language, use this: Demo
/air_temp": \K[\d.]+/ (117steps) this will be highly efficient in searching your very large JSON text.
If no \K is allowed, you can use a capture group: (Demo)
/air_temp": ([\d.]+)/ this will still move with decent speed through your JSON text
Notice that there is no global flag at the end of the pattern, so after one match, the regex engine stops searching.
Update:
For "less literal" matches (but it shouldn't matter if your source is reliable), you could use:
Extended character class to include -:
/air_temp": \K[\d.-]+/ #still 117 steps
or change to negated character class and match everything that isn't a , (because the value always terminates with a comma):
/air_temp": \K[^,]+/ #still 117 steps
For a very strict match (if you are looking for a pattern that means you have ZERO confidence in the input data)...
It appears that your data doesn't go beyond one decimal place, temps between 0 and 1 prepend a 0 before the decimal, and I don't think you need to worry with temps in the hundreds (right?), so you could use:
/air_temp": \K-?[1-9]?\d(?:\.\d)? #200steps
Explanation:
Optional negative sign
Optional tens digit
Required ones digit
Optional decimal which must be followed by a digit
Accuracy Test Demo
Real Data Demo
I have several large json objects (think GB scale), where the object values in some of the innermost levels are arrays of objects. I'm using jq 1.4 and I'm trying to break these arrays into individual objects, each of which will have a key such as g__0 or g__1, where the numbers correspond to the index in the original array, as returned by the keys function. The number of objects in each array may be arbitrarily large (in my example it is equal to 3). At the same time I want to keep the remaining structure.
For what it's worth the original structure comes from MongoDB, but I am unable to change it at this level. I will then use this json file to create a schema for BigQuery, where an example column will be seeds.g__1.guid and so on.
What I have:
{
"port": 4500,
"notes": "This is an example",
"seeds": [
{
"seed": 12,
"guid": "eaf612"
},
{
"seed": 23,
"guid": "bea143"
},
{
"seed": 38,
"guid": "efk311"
}
]
}
What I am hoping to achieve:
{
"port": 4500,
"notes": "This is an example",
"seeds": {
"g__0": {
"seed": 12,
"guid": "eaf612"
},
"g__1": {
"seed": 23,
"guid": "bea143"
},
"g__2": {
"seed": 38,
"guid": "efk311"
}
}
}
Thanks!
The following jq program should do the trick. At least it produces the desired results for the given JSON. The program is so short and straightforward that I'll let it speak for itself:
def array2object(prefix):
. as $in
| reduce range(0;length) as $i ({}; .["\(prefix)_\($i)"] = $in[$i]);
.seeds |= array2object("g__")
So, you essentially want to transpose (pivot) your data in BigQuery Table such that instead of having data in rows as below
you will have your data in columns as below
Thus, my recommendation would be
First, load your data as is to start with
So now, instead of doing schema transformation outside of BigQuery, let’s rather do it within BigQuery!
Below would be an example of how to achieve transformation you are looking for (assuming you have max three items/objects in array)
#standardSQL
SELECT
port, notes,
STRUCT(
seeds[SAFE_OFFSET(0)] AS g__0,
seeds[SAFE_OFFSET(1)] AS g__1,
seeds[SAFE_OFFSET(2)] AS g__2
) AS seeds
FROM yourTable
You can test this with dummy data using CTE like below
#standardSQL
WITH yourTable AS (
SELECT
4500 AS port, 'This is an example' AS notes,
[STRUCT<seed INT64, guid STRING>
(12, 'eaf612'), (23, 'bea143'), (38, 'efk311')
] AS seeds
UNION ALL SELECT
4501 AS port, 'This is an example 2' AS notes,
[STRUCT<seed INT64, guid STRING>
(42, 'eaf412'), (53, 'bea153')
] AS seeds
)
SELECT
port, notes,
STRUCT(
seeds[SAFE_OFFSET(0)] AS g__0,
seeds[SAFE_OFFSET(1)] AS g__1,
seeds[SAFE_OFFSET(2)] AS g__2
) AS seeds
FROM yourTable
So, technically, if you know max number of items/object in seeds array – you can just manually write needed SQL statement, to run it against real data.
Hope you got an idea
Of course you can script /automate process – you can find examples for similar pivoting tasks here:
https://stackoverflow.com/a/40766540/5221944
https://stackoverflow.com/a/42287566/5221944