Using the key from JSON in query - mysql

I'm new to MySQL and received a task which requires a complex(for me) query. I read the documentation and a few sources but I still cannot write it myself.
I'm selecting a rows from a table where in one of the cells I have JSON like this one
{
[
{
"interval" : 2,
"start": 03,
"end": 07,
"day_of_week": 3
}, {
"interval" : 8,
"start": 22,
"end": 23,
"day_of_week": 6
}
]
}
I want to check if some of the "day_of_week" values is equal to the current day of week and if so to write this value and the values of "start", "end" and "day_of_week" assoiciated with it in a variables to use them in the query.

That's not valid JSON format, so none of the MySQL JSON functions will work on it regardless. Better just fetch the whole blob of not-JSON into a client application that knows how to parse it, and deal with it there.
Even if it were valid JSON, I would ask this: why would you store data in a format you don't know how to query?
The proper solution is the following:
SELECT start, end, day_of_week
FROM mytable
WHERE day_of_week = DAYOFWEEK(CURDATE());
See how easy that is when you store data in normal rows and columns? You get to use ordinary SQL expressions, instead of wondering how you can trick MySQL into giving up the data buried in your non-JSON blob.
JSON is the worst thing to happen to relational databases.
Re your comment:
If you need to query by day of week, then you could reorganize your JSON to support that type of query:
{
"3":{
"interval" : 2,
"start": 03,
"end": 07,
"day_of_week": 3
},
"6": {
"interval" : 8,
"start": 22,
"end": 23,
"day_of_week": 6
}
}
Then it's possible to get results for the current weekday this way:
SELECT data->>'$.start' AS `start`,
data->>'$.end' AS `end`,
data->>'$.day_of_week' AS `day_of_week`
FROM (
SELECT JSON_EXTRACT(data, CONCAT('$."', DAYOFWEEK(CURDATE()), '"')) AS data
FROM mytable
) AS d;
In general, when you store data in a non-relational manner, the way to optimize it is to organize the data to support a specific query.

Related

SQl query help in grouping

An SQL table schema,
time, country, activer_users
If I just want to show the total number of active users over time, Below simple slect wil do that
SELECT time, sum(active_users) as activer_users GROUP BY time ORDER BY time
returned data will be like,
[{
"time": 1585878969,
"active_users": 2300
},....]
If I want active_users over time by country, then
SELECT time, country, sum(active_users) as activer_users GROUP BY time ORDER BY time, country
returned data will be like,
[{
"time": 1585878969,
"active_users": 1300,
"country": "India"
}, {
"time": 1585878969,
"active_users": 1000,
"country": "China"
}....]
I want data in the below format,
[{
"time": 1585878969,
"India": 1300,
"China": "1000"
}....]
Is this possible, to create dynamic columns from the value of a field and its value based on another field..
if suchthing is possible, what should be the query for that
Other helpful users may correct me, but I think is not possible altering MySQL responses like this. MySQL always responds in a COLUMN-VALUE way, so you would have to create a column e.g. "China" and store this data in there to get a native response like this.

Query ISODate on MongoDB with Google Sheets "dd-MM-yyyy HH:ss" as an output

I use a tool called Redash to query (in JSON) on MongoDB. In my collections dates are formulated in ISO, so when my query is imported (with google sheets' importdata function) to a sheet, I have to convert it to the appropriate format with a formula designed in the sheet.
I would love to integrate this operation directly in my query, that the ISO date format is directly sent to Sheets in the appropriate "dd-MM-yyyy HH:ss" format.
Any ideas ?
Many many thanks
You may be able to use the $dateToString aggregation operator inside a $project aggregation stage.
For example:
> db.test.find()
{ "_id": 0, "date": ISODate("2018-03-07T05:14:13.063Z"), "a": 1, "b": 2 }
> db.test.aggregate([
{$project: {
date: {$dateToString: {
format: '%d-%m-%Y %H:%M:%S',
date: '$date'
}},
a: '$a',
b: '$b'
}}
])
{ "_id": 0, "date": "07-03-2018 05:14:13", "a": 1, "b": 2 }
Note that although the $dateToString operator was available since MongoDB 3.0, MongoDB 3.6 adds the capability to output the string according to a specific timezone.

Regex Return First Match

I have a weather file where I would like to extract the first value for "air_temp" recorded in a JSON file. The format this HTTP retriever uses is regex (I know it is not the best method).
I've shortened the JSON file to 2 data entries for simplicity - there are usually 100.
{
"observations": {
"notice": [
{
"copyright": "Copyright Commonwealth of Australia 2017, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml",
"copyright_url": "http://www.bom.gov.au/other/copyright.shtml",
"disclaimer_url": "http://www.bom.gov.au/other/disclaimer.shtml",
"feedback_url": "http://www.bom.gov.au/other/feedback"
}
],
"header": [
{
"refresh_message": "Issued at 12:11 pm EST Tuesday 11 July 2017",
"ID": "IDN60901",
"main_ID": "IDN60902",
"name": "Canberra",
"state_time_zone": "NSW",
"time_zone": "EST",
"product_name": "Capital City Observations",
"state": "Aust Capital Territory"
}
],
"data": [
{
"sort_order": 0,
"wmo": 94926,
"name": "Canberra",
"history_product": "IDN60903",
"local_date_time": "11/12:00pm",
"local_date_time_full": "20170711120000",
"aifstime_utc": "20170711020000",
"lat": -35.3,
"lon": 149.2,
"apparent_t": 5.7,
"cloud": "Mostly clear",
"cloud_base_m": 1050,
"cloud_oktas": 1,
"cloud_type_id": 8,
"cloud_type": "Cumulus",
"delta_t": 3.6,
"gust_kmh": 11,
"gust_kt": 6,
"air_temp": 9.0,
"dewpt": 0.2,
"press": 1032.7,
"press_qnh": 1031.3,
"press_msl": 1032.7,
"press_tend": "-",
"rain_trace": "0.0",
"rel_hum": 54,
"sea_state": "-",
"swell_dir_worded": "-",
"swell_height": null,
"swell_period": null,
"vis_km": "10",
"weather": "-",
"wind_dir": "WNW",
"wind_spd_kmh": 7,
"wind_spd_kt": 4
},
{
"sort_order": 1,
"wmo": 94926,
"name": "Canberra",
"history_product": "IDN60903",
"local_date_time": "11/11:30am",
"local_date_time_full": "20170711113000",
"aifstime_utc": "20170711013000",
"lat": -35.3,
"lon": 149.2,
"apparent_t": 4.6,
"cloud": "Mostly clear",
"cloud_base_m": 900,
"cloud_oktas": 1,
"cloud_type_id": 8,
"cloud_type": "Cumulus",
"delta_t": 2.9,
"gust_kmh": 9,
"gust_kt": 5,
"air_temp": 7.3,
"dewpt": 0.1,
"press": 1033.1,
"press_qnh": 1031.7,
"press_msl": 1033.1,
"press_tend": "-",
"rain_trace": "0.0",
"rel_hum": 60,
"sea_state": "-",
"swell_dir_worded": "-",
"swell_height": null,
"swell_period": null,
"vis_km": "10",
"weather": "-",
"wind_dir": "NW",
"wind_spd_kmh": 4,
"wind_spd_kt": 2
}
]
}
}
The regex expression I am currently using is: .*air_temp": (\d+).* but this is returning 9 and 7.3 (entries 1 and 2). Could someone suggest a way to only return the first value?
I have tried using lazy quantifier group, but have had no luck.
This regex will help you. But I think you should capture and extract the first match with features of the programming language you are using.
.*air_temp": (\d{1,3}\.\d{0,3})[\s\S]*?},
To understand the regex better: take a look at this.
Update
The above solution works if you have only two data entries. For more than two entries, we should have used this one:
header[\s\S]*?"air_temp": (\d{1,3}\.\d{0,3})
Here we match the word header first and then match anything in a non-greedy way. After that, we match our expected pattern. thus we get the first match. Play with it here in regex101.
To capture the negative numbers, we need to check if there is any - character exists or not. We do this by ? which means 'The question mark indicates zero or one occurrence of the preceding element'.
So the regex becomes,
header[\s\S]*?"air_temp": (-?\d{1,3}\.\d{0,3}) Demo
But the use of \K without the global flag ( in another answer given by mickmackusa ) is more efficient. To detect negative numbers, the modified version of that regex is
air_temp": \K-?\d{1,2}\.\d{1,2} demo.
Here {1,2} means 1~2 occurance/s of the previous character. We use this as {min_occurance,max_occurance}
I do not know which language you are using, but it seems like a difference between the global flag and not using the global flag.
If the global flag is not set, only the first result will be returned. If the global flag is set on your regex, it will iterate through returning all possible results. You can test it easily using Regex101, https://regex101.com/r/x1bwg2/1
The lazy/greediness should not have any impact in regards to using/not using the global flag
If \K is allowed in your coding language, use this: Demo
/air_temp": \K[\d.]+/ (117steps) this will be highly efficient in searching your very large JSON text.
If no \K is allowed, you can use a capture group: (Demo)
/air_temp": ([\d.]+)/ this will still move with decent speed through your JSON text
Notice that there is no global flag at the end of the pattern, so after one match, the regex engine stops searching.
Update:
For "less literal" matches (but it shouldn't matter if your source is reliable), you could use:
Extended character class to include -:
/air_temp": \K[\d.-]+/ #still 117 steps
or change to negated character class and match everything that isn't a , (because the value always terminates with a comma):
/air_temp": \K[^,]+/ #still 117 steps
For a very strict match (if you are looking for a pattern that means you have ZERO confidence in the input data)...
It appears that your data doesn't go beyond one decimal place, temps between 0 and 1 prepend a 0 before the decimal, and I don't think you need to worry with temps in the hundreds (right?), so you could use:
/air_temp": \K-?[1-9]?\d(?:\.\d)? #200steps
Explanation:
Optional negative sign
Optional tens digit
Required ones digit
Optional decimal which must be followed by a digit
Accuracy Test Demo
Real Data Demo

jq: Turn an array of objects into individual objects and use each array index as a new key

I have several large json objects (think GB scale), where the object values in some of the innermost levels are arrays of objects. I'm using jq 1.4 and I'm trying to break these arrays into individual objects, each of which will have a key such as g__0 or g__1, where the numbers correspond to the index in the original array, as returned by the keys function. The number of objects in each array may be arbitrarily large (in my example it is equal to 3). At the same time I want to keep the remaining structure.
For what it's worth the original structure comes from MongoDB, but I am unable to change it at this level. I will then use this json file to create a schema for BigQuery, where an example column will be seeds.g__1.guid and so on.
What I have:
{
"port": 4500,
"notes": "This is an example",
"seeds": [
{
"seed": 12,
"guid": "eaf612"
},
{
"seed": 23,
"guid": "bea143"
},
{
"seed": 38,
"guid": "efk311"
}
]
}
What I am hoping to achieve:
{
"port": 4500,
"notes": "This is an example",
"seeds": {
"g__0": {
"seed": 12,
"guid": "eaf612"
},
"g__1": {
"seed": 23,
"guid": "bea143"
},
"g__2": {
"seed": 38,
"guid": "efk311"
}
}
}
Thanks!
The following jq program should do the trick. At least it produces the desired results for the given JSON. The program is so short and straightforward that I'll let it speak for itself:
def array2object(prefix):
. as $in
| reduce range(0;length) as $i ({}; .["\(prefix)_\($i)"] = $in[$i]);
.seeds |= array2object("g__")
So, you essentially want to transpose (pivot) your data in BigQuery Table such that instead of having data in rows as below
you will have your data in columns as below
Thus, my recommendation would be
First, load your data as is to start with
So now, instead of doing schema transformation outside of BigQuery, let’s rather do it within BigQuery!
Below would be an example of how to achieve transformation you are looking for (assuming you have max three items/objects in array)
#standardSQL
SELECT
port, notes,
STRUCT(
seeds[SAFE_OFFSET(0)] AS g__0,
seeds[SAFE_OFFSET(1)] AS g__1,
seeds[SAFE_OFFSET(2)] AS g__2
) AS seeds
FROM yourTable
You can test this with dummy data using CTE like below
#standardSQL
WITH yourTable AS (
SELECT
4500 AS port, 'This is an example' AS notes,
[STRUCT<seed INT64, guid STRING>
(12, 'eaf612'), (23, 'bea143'), (38, 'efk311')
] AS seeds
UNION ALL SELECT
4501 AS port, 'This is an example 2' AS notes,
[STRUCT<seed INT64, guid STRING>
(42, 'eaf412'), (53, 'bea153')
] AS seeds
)
SELECT
port, notes,
STRUCT(
seeds[SAFE_OFFSET(0)] AS g__0,
seeds[SAFE_OFFSET(1)] AS g__1,
seeds[SAFE_OFFSET(2)] AS g__2
) AS seeds
FROM yourTable
So, technically, if you know max number of items/object in seeds array – you can just manually write needed SQL statement, to run it against real data.
Hope you got an idea
Of course you can script /automate process – you can find examples for similar pivoting tasks here:
https://stackoverflow.com/a/40766540/5221944
https://stackoverflow.com/a/42287566/5221944

Store a big array in the MySQL table

I want to create tablestopsfor all stops with these columns id, stop_name, stop_lat, stop_long, route, arrivaltime but I dont know how can I store the arrivaltime into the table since this column is a big array
Like this:
{
"id": 1
"stops_name": "Amersham ",
"arrival_time": {
"mon-fri": [ "05:38", "06:07","06:37",.....50 entries],
"sat": ["05:34","06:01","06:31",...........50 entries],
"son": ["06:02","06:34","07:04",...........50 entries]
},
"stops_lat": 83.837994,
"stops_long": 18.700423
}
Is that to manage with mysql?
Generally speaking you would split the "arrival times" out into a new table, referencing back to the table of stops. You would also generally store each time as a single row, and then select the entire collection of rows.
This works best because it lets you query on the 'time' column and search for time ranges, etc and only get the relevant rows.
For the "day", I would most likely use a Set to have a column that can be 1 or more values. Also consider that likely you may need to store info on public holidays or other special dates as well:
https://dev.mysql.com/doc/refman/5.6/en/set.html
Stops: id, stops_name, stops_lat, stops_long (1, "Amersham", 83.837994, 18.700423)
Stops_arrivals: id, stops_id, day, time (1, 1, "Mon", "05:38"), (2, 1, "Mon", "06:07"), etc