Spark DDL Schema JSON Struct - json

Question
I am trying to define a nested .json schema in pyspark, but cannot get the ddl_schema string to work.
Usually in SQL this would be ROW, I have tried STRUCT below but can't get the data type correct this is the error...
ParseException:
mismatched input '(' expecting {<EOF>, ',', 'COMMENT', NOT}(line 6, pos 15)
== SQL ==
driverId INT,
driverRef STRING,
number STRING,
code STRING,
name STRUCT(forename STRING, surname STRING),
---------------^^^
dob DATE,
nationality STRING,
url STRING
Data Sample
+--------+----------+------+----+--------------------+----------+-----------+--------------------+
|driverId| driverRef|number|code| name| dob|nationality| url|
+--------+----------+------+----+--------------------+----------+-----------+--------------------+
| 1| hamilton| 44| HAM| {Lewis, Hamilton}|1985-01-07| British|http://en.wikiped...|
Code Sample
mnt = "/mnt/dev/root"
env = "raw"
path = "formula1/drivers"
fileFormat = "json"
inPath = f"{mnt}/{env.upper()}/{path}.{fileFormat}"
options = {'header': 'True'}
ddl_schema = """
driverId INT,
driverRef STRING,
number STRING,
code STRING,
name STRUCT(forename STRING, surname STRING),
dob DATE,
nationality STRING,
url STRING
"""
drivers_df = (spark
.read
.options(**options)
.schema(ddl_schema)
.format(fileFormat)
.load(inPath)
)

You are using the wrong syntax for STRUCT.
Here is the right one:
name STRUCT<forename:STRING,surname:STRING>
https://spark.apache.org/docs/latest/sql-ref-datatypes.html
(search for Complex types and choose the SQL tab)
Data type
SQL name
BooleanType
BOOLEAN
ByteType
BYTE, TINYINT
ShortType
SHORT, SMALLINT
IntegerType
INT, INTEGER
LongType
LONG, BIGINT
FloatType
FLOAT, REAL
DoubleType
DOUBLE
DateType
DATE
TimestampType
TIMESTAMP
StringType
STRING
BinaryType
BINARY
DecimalType
DECIMAL, DEC, NUMERIC
YearMonthIntervalType
INTERVAL YEAR, INTERVAL YEAR TO MONTH, INTERVAL MONTH
DayTimeIntervalType
INTERVAL DAY, INTERVAL DAY TO HOUR, INTERVAL DAY TO MINUTE, INTERVAL DAY TO SECOND, INTERVAL HOUR, INTERVAL HOUR TO MINUTE, INTERVAL HOUR TO SECOND, INTERVAL MINUTE, INTERVAL MINUTE TO SECOND, INTERVAL SECOND
ArrayType
ARRAY<element_type>
StructType
STRUCT<field1_name: field1_type, field2_name: field2_type, …> Note: ‘:’ is optional.
MapType
MAP<key_type, value_type>

Related

Week date range

There is a table objects, which stores data on real estate objects. Me need to use a query to calculate a new field that will display the date range from Monday to Sunday, which includes the date the object was created (for example, “2020-11-16 - 2020-11-22”)
create table objects(
object_id int NOT NULL PRIMARY KEY ,
city_id int not null ,
price int ,
area_total int ,
status varchar(50) ,
class varchar(50) ,
action varchar(50) ,
date_create timestamp,
FOREIGN KEY(city_id) references avg_price_square_city(city_id)
);
Data in the table:
INSERT INTO objects (object_id, city_id, price, area_total, status, class, action, date_create)
VALUES (1, 1, 4600000, 72, 'active', 'Secondary', 'Sale', '2022-05-12 21:49:34');
INSERT INTO objects (object_id, city_id, price, area_total, status, class, action, date_create)
VALUES (2, 2, 5400000, 84, 'active', 'Secondary', 'Sale', '2022-05-19 21:49:35');
The query should display two fields: the object number and a range that includes the date it was created. How can this be done ?
P.S
I wrote this query,but he swears at the "-" sign:
SET #WeekRangeStart ='2022/05/10';
SET #WeekRangeEnd = '2022/05/17';
select object_id,#range := #WeekRangeStart '-' #WeekRangeEnd
FROM objects where #range = #WeekRangeStart and date_create between #WeekRangeStart and #WeekRangeEnd
UNION
select object_id,#range from objects where #`range` = #WeekRangeEnd;
Error:[42000][1064] You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '#WeekRangeEnd FROM objects where #range = #WeekRangeStart and date_create betwee' at line 1
I want to receive in query:
object_id #range
1 2022/05/10 - 2022/05/17
The column #range must contain the date from the "date_create"
SET #WeekRangeStart = CAST('2022/05/10' as DATE);
SET #WeekRangeEnd = CAST('2022/05/17' as DATE);
SET #range = CONCAT(#WeekRangeStart,' - ',#WeekRangeEnd) ;
-- select #range;
select
object_id,
#range
FROM objects
where DATE(date_create) between #WeekRangeStart and #WeekRangeEnd
UNION
select object_id,#range from objects
;
Gives next result:
object_id
#range
1
2022-05-10 - 2022-05-17
2
2022-05-10 - 2022-05-17
This result is the output of the SQL part that is put after the UNION. Because date_create is not between your WeekRangeStart and WeekRangeEnd.
You should take some time, and read the UNION documentation.
The variable #range is calculated before the SQL statement, because the value is a constant.
see: DBFIDDLE
NOTE: You should try to use the same dateformat everywhere, and not mix date like '2022-05-19 21:49:35' and 2022/05/10. Use - OR use /, but do not mix them...
EDIT: After the calirification "Me need to use a query to calculate a new field that will display the date range from Monday to Sunday,...":
You probably wanted to do:
SET #WeekDate = CAST('2022/05/10' as DATETIME);
SELECT
ADDDATE(#WeekDate, -(DAYOFWEEK(#WeekDate)-1) +1) as Monday,
DATE_ADD(ADDDATE(#WeekDate, -(DAYOFWEEK(#WeekDate)-1) +9), INTERVAL -1 SECOND) as Sunday;
output:
Monday
Sunday
2022-05-09 00:00:00
2022-05-16 23:59:59

Calculate date of holiday using a factor table to add/subtract days from a specific date

DB-Fiddle
CREATE TABLE holidays (
id int primary key,
name VARCHAR(255),
calc_type VARCHAR(255),
calc_factor VARCHAR(255)
);
INSERT INTO holidays
(id, name, calc_type, calc_factor
)
VALUES
("1", "Holdiay_01", "fixed", "0"),
("2", "Holdiay_02", "fixed", "0"),
("3", "Holdiay_03", "fixed", "0"),
("4", "Holdiay_04", "moveable", "10"),
("5", "Holdiay_05", "moveable", "-5");
The table above I want to use to calculate the date of a holiday in an SQL query.
The holidays are divided in two different calculation types:
fixed = same date every year
(YYYY-03-01, YYYY-05-12, YYYY-08-09)
moveable = date is calculated by adding/subtracting a pre-defined amount of days (calc_factor) from a fixed date:
(YYYY-11-12) + 10, (YYYY-11-12) - 5
In the end the result should look like this:
id name date
1 Holiday_01 2019-03-01
2 Holiday_02 2019-05-12
3 Holiday_03 2019-08-09
4 Holiday_04 2019-11-22
5 Holiday_05 2019-11-07
I tried to go with something like this but could not make it work so far:
SELECT
id,
name,
CASE (
WHEN name = "Holdiay_01" THEN DATE_ADD(YEAR(CURDATE()) & MONTH(3) & DAY(1), INTERVAL calc_factor)
WHEN name = "Holdiay_02" THEN DATE_ADD(YEAR(CURDATE()) & MONTH(5) & DAY(12), INTERVAL calc_factor)
WHEN name = "Holdiay_03" THEN DATE_ADD(YEAR(CURDATE()) & MONTH(9) & DAY(8), INTERVAL calc_factor)
WHEN name = "Holdiay_04" THEN DATE_ADD(YEAR(CURDATE()) & MONTH(11) & DAY(12), INTERVAL calc_factor)
WHEN name = "Holdiay_05" THEN DATE_ADD(YEAR(CURDATE()) & MONTH(11) & DAY(12), INTERVAL calc_factor)
ELSE NULL END ) AS date
FROM holidays;
How do I have to modify my query to get the expected result?
MySQL has the CONCAT() function, which allows you to concatenate two or more strings. The function actually allows for one or more arguments, but its main use is to concatenate two or more strings.
Following is your answer:
SELECT
id,
name,
CASE
WHEN name = 'Holdiay_01' THEN DATE_ADD((CONCAT(YEAR(CURDATE()),'-03-01')),
INTERVAL (calc_factor) DAY)
WHEN name = 'Holdiay_02' THEN DATE_ADD((CONCAT(YEAR(CURDATE()),'-05-12')),
INTERVAL (calc_factor) DAY)
WHEN name = 'Holdiay_03' THEN DATE_ADD((CONCAT(YEAR(CURDATE()),'-08-09')),
INTERVAL (calc_factor) DAY)
WHEN name = 'Holdiay_04' THEN DATE_ADD((CONCAT(YEAR(CURDATE()),'-11-12')),
INTERVAL (calc_factor) DAY)
WHEN name = 'Holdiay_05' THEN DATE_ADD((CONCAT(YEAR(CURDATE()),'-11-12')),
INTERVAL (calc_factor) DAY)
ELSE NULL END AS date
FROM holidays;
Also check in DB Fiddle:
DB-Fiddle
You have a rather perverse data model -- a row regarding a holiday but not the date. That said, the following does what you want after fixing the holiday names:
SELECT id, name,
(CASE WHEN name = 'Holiday_01' THEN DATE_ADD(CONCAT(YEAR(CURDATE()), '-03-01'), INTERVAL calc_factor DAY)
WHEN name = 'Holiday_02' THEN DATE_ADD(CONCAT(YEAR(CURDATE()), '-05-12'), INTERVAL calc_factor DAY)
WHEN name = 'Holiday_03' THEN DATE_ADD(CONCAT(YEAR(CURDATE()), '-09-08'), INTERVAL calc_factor DAY)
WHEN name = 'Holiday_04' THEN DATE_ADD(CONCAT(YEAR(CURDATE()), '-11-12'), INTERVAL calc_factor DAY)
WHEN name = 'Holiday_05' THEN DATE_ADD(CONCAT(YEAR(CURDATE()), '-11-12'), INTERVAL calc_factor DAY)
END) as date
FROM holidays;
Here is the corresponding db<>fiddle.
Some issues with your code apart from the misspelt names:
& is a bitwise AND operator. You appear to want to construct a date, so CONCAT() is appropriate.
INTERVAL requires a time unit.
MONTH() and DAY() are functions that extract those components from a date. An argument such as "3" is converted to a date -- and because it is not a date, the returned value is NULL.
As a matter of simplification, ELSE NULL is superfluous, because CASE expressions return NULL if no conditions match.

Split the date and time from string in codesys v 3

i 'm working on codesys . I have a string which has the DATE AND TIME .I want to split the date and time .
currentTime: DATE_AND_TIME;
showing value like this
DT#2019-08-06-10:06:53
after concat Convert the currentTime variable into string .
Now i want to split the date and time values
time : 10:06:53
Date 2019-08-06
Please provide the deceleration and implementation part
Declaration part:
dtDateAndTime : DATE_AND_TIME;
sDateAndTime : STRING;
sDate : STRING;
sTime : STRING;
Implementation part:
sDateAndTime := DT_TO_STRING(dtDateAndTime);
sDate := MID(sDateAndTime, 10 , 4);
sTime := RIGHT(sDateAndTime, 8);

Random number using Date() in Expression Builder

I want to generate random number using Date() to format it like for example: ddmmyyyyHhNnSs
How do I achieve that? Is that even possible?
I was kinda hoping I can do it easy way by the expression builder but I seem to fail on each approach ;)
This number need to populate txtID field and it will be unique identifier for each database entry.
That's quite easy - with a twist.
Problem is that Rnd only returns a Single and the resolution of this only allows for 10000000 unique values. As you request a resolution to the second and with 86400 seconds per day, that only leaves a span of 115.74 days while the range of Date spans 3615899 days:
TotalDays = -CLng(#1/1/100#) + CLng(#12/31/9999#)
To overcome this, use Rnd twice which will result in 1E+15 possible values or 11574074074 days - way beyond what's needed:
RandomDouble = Rnd * Rnd
Now, to limit the possible values to fit into the range of data type Date, just follow the documentation:
RandomValue = (UpperValue - LowerValue) * Rnd + LowerValue
and apply the date values:
RandomDouble = (CLng(#12/31/9999#) - CLng(#1/1/100#)) * Rnd * Rnd + CLng(#1/1/100#)
This, however, will result in values containing unwanted milliseconds, thus perform the proper conversion to Date value using CDate which will round to the nearest second, and you have the final expression:
RandomDate = CDate((CLng(#12/31/9999#) - CLng(#1/1/100#)) * Rnd * Rnd + CLng(#1/1/100#))
Use the value as is if your field is of datatype Date or - if text - apply a format to this with Format(RandomDate, "yyyymmddhhnnss") and a sample output will be:
01770317032120
01390126010945
50140322081227
35290813165627
09330527072433
20560513105943
61810505124235
09381019130230
17010527033132
08310306233911
If you want numeric values, use CDec to convert (CLng will fail because of overflow):
RandomNumber = CDec(Format(RandomDate, "yyyymmddhhnnss"))
All said, I'm with #Bohemian - if you just want a unique timestamp and have less than one transaction per second, just use data type Date for your field and use Now:
TimeStamp = Now()
and apply a format to this of yyyymmddhhnnss.
However, Multiplying random numbers together alters the
probablility distribution:
Uniform Product Distribution
Thus, a better method is to create a random date, then a random time, and possibly a random count of milliseconds - I wrote above, that CDate rounds a value to the nearest second; it doesn't, only whenever Access displays a date/time with milliseconds the displayed valued is rounded to the second.
So I modified the function to take care of this:
Public Function DateRandom( _
Optional ByVal UpperDate As Date = #12/31/9999#, _
Optional ByVal LowerDate As Date = #1/1/100#, _
Optional ByVal DatePart As Boolean = True, _
Optional ByVal TimePart As Boolean = True, _
Optional ByVal MilliSecondPart As Boolean = False) _
As Date
' Generates a random date/time - optionally within the range of LowerDate and/or UpperDate.
' Optionally, return value can be set to include date and/or time and/or milliseconds.
'
' 2015-08-28. Gustav Brock, Cactus Data ApS, CPH.
' 2015-08-29. Modified for uniform distribution as suggested by Stuart McLachlan by
' combining a random date and a random time.
' 2015-08-30. Modified to return selectable and rounded value parts for
' Date, Time, and Milliseconds.
' 2015-08-31. An initial call of Randomize it included to prevent identical sequences.
Const SecondsPerDay As Long = 60& * 60& * 24&
Dim DateValue As Date
Dim TimeValue As Date
Dim MSecValue As Date
' Shuffle the start position of the sequence of Rnd.
Randomize
' If all parts are deselected, select date and time.
If Not DatePart And Not TimePart And Not MilliSecondPart = True Then
DatePart = True
TimePart = True
End If
If DatePart = True Then
' Remove time parts from UpperDate and LowerDate as well from the result value.
' Add 1 to include LowerDate as a possible return value.
DateValue = CDate(Int((Int(UpperDate) - Int(LowerDate) + 1) * Rnd) + Int(LowerDate))
End If
If TimePart = True Then
' Calculate a time value rounded to the second.
TimeValue = CDate(Int(SecondsPerDay * Rnd) / SecondsPerDay)
End If
If MilliSecondPart = True Then
' Calculate a millisecond value rounded to the millisecond.
MSecValue = CDate(Int(1000 * Rnd) / 1000 / SecondsPerDay)
End If
DateRandom = DateValue + TimeValue + MSecValue
End Function
Format now() and cast to a long:
select CLng(format(now(), 'ddmmyyyyhhnnss')) as txnId
Although this is not "random", it is unique as long as there are never more than one transaction per second (confirmed in comment above).

Error in plpgsql function: array value must start with "{" or dimension information

I'm trying to format the results of this query:
CREATE OR REPLACE FUNCTION "alarmEventList"(sampleid integer
, starttime timestamp without time zone
, stoptime timestamp without time zone)
RETURNS text[] AS
$BODY$DECLARE
result text[];
BEGIN
select into result array_agg(res::text)
from (
select to_char("Timestamp", 'YYYY-MM-DD HH24:MI:SS')
,"AlertLevel"
,"Timestamp" - lag("Timestamp") over (order by "Timestamp")
from "Judgements"
WHERE "SampleID" = sampleid
and "Timestamp" >= starttime
and "Timestamp" <= stoptime
) res
where "AlertLevel" > 0;
select into result array_to_string(result,',');
return result;
END
$BODY$
LANGUAGE plpgsql VOLATILE
Right now without array_to_string() I get something like this:
{"(\"2013-10-16 15:10:40\",1,00:00:00)","(\"2013-10-16 15:11:52\",1,00:00:48)"}
and I want something like this:
2013-10-16 15:10:40,1,00:00:00 | 2013-10-16 15:11:52,1,00:00:48 |
But when I run the query I get error:
array value must start with "{" or dimension information
You do not actually want an array type, but a string representation.
Can be achieved like this:
CREATE OR REPLACE FUNCTION "alarmEventList"(sampleid integer
, starttime timestamp
, stoptime timestamp
, OUT result text) AS
$func$
BEGIN
SELECT INTO result string_agg(concat_ws(','
,to_char("Timestamp", 'YYYY-MM-DD HH24:MI:SS')
,"AlertLevel"
,"Timestamp" - ts_lag)
, ' | ')
FROM (
SELECT "Timestamp"
,"AlertLevel"
,lag("Timestamp") OVER (ORDER BY "Timestamp") AS ts_lag
FROM "Judgements"
WHERE "SampleID" = sampleid
AND "Timestamp" >= starttime
AND "Timestamp" <= stoptime
) res
WHERE "AlertLevel" > 0;
END
$func$ LANGUAGE plpgsql
The manual on string_agg() and concat_ws().