Why is data bytes size different when serialized with JSONSerialization - json

Why is data serialized using JSONSerialization differ from the data serialized with the extension below?
let uint8Array: [UInt8] = [123, 234, 255]
let data1 = uint8Array.data // 3bytes
let data2 = try! JSONSerialization.data(withJSONObject: uint8Array) // 13 bytes
extension Data {
var bytes: [UInt8] {
return [UInt8](self)
}
}
extension Array where Element == UInt8 {
var data: Data {
return Data(self)
}
}

JSON, and so the JSONSerialization will transform an object (here an array of Int), into Data according to some rules (like adding "[", "]", "," in our cases, in others input values and options, it can add if necessary ":", "{", "}", "\n"`, etc) and using the UTF8 (or kind of UTF16, but I'll skip this part for understanding, see this part in WikiPedia) encoding.
So in fact, for 123 value, it will be 1 into UTF8 encoding, then 2, then 3, so just for one number, it's 3 bytes.
So you see that for your 3 Int numbers, it will be 9 bytes, then, we need to add the "[", "]", and the two commas to separate them: ",", which makes 9 + 2 + 2 = 13 bytes.
To illustrates:
Let's add:
extension Data {
func hexRepresentation(separator: String) -> String {
map { String(format: "%02hhX", $0) }.joined(separator: separator)
}
func intRepresentation(separator: String) -> String {
map { String(format: "%d", $0) }.joined(separator: separator)
}
}
And use it to see what are the values inside data1 & data2:
print(data1.hexRepresentation(separator: "-"))
print(data1.intRepresentation(separator: "-"))
print(data2.hexRepresentation(separator: "-"))
print(data2.intRepresentation(separator: "-"))
$>7B-EA-FF
$>123-234-255
$>5B-31-32-33-2C-32-33-34-2C-32-35-35-5D
$>91-49-50-51-44-50-51-52-44-50-53-53-93
I let you chose as you prefers Int or hex raw value interpretations.
We see that using the first method, we get 3 bytes, with the initial values of your array.
But for the JSON one, we can split it as such:
$>91 -> [
$>49-50-51 -> 1, 2 & 3
$>44 -> ,
$>50-51-52 -> 2, 3, 4
$>44 -> ,
$>50-53-53 -> 2, 5, 5
$>93 -> ]
You can check on any UTF8 table (like https://www.utf8-chartable.de, etc), but 91 that for [, 49 it's for 1, etc.

Related

Skip empty or faulty rows with Serde

I have a file with valid rows that I'm parsing to a struct using Serde and the csv crate.
#[derive(Debug, Deserialize)]
struct Circle {
x: f32,
y: f32,
radius: f32,
}
fn read_csv(path: &str) -> Result<Vec<Circle>, csv::Error> {
let mut rdr = csv::ReaderBuilder::new().delimiter(b';').from_path(path)?;
let res: Vec<Circle> = rdr
.deserialize()
.map(|record: Result<Circle, csv::Error>| {
record.unwrap_or_else(|err| panic!("There was a problem parsing a row: {}", err))
})
.collect();
Ok(res)
}
This code work for the most times, but sometimes when I get files they contain "empty" rows at the end:
x;y;radius
6398921.770;146523.553;0.13258
6398921.294;146522.452;0.13258
6398914.106;146526.867;0.13258
;;;
This makes the parsing fail with
thread 'main' panicked at 'There was a problem parsing a row: CSV
deserialize error: record 4 (line: 4, byte: 194): field 0: cannot
parse float from empty string', src/main.rs:90:41 note: run with
RUST_BACKTRACE=1 environment variable to display a backtrace
How can I handle faulty rows without manipulating the file contents beforehand?
Thanks!

Wants to generate json output from mysql using gorm in Go

So I am trying to get data from my database i.e MySQL. I am able to finish that step accessing the database but, the problem is I want to get the output in JSON format, but i did some research but didn't got the result so anyone can guide or hep me in this, getting the MySQL data in json by using GORM.
Here is the sample of my code which i written.
package main
import (
"fmt"
"github.com/jinzhu/gorm"
_ "github.com/jinzhu/gorm/dialects/mysql" //This must be introduced! !
)
type k_movie struct {
Id uint32
Title string `gorm:"default:''"`
Url_name string `gorm:"default:''"`
K_score string ``
Poster_url string ``
}
func main() {
db, errDb := gorm.Open("mysql", "root:xyz#123#(127.0.0.1)/dbdump?charset=utf8mb4&loc=Local")
if errDb != nil {
fmt.Println(errDb)
}
defer db.Close() //Close the database connection after use up
db.LogMode(true) //Open sql debug mode
//SELECT * FROM `k_movies` WHERE (id>0 and id<.....)
var movies []k_movie
db.Where("id>? and id<?", 0, 103697).Limit(3).Find(&movies)
fmt.Println(movies)
//Get the number
total := 0
db.Model(&k_movie{}).Count(&total)
fmt.Println(total)
var infos []k_movie //Define an array to receive multiple results
db.Where("Id in (?)", []uint32{1, 2, 3, 4, 5, 6, 7, 8}).Find(&infos)
fmt.Println(infos)
fmt.Println(len(infos)) //Number of results
var notValue []k_movie
db.Where("id=?", 3).Find(&notValue)
if len(notValue) == 0 {
fmt.Println("No data found!")
} else {
fmt.Println(notValue)
}
}
And the output I'm getting in this format.
kumardivyanshu#Divyanshus-MacBook-Air ~/myproject/src/github.com/gorm_mysql % go run test.go
(/Users/kumardivyanshu/myproject/src/github.com/gorm_mysql/test.go:31)
[2021-05-13 08:59:45] [3.89ms] SELECT * FROM `k_movies` WHERE (id>0 and id<103697) LIMIT 3
[3 rows affected or returned ]
[{1 Golmaal: Fun Unlimited golmaal-fun-unlimited 847 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/32b7385e1e616d7ba3d11e1bee255ecce638a136} {2 Dabangg 2 dabangg-2 425 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/1420c4d6f817d2b923cd8b55c81bdb9d9fd1eca0} {3 Force force 519 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/cd1dc247da9d16e194f4bfb09d99f4dedfb2de00}]
(/Users/kumardivyanshu/myproject/src/github.com/gorm_mysql/test.go:36)
[2021-05-13 08:59:45] [20.22ms] SELECT count(*) FROM `k_movies`
[0 rows affected or returned ]
103697
(/Users/kumardivyanshu/myproject/src/github.com/gorm_mysql/test.go:40)
[2021-05-13 08:59:45] [2.32ms] SELECT * FROM `k_movies` WHERE (Id in (1,2,3,4,5,6,7,8))
[8 rows affected or returned ]
[{1 Golmaal: Fun Unlimited golmaal-fun-unlimited 847 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/32b7385e1e616d7ba3d11e1bee255ecce638a136} {2 Dabangg 2 dabangg-2 425 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/1420c4d6f817d2b923cd8b55c81bdb9d9fd1eca0} {3 Force force 519 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/cd1dc247da9d16e194f4bfb09d99f4dedfb2de00} {4 Eega eega 906 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/08aef7d961d4699bf2d12a7c854b6b32d1445247} {5 Fukrey fukrey 672 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/5d14bd2fb0166f4bb9ab919e31b69f2605f366aa} {6 London Paris New York london-paris-new-york 323 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/222d8a6b5c76b1d3cfa0b93d4bcf1a1f16f5e199} {7 Bhaag Milkha Bhaag bhaag-milkha-bhaag 963 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/efa8b86c753ae0110cc3e82006fadabb06f1486c} {8 Bobby Jasoos bobby-jasoos 244 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/0e9540d4c962ec33d8b63c0c563e7b64169122e0}]
8
(/Users/kumardivyanshu/myproject/src/github.com/gorm_mysql/test.go:45)
[2021-05-13 08:59:45] [1.56ms] SELECT * FROM `k_movies` WHERE (id=3)
[1 rows affected or returned ]
[{3 Force force 519 https://movieassetsdigital.sgp1.cdn.digitaloceanspaces.com/thumb/cd1dc247da9d16e194f4bfb09d99f4dedfb2de00}]
You need to define the json tag on your struct, so you can use the json.Marshal to grab a []byte slice that presents a json object.
Example taken from Go by example:
type Response2 struct {
Page int `json:"page"`
Fruits []string `json:"fruits"`
}
res2D := &Response2{
Page: 1,
Fruits: []string{"apple", "peach", "pear"}}
res2B, _ := json.Marshal(res2D)
fmt.Println(string(res2B))
That would print:
{"page":1,"fruits":["apple","peach","pear"]}

Rebuild JSON string to a different structure, parsing specific fields

I receive a JSON like this
{
"raw_content":"very long string"
"mode":"ML",
"user_id":"4000008367",
"user_description":"John Doe",
"model":3,
"dest_contact":"test#email.it",
"order_details":[
"ART.: 214883 PELL GRANI 9 ESPR.BAR SKGR 1000 SGOC.: 1000 GR\nVS.ART: 305920132 COMPOS. PALLET\n36 COLLI PEZ.RA: 6 TOT.PEZZI: 216 B: 12 T: 6\nEU C.L.: 24,230\nCO SCAP- : 16,500 CA SCAP- : 15,000 CO SCCP- : 0,000\nCO SAGV- : 0,00000\nC.N. : 17,200SCAD.MIN.: 25/01/22\nCONDIZIONI PAGAMENTO : 60GG B.B. DT RIC FT FINEMESE ART62\n",
"ART.: 287047 PELLINI BIO100%ARABICALTGR 250 SGOC.: 250 GR\nVS.ART: 315860176 COMPOS. PALLET\n36 COLLI PEZ.RA: 6 TOT.PEZZI: 216 B: 12 T: 3\nEU C.L.: 8,380\nCO SCAP- : 16,500 PR SCAP- : 15,000 CO SCCP- : 0,000\nCO SAGV- : 0,00000\nC.N. : 5,950SCAD.MIN.: 25/01/22\nCONDIZIONI PAGAMENTO : 60GG B.B. DT RIC FT FINEMESE ART62\n",
"ART.: 3137837 CAFFE PELLINI TOP LTGR 250 SGOC.: 250 GR\nVS.ART: 315850175 COMPOS. PALLET\n30 COLLI PEZ.RA: 12 TOT.PEZZI: 360 B: 6 T: 5\nEU C.L.: 6,810\nCO SCAP- : 16,500 PR SCAP- : 12,000 CO SCCP- : 0,000\nCO SAGV- : 0,00000\nC.N. : 5,000SCAD.MIN.: 18/08/21\nCONDIZIONI PAGAMENTO : 60GG B.B. DT RIC FT FINEMESE ART62\n",
"ART.: 7748220 ESPRES.SUP.TRADIZ. MOKPKGR 500 SGOC.: 500 GR\nVS.ART: 315930114 COMPOS. PALLET\n80 COLLI PEZ.RA: 10 TOT.PEZZI: 800 B: 10 T: 6\nEU C.L.: 7,580\nCO SCAP- : 16,500 PR SCAP- : 27,750 CO SCCP- : 0,000\nCO SAGV- : 0,00000\nC.N. : 4,570SCAD.MIN.: 25/01/22\nCONDIZIONI PAGAMENTO : 60GG B.B. DT RIC FT FINEMESE ART62\n"
],
"order_footer":"\nPALLET DA CM. 80X120\nT O T A L E C O L L I 182\n\n- EX D.P.R. 322 - 18/05/82 NON SI ACCETTANO TERMINI MINIMI DI\nCONSERVAZIONE INFERIORI A QUELLI INDICATI\n- CONSEGNA FRANCO BANCHINA, PALLET MONOPRODOTTO\nCOME DA PALLETTIZZAZIONE SPECIFICATA\nCONDIZIONI DI PAGAMENTO : COME DA ACCORDI\n"
}
and I want to reorder it to this
{
id: "32839ds8a32jjdas93193snkkk32jhds-k2j1", // generated, see my implementation
rawContent: "very long string",
parsedContent: {"mode":"ML", "user_id":"4000008367", "user_description":"John Doe", "order_details":[ "....." ], ... } // basically all the fields except raw content
}
How can I do this? I'm trying to work it with maps:
var output map[string]interface{}
var message processing.Document // message is of type struct {ID string, Raw string, Parsed string}
err = json.Unmarshal([]byte(doc), &output)
if err != nil {
// error handling
}
message.ParsedContent = "{"
for key, data := range output {
if key == "raw_content" {
hash := md5.Sum([]byte(data.(string)))
message.ID = hex.EncodeToString(hash[:])
message.RawContent = base64.RawStdEncoding.EncodeToString([]byte(data.(string)))
} else {
temp := fmt.Sprintf("\"%s\": \"%s\", ", key, data)
message.ParsedContent = message.ParsedContent + temp
}
}
message.ParsedContent = message.ParsedContent + "}"
msg, err := json.Marshal(message)
if err != nil {
// error handling
}
fmt.Println(string(msg))
There's a few problem with this. If it was only strings it would be ok, but there are integers and the sprintf doesn't work (the output I get, for example on the field "model" is "model": "%!s(float64=3)". I could do an if key == model and parse it as an int, but as I said the fields are not always the same and there are other integers that sometimes are there and sometimes are not there.
Also, the field "order_footer" has escaped new lines which are somehow deleted in my parsing, and this breaks the validity of the resulting JSON.
How can I solve this issues?
EDIT: As suggested, hand-parsing JSON is a bad idea. I could parse it into a struct, the field "model" actually tells me which struct to use. The struct for "model": 3 for example is:
type MOD3 struct {
Raw string `json:"raw_content"`
Mode string `json:"mode"`
UserID string `json:"user_id"`
UserDes string `json:"user_description"`
Model int `json:"model"`
Heading string `json:"legal_heading"`
DestContact string `json:"dest_contact"`
VendorID string `json:"vendor_id"`
VendorLegal string `json:"vendor_legal"`
OrderID string `json:"order_id"`
OrderDate int64 `json:"order_date"`
OrderReference string `json:"order_reference"`
DeliveryDate int64 `json:"delivery_date"`
OrderDestination string `json:"order_destination"`
OrderDestinationAddress string `json:"order_destination_address"`
Items []string `json:"order_details"`
OrderFooter string `json:"order_footer"`
}
At this point, how can I parse specific fields to the output format?
You should never, ever, ever try to generate JSON by hand. The steps should be:
Get JSON and parse it in a model object.
Create a copy of the model object with all the changes you want.
Convert the copied model object to JSON.
You don't know enough about JSON to modify it on the fly. I know enough, and the only reasonable way is complete parsing, and writing back the complete changes.

Why do I always get a "trailing characters" error when trying to parse data with serde_json?

I have a server that returns requests in a JSON format. When trying to parse the data I always get "trailing characters" error. This happens only when getting the JSON from postman
let type_of_request = parsed_request[1];
let content_of_msg: Vec<&str> = msg_from_client.split("\r\n\r\n").collect();
println!("{}", content_of_msg[1]);
// Will print "{"username":"user","password":"password","email":"dwadwad"}"
let res: serde_json::Value = serde_json::from_str(content_of_msg[1]).unwrap();
println!("The username is: {}", res["username"]);
when getting the data from postman this happens:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("trailing characters", line: 1, column: 60)', src\libcore\result.rs:997:5
but when having the string inside Rust:
let j = "{\"username\":\"user\",\"password\":\"password\",\"email\":\"dwadwad\"}";
let res: serde_json::Value = serde_json::from_str(j).unwrap();
println!("The username is: {}", res["username"]);
it works like a charm:
The username is: "user"
EDIT: Apparently as I read the message into a buffer and turned it into a string it saved all the NULL characters the buffer had which are of course the trailing characters.
Looking at the serde json code, one finds the following comment above the relevant ErrorCode enum element:
/// JSON has non-whitespace trailing characters after the value.
TrailingCharacters,
So as the error code implies, you've got some trailing character which is not whitespace. In your snippet, you say:
println!("{}", content_of_msg[1]);
// Will print "{"username":"user","password":"password","email":"dwadwad"}"
If you literally copy and pasted the printed output here, I'd note that I wouldn't expect the output to be wrapped in the leading and trailing quotation marks. Did you include these yourself or were they part of what was printed? If they were printed, I suspect that's the source of your problem.
Edit:
In fact, I can nearly recreate this using a raw string with leading/trailing quotation marks in Rust:
extern crate serde_json;
#[cfg(test)]
mod tests {
#[test]
fn test_serde() {
let s =
r#""{"username":"user","password":"password","email":"dwadwad"}""#;
println!("{}", s);
let _res: serde_json::Value = serde_json::from_str(s).unwrap();
}
}
Running it via cargo test yields:
test tests::test_serde ... FAILED
failures:
---- tests::test_serde stdout ----
"{"username":"user","password":"password","email":"dwadwad"}"
thread 'tests::test_serde' panicked at 'called `Result::unwrap()` on an `Err` value: Error("trailing characters", line: 1, column: 4)', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
failures:
tests::test_serde
Note that my printed output also includes leading/trailing quotation marks and I also get a TrailingCharacter error, albeit at a different column.
Edit 2:
Based on your comment that you've added the wrapping quotations yourself, you've got a known good string (the one you've defined in Rust), and one which you believe should match it but doesn't (the one from Postman).
This is a data problem and so we should examine the data. You can adapt the below code to check the good string against the other:
#[test]
fn test_str_comp() {
// known good string we'll compare against
let good =
r#"{"username":"user","password":"password","email":"dwadwad"}"#;
// lengthened string, additional characters
// also n and a in username are transposed
let bad =
r#"{"useranme":"user","password":"password","email":"dwadwad"}abc"#;
let good_size = good.chars().count();
let bad_size = bad.chars().count();
for (idx, (c1, c2)) in (0..)
.zip(good.chars().zip(bad.chars()))
.filter(|(_, (c1, c2))| c1 != c2)
{
println!(
"Strings differ at index {}: (good: `{}`, bad: `{}`)",
idx, c1, c2
);
}
if good_size < bad_size {
let trailing = bad.chars().skip(good_size);
println!(
"bad string contains extra characters: `{}`",
trailing.collect::<String>()
);
} else if good_size > bad_size {
let trailing = good.chars().skip(bad_size);
println!(
"good string contains extra characters: `{}`",
trailing.collect::<String>()
);
}
assert!(false);
}
For my example, this yields the failure:
test tests::test_str_comp ... FAILED
failures:
---- tests::test_str_comp stdout ----
Strings differ at index 6: (good: `n`, bad: `a`)
Strings differ at index 7: (good: `a`, bad: `n`)
bad string contains extra characters: `abc`
thread 'tests::test_str_comp' panicked at 'assertion failed: false', src/lib.rs:52:9
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
failures:
tests::test_str_comp

Using Microsoft.FSharpLu to serialize JSON to a stream

I've been using the Newtonsoft.Json and Newtonsoft.Json.Fsharp libraries to create a new JSON serializer and stream to a file. I like the ability to stream to a file because I'm handling large files and, prior to streaming, often ran into memory issues.
I stream with a simple fx:
open Newtonsoft.Json
open Newtonsoft.Json.FSharp
open System.IO
let writeToJson (path: string) (obj: 'a) : unit =
let serialized = JsonConvert.SerializeObject(obj)
let fileStream = new StreamWriter(path)
let serializer = new JsonSerializer()
serializer.Serialize(fileStream, obj)
fileStream.Close()
This works great. My problem is that the JSON string is then absolutely cluttered with stuff I don't need. For example,
let m =
[
(1.0M, None)
(2.0M, Some 3.0M)
(4.0M, None)
]
let makeType (tup: decimal * decimal option) = {FieldA = fst tup; FieldB = snd tup}
let y = List.map makeType m
Default.serialize y
val it : string =
"[{"FieldA": 1.0},
{"FieldA": 2.0,
"FieldB": {
"Case": "Some",
"Fields": [3.0]
}},
{"FieldA": 4.0}]"
If this is written to a JSON and read into R, there are nested dataframes and any of the Fields associated with a Case end up being a list:
library(jsonlite)
library(dplyr)
q <- fromJSON("default.json")
x <-
q %>%
flatten()
x
> x
FieldA FieldB.Case FieldB.Fields
1 1 <NA> NULL
2 2 Some 3
3 4 <NA> NULL
> sapply(x, class)
FieldA FieldB.Case FieldB.Fields
"numeric" "character" "list"
I don't want to have to handle these things in R. I can do it but it's annoying and, if there are files with many, many columns, it's silly.
This morning, I started looking at the Microsoft.FSharpLu.Json documentation. This library has a Compact.serialize function. Quick tests suggest that this library will eliminate the need for nested dataframes and the lists associated with any Case and Field columns. For example:
Compact.serialize y
val it : string =
"[{
"FieldA": 1.0
},
{
"FieldA": 2.0,
"FieldB": 3.0
},
{
"FieldA": 4.0
}
]"
When this string is read into R,
q <- fromJSON("compact.json")
x <- q
x
> x
FieldA FieldB
1 1 NA
2 2 3
3 4 NA
> sapply(x, class)
FieldA FieldB
"numeric" "numeric
This is much simpler to handle in R. and I'd like to start using this library.
However, I don't know if I can get the Compact serializer to serialize to a stream. I see .serializeToFile, .desrializeStream, and .tryDeserializeStream, but nothing that can serialize to a stream. Does anyone know if Compact can handle writing to a stream? How can I make that work?
The helper to serialize to stream is missing from the Compact module in FSharpLu.Json, but you should be able to do it by following the C# example from
http://www.newtonsoft.com/json/help/html/SerializingJSON.htm. Something along the lines:
let writeToJson (path: string) (obj: 'a) : unit =
let serializer = new JsonSerializer()
serializer.Converters.Add(new Microsoft.FSharpLu.Json.CompactUnionJsonConverter())
use sw = new StreamWriter(path)
use writer = new JsonTextWriter(sw)
serializer.Serialize(writer, obj)