Consider a small Go application that reads a large JSON file 2GB+, marshals the JSON data into a struct, and POSTs the JSON data to a web service endpoint.
The web service receiving the payload changed its functionality, and now has a limit of 25MB per payload. What would be the best approach to overcome this issue using Go? I've thought of the following, however I'm not sure it is the best approach:
Creating a function to split the large JSON file into multiple smaller ones (up to 20MB), and then iterate over the files sending multiple smaller requests.
Similar function to the one being used to currently send the entire JSON payload:
func sendDataToService(data StructData) {
payload, err := json.Marshal(data)
if err != nil {
log.Println("ERROR:", err)
}
request, err := http.NewRequest("POST", endpoint, bytes.NewBuffer(payload))
if err != nil {
log.Println("ERROR:", err)
}
client := &http.Client{}
response, err := client.Do(request)
log.Println("INFORMATIONAL:", request)
if err != nil {
log.Println("ERROR:", err)
}
defer response.Body.Close()
}
You can break the input into chunks and send each piece individually:
dec := json.NewDecoder(inputStream)
tok, err := dec.Token()
if err != nil {
return err
}
if tok == json.Delim('[') {
for {
var obj json.RawMessage
if err := dec.Decode(&obj); err != nil {
return err
}
// Here, obj contains one element of the array. You can send this
// to the server.
if !dec.More() {
break
}
}
}
As the server-side can process data progressively, I assume that the large JSON object can be split into smaller pieces. From this point, I can propose several options.
Use HTTP requests
Pros: Pretty simple to implement on the client-side.
Cons: Making hundreds of HTTP requests might be slow. You will also need to handle timeouts - this is additional complexity.
Use WebSocket messages
If the receiving side supports WebSockets, a step-by-step flow will look like this:
Split the input data into smaller pieces.
Connect to the WebSocket server.
Start sending messages with the smaller pieces till the end of the file.
Close connection to the server.
This solution might be more performant as you won't need to connect and disconnect from the server each time you send a message, as you'd do with HTTP.
However, both solutions suppose that you need to assemble all pieces on the server-side. For example, you would probably need to send along with the data a correlation ID to let the server know what file you are sending right now and a specific end-of-file message to let the server know when the file ends. In the case of the WebSocket server, you could assume that the entire file is sent during a single connection session if it is relevant.
Related
Here is my minimal .proto file:
syntax = "proto3";
message getDhtParams {}
message DhtContents {
string dht_contents=1;
}
service MyApp {
rpc getDhtContent(getDhtParams) returns (DhtContents) {}
}
Two things to note related to the above proto file:
It is a minimal file. There is a lot more to it.
The server is already generated and running. The server is implemented in Python.
I am writing client in Go. And this is the fetching code I have come up with:
func fetchDht() (*pb.DhtContents, error) {
// Set up a connection to the server.
address := "localhost:9998"
conn, err := grpc.Dial(address, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
log.Fatalf("did not connect: %v", err)
}
defer conn.Close()
client := pb.NewMyAppClient(conn)
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
r, err := client.GetDhtContent(ctx, &pb.GetDhtParams{})
if err != nil {
return nil, errors.New("could not get dht contents")
}
return r, nil
}
For sake of simplicity, I have tripped down the output, but the output looks something like this:
dht_contents:"{'node_ids': ['dgxydhlqoopevxv'], 'peer_addrs': [['192.168.1.154', '41457']], 'peer_meta': [{'peer_id': {'nodeID': 'dgxydhlqoopevxv', 'key': 'kdlvjdictuvgxspwkdizqryr', 'mid': 'isocvavbtzkxeigkkrubzkcx', 'public_key': 'uhapwxnfeqqmnojsaijghhic', '_address': 'xklqlebqngpkxb'}, 'ip_addrs': ['192.168.1.154', '41457'], 'services': [{'service_input': '', 'service_output': '', 'price': 0}], 'timestamp': 1661319968}]}"
A few things to note about this response:
It starts with dht_contents: which I know is a field of DhtContents message.
This could be an issue from the server side; in that case I will inform the service developer. But the json enclosed is not a valid JSON as it uses single quotes.
My questions:
What is an elegant way to deal with that dht_contents? There must be the protobuf/grpc way. I aim to get the contents between double quotes.
How do I convert the content to JSON? I have already created the struct to unmarshal.
It would be enough if I am also able to convert the response which is of type *pb.DhtContents to []byte, from there I can convert it to JSON.
The generated code should have a method which will get rid of dht_contents:" from the start and " from the end.
In your case, that method should be called GetDhtContents().
You can modify your fetchDht function to something like this:
func fetchDht() (string, error) {
address := "localhost:9998"
// ...
if err != nil {
return nil, errors.New("could not get dht contents")
}
return r.GetDhtContents(), nil
}
From there on, you can work on making it a valid JSON by replacing single quotes to double quotes. Or it may be handled on the service end.
there is the methods generated by proto file to get the content from the result(the "r"), then use r.Get..., you could get the content.
convert string to the type you want.
suggest:
change proto type to bytes
then json.Unmarshal([r.Get...],[dst])
I am trying to pull data on mails coming into an API from an email testing tool mailhog.
If I use a call to get a list of emails e.g
GET /api/v1/messages
I can load this data into a struct with no issues and print out values I need.
However if I use a different endpoint that is essentially a stream of new emails coming in, I have different behavior. When I run my go application I get no output whatsoever.
Do I need to do like a while loop to constantly listen to the endpoint to get the output?
My end goal is to pull some information from emails as they come in and then pass them into a different function.
Here is me trying to access the streaming endpoint
https://github.com/mailhog/MailHog/blob/master/docs/APIv1.md
res, err := http.Get("http://localhost:8084/api/v1/events")
if err != nil {
panic(err.Error())
}
body, err := ioutil.ReadAll(res.Body)
if err != nil {
panic(err.Error())
}
var data Email
json.Unmarshal(body, &data)
fmt.Printf("Email: %v\n", data)
If I do a curl request at the mailhog service with the same endpoint, I do get output as mails come in. However I cant seem to figure out why I am getting no output via my go app. The app does stay running just I dont get any output.
I am new to Go so apologies if this is a really simple question
From ioutil.ReadAll documentation:
ReadAll reads from r until an error or EOF and returns the data it read.
When you use to read the body of a regular endpoint, it works because the payload has an EOF: the server uses the header Content-Length to tell how many bytes the body response has, and once the client read that many bytes, it understands that it has read all of the body and can stop.
Your "streaming" endpoint doesn't use Content-Length though, because the body has an unknown size, it's supposed to write events as they come, so you can't use ReadAll in this case. Usually, in this case, you are supposed to read line-by-line, where each line represents an event. bufio.Scanner does exactly that:
res, err := http.Get("http://localhost:8084/api/v1/events")
if err != nil {
panic(err.Error())
}
scanner := bufio.NewScanner(res.Body)
for e.scanner.Scan() {
if err := e.scanner.Err(); err != nil {
panic(err.Error())
}
event := e.scanner.Bytes()
var data Email
json.Unmarshal(event, &data)
fmt.Printf("Email: %v\n", data)
}
curl can process the response as you expect because it checks that the endpoint will stream data, so it reacts accordinly. It may be helpful to add the response curl gets to the question.
I have a very simple Go webserver. It's job is to receive an inbound json payload. It then publishes the payload to one or more services that expect a byte array. The payload doesn't need to be checked. Just sent over.
In this case, it receives an inbound job and sends it to Google PubSub. It might be another service - it doesn't really matter. I'm trying to find the most efficient way to convert the object to a byte array without first decoding it.
Why? Seems a bit wasteful to decode and convert to JSON on one server, only to unmarshal it later. Plus, I don't want to maintain two identical structs in two packages.
How is it possible to convert the io.ReadCloser to a byte array so I only need to unmarshal once. I tried something like this answer but don't think that's the most efficient way either:
From io.Reader to string in Go
My http server code looks like this:
func Collect(d DbManager) http.HandlerFunc {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json; charset=utf-8")
code := 422
obj := Report{}
response := Response{}
response.Message = "Invalid request"
decoder := json.NewDecoder(r.Body)
decoder.Decode(&obj)
if obj.Device.MachineType != "" {
msg,_ := json.Marshal(obj)
if d.Publish(msg, *Topic) {
code = 200
}
response.Message = "Ok"
}
a, _ := json.Marshal(response)
w.WriteHeader(code)
w.Write(a)
return
})
}
You convert a Reader to bytes, by reading it. There's not really a more efficient way to do it.
body, err := ioutil.ReadAll(r.Body)
If you are unconditionally transferring bytes from an io.Reader to an io.Writer, you can just use io.Copy
Say I have an application that on the backend I want to use raw tcp so that I can have bi-directional communication between different services. In this application I want to send over a payload consisting of a json object, when the json data is sent, every few messages, it gets cut off and the remainder is then clumped to the next response. I don't want to use something like say websockets due to the time used to upgrade from http. What is a good (and hopefully best) way to ensure that that json object will go from one node and be read from another node as the whole json object?
I know sending and receiving buffers of a set size and a message for a heartbeet are the rule of thumb, but can I see an example? Preferably in Javascript (node's net stdlib) or Golang (it's net stdlib) because those are my most proficient languages, though I don't really care what language it is done in ultimately.
(I know there's a few questions out there asking similar things regarding ensuring delivery of message with tcp, but none asked for an example that I found)
I know tcp is a stream. I'm just asking of a way to ensure that when writing a specific json object to this stream, how do I ensure I get the same json object on the other end, as in "send json object X from node a, ok node b received that same object X"
You don't need a heartbeat, or fixed size messages for delivery confirmation. If you need to ensure delivery, you need an application level acknowledgement. If you need to ensure delivery of the correct message, you'll need to include a unique message ID to acknowledge. If you need to ensure that the message is unaltered, you'll need to include a checksum or MAC.
Here it sounds like you're having trouble with message framing. While there are many ways to frame your messages (simple length-prefix, type-length-value, HTTP/1.1, etc.), a simple solution is to use the built-in json.Encoder and json.Decoder.
Example client, which sends a "PING" message every second:
type Message struct {
Payload string
}
func sendMessages(c net.Conn) {
message := Message{}
encoder := json.NewEncoder(c)
for i := 0; ; i++ {
message.Payload = fmt.Sprintf("PING %d", i)
err := encoder.Encode(message)
if err != nil {
log.Fatal(err)
}
time.Sleep(time.Second)
}
}
Example Server:
type Message struct {
Payload string
}
func receiveMessages(c net.Conn) {
m := Message{}
decoder := json.NewDecoder(c)
for {
err := decoder.Decode(&m)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Received: %#v\n", m)
}
}
I am stumped on what seems like a very simple problem.
I am receiving a json object from a client. It looks like this:
{
"user": "test#example.com"
}
I need to simply pass this on to another part of the api as a POST request. This is what i've got until now:
//Decode incomming json
decoder := json.NewDecoder(r.Body)
var user UserInformation
err := decoder.Decode(&user)
if err != nil {
log.Println(err)
}
jsonUser, _ := json.Marshal(user)
log.Println(string(jsonUser[:])) //Correct output
buffer := bytes.NewBuffer(jsonUser)
log.Println(string(buffer.Bytes()[:])) //Also correct
resp, err := http.Post("http://example.com/api/has_publisher", "application/json", buffer)
if err != nil {
log.Println(err)
}
As I cannot test this program on a live system, I verified the resulting post request with wireshark only to find that the content is missing along with Content-Length being 0. For some reason, http.Post doesn't read from the buffer.
Am i missing something here? I would greatly appreciate if someone could point me in the right direction.
Thanks!
Shouldn`t be the root cause but replace
buffer := bytes.NewBuffer(jsonUser)
with
buffer := bytes.NewReader(jsonUser)
It is more likely that your test setup is the root cause. I assume you are pointing to a non-existing endpoint. This would result in a failure (TCP SYN fails) before the actual HTTP POST is send.
Check if you can use mockable.io as an alternative to mock your backend.