Concurrent MySQL writing with GORM leads to an error - mysql

I have implemented a complex csv import script in Golang.
I use a Workerpool implementation for it. Inside that workerpool, workers run through 1000s of small csv files, categorizing, tagging and branding the products.
And they all write to the same database table. So far so good.
The problem i'm facing is, that if i chose more than 2 workers, the process crashes with the following message randomly
The workflow is
foreach (csv) {
workerPool.submit(csv)
}
func worker(csv) {
foreach (line) {
import(line)
}
}
import(line) {
product = get(line)
product.category = determine_category(product)
product.brand = determine_brand(product)
save(brand)
product.tags = determine_tags(product)
//and after all
save(product)
}
I tried to wrap the save() calls in transactions, but it didn't help.
Now i have the following questions:
Is MySQL suited to save concurrently to 1 table?
If transactions are need to accomplish this, where should they be set?
Is the Go SQL Driver (where the error ALWAYS happens in packets.go:1102) suited to do this ?
Could anyone help me (maybe by hiring for a few hours)?
I'm completely stuck. I can also share the sourcecode if that helps. But I first wanted to know i you guess that it's rather my code or a general issue.

Open a new db connection in each goroutine (or thread, for languages that use threads).
MySQL's protocol is stateful, which means if multiple goroutines attempt to use the same connection, the requests and responses get very confused.
You would have the same problem trying to share any other kind of stateful protocol connection between goroutines.
For example ftp is also a stateful protocol, and that may be easier to understand. A client goroutine might send a message like "get file x" and the response should be a series of messages containing the content of that file. If another goroutine tries to use the same connection while that request/response is inprogress, both clients will be confused. The second goroutine will read packets that belong to a file it didn't request. The first goroutine who requested the file will find some packets it was expecting have already been read.
Similarly, MySQL's protocol does not support multiple client goroutines sharing a single connection.

Related

How to handle "Unexpected EOF at target" error from API calls?

I'm creating a Forge application which needs to get version information from a BIM 360 hub. Sometimes it works, but sometimes (usually after the code has already been run once this session) I get the following error:
Exception thrown: 'Autodesk.Forge.Client.ApiException' in mscorlib.dll
Additional information: Error calling GetItem: {
"fault":{
"faultstring":"Unexpected EOF at target",
"detail": {
"errorcode":"messaging.adaptors.http.flow.UnexpectedEOFAtTarget"
}
}
}
The above error will be thrown from a call to an api, such as one of these:
dynamic item = await itemApi.GetItemAsync(projectId, itemId);
dynamic folder = await folderApi.GetFolderAsync(projectId, folderId);
var folders = await projectApi.GetProjectTopFoldersAsync(hubId, projectId);
Where the apis are initialized as follows:
ItemsApi itemApi = new ItemsApi();
itemApi.Configuration.AccessToken = Credentials.TokenInternal;
The Ids (such as 'projectId', 'itemId', etc.) don't seem to be any different when this error is thrown and when it isn't, so I'm not sure what is causing the error.
I based my application on the .Net version of this tutorial: http://learnforge.autodesk.io/#/datamanagement/hubs/net
But I adapted it so I can retrieve multiple nodes asynchronously (for example, all of the nodes a user has access to) without changing the jstree. I did this to allow extracting information in the background without disrupting the user's workflow. The main change I made was to add another Route on the server side that calls "GetTreeNodeAsync" (from the tutorial) asynchronously on the root of the tree and then calls it on each of the returned children, then each of their children, and so on. The function waits until all of the nodes are processed using Task.WhenAll, then returns data from each of the nodes to the client;
This means that there could be many api calls running asynchronously, and there might be duplicate api calls if a node was already opened in the jstree and then it's information is requested for the background extraction, or if the background extraction happens more than once. This seems to be when the error is most likely to happen.
I was wondering if anyone else has encountered this error, and if you know what I can do to avoid it, or how to recover when it is caught. Currently, after this error occurs, it seems that every other api call will throw this error as well, and the only way I've found to fix it is to rerun the code (I use Visual Studio so I just rerun the server and client, and my browser launches automatically)
Those are sporadic errors from our apigee router due to latency issues in the authorization process that we are currently looking into internally.
When they occur please cease all your upcoming requests, wait for a few minutes and retry again. Take a look at stuff like this or this to help you out.
And our existing reports calling out similar errors seem to point to concurrency as one of the factors leading up to the issue so you might also want to limit your concurrent requests and see if that mitigate the issue.

Data Studio connector making multiple calls to API when it should only be making 1

I'm finalizing a Data Studio connector and noticing some odd behavior with the number of API calls.
Where I'm expecting to see a single API call, I'm seeing multiple calls.
In my apps script I'm keeping a simple tally which increments by 1 every url fetch and that is giving me the correct number I expect to see with getData().
However, in my API monitoring logs (using Runscope) I'm seeing multiple API requests for the same endpoint, and varying numbers for different endpoints in a single getData() call (they should all be the same). E.g.
I can't post the code here (client project) but it's substantially the same framework as the Data Connector code on Google's docs. I have caching and backoff implemented.
Looking for any ideas or if anyone has experienced something similar?
Thanks
Per the this reference, GDS will also perform semantic type detection if you aren't explicitly defining this property for your fields. If the query is semantic type detection, the request will feature sampleExtraction: true
When Data Studio executes the getData function of a community connector for the purpose of semantic detection, the incoming request will contain a sampleExtraction property which will be set to true.
If the GDS report includes multiple widgets with different dimensions/metrics configuration then GDS might fire multiple getData calls for each of them.
Kind of a late answer but this might help others who are facing the same problem.
The widgets / search filters attached to a graph issue getData calls of their own. If your custom adapter is built to retrieve data via API calls from third party services, data which is agnostic to the request.fields property sent forward by GDS => then these API calls are multiplied by N+1 (where N = the amout of widgets / search filters your report is implementing).
I could not find an official solution for this either, so I invented a workaround using cache.
The graph's request for getData (typically requesting more fields than the Search Filters) will be the only one allowed to query the API Endpoint. Before starting to do so it will store a key in the cache "cache_{hashOfReportParameters}_building" => true.
if (enableCache) {
cache.putString("cache_{hashOfReportParameters}_building", 'true');
Logger.log("Cache is being built...");
}
It will retrieve API responses, paginating in a look, and buffer the results.
Once it finished it will delete the cache key "cache_{hashOfReportParameters}building", and will cache the final merged results it buffered so far inside "cache{hashOfReportParameters}_final".
When it comes to filters, they also invoke: getData but typically with only up to 3 requested fields. First thing we want to do is make sure they cannot start executing prior to the primary getData call... so we add a little bit of a delay for things that might be the search filters / widgets that are after the same data set:
if (enableCache) {
var countRequestedFields = requestedFields.asArray().length;
Logger.log("Total Requested fields: " + countRequestedFields);
if (countRequestedFields <= 3) {
Logger.log('This seams to be a search filters.');
Utilities.sleep(1000);
}
}
After that we compute a hash on all of the moving parts of the report (date range, plus all of the other parameters you have set up that could influence the data retrieved form your API endpoints):
Now the best part, as long as the main graph is still building the cache, we make these getData calls wait:
while (cache.getString('cache_{hashOfReportParameters}_building') === 'true') {
Logger.log('A similar request is already executing, please wait...');
Utilities.sleep(2000);
}
After this loop we attempt to retrieve the contents of "cache_{hashOfReportParameters}_final" -- and in case we fail, its always a good idea to have a backup plan - which would be to allow it to traverse the API again. We have encountered ~ 2% error rate retrieving data we cached...
With the cached result (or buffered API responses), you just transform your response as per the schema GDS needs (which differs between graphs and filters).
As you start implementing this, you`ll notice yet another problem... Google Cache is limited to max 100KB per key. There is however no limit on the amount of keys you can cache... and fortunately others have encountered similar needs in the past and have come up with a smart solution of splitting up one big chunk you need cached into multiple cache keys, and gluing them back together into one object when retrieving is necessary.
See: https://github.com/lwbuck01/GASs/blob/b5885e34335d531e00f8d45be4205980d91d976a/EnhancedCacheService/EnhancedCache.gs
I cannot share the final solution we have implemented with you as it is too specific to a client - but I hope that this will at least give you a good idea on how to approach the problem.
Caching the full API result is a good idea in general to avoid round trips and server load for no good reason if near-realtime is good enough for your needs.

Python 3.4 Sockets sendall function

import socket
def functions():
print ("hello")
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = ('192.168.137.1', 20000)
sock.bind(server_address)
sock.listen(1)
conn, addr = sock.accept()
print ('Connected by', addr)
sock.listen(1)
conn.sendall(b"Welcome to the server")
My question is how to send a function to the client,
I know that conn.sendall(b"Welcome to the server") will data to the client.
Which can be decoded.
I would like to know how to send a function to a client like
conn.sendall(function()) - this does not work
Also I would like to know the function that would allow the client to receive the function I am sending
I have looked on the python website for a function that could do this but I have not found one.
The functionality requested by you is principally impossible unless explicitly coded on client side. If this were possible, one could write a virus which easily spreads into any remote machine. Instead, this is client right responsibility to decode incoming data in any manner.
Considering a case client really wants to receive a code to execute, the issue is that code shall be represented in a form which, at the same time,
is detached from server context and its specifics, and can be serialized and executed at any place
allows secure execution in a kind of sandbox, because a very rare client will allow arbitrary server code to do anything at the client side.
The latter is extremely complex topic; you can read any WWW browser security history - most of closed vulnerabilities are of issues in such sandboxing.
(There are environments when such execution is allowed and desired; e.g. Erlang cookie-based peering cluster. But, in such cluster, side B is also allowed to execute anything at side A.)
You should start with searching an execution environment (high-level virtual machine) which conforms to your needs in functionality and security. For Python, you'd look at multiprocessing module: its implementation of worker pools doesn't pass the code itself, but simplifies passing data for execution requests. Also, passing of arbitrary Python data without functions is covered with marshal and pickle modules.

What's best options for video streaming or max data transfer using SuperWebSocket

Minimum To Achieve:- Send nearly or more than 1 mb/second to other websocket clients.
Questions:--
Is it possible video streaming with SuperWebSocket?
What options/features of SuperWebSocket can be used like Asynch
mode/JsonCommands/CustomSession/etc to achieve fastest data
transfer?
How to sequence a big data sent in chunks but if received without any order at client or server side? Is there anything built in to sequence these chunks or I have to manually send sequence nos in message itself?
What I have tried:--
Multiple secure sessions with same port and different paths in javascript code
ws = new WebSocket(wss://localhost:8089/1/1)
ws = new WebSocket(wss://localhost:8089/2/2)
ws = new WebSocket(wss://localhost:8089/3/3)
with above sessions I send large data in chunks but it's not receiving in expected order at server/client side and also after successfully sending large chunk (size=55000kb) that session closes automatically!
I am looking into sample projects of SuperWebSocket but not sure where to go! I am open to try any option inside SuperWebsocket. Thanks
1) I am not sure it does, but if it provides an API to send Byte[], it may be enough.
2) No idea about this one, the documentation may explain it.
3) What do you mean without order? WebSockets is TCP based, so data segments sent in the same connection will arrive in the same order they were sent.
4) Why would you open different connections to the same site? There is also probably limitations about the connections that you can open to the same host. One should be OK, open several is not going to increment your bandwidth, only will increment your problems.
I develop a WebSocket server component that handles messages as Stream derived objects and has an acceptable performance so far, you may like to give it a try.

InstanceContextMode.Single in WCF wsHttpBinding,webHttpBinding and REST

I have recently started development on a relatively simple WCF REST service which returns JSON formatted results. At first everything worked great, and the service was quickly up and running.
The main function of the service is to return a large chunk of data extracted from a database. This data rarely changes, so I decided to try and setup a caching mechanism to speed things up. To do this I planned to set InstanceContextMode.Single and ConcurrencyMode.Multiple, and then with some thread locks, safely return a static cached result. Every 5 minutes or so, or whenever IIS decides to clear everything, the data would be re-fetched from the database.
My issue is InstanceContextMode.Single does not behave as expected. My understanding is a single instance of my WCF service class should be created and maintained. However the behaviour I have is a completely new instance of my Class is created per call. This include re-initialising all static variables.
I tried changing the web service from webHttpBinding (used for REST) to wsHttpBinding and using the service as a SOAP config, but this results in exactly the same behaviour.
What am I doing wrong!!! Have spent way too long trying to figure this out.
Any help would be great!.
Strange, can you try this and tell me what happen then?
ServiceThrottlingBehavior ThrottleBehavior = new ServiceThrottlingBehavior();
ThrottleBehavior.MaxConcurrentSessions = 1;
ThrottleBehavior.MaxConcurrentCalls = 1;
ThrottleBehavior.MaxConcurrentInstances = 1;
ServiceHost Host = ...
Host.Description.Behaviors.Add(ThrottleBehavior);
And [how] do you know your single service instance isn't "Single"? You saw multiple database connection from profiler? Is that what suggested to you why your service isn't a single instance? From your service operation implementation, do you do some of the work on a separate thread?