In my code the code is executed before doing all tasks. Whattä do I have to change on my code such that does all tasks before ending?
package main
import (
"fmt"
"math/rand"
"time"
)
// run x tasks at random intervals
// - a task is a goroutine that runs for 2 seconds.
// - a task runs concurrently to other task
// - the interval between task is between 0 and 2 seconds
func main() {
// set x to the number of tasks
x := 4
// random numbers generation initialization
random := rand.New(rand.NewSource(1234))
for num := 0; num < x; num++ {
// sleep for a random amount of milliseconds before starting a new task
duration := time.Millisecond * time.Duration(random.Intn(2000))
time.Sleep(duration)
// run a task
go func() {
// this is the work, expressed by sleeping for 2 seconds
time.Sleep(2 * time.Second)
fmt.Println("task done")
}()
}
}
Yes as #Laney mentions this can be done using both Waitgroups and channels. Refer code below.
Waitgroups:
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
// run x tasks at random intervals
// - a task is a goroutine that runs for 2 seconds.
// - a task runs concurrently to other task
// - the interval between task is between 0 and 2 seconds
func main() {
// set x to the number of tasks
x := 4
// random numbers generation initialization
var wg sync.WaitGroup
random := rand.New(rand.NewSource(1234))
for num := 0; num < x; num++ {
// sleep for a random amount of milliseconds before starting a new task
duration := time.Millisecond * time.Duration(random.Intn(2000))
time.Sleep(duration)
//
wg.Add(1)
// run a task
go func() {
// this is the work, expressed by sleeping for 2 seconds
time.Sleep(2 * time.Second)
fmt.Println("task done")
wg.Done()
}()
}
wg.Wait()
fmt.Println("All tasks done")
}
Output:
task done
task done
task done
task done
All tasks done
On playground : https://play.golang.org/p/V-olyX9Qm8
Using channels:
package main
import (
"fmt"
"math/rand"
"time"
)
// run x tasks at random intervals
// - a task is a goroutine that runs for 2 seconds.
// - a task runs concurrently to other task
// - the interval between task is between 0 and 2 seconds
func main() {
//Channel to indicate completion of a task, can be helpful in sending a result value also
results := make(chan int)
// set x to the number of tasks
x := 4
t := 0 //task tracker
// random numbers generation initialization
random := rand.New(rand.NewSource(1234))
for num := 0; num < x; num++ {
// sleep for a random amount of milliseconds before starting a new task
duration := time.Millisecond * time.Duration(random.Intn(2000))
time.Sleep(duration)
//
// run a task
go func() {
// this is the work, expressed by sleeping for 2 seconds
time.Sleep(2 * time.Second)
fmt.Println("task done")
results <- 1 //may be something possibly relevant to the task
}()
}
//Iterate over the channel till the number of tasks
for result := range results {
fmt.Println("Got result", result)
t++
if t == x {
close(results)
}
}
fmt.Println("All tasks done")
}
Output:
task done
task done
Got result 1
Got result 1
task done
Got result 1
task done
Got result 1
All tasks done
Playground : https://play.golang.org/p/yAFdDj5nhb
In Go, as most languages, the process will exit when the entrypoint main() function exits.
Because you're spawning a number of goroutines, the main function is ending before the goroutines are all done, causing the process to exit and not finish those goroutines.
As others have suggested, you want to block your main() function until all the goroutines are done, and a couple of the most common ways to do that are either using semaphores (sync.WaitGroup), or channels (go by example)
There are many options. For example, you can use channels or sync.WaitGroup
The program ends when main goroutine ends.
You may use:
waitgroup - it gives very convenient way to wait when all tasks are done
channels - read from channel is blocked until new data arrives or channel gets closed.
naïve sleep - good only for example purposes
Related
I wrote code that behaves weird and slow and I can't understand why.
What I'm trying to do is to download data from bigquery (using a query as an input) to a CSV file, then create a url link with this CSV so people can download it as a report.
I'm trying to optimize the process of writing the CSV as it takes some time and have some weird behavior.
The code iterates over bigquery results and pass each result to a channel for future parsing/writing using golang encoding/csv package.
This is the relevant parts with some debugging
func (s *Service) generateReportWorker(ctx context.Context, query, reportName string) error {
it, err := s.bigqueryClient.Read(ctx, query)
if err != nil {
return err
}
filename := generateReportFilename(reportName)
gcsObj := s.gcsClient.Bucket(s.config.GcsBucket).Object(filename)
wc := gcsObj.NewWriter(ctx)
wc.ContentType = "text/csv"
wc.ContentDisposition = "attachment"
csvWriter := csv.NewWriter(wc)
var doneCount uint64
go backgroundTimer(ctx, it.TotalRows, &doneCount)
rowJobs := make(chan []bigquery.Value, it.TotalRows)
workers := 10
wg := sync.WaitGroup{}
wg.Add(workers)
// start wrokers pool
for i := 0; i < workers; i++ {
go func(c context.Context, num int) {
defer wg.Done()
for row := range rowJobs {
records := make([]string, len(row))
for j, r := range records {
records[j] = fmt.Sprintf("%v", r)
}
s.mu.Lock()
start := time.Now()
if err := csvWriter.Write(records); err != {
log.Errorf("Error writing row: %v", err)
}
if time.Since(start) > time.Second {
fmt.Printf("worker %d took %v\n", num, time.Since(start))
}
s.mu.Unlock()
atomic.AddUint64(&doneCount, 1)
}
}(ctx, i)
}
// read results from bigquery and add to the pool
for {
var row []bigquery.Value
if err := it.Next(&row); err != nil {
if err == iterator.Done || err == context.DeadlineExceeded {
break
}
log.Errorf("Error loading next row from BQ: %v", err)
}
rowJobs <- row
}
fmt.Println("***done loop!***")
close(rowJobs)
wg.Wait()
csvWriter.Flush()
wc.Close()
url := fmt.Sprintf("%s/%s/%s", s.config.BaseURL s.config.GcsBucket, filename)
/// ....
}
func backgroundTimer(ctx context.Context, total uint64, done *uint64) {
ticker := time.NewTicker(10 * time.Second)
go func() {
for {
select {
case <-ctx.Done():
ticker.Stop()
return
case _ = <-ticker.C:
fmt.Printf("progress (%d,%d)\n", atomic.LoadUint64(done), total)
}
}
}()
}
bigquery Read func
func (c *Client) Read(ctx context.Context, query string) (*bigquery.RowIterator, error) {
job, err := c.bigqueryClient.Query(query).Run(ctx)
if err != nil {
return nil, err
}
it, err := job.Read(ctx)
if err != nil {
return nil, err
}
return it, nil
}
I run this code with query that have about 400,000 rows. the query itself take around 10 seconds, but the whole process takes around 2 minutes
The output:
progress (112346,392565)
progress (123631,392565)
***done loop!***
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
worker 3 took 1m16.728143875s
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
worker 3 took 1m13.525662666s
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
worker 4 took 1m17.576536375s
progress (392565,392565)
You can see that writing first 112346 rows was fast, then for some reason worker 3 took 1.16minutes (!!!) to write a single row, which cause the other workers to wait for the mutex to be released, and this happened again 2 more times, which caused the whole process to take more than 2 minutes to finish.
I'm not sure whats going and how can I debug this further, why I have this stalls in the execution?
As suggested by #serge-v, you can write all the records to a local file and then transfer the file as a whole to GCS. To make the process happen in a shorter time span you can split the files into multiple chunks and can use this command : gsutil -m cp -j where
gsutil is used to access cloud storage from command line
-m is used to perform a parallel multi-threaded/multi-processing copy
cp is used to copy files
-j applies gzip transport encoding to any file upload. This also saves network bandwidth while leaving the data uncompressed in Cloud Storage.
To apply this command in your go Program you can refer to this Github link.
You could try implementing profiling in your Go program. Profiling will help you analyze the complexity. You can also find the time consumption in the program through profiling.
Since you are reading millions of rows from BigQuery you can try using the BigQuery Storage API. It Provides faster access to BigQuery-managed Storage than Bulk data export. Using BigQuery Storage API rather than the iterators that you are using in Go program can make the process faster.
For more reference you can also look into the Query Optimization techniques provided by BigQuery.
I have installed libvirt-dev, compiled and run that code on a Ubuntu box:
package main
import (
"fmt"
"github.com/libvirt/libvirt-go"
)
func main() {
conn, _ := libvirt.NewConnect("qemu:///system")
defer conn.Close()
cb := func(c *libvirt.Connect, d *libvirt.Domain, event *libvirt.DomainEventLifecycle) {
fmt.Println(fmt.Sprintf("Event %d", event.Event))
}
_, err := conn.DomainEventLifecycleRegister(nil, cb)
if err != nil {
panic(fmt.Sprintf("cannot register libvirt domain event: %s", err))
}
}
And got: cannot register libvirt domain event: virError(Code=1, Domain=0, Message='internal error: could not initialize domain event timer')
I'm using libvirt-go while digital ocean go-libvirt LifecycleEvents just works fine...
Any ideas?
You've not registered any event loop implementation.
The easy way is to call EventRegisterDefaultImpl before opening a libvirt connection, and then spawn a goroutine that runs EventRunDefaultImpl in an infinite loop
The harder way is to provide your own custom event loop implementation using EventRegisterImpl
I have written an application in Haskell that does the following:
Recursively list a directory,
Parse the JSON files from the directory list,
Look for matching key-value pairs, and
Return filenames where matches have been found.
My first version of this application was the simplest, naive version I could write, but I noticed that space usage seemed to increase monotonically.
As a result, I switched to conduit, and now my primary functionality looks like this:
conduitFilesFilter :: ProjectFilter -> Path Abs Dir -> IO [Path Abs File]
conduitFilesFilter projFilter dirname' = do
(_, allFiles) <- listDirRecur dirname'
C.runConduit $
C.yieldMany allFiles
.| C.filterMC (filterMatchingFile projFilter)
.| C.sinkList
Now my application has bounded memory usage but it's still quite slow. Out of this, I have two questions.
1)
I used stack new to generate the skeleton to create this application and it by default uses the ghc options -threaded -rtsopts -with-rtsopts=-N.
The surprising thing (to me) is that the application uses all processors available to it (about 40 in the target machine) when I actually go to run it. However, I didn't write any part of the application to be run in parallel (I considered it, actually).
What's running in parallel?
2)
Additionally, most of the JSON files are really large (10mb) and there are probably 500k of them to be traversed. This means my program is very slow as a result of all the Aeson-decoding. My idea was to run my filterMatchingFile part in parallel, but looking at the stm-conduit library, I can't see an obvious way to run this middle action in parallel across a handful of processors.
Can anyone suggest a way to smartly parallelize my function above using stm-conduit or some other means?
Edit
I realized that I could break up my readFile -> decodeObject -> runFilterFunction into separate parts of the conduit and then I could use stm-conduit there with a bounded channel. Maybe I'll give it a shot...
I ran my application with +RTS -s (I reconfigured it to -N4) and I see the following:
115,961,554,600 bytes allocated in the heap
35,870,639,768 bytes copied during GC
56,467,720 bytes maximum residency (681 sample(s))
1,283,008 bytes maximum slop
145 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 108716 colls, 108716 par 76.915s 20.571s 0.0002s 0.0266s
Gen 1 681 colls, 680 par 0.530s 0.147s 0.0002s 0.0009s
Parallel GC work balance: 14.99% (serial 0%, perfect 100%)
TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.007s elapsed)
MUT time 34.813s ( 42.938s elapsed)
GC time 77.445s ( 20.718s elapsed)
EXIT time 0.000s ( 0.010s elapsed)
Total time 112.260s ( 63.672s elapsed)
Alloc rate 3,330,960,996 bytes per MUT second
Productivity 31.0% of total user, 67.5% of total elapsed
gc_alloc_block_sync: 188614
whitehole_spin: 0
gen[0].sync: 33
gen[1].sync: 811204
From your program description, there is no reason for it to have increasing memory usage. I think it was accidental memory leak from missed lazy computation. This can be easily detected by heap profiling: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html#hp2ps-rendering-heap-profiles-to-postscript. Other possible reason is that runtime does not release all memory back to OS. Until some threshold, it will keep holding to memory proportional to the largest file processed. This may look as a memory leak if tracked through process RSS size.
-A32m option increases nursery size. It lets your program allocate more memory before garbage collection is triggered. Stats shows that very little memory is retained during GC, so less often it happens, more time program spends doing actual work.
Prompted by Michael Snoyman on Haskell Cafe, who pointed out that my first version was not truly taking advantage of Conduit's streaming capabilities, I rewrote my Conduit version of the application (without using stm-conduit). This was a large improvement: my first Conduit version was operating over all data and I didn't realize this.
I also increased the nursery size and this increased my productivity by doing garbage collection less frequently.
My revised function ended up looking like this:
module Search where
import Conduit ((.|))
import qualified Conduit as C
import Control.Monad
import Control.Monad.IO.Class (MonadIO, liftIO)
import Control.Monad.Trans.Resource (MonadResource)
import qualified Data.ByteString as B
import Data.List (isPrefixOf)
import Data.Maybe (fromJust, isJust)
import System.Path.NameManip (guess_dotdot, absolute_path)
import System.FilePath (addTrailingPathSeparator, normalise)
import System.Directory (getHomeDirectory)
import Filters
sourceFilesFilter :: (MonadResource m, MonadIO m) => ProjectFilter -> FilePath -> C.ConduitM () String m ()
sourceFilesFilter projFilter dirname' =
C.sourceDirectoryDeep False dirname'
.| parseProject projFilter
parseProject :: (MonadResource m, MonadIO m) => ProjectFilter -> C.ConduitM FilePath String m ()
parseProject (ProjectFilter filterFunc) = do
C.awaitForever go
where
go path' = do
bytes <- liftIO $ B.readFile path'
let isProj = validProject bytes
when (isJust isProj) $ do
let proj' = fromJust isProj
when (filterFunc proj') $ C.yield path'
My main just runs the conduit and prints those that pass the filter:
mainStreamingConduit :: IO ()
mainStreamingConduit = do
options <- getRecord "Search JSON Files"
let filterFunc = makeProjectFilter options
searchDir <- absolutize (searchPath options)
itExists <- doesDirectoryExist searchDir
case itExists of
False -> putStrLn "Search Directory does not exist" >> exitWith (ExitFailure 1)
True -> C.runConduitRes $ sourceFilesFilter filterFunc searchDir .| C.mapM_ (liftIO . putStrLn)
I run it like this (without the stats, typically):
stack exec search-json -- --searchPath $FILES --name NAME +RTS -s -A32m -n4m
Without increasing nursery size, I get a productivity around 30%. With the above, however, it looks like this:
72,308,248,744 bytes allocated in the heap
733,911,752 bytes copied during GC
7,410,520 bytes maximum residency (8 sample(s))
863,480 bytes maximum slop
187 MB total memory in use (27 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 580 colls, 580 par 2.731s 0.772s 0.0013s 0.0105s
Gen 1 8 colls, 7 par 0.163s 0.044s 0.0055s 0.0109s
Parallel GC work balance: 35.12% (serial 0%, perfect 100%)
TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.006s elapsed)
MUT time 26.155s ( 31.602s elapsed)
GC time 2.894s ( 0.816s elapsed)
EXIT time -0.003s ( 0.008s elapsed)
Total time 29.048s ( 32.432s elapsed)
Alloc rate 2,764,643,665 bytes per MUT second
Productivity 90.0% of total user, 97.5% of total elapsed
gc_alloc_block_sync: 3494
whitehole_spin: 0
gen[0].sync: 15527
gen[1].sync: 177
I'd still like to figure out how to parallelize the filterProj . parseJson . readFile part, but for now I'm satisfied with this.
I figured out how to run this application using stm-conduit with some help from the Haskell wiki on parallelism and a Stack Overflow answer that talks about waiting for threads to end before main exits.
The way it works is that I create a channel that holds all of the filenames to be operated on. Then, I fork a bunch of threads that each runs a Conduit with the filepath-channel as a Source. I track all of the child threads and wait for them to finish.
Maybe this solution will be useful for someone else?
Not all of my lower-level filter functions are present, but the gist of it is that I have a Conduit that tests some JSON. If it passes, then it yields the FilePath.
Here's my main in entirety:
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Conduit ((.|))
import qualified Conduit as C
import Control.Concurrent
import Control.Monad (forM_)
import Control.Monad.IO.Class (liftIO)
import Control.Concurrent.STM
import Control.Monad.Trans.Resource (register)
import qualified Data.Conduit.TMChan as STMChan
import Data.Maybe (isJust, fromJust)
import qualified Data.Text as T
import Options.Generic
import System.Directory (doesDirectoryExist)
import System.Exit
import Search
data Commands =
Commands { searchPath :: String
, par :: Maybe Int
, project :: Maybe T.Text
, revision :: Maybe T.Text
} deriving (Generic, Show)
instance ParseRecord Commands
makeProjectFilter :: Commands -> ProjectFilter
makeProjectFilter options =
let stdFilts = StdProjectFilters
(ProjName <$> project options)
(Revision <$> revision options)
in makeProjectFilters stdFilts
main :: IO ()
main = do
options <- getRecord "Search JSON Files"
-- Would user like to run in parallel?
let runner = if isJust $ par options
then mainSTMConduit (fromJust $ par options)
else mainStreamingConduit
-- necessary things to search files: search path, filters to use, search dir exists
let filterFunc = makeProjectFilter options
searchDir <- absolutize (searchPath options)
itExists <- doesDirectoryExist searchDir
-- Run it if it exists
case itExists of
False -> putStrLn "Search Directory does not exist" >> exitWith (ExitFailure 1)
True -> runner filterFunc searchDir
-- Single-threaded version with bounded memory usage
mainStreamingConduit :: ProjectFilter -> FilePath -> IO ()
mainStreamingConduit filterFunc searchDir = do
C.runConduitRes $
sourceFilesFilter filterFunc searchDir .| C.mapM_C (liftIO . putStrLn)
-- Multiple-threaded version of this program using channels from `stm-conduit`
mainSTMConduit :: Int -> ProjectFilter -> FilePath -> IO ()
mainSTMConduit nrWorkers filterFunc searchDir = do
children <- newMVar []
inChan <- atomically $ STMChan.newTBMChan 16
_ <- forkIO . C.runResourceT $ do
_ <- register $ atomically $ STMChan.closeTBMChan inChan
C.runConduitRes $ C.sourceDirectoryDeep False searchDir .| STMChan.sinkTBMChan inChan True
forM_ [1..nrWorkers] (\_ -> forkChild children $ runConduitChan inChan filterFunc)
waitForChildren children
return ()
runConduitChan :: STMChan.TBMChan FilePath -> ProjectFilter -> IO ()
runConduitChan inChan filterFunc = do
C.runConduitRes $
STMChan.sourceTBMChan inChan
.| parseProject filterFunc
.| C.mapM_C (liftIO . putStrLn)
waitForChildren :: MVar [MVar ()] -> IO ()
waitForChildren children = do
cs <- takeMVar children
case cs of
[] -> return ()
m:ms -> do
putMVar children ms
takeMVar m
waitForChildren children
forkChild :: MVar [MVar ()] -> IO () -> IO ThreadId
forkChild children io = do
mvar <- newEmptyMVar
childs <- takeMVar children
putMVar children (mvar:childs)
forkFinally io (\_ -> putMVar mvar ())
Note: I'm using stm-conduit 3.0.0 with conduit 1.12.1, which is why I needed to include the boolean argument:
STMChan.sinkTBMChan inChan True
In version 4.0.0 of stm-conduit, this function automatically closes the channel, so the boolean argument has been removed.
I'm having serious problems with mysql using pthreads. The error I get after ending my program:
"Error in my_thread_global_end(): 1 threads didn't exit"
I called mysql_library_init in main before starting any threads. For the sake of it, I just started 1 thread. After the thread is closed (using pthread_join), I call mysql_library_end in main. In the pthread itself I call mysql_init. For some reason this seems incorrect cause I get the error. I use MySQL 5.6 and link with libmysqlclient.a.
The mysql manual is extremely unclear and contradictory, so I hope someone with a logical mind can explain this to me:
"In a nonmulti-threaded environment, mysql_init invokes mysql_library_init automatically as necessary. However, mysql_library_init is not thread-safe in a multi-threaded environment, and thus neither is mysql_init. Before calling mysql_init, either call mysql_library_init prior to spawning any threads, or use a mutex to protect the mysql_library_init call. This should be done prior to any other client library call."
First line: So mysql_init ONLY invokes mysql_library_init in a NONmulti-threaded environment "when needed" (when is it needed anyway in a NONmulti-threaded environment?) and so I can conlcude from this that mysql_init() thinks it is NOT needed in a multi-threaded environment? I guess not, so fine, I call mysql_library_init in my main... Then I read everywhere I should also call mysql_init within the thread after that. I want each thread to have his own connection, so fine, I also do that so each thread have their own MYSQL struct. But the manual sais mysql_init is not thread safe... Uhm, ok... So just with 1 thread, I still have the problem...
main -> mysql_library_init
main -> create 1 pthread
pthread -> mysql_init
pthread -> mysql_real_connect
pthread -> mysql_close
....
I press Ctrl C after a few seconds (mysql was closed by now in the thread) so the cleaning up starts:
main -> pthread_cancel
main -> pthread_join
main -> mysql_library_end
RESULT: Error in my_thread_global_end: 1 threads didn't exit
........
int main( void )
{
if ( mysql_library_init( 0, NULL, NULL ) != 0 ) { ... }
if ( mysql_thread_safe() ) { ... } // This goes fine
sem_init( &queue.totalStored, 0, 0 );
pthread_mutex_init( &mutex_bees, NULL );
pthread_create( &workerbees[tid], &attr, BeeWork, ( void * ) tid );
pthread_attr_destroy( &attr );
while ( recv_signal == 0 )
{
errno = 0;
sock_c = accept( sock_s, NULL, NULL );
if ( ( sock_c == -1 ) && ( errno == EINTR ) )
{
// do stuff
if ( recv_signal == SIGHUP ) { /* do stuff*/ }
} else { /* do stuff */ }
}
// CLEANUP
close( sock_s );
RC = pthread_cancel( workerbees[tid] );
if ( RC != 0 ) { Log( L_ERR, "Unsuccessful pthread_cancel()" ); }
// WAIT FOR THREADS TO FINISH WORK AND EXIT
RC = pthread_join( workerbees[tid], &res );
if ( RC != 0 ) { Log( L_ERR, "Error: Unsuccessful pthread_join()" ); }
if ( res == PTHREAD_CANCELED )
{ /* print debug stuff */ }
else { /* print debug stuff */ }
mysql_library_end();
sem_destroy( &queue.totalStored );
exit( 0 );
}
void *BeeWork( void *t )
{
// DISABLE SIGNALS THAT main() ALREADY LISTENS TO
sigemptyset( &sigset );
sigaddset( &sigset, SIGINT );
sigaddset( &sigset, SIGTERM );
sigaddset( &sigset, SIGQUIT );
sigaddset( &sigset, SIGHUP );
pthread_sigmask( SIG_BLOCK, &sigset, NULL );
MYSQL *conn;
conn = mysql_init( NULL );
if ( ! mysql_real_connect( conn, server, prefs.mysql_user, prefs.mysql_pass, prefs.mysql_db, 0, prefs.mysql_sock, 0 ) ) { /* error */ }
mysql_close( conn );
// Do stuff
...
pthread_exit( ( void * ) t );
}
I guess I can answer my own question, I found out my pthread cleanup handler was not executed (installed with pthread_cleanup_push) and the end of the code with pthread_exit was called sooner than main could cancel the thread. I did a pthread_cleanup_pop( 0 ) and changed it to pthread_cleanup_pop( 1 ) so the cleanup handler also got executed when the thread exits sooner than main could cancel. In this cleanup handler, now mysql_thread_end actually got a chance to run and it fixed the problem.
Addition: Apparently, the MySQL client library is not designed to use connection handles (MYSQL*) across threads.
Do not pass a MYSQL* from one thread to another.
Do not use std::async or simlilar in conjunction with MySQL functions. (Code may or may not be executed in a separate thread.)
so I have the following test Go code which is designed to read from a binary file through stdin, and send the data read to a channel, (where it would then be processed further). In the version I've given here, it only reads the first two values from stdin, although that's fine as far as showing the problem is concerned.
package main
import (
"fmt"
"io"
"os"
)
func input(dc chan []byte) {
data := make([]byte, 2)
var err error
var n int
for err != io.EOF {
n, err = os.Stdin.Read(data)
if n > 0 {
dc <- data[0:n]
}
}
}
func main() {
dc := make(chan []byte, 1)
go input(dc)
fmt.Println(<-dc)
}
To test it, I first build it using go build, and then send data to it using the command-
./inputtest < data.bin
The data I am using currently to test is just random binary data created using the openssl command.
The problem I am having is that it misses the first values from Stdin, and only gives the second and greater values. I think this is to do with the channel, as the same script with the channel removed produces the correct data. Has anyone come across this before? For example, I get the following output when running this command-
./inputtest < data.bin
[36 181]
Whereas I should be getting-
./inputtest < data.bin
[72 218]
(The binary data is the same in both instances.)
You're overwriting your buffer on every read and you've got a channel buffer, so you'll lose data every time there's space in the channel.
Try something like this (not tested, written on tablet, etc...):
import "os"
func input(dc chan []byte) error {
defer close(dc)
for {
data := make([]byte, 2)
n, err := os.Stdin.Read(data)
if n > 0 {
dc <- data[0:n]
}
if err != nil {
return err
}
}
return nil
}