I want to clarify more about the functionality of tf.data.
Is this library an example of incremental/progressive loading (look here 5. Stream Data or Use Progressive Loading section)?
The reason why this is done is because the whole dataset does not need to be in memory but only a batch of the dataset
Related
Is there a way to upload a file to IPFS chunk-by-chunk using the js-ipfs-http-client package? To improve the concurrency model of my application, I’d like to avoid holding complete files in memory and instead just work with the chunks.
In the source, I can see a method called start is not implemented. Is this supposed to enable this behavior in the long run?
I asked this question on discuss.ipfs.io as well a little while back.
I have given a task to load a json file into titandb with dynamodb as back end.Is there any java tutorial or if possible please upload java sample coding...
thanks.
Titan is an abstraction layer so whether you use Cassandra, dynamo, hbase, etc, you merely need to find Titan data loading instructions. They are a bit dated but you might want to start with these blog posts:
http://thinkaurelius.com/2014/05/29/powers-of-ten-part-i/
http://thinkaurelius.com/2014/06/02/powers-of-ten-part-ii/
The code examples work with an older version of Titan (the schema portion) but the concepts still apply.
You will find that the strategy for data loading with Titan has a lot to do with the size of your graph. You said you are loading "a JSON file" so I imagine you have a smaller graph in the millions of edges. In this case, a simple groovy script will likely suffice. Write a script to parse your JSON and write the data to the Titan.
I am having a 50MB csv data, is there any possibility i can compress the data to load
d3.js/ dc.js charts, now the page is too slow i would like to optimise it.. any help is much appreciated
Thanks in advance
I think it would be the best to implement a lazy loading solution. The idea is simple: you create a small, say 2MB CSV file and render your visualization using it. At the same time you start loading your full 50MB CSV.
Here is a small snippet:
DS = {} // your app holder for keeping global scope clean
d3.csv('data/small.csv', function(err, smallCSV) {
// Start loading big file immediately
d3.csv('data/big.csv', function(err, bigCSV) {
DS.data = bigCSV // when big data is loaded it replaces old partial data
DS.drawViz() // redraw viz
})
// This portion of code also starts immediately, while big file is still loading
DS.data = smallCSV
DS.drawViz() // function which has all your d3 code and uses DS.data inside
})
The change from small to big could be done in such way that user would have no clue, that something happened in the background. Consider this example where quite big data file is loaded and you can feel the lag at start. This app could load much faster if data would be loaded in two rounds.
That's a lot of data; give us a sample of the first couple rows. What are you doing with it, and how much of it affects what's on screen? Where does the csv come from (i.e., local or web service)?
If it's a matter of downloading the resource, depending on how common and large the values are, you may be able to refactor them into 1-byte keys with definitions pre-loaded (hash maps are O(1) access). Also if you're using a large amount of numerical data, perhaps a different number space (i.e., something that uses less characters than base 10) can shave some bytes off the final size since the CSV values are strings.
It sounds like CSV may not be the way to go, though, especially if your CSV is mostly unique strings or certain numerical data that won't benefit from the above optimizations. If you're loading the CSV from a web service, you could change it so that certain chunks are returned via some passed key (or handle it smarter server-side). So you would load only what you need at any given time, and probably cache it.
Finally, you could schedule multiple async calls to load the the whole thing in small chunks similar to what was suggested by leakyMirror. Since it would probably make most sense to use a lot of chunks, you'd want to do it with code (instead of typing all of those callbacks), and use an async event scheduler. I know there's a popular async library (https://github.com/caolan/async) that has a bunch of ways to do this, or you can write your own callback scheduler.
This is a more general software architecture question for monotouch / xamarin enviorment.
Here's my problem:
The app I am currently building downloads around 30k of json objects (6mb) on app launch. Data is then locally stored, so all screens make local db (sqlite) calls.
Main issue is the time it takes to perform the download. At the moment, it's about 36s total on the simulator, split between following tasks:
download ~ 10 sec
data conversion (json to native obj) ~ 16 sec
db insert ~ 10 sec
This is far too long, especially when I compare it with similar apps that are on the appstore. I feel like I am not doing something right here, or not being aware of an alternative way? Here are the improvements I've implemented:
gzip response - currently 6mb, with gzip it goes down to ~ 1mb
installed ServiceStack.Text json serialiser, about 2.5x faster than json.net (but still 16 seconds is too long)
flattened json response, so I can execute db.InsertAll() on response array (without extra looping etc) for more robost db import (transactions)
one call per day limitation
Now, what I want to do is to display local data on app launch and initialise download / updater in the background. The only problem is the time it takes to download + newly installed apps won't have any local data to display...
My questions are:
is mvc 4 api -> json convert -> sqlite import a good approach for this type of app? If not - what are the alternatives?
I've been thinking of server returning actual sqlite file instead, in a zipped response, or returning zipped db commands... Or perhaps sqlite is not suitable for this type of app? Are there any better alternatives for local storage? .net serializer / xml etc?
Thanks for all your suggestions!
My suggestion would be to do your work asynchronously - and you're lucky since C# makes that very easy. E.g.
Start a background download;
Process (background) the object as they are downloaded;
Insert (background) objects as they are processed;
If applicable update the UI (from the main thread) for every X object you add;
Since the download is (mostly, see note) network bound then your CPU will be idle for many seconds. That's a waste of time considering your next step (processing) will be CPU bound. Even more since the step afterward will likely be I/O bound (database).
IOW it looks like a good idea to run all three tasks simultaneously while giving feedback of the progress (showing data or a progress bar) to the application user.
Note #1: A gzipped response will download faster. OTOH it will take some extra (CPU) time to uncompress locally. It should be faster but it's worth measuring both options (e.g. using Apple's Instrument tool, which works nicely with Xamarin.iOS).
Note #2: A zip file, as a response, will also need extra time (to uncompress). That's not something you want to do sequentially after the download (but you could uncompress it as it's downloaded).
I have a big tiff file which I don't want to load it into memory one time (That will cause my application takes so many memory), I want to load target part of it one time and show this part in screen.
I am trying to use LibTiff.net library for implement this, but I haven't found a suitable API for it.
Currently I can just load that by calloc a new array (very big!) then call ReadRGBAImageOriented function for load the RGBA value for it.
Do someone have experience on it?
Thanks