Does the IPFS JavaScript client support stream (chunked) uploads? - ipfs

Is there a way to upload a file to IPFS chunk-by-chunk using the js-ipfs-http-client package? To improve the concurrency model of my application, I’d like to avoid holding complete files in memory and instead just work with the chunks.
In the source, I can see a method called start is not implemented. Is this supposed to enable this behavior in the long run?
I asked this question on discuss.ipfs.io as well a little while back.

Related

Parsing Protocol-Buffers without .proto file

I am reverse-engineering an Android app as part of a security project. My first step is to discover the protocol exchanged between the app and server. I have found that the protocol being used is protocol buffers. Given the nature of protobuf, the original .proto file is needed to be able to unserialize the protobuf-encoded message. Since I don't have that, I used protod to disassemble the Android app and recover out any .proto files used.
I have the Android app in a form where it is a bunch of .smali and .so files. Running protod against the .so files yields only one .proto file -- google/protobuf/descriptor.proto.
I was under the impression that users of protocol buffers write their own .proto files, which might reference google/protobuf/descriptor.proto, but according to protod google/protobuf/descriptor.proto is the only protofile used by the app. Could this actually be possible and google/protobuf/descriptor.proto is enough for me to unserialize the messages between the app and server?
When you write a .proto file you can set an option optimize_for to LITE_RUNTIME (see here) and this will omit the descriptors from the generated code to reduce the size of the binary. I believe this is a common practice for mobile development since code size is a scarce resource in that environment. This may explain why you found only a single .proto file. It is unlikely that the app is actually transferring any data using descriptor.proto since that is mostly an implementation detail of the protocol buffers library.
If you cannot find any other descriptors, your best bet might be to try to interpret the protocol buffers without them. You can read about the protocol buffers wire format here. An easy way to get started would be to create a proto2 message type containing no fields and attempt to parse the data as that type. You can then use the reflection API to examine what are known as the "unknown fields" in the message and try to figure out what they represent.

Why can't I read a JSON file with a different program while my Processing sketch is still open?

I'm writing data to a JSON file in Processing with the saveJSONObject command. I would like to access that JSON file with another program (MAX/MSP) while my sketch is still open. The problem is, MAX is unable to read from the file while my sketch is running. Only after I close the sketch is MAX able to import data from my file.
Is Processing keeping that file open somehow while the sketch is running? Is there any way I can get around this problem?
It might be easier to stream your data straight to MaxMSP using the OSC protocol. On the Processing side, have a look at the oscP5 library and on the Max side at the udpreceive object.
You could send your JSON object as a string and unpack that in Max (maybe using the JavaScript support already present in Max), but it might be simpler to mimic the structure of your JSON object as the arguments of the OSC message object which you simply umpack in Max directly.
Probably, because I/O is usually buffered (notably for performance reasons, ans also because the hardware is doing I/O by blocks).
Try to flush the output channel, perhaps using PrintWriter::flush or something similar.
Details are implementation specific (and might be operating system specific).

HTML5: accessing large structured local data

Summary:
Are there good HTML5/javascript options for selectively reading chunks of data (let's say to be eventually converted to JSON) from a large local file?
Problem I am trying to solve:
Some existing program locally and outputs a ton of data. I want to provide a browser-based interactive viewer that will allow folks to browse through these results. I have control over how the data is written out. I can write it all out in one big file, but since it's quite large, I can't just read the whole thing in memory. Hence, I am looking for some kind of indexed or db-like access to this from my webapp.
Thoughts on solutions:
1. Brute-force: HTML5 FileReader API has a nice slice() method for random access. So I could write out some kind of an index in the beginning of the file, use it to look up positions of other stored objects, and read them whenever they're needed. I figured I'd ask if there are already javascript libraries that do something like this (or better) before trying to implement this ugly thing.
2. HTML5 local database. Essentially, I am looking for an analog of HTML5 openDatabase() call that would open (a read-only) connection to a database based on a user-specified local file. From what I understand, there's no way to specify a file with a pre-loaded database. Furthermore, even if there was such a hack, it's not clear whether the local file format would be the same across browsers. I've seen the phonegap solution that populates the browser local database from SQL statements. I can do that too, but the data I am talking about is quite large (5-10GB): it will take a while to load, and such duplication seems rather pointless.
HTML5 does not sound like the appropriate answer for your needs. HTML5's focus is on the client side, and based on your description you're asking a lot out of the browsers, most likely more than they can handle.
I would instead recommend you look at a server-based solution to deliver the desired goal/results to the client view, something like Splunk would be a good product to consider.

What are logging libraries for?

This may be a stupid question, as most of my programming consists of one-man scientific computing research prototypes and developing relatively low-level libraries. I've never programmed in the large in an enterprise environment before. I've always wondered, what are the main things that logging libraries make substantially easier than just using good old fashioned print statements or file output, simple programming logic and a few global variables to determine how verbosely things get logged? How do you know when a few print statements or some basic file output ain't gonna cut it and you need a real logging library?
Logging helps debug problems especially when you move to production and problems occur on people's machines you can't control. Best laid plans never survive contact with the enemy, and logging helps you track how that battle went when faced with real world data.
Off the shel logging libraries are easy to plug in and play in less than 5 minutes.
Log libraries allow for various levels of logging per statement (FATAL, ERROR, WARN, INFO, DEBUG, etc).
And you can turn up or down logging to get more of less information at runtime.
Highly threaded systems help sort out what thread was doing what. Log libraries can log information about threads, timestamps, that ordinary print statements can't.
Most allow you to turn on only portions of the logging to get more detail. So one system can log debug information, and another can log only fatal errors.
Logging libraries allow you to configure logging through an external file so it's easy to turn on or off in production without having to recompile, deploy, etc.
3rd party libraries usually log so you can control them just like the other portions of your system.
Most libraries allow you to log portions or all of your statements to one or many files based on criteria. So you can log to both the console AND a log file.
Log libraries allow you to rotate logs so it will keep several log files based on many different criteria. Say after the log gets 20MB rotate to another file, and keep 10 log files around so that log data is always 100MB.
Some log statements can be compiled in or out (language dependent).
Log libraries can be extended to add new features.
You'll want to start using a logging libraries when you start wanting some of these features. If you find yourself changing your program to get some of these features you might want to look into a good log library. They are easy to learn, setup, and use and ubiquitous.
There are used in environments where the requirements for logging may change, but the cost of changing or deploying a new executable are high. (Even when you have the source code, adding a one line logging change to a program can be infeasible because of internal bureaucracy.)
The logging libraries provide a framework that the program will use to emit a wide variety of messages. These can be described by source (e.g. the logger object it is first sent to, often corresponding to the class the event has occurred in), severity, etc.
During runtime the actual delivery of the messaages is controlled using an "easily" edited config file. For normal situations most messages may be obscured altogether. But if the situation changes, it is a simpler fix to enable more messages, without needing to deploy a new program.
The above describes the ideal logging framework as I understand the intention; in practice I have used them in Java and Python and in neither case have I found them worth the added complexity. :-(
They're for logging things.
Or more seriously, for saving you having to write it yourself, giving you flexible options on where logs are store (database, event log, text file, CSV, sent to a remote web service, delivered by pixies on a velvet cushion) and on what is logged at runtime, rather than having to redefine a global variable and then recompile.
If you're only writing for yourself then it's unlikely you need one, and it may introduce an external dependency you don't want, but once your libraries start to be used by others then having a logging framework in place may well help your users, and you, track down problems.
I know that a logging library is useful when I have more than one subsystem with "verbose logging," but where I only want to see that verbose data from one of them.
Certainly this can be achieved by having a global log level per subsystem, but for me it's easier to use a "system" of some sort for that.
I generally have a 2D logging environment too; "Info/Warning/Error" (etc) on one axis and "AI/UI/Simulation/Networking" (etc) on the other. With this I can specify the logging level that I care about seeing for each subsystem easily. It's not actually that complicated once it's in place, indeed it's a lot cleaner than having if my_logging_level == DEBUG then print("An error occurred"); Plus, the logging system can stuff file/line info into the messages, and then getting totally fancy you can redirect them to multiple targets pretty easily (file, TTY, debugger, network socket...).

What is a Smalltalk "image"?

What is a Smalltalk "image"? Is it like serializing a Smalltalk run-time?
Most popular programming systems separate program code (in the form of class definitions, functions or procedures) from program state (such as objects or other forms of application data). They load the program code when an application is started, and any previous application state has to be recreated explicitly from configuration files or other data sources. Any settings the application programmer doesn't explicitly save, you have to set back up whenever you restart.
Many Smalltalk systems, however, do not differentiate between application data (objects) and code (classes). In fact, classes are objects themselves. Therefore most Smalltalk systems store the entire application state (including both Class and non-Class objects) in an image file. The image can then be loaded by the Smalltalk virtual machine to restore a Smalltalk-like system to a previous state.
http://en.wikipedia.org/wiki/Smalltalk#Image-based_persistence
The Smalltalk image is a very interesting beast. Look at it as a kind of immortality. Many current Smalltalk systems, Pharo, Squeak, VisualWorks among them, share a common ancestor, that is, a Smalltalk image from Xerox PARC. This common ancestor however is not some remote thing, but actually still alive in those modern systems. The modern variants were produced by sending messages to the objects in that image. Some of those messages actually morphed the current objects. Classes are full-blown objects, and creating new classes is done by sending messages to class objects. Some of the objects in a Smalltalk image may date back to 1972, when the first Smalltalk image bootstrapped! Smalltalk images never die, they just fade into something potentially fundamentally different. You should view your application building as not fundamentally different from creating a new Smalltalk version.
When the smalltalk VM starts, it loads a saved state of objects (yes: including open file streams, windows, threads and more) from the "image" into its memory and resumes execution where it left when the image was saved.
At any time during your work, you can "save an image" (aka: a snapshot of the current overall state) into an image file. You can keep multiple images on your disk. Useful if you work on different projects.
Images are often (but not in all smalltalk systems) portable across architectures; for example, a squeak image can be loaded into bot a windows and a mac (and even an android) squeak VM. Images are not portable across dialects, and sometimes not across versions within a dialect.
Images usually contain everything - even the debugger, compiler, editors, browsers etc. However, for deployment, it is sometimes useful to "strip" (i.e. remove unused stuff) from an image - either to hide secrets (;-) or to make it smaller (for embedded or mobile devices).
Most Smalltalks cannot live without an image, with the exception of Smalltalk/X and (I think) S#-Smalltalk (but I am on thin ice here...)
To save and transport source code, images are not useful - use either fileout in standard format or in xml or in any other transport format (there are many).
Images are also not useful for marshalling/unmarshalling; use xml, binarystorage, databases, glorb or any other serialization method for that.
It's serialising everything in the whole system, including all development work and all user data. Everything apart from the kernel of the runtime environment.
Smalltalk, like Java, runs on a Virtual Machine running symbolic bytecode, and it contains low-level things like the garbage collector. This makes Smalltalk very portable, and also very write-once-run-anywhere.
Unsurprisingly, this was the inspiration for Java. So the Smalltalk VM (StVM) is the equivalent of the Java Runtime Environment.
In Smalltalk, everything else is stored in RAM. The codebase, which is dynamically compiled on-the-fly for the StVM. All the object data that you've built up by running your vertical and horizontal end-user apps. All the customisation you've done to the windowing environment and its appearance. All the new code you've written. A song you have loaded onto the VM to play in a music player. Any other data, code or objects you're using or have loaded in.
It's all live in the PC's memory.
Periodically, you may want to save the current-state-play to disk. When you do that, you freeze the Smalltalk VM momentarily, and it copies everything into a single disk-file. That disk-file is called the image file, and by default it will have a .image suffix in most distributions on PCs (whether they are running Linux, MacOS, Windows or RiscOS).
It's a like the way you save your work-in-progress when you are in a word-processor or a spreadsheet on a typical PC. Except that this save includes the latest version of the spreadsheet code that the spreadsheet app itself is made out of.
The Smalltalk system does have other ways of securing your data. If you develop any software, or alter any of the codebase that the Smalltalk system is written in, it logs every change to disk in real-time.
You have the option of writing code, or loading an app, that will can save your source-code and it's associated data structures, to distributed source code repositories, or to repositories on your local disc. Or to relational databases. Or to object databases or the newly-fashionable NoSQL databases.
Most pre-written apps backup the data to disk(s) or database(s) on-the-fly.
The image is a save of the entire Smalltalk system, (apart from the Virtual Machine. The Virtual Machine is equivalent to the Java Runtime Environment. Everything else is stored in the image.
Write a new File System to access the underlying OS's discs? That's in the image. (and all the changes have also been logged to disk automatically by the Smalltalk system).
Enter a whole bunch of data into your Smalltalk image-based object database? That's in the image.
Want to do a factory-reset to your Smalltalk system? Simply go back to using the image file you received when you first installed Smalltalk. Want to save the image every hour on the hour, and then restore back to 4 hours ago? Just load the image file from four hours ago.
The image is a copy of everything that the Smalltalk system has in memory. Except for the small, unchanging, vital proportion of the system which is the Virtual Machine.
I recommend you read Pharo By Example. To quote from its first chapter,
"The current system image is a snapshot of a running Pharo system,
frozen in time. It consists of two files: an .image file, which contains the
state of all of the objects in the system (including classes and methods,
since they are objects too), and a .changes file, which contains a log of all
of the changes to the source code of the system. In Figure 1.1, these files
are called pharo.image and pharo.changes."
HTH
http://book.seaside.st/book/getting-started/pharo-squeak/what-is-image
All Smalltalk objects live in
something called an image. An image is
a snapshot of memory containing all
the objects at a given point in time.
Second hit on google.
Put simply, a Smalltalk image is an image of the Smalltalk environment which has been saved at a given point in time. When this image is reloaded into the Smalltalk runtime system, everything is as it was at the time the image was saved.
Images saved by one Smalltalk system cannot, in general, be loaded by a different Smalltalk system.
I find image-based development incredibly empowering. If I get interrupted I can save the image, and when I get back to it I'm right back where I was. Debuggers that were open are still open, waiting to continue. There's very little "got to figure out how to get back where I was" - it's more "OK, let's continue...".
Share and enjoy.
In almost every other language (apart from ABAP as far as I have been told by some senior SAP developers), you have a clear separation:
Code you are working on which defines logic
State of the program you are running
database, generally things you need as input for your code
In Smalltalk, all of this can be - note can - in the image.
In theory, you can deploy a Smalltalk application in a loaded image which brings all the data and logic and runs the application at startup.
In practice, as far as my experience goes, you tend not to do that for reasons.
If you stay inside the image, you have everything available at your disposal, for good or ill.
Classes are objects, methods are objects, so you can actually do things like add a
self halt
in a method, run some code which calls this method, change the method while it is being executed, recompile it and have the code continue.
You can also do wonderful things like passing method names as string to a method and then performing the argument without having this visible in code anywhere.
Both things are very good for learning and trying out. Not so good for production code or maintaining producion code.
The thing I was struggling with in the beginning was that for instance for UI creation, you create a window in the Smalltalk image which is then created "a second time" by the OS, with window handle and everything.
Of course, you can save the window with the Smalltalk image, it will also open up again (normally), but what happens internally is that there is a list of windows (i.e. all the UI components) which have been saved with the image in their Smalltalk state.
During Image Startup, there is a process that iterates over this list and asks the OS to recreate all of them.
In theory, you can do this for everything the OS offers:
file handles, resource handles, ports and so on
In practice, you probably dont want to do that.
The companies I worked at had nice beginner tutorials of code to run before saving your image in order not to get into trouble when restarting the next day.
Ideally, you could combine a Smalltalk image concept with peristence and just store all your objects in a real database.
I do not have the overview whether any Smalltalk dialect has done this, however.