Apache Arrow or feather plugin? - pyarrow

I'd like to use local feather-files as a sources in Intake. Is the plugin for feather/arrow not yet existing or am I missing something?

You are right, there is not currently a feather or arrow (i.e., framed buffers) driver for Intake, as far as I know. Neither is supported by Dask either.
Given that pandas does explicitly support feather, it would be easy to build a driver for Intake, supporting multiple remote files, and these could even be loaded in parallel with Dask, without adding code to dask.
However, first I'd like to ask: why not parquet? That seems to be the standard format, at least partly because its reach goes well beyond python/arrow. That format is supported by Intake and any other tabular data engine.

Related

Hide my primefaces version from the inspector [duplicate]

I'm using primefaces and primefaces-extensions in my application. For each and every resources like .css and .js files there's also an "ln" and "v" query parameters in the GET request for that resource, like below:
primefaces-extensions.js?ln=primefaces-extension&v=6.1
validation.js?ln=primefaces&v=6.1
As a security concern, since these parameters shows the exact version of the framework I'm using, how can I hide them?
Hiding the 'ln' is kind of useless since with a very small amount of effort, you can get the same information from the javascript files and the source of the page too ('PF() is all over the place)
The 'v' however is a slightly different issue. If you use the non-modified PF source, hiding it is sort of useless too since with very little effort (creating a hash) the possible hackers can download your sources, create a hash and compare the resulting hashes with a dictionary they can easily create of existing PrimeFaces sources and then know which version you use. So the only thing to do here is to modify the source to have it not turn up 'known or comparable' hashes by making some slight modifications (adding whitespace should already help).
But if you really want the version not to be show, you can download the PrimeFaces sources and replace the version info with some ofuscated number and build that custom version. Keep in mind that if you don't make any changes in the sources, the dictionary lookups mentioned above are still working. So it is only some minor inconvenince for hackers.

Choice of database for a dictionary which can be edited as plain text

I am creating a dictionary app but I am not sure what kind of database to use !
My Requirements :
Support for Unicode characters.
Reasonably Fast search.
In case of an installed version of the app, the database file can be edited using a text editor (such as notepad).
I want to implement a web version of the app (WHICH USES THE SAME DATABASE FILE USED BY THE INSTALLED VERSION) and viewers can add/modify entries so it requires concurrent access via network.
Easy to parse in any programming language.
Expecting a maximum file size of 100 MB (May not reach anywhere near that but just to make it future proof.)
So with my limited knowledge I ended up with 3 options : CSV, XML or JSON. I prefer CSV for easy editing.
I understand that they are not as good as RDBMS but for the specific scenario is it possible to use any of these or how good can they perform?
Any alternate ideas are also welcome !
Thanks in advance.
I think that using a single file as a "database" is not such a great choice if you think/plan to extend your application further.
My recommendation is SQLite (https://www.sqlite.org/about.html). As it is currently promoted on their website it's suitable on almost all of your requirements except that of being edited with a text editor. But I think that are easy enough solutions for content management as well, like SqliteBrowser http://sqlitebrowser.org/

How to find pink highlighted json errors in sublime text

I'm working with a large json file.
This json has been parsed by myself using Python, and (as a result) there are some json validation errors at different points in the file. I want to identify these errors in order to improve my Python parser.
Sublime text (2) helpfully highlights in pink formatting errors in the json, however working my way through 70,000,000 lines of json to find these errors is somewhat challenging.
Is there any way to skip to pink highlighted errors in the json?
(Note: the json file is sufficiently large that trying to use an online validator for example is not possible)
Thanks!
This can be done in a fancy way using a plugin, but for your purposes probably the best way is to just enter a command into the console. Open your JSON file with errors in it, then open the console with Ctrl`. Paste in the following code and hit Enter:
view.show_at_center(view.find_by_selector("invalid.illegal")[0])
and the view will scroll to show the first error in the file. Fix that error, click back on the console entry line, hit the up arrow to bring back the command you just ran, and hit Enter again, and it should scroll to the next error, and so on. When there are no more errors, you'll get IndexError: list index out of range printed to the console, and the view won't scroll any more.
While this will work in both Sublime Text 2 and 3, I strongly urge you to upgrade to ST3 if at all possible. ST2 has been shelved and deprecated, and there will be no more bug fixes released. Development is now focused solely on ST3 (as well as being in the planning stages for ST4!). "I don't know of any good reason to not use Sublime Text 3" - Will Bond, ST core developer.
There are a ton of new features and bug fixes in the new version, even if you're just using the public beta. (BTW, don't let the word "beta" fool you - the program is rock solid, and has been for years.) If you want more cutting-edge features, and are a registered user (which you should be if you are using the program long-term or for commercial purposes), you can download the dev builds which are updated more frequently, but run the slight chance of having an undetected bug or two.
One of the major advantages of ST3 is that it now supports a new, YAML-based sublime-syntax highlighting engine, which allows for much greater flexibility than the old .tmLanguage highlighting files (which are still supported). Related to that, the syntax files have all been open-sourced and development is proceeding very rapidly on them, even though it's been a few months since the last build was released.
Probably the biggest reason to upgrade is the plugin community. The internal Python API has been updated to Python 3 (3.3.6, to be precise), which had the side effect of making many old plugins incompatible. Except in a few rare cases, most plugins now support ST3, and many are dropping ST2 support by the wayside as it becomes too difficult to maintain two codebases, as well as trying to develop with the much more limited API ST2 provides. So, unless you absolutely depend on an old ST2-only plugin that can't be ported, upgrading is definitely the best path to take.

HTML5: accessing large structured local data

Summary:
Are there good HTML5/javascript options for selectively reading chunks of data (let's say to be eventually converted to JSON) from a large local file?
Problem I am trying to solve:
Some existing program locally and outputs a ton of data. I want to provide a browser-based interactive viewer that will allow folks to browse through these results. I have control over how the data is written out. I can write it all out in one big file, but since it's quite large, I can't just read the whole thing in memory. Hence, I am looking for some kind of indexed or db-like access to this from my webapp.
Thoughts on solutions:
1. Brute-force: HTML5 FileReader API has a nice slice() method for random access. So I could write out some kind of an index in the beginning of the file, use it to look up positions of other stored objects, and read them whenever they're needed. I figured I'd ask if there are already javascript libraries that do something like this (or better) before trying to implement this ugly thing.
2. HTML5 local database. Essentially, I am looking for an analog of HTML5 openDatabase() call that would open (a read-only) connection to a database based on a user-specified local file. From what I understand, there's no way to specify a file with a pre-loaded database. Furthermore, even if there was such a hack, it's not clear whether the local file format would be the same across browsers. I've seen the phonegap solution that populates the browser local database from SQL statements. I can do that too, but the data I am talking about is quite large (5-10GB): it will take a while to load, and such duplication seems rather pointless.
HTML5 does not sound like the appropriate answer for your needs. HTML5's focus is on the client side, and based on your description you're asking a lot out of the browsers, most likely more than they can handle.
I would instead recommend you look at a server-based solution to deliver the desired goal/results to the client view, something like Splunk would be a good product to consider.

How can I analyze a closed format (e.g. doc or vce)?

I want to study the .vce format. It's a binary format and it seems more complicated than a simple object serialization. Does it exist any tool or technique to analyze a binary format?
You might need to "Reverse-Code-Engineer" a programm using this file format (http://www.openrce.org/). Tools used for this kind of analysis are: brain, disassembler (IDA Pro for example) and Debugger (OllyDBG for example). But beware - the way for successfull reverse engineering a file format is veeeeeerrry hard.
And reversing an application might be illegal depending on where you live!
You'll have to get a library that can read the format (or create one yourself).
Here is some of the microsoft office binary format specifications
I believe it would only be possible through some nasty reversed-engineering. It would be very useful to have access to application that uses mentioned format, so that you can generate few simple files and compare them in hex editor. You cannot get far with this method, but you might be able to figure out the header.
It would also be useful to study some binary format mechanisms, such as encryption and compression. If you're talking about Visual CertExam file format, than it is likely that useful data will be strongly encrypted.
My 2 cents:
Start by reversing the application reading the files themselves. Particularly android applications are helpful, as the resulting java source is easier to read (you might want to try A+ vce reader for android for example). This program indicates that vce uses/embeds sqlite in the file (in line with what is hinted here: Reverse Engineer a File Format).
Where to go from here? You might want to explore sqlite file carving tools to see if there might be a way to programatically identify the patterns in the file. Good luck!