Bringing functionality to data - how is this called? - terminology

In many cases we get the data from a database and bring it to our application (data to functionality). However in some designs we do it the other way around, which is favourable in Big Data settings. Examples would be Hadoop MapReduce or Apache Spark.
How do we call the approach to bring fucntionality to data and not the other way around? I remember something like "data location sensitive" or "data location awareness" or some thing like that but can not find the correct term in the internet anymore.

Is it called data locality?.
So suppose if you have a text file of size 1 GB and you have written a map reduce code to convert all text in that file to upper case then first the file will be broken into chunks and the logic to cover the text to Upper case would be available to each data node. Now the tasktracker on each node would only run the map reduce code the data block/s present on that local node. This is known as data locality.

Related

Is there a way to extract AutoCAD Drawings and Pictures from an OLEObject field in a table?

Just for some background, I'm a mechanical engineer at a company and the older folks here created a database in Access 2003 which basically takes an AutoCAD Drawing or a Picture OLE and plops it in a nicely framed report with a bunch of other information. I've been making some modifications to that database, one of which is to store all OLEObjects as links to actual files in our shared network. Every new file that has been added to the database for the past week or so has been linked and the guys seem to have gotten the hang of it.
My problem at this point in time is to try and retrieve all the Objects that are embedded in the tables. I've tried Lebans OLE to Disk but that doesn't seem to work with AutoCAD Drawings (which are .dwg and .dxf files) nor does it work with "Picture".
I know this is a quite controversial topic seeing as how I'm not providing any code to start with, but I think this is too complicated for me to even begin doing and I'm in over my head. Extracting the OLEObjects by hand isn't feasible since there's over 8000 of these spread through several databases. Is there any way to automate the extraction via code?
Thanks in advance,
Rafael.
If you access the Object property of the Bound Object Frame control, for an AutoCAD Drawing this should return the AutoCAD Document Object. You can then invoke the SaveAs method of the Document object to save the file to a known location.
For example, something like:
With Me.MyBoundObjectFrame.Object
.SaveAs "Drive:\YourPath\YourDrawing.dwg"
End With
Where MyBoundObjectFrame is the name of your Bound Object Frame control.
This works successfully in my limited testing.

Bulk uploading data to Parse.com

I have about 10GB worth of data that I would like to import to Parse. the data is currently in JSON format which is great for importing data using the parse importer.
However I have no unique identifier to these objects. Of course they have unique properties e.g. a url, the ids pointing to specific objects need to be constant.
What would be the best way to edit the large amount of data -in bulk- on their server without running into request issues (as I'm currently on the free pricing model) and without taking too much time to alter the data.
Option 1
Import the data once and export the data in JSON with the newly assigned objectIds. Then edit them locally matching the url then replace the class with the new edited data. Any new editions will receive a new objectId by Parse.
How much downtime between import and export will there be as I would need to delete the class and recreate it? Are there any other concerns with this methodology?
Option 2
Query for the URL or array of URLs and then edit the data then re-save. This means the data will persist indefinitely but as the edit will consist of hundreds of thousands of objects will this most likely over run the request limit?
Option 3
Is there a better option I am missing?
The best option is to upload to Parse then edit through their normal channels. Using various hacks it is possible to stay below the 30pings/second offered as part of the free tier. You can iterate over the data using background jobs (written in Javascript) -- you may need to slow down your processing so you don't hit limits. The super hacky way is to download from the table to a client (iOS/Android) app and then push back up to Parse. If you do this in batch (not a synchronous for loop, by the way), then the latency alone will keep you under the 30ping/sec limit.
I'm not sure why you're worried about downtime. If the data isn't already uploaded to Parse, can't you upload it, pull it down and edit it, and re-upload it -- taking as long as you'd like? Do this in a separate table from any you are using in production, and you should be just fine.

Rails model concept with multiple sources

I have a document management system. I have a data set that can run through a program (another kind of file) which can be turned into images, a different kind of data, or even a new data set. I have to keep track of this "lineage".
If I was thinking in Mysql terms directly, I would add a "source" column and link each file to the file that it was created from.
I can't think of a logical way to do this within the confines of Ruby on Rails. Any ideas/hints/tips?
What you are looking for is GraphDBs. You can try neo4j www.neo4j.org/‎

Dynamic JSON file vs API

I am designing a system with 30,000 objects or so and can't decide between the two: either have a JSON file pre computed for each one and get data by pointing to URL of the file (I think Twitter does something similar) or have a PHP/Perl/whatever else script that will produce JSON object on the fly when requested, from let's say database, and send it back. Is one more suited for than another? I guess if it takes a long time to generate the JSON data it is better to have already done JSON files. What if generating is as quick as accessing a database? Although I suppose one has a dedicated table in the database specifically for that. Data doesn't change very often so updating is not a constant thing. In that respect the data is static for all intense and purposes.
Anyways, any thought would be much appreciated!
Alex
You might want to try MongoDB which retrieves the objects as JSON and is highly scalable and easy to setup.

Storing Base64 PNG in MySQL

I am using Sencha Touch to capture data from a user on an iPad. This includes a standard form (name, email, etc.) as well as the customer's signature (see the plugin here).
Essentially, the plugin takes the coordinates from the user's signature and gives me back Base64 PNG data.
Once I have the signature data, I want to store it. My two questions are:
Should I store the Base64 data in my
(MySQL) database along with the rest
of the user's information, or should
I create a static file and link as
necessary?
If storing in the
database is the way to go, what data
type should I use?
There's no need to base64 encode the image. MySQL's perfectly capable of storing binary data. Just make sure you use a 'blob' field type, and not 'text'. text fields are subject to character set translation, which could trash your .png data. blob fields are not translated.
As well, base64 encoding increases the size of text by around 35%, so you'd be wasting a large chunk of space for no benefit whatsoever.
However, it's generally a bad idea to store images in the database. You do have the advantage of the image being "right there" always, but makes for absolutely huge dumps at backup time and all kinds of fun trying to get the image out and displayed in your app/web page.
it's invariably better to store it externally in a file named after the record's primary key for ease of access/verfication.
Just save files in BLOB field. Such PNG file shouldn't be larger than 1KB if you turn some optimizations (grayscale or B/W).
Storing files outside DB seems easy but there are things to consider:
backup,
additional replication if multi-server
security - access rights to files dir, but also to files,
no transactions - e.g. DB insert ok but file write fails,
need to distribute files within multiple directories to avoid large dir listings (depends on filesystem capabilities)
Blob will store Base64. It will get you what you need. Storing it in the database gives you built in relational capabilities that you would have to code yourself if you stored it in a static file. Hope this helps. Good luck sir.
Edit: mark's right about binary v. base 64
Set your field as Blob data type, it stores perfectly base64EncodedString