Pig : result of json loader empty

Pig : result of json loader empty - json

I'm using cdh5 quickstart vm and I have a file like this(not full here):
{"user_id": "kim95",
"type": "Book",
"title": "Modern Database Systems: The Object Model, Interoperability, and
Beyond.",
"year": "1995",
"publisher": "ACM Press and Addison-Wesley",
"authors": {},
"source": "DBLP"
}
{"user_id": "marshallo79",
"type": "Book",
"title": "Inequalities: Theory of Majorization and Its Application.",
"year": "1979",
"publisher": "Academic Press",
"authors": {("Albert W. Marshall"), ("Ingram Olkin")},
"source": "DBLP"
}
and I used this script:
books = load 'data/book-seded.json'
using JsonLoader('t1:tuple(user_id:
chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,source:chararray,authors:bag{T:tuple(author:chararray)})');
STORE books INTO 'book-no-seded.tsv';
the script works , but the generated file is empty, do you have any idea?

Finally , only this schema worked : If I add or remove a space different from this configuration then i gonna have an error( i also added "name" for tuples and specified "null" when it was empty, and changed the order between authors and source, but even without this congiguration it will still be wrong)
{"user_id": "kim95", "type": "Book","title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": [{"name":null"}], "source": "DBLP"}
{"user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press", "authors": [{"name":"Albert W. Marshall"},{"name":"Ingram Olkin"}], "source": "DBLP"}
And the working script is this one :
books = load 'data/book-seded-workings-reduced.json'
using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');
STORE books INTO 'book-table.csv'; //whether .tsv or .csv

try STORE books INTO 'book-no-seded.tsv' using USING org.apache.pig.piggybank.storage.JsonStorage();

You need to bu sure that the LOAD schema is good. You can try to do a DUMP books to quick check.
We had to be careful with the input data and the schema when we used the Pig JsonLoader for this tutorial http://gethue.com/hadoop-tutorials-ii-1-prepare-the-data-for-analysis/.

Related

Fiware Upload Image

I want to know how to use NSGI-LD to upload an image even though these static files are not stored in Orion Context Broker or Mongo. I want to know if there is a way to configure the NSGI-LD to forward the images to AWS S3 Buck or another location?

As you correctly identified, binary files are not a good candidate for context data, and should not be held directly within a context broker. The usual paradigm would be as follows:
Imagine you have a number plate reader library linked to Kurento and wish to store the images of vehicles as they pass. In this case the event from the media stream should cause two separate actions:
Upload the raw image to a storage server
Upsert the context data to the context broker including an attribute holding the URI of the stored image.
Doing things this way means you can confirm that the image is safely stored, and then send the following:
{
"vehicle_registration_number": {
"type": "Property",
"value": "X123RPD"
},
"image_download": {
"type": "Property",
"value": "http://example.com/url/to/image"
}
}
The alternative would be to simply include some link back to the source file somehow as metadata:
{
"vehicle_registration_number": {
"type": "Property",
"value": "X123RPD",
"origin": {
"type": "Property",
"value": "file://localimage"
}
}
}
Then if you have a registration on vehicle_registration_number which somehow links back to the server with the original file, it could upload the image after the context broker has been updated (and then do another upsert)
Option one is simpler. Option two would make more sense if the registration is narrower. For example, only upload images of VRNs for cars whose speed attribute is greater than 70 km/h.
Ontologically you could say that Device has a relationship to a Photograph which would mean that Device could have an additional latestRecord attribute:
{
"latestRecord": {
"type": "Relationship",
"object": "urn:ngsi-ld:CatalogueRecordDCAT-AP:0001"
},
}
And and create a separate entity holding the details of the Photograph itself using a standard data model such as CatalogueRecordDCAT-AP which is defined here. Attributes such as source and sourceMetadata help define the location of the raw file.
{
"id": "urn:ngsi-ld:CatalogueRecordDCAT-AP:0001",
"type": "CatalogueRecordDCAT-AP",
"dateCreated": "2020-11-02T21:25:54Z",
"dateModified": "2021-07-02T18:37:55Z",
"description": "Speeding Ticket",
"dataProvider": "European open data portal",
"location": {
"type": "Point",
"coordinates": [
36.633152,
-85.183315
]
},
"address": {
"streetAddress": "2, rue Mercier",
"addressLocality": "Luxembourg",
"addressRegion": "Luxembourg",
"addressCountry": "Luxembourg",
"postalCode": "2985",
"postOfficeBoxNumber": ""
},
"areaServed": "European Union and beyond",
"primaryTopic": "Public administration",
"modificationDate": "2021-07-02T18:37:55Z",
"applicationProfile": "DCAT Application profile for data portals in Europe",
"changeType": "First version",
"source": "http://example.com/url/to/image"
"sourceMetadata": {"type" :"jpeg", "height" : 100, "width": 100},
"#context": [
"https://smartdatamodels.org/context.jsonld"
]
}

Import json array of objects to Elastic Search App

I have following sample json data with structure
[{
"URL": "http://www.just-eat.co.uk/restaurants-bluebreeze-le3/menu",
"_id": "55f14313c7447c3da7052518",
"address": "56 Bonney Road",
"address line 2": "Leicester",
"name": "Blue Breeze Fish Bar",
"outcode": "LE3",
"postcode": "9NG",
"rating": 5.5,
"type_of_food": "Fish \u0026 Chips"
},
{
"URL": "http://www.just-eat.co.uk/restaurants-bluebreeze-le3/menu",
"_id": "55f14313c7447c3da7052519",
"address": "56 Bonney Road",
"address line 2": "Leicester",
"name": "Blue Breeze Fish Bar",
"outcode": "LE3",
"postcode": "9NG",
"rating": 5.5,
"type_of_food": "Fish \u0026 Chips"
},
...... and so on thousands of objects
]
So i am trying to import this json data to Elastic Search App cluster i created... But i get error in all documents as follows
Indexing Summary
Something went wrong. Please address the errors and try again.
2548 documents with errors...
they didn't give any error details... just said listed all my objects with "semicolon highlighted"...
Can anyone tell me why am i not able to upload the data?

How to import an image with the local path listed in a json file?

I am having a problem setting up a an image in json file with local path.
I have a folder named Images and a Data json file. I want to json file to hold my pictures in json locally.
{
"name": "John",
"id": "001",
"price": "$995",
"image": "./Images/picture1.jpg"
},
{
"name": "John",
"id": "001",
"price": "$995",
"image": "./Images/picture2.jpg"
},
{
"name": "John",
"id": "001",
"price": "$995",
"image": "./Images/picture1.jpg"
},
{
"name": "John",
"id": "001",
"price": "$995",
"image": "./Images/picture2.jpg"
},
Please tell my why this does not work.

JSON files only contain strings without any linking. So "./Images/picture2.jpg" is just a string without any meaning for your editor.
Some apps or editors might present some eye candy features, like maybe what you are looking for. But this is totally an option of the editor and not something of JSON itself.
JSON are plain text files that store key, values.
I hope it helps

Access files located in the WIP folder of BIM360 Design (old C4R)

We have a requirement from one of our clients to access the project files that are stored in the BIM360 Design (old Collaboration for Revit - C4R). I can not find any information in the developer pages of the Forge APIs that points to this location. I assume such an API is not part of Forge, but we were wondering if there is any other API that can provide those files.
The exact requirements are:
Constantly monitor for changes on the files located there.
When changes occur, retrieve and backup all those files to a local machine.
The question is, how, if possible, can we access the project files located at the BIM360 Design cloud?
UPDATE (10/04/2018)
We have found these commands - specifically PublishModel and GetPublishModelJob. This does something, we can at the very least prompt the publication on demand, without the need for Revit. It is not clear to me when the items:autodesk.bim360:C4RModel pseudo-file is created. On top of that, the API does not appear to be able to receive a prefered output folder, which makes it really cumbersome to work for the intended purpose of backing up the information inside BIM360 Design.
UPDATE (25/04/2018)
We have tried using both commands (PublishJob and GetPublishModelJob). We have impersonated a Project Admin (via the x-user-id) but Forge is returning a 401 error (which is not even documented). The following (with a redacted documentID) is what we have tried:
{
"jsonapi": {
"version": "1.0"
},
"data": {
"type": "commands",
"attributes": {
"extension": {
"type": "commands:autodesk.bim360:C4RModelGetPublishJob",
"version": "1.0.0"
}
},
"relationships": {
"resources": {
"data": [ { "type": "items", "id": "<document_id>" } ]
}
}
}
}
And this is Forge's response:
{
"jsonapi": {
"version": "1.0"
},
"errors": [
{
"id": "a4547153-1fd4-4710-b0d1-a7184d9e7e22",
"status": "401",
"code": "C4R",
"detail": "Failed to get publish model job"
}
]
}
Any thoughts?

After discussing with #tfrascaroli in Forge Help channel, we found the root cause of this error is caused by the incorrect value of x-user-id, so he didn't have the right permission to push the latest version of the C4R model to BIM360 docs.
{
"jsonapi": {
"version": "1.0"
},
"errors": [
{
"id": "a4547153-1fd4-4710-b0d1-a7184d9e7e22",
"status": "401",
"code": "C4R",
"detail": "Failed to get publish model job"
}
]
}
The x-user-id is not a GUID and not the id we saw in the response of GET users or GET users/:user_id, it should be the value of the uid. After replacing the x-user-id value by the uid, the error doesn't show up again.
[
{
"id": "a75e8769-621e-40b6-a524-0cffdd2f784e", //!<<< We didn't use it for `x-user-id`
"account_id": "9dbb160e-b904-458b-bc5c-ed184687592d",
"status": "active",
"role": "account_admin",
"company_id": "28e4e819-8ab2-432c-b3fb-3a94b53a91cd",
"company_name": "Autodesk",
"last_sign_in": "2016-04-05T07:27:20.858Z",
"email": "john.smith#mail.com",
"name": "John Smith",
"nickname": "Johnny",
"first_name": "John",
"last_name": "Smith",
"uid": "L9EBJKCGCXBB", //!<<<<< Here is the value for the x-user-id
"image_url": "http://static-dc.autodesk.net/etc/designs/v201412151200/autodesk/adsk-design/images/autodesk_header_logo_140x23.png",
"address_line_1": "The Fifth Avenue",
"address_line_2": "#301",
"city": "New York",
"postal_code": "10011",
"state_or_province": "New York",
"country": "United States",
"phone": "(634)329-2353",
"company": "Autodesk",
"job_title": "Software Developer",
"industry": "IT",
"about_me": "Nothing here",
"created_at": "2015-06-26T14:47:39.458Z",
"updated_at": "2016-04-07T07:15:29.261Z"
}
]

Do you have an access right to the workshared Revit file? Publish command is to publish workshared central model in the cloud to Docs. To use it, you need an access to Revit model in the central in the cloud. Forge Publish command does the same thing as publish command in Revit desktop. You need the same access right. To use cloud workshared feature, first you need to have Design license assigned to you, then you also need to be a member a Revit project. Being invited to Docs is not enough.
(As C4R/Design was merged to Docs recently, this C4R specific license part was intentionally kept the same as previous licensing. We also have Team for earlier versions. It makes it a bit complicated. I hope it will be easier as we move forward in future.)

Convert JSON into GeoJSON compatible with Mapbox Studio

I'm attempting to use this JSON from a SpaceX API to display the locations of all the SpaceX launch sites on a Mapbox map using Mapbox-GL. When I attempt to load this into a dataset in Mapbox Studio I get an error that says: Input failed. "type" member required on line 1.
I assume this is due to the way that the JSON is structured i.e. it doesn't have GeoJSON properties.
How can I easily adapt this JSON and convert it into GeoJSON that works with Mapbox?

The JSON file you provided isn't a valid GeoJSON. You can read more about the specification of the format here: http://geojson.org/
You would want a small script to transform the SpaceX JSON file into valid GeoJSON. Currently a single record looks like this:
{
"id": "ccafs_slc_40",
"full_name": "Cape Canaveral Air Force Station Space Launch Complex 40",
"status": "active",
"location": {
"name": "Cape Canaveral",
"region": "Florida",
"latitude": 28.5618571,
"longitude": -80.577366
},
"vehicles_launched": [
"Falcon 9"
],
"details": "SpaceX primary Falcon 9 launch pad, where all east coast Falcon 9s launched prior to the AMOS-6 anomaly. Initially used to launch Titan rockets for Lockheed Martin. Back online since CRS-13 on 2017-12-15."
}
What you probably want is a Feature with a geometry type of Point like this:
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [-80.577366, 28.5618571]
},
"properties": {
"id": "ccafs_slc_40",
"full_name": "Cape Canaveral Air Force Station Space Launch Complex 40",
"status": "active",
"location": {
"name": "Cape Canaveral",
"region": "Florida"
},
"vehicles_launched": ["Falcon 9"],
"details":
"SpaceX primary Falcon 9 launch pad, where all east coast Falcon 9s launched prior to the AMOS-6 anomaly. Initially used to launch Titan rockets for Lockheed Martin. Back online since CRS-13 on 2017-12-15."
}
}
After you transformed each record of your original array, you need to wrap them in a FeatureCollection in order for mapbox-gl to render it:
{
"type": "FeatureCollection",
"features": [
//...
]
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Pig : result of json loader empty - json

try STORE books INTO 'book-no-seded.tsv' using USING org.apache.pig.piggybank.storage.JsonStorage();

You need to bu sure that the LOAD schema is good. You can try to do a DUMP books to quick check. We had to be careful with the input data and the schema when we used the Pig JsonLoader for this tutorial http://gethue.com/hadoop-tutorials-ii-1-prepare-the-data-for-analysis/.

Related

Fiware Upload Image

Import json array of objects to Elastic Search App

How to import an image with the local path listed in a json file?

Access files located in the WIP folder of BIM360 Design (old C4R)

Convert JSON into GeoJSON compatible with Mapbox Studio

Categories

Resources