I saw the cubism graphs and they are simply amazing. I have a big JSON file with 1000 entries that have a timestamp and a value (integer). Can Cubism graph those or not?! I can't seem to find documentation on this...
Cubism is generally intended for realtime data, but you can implement a metric that simply returns static values from a JSON file. Typically you do this by using context.metric. See the stocks demo in the Cubism intro talk for an example.
Related
I'm trying to get all the documents in the "businesses" collection from Firebase together with their sub-collections.
The problem is when I do the query to Firebase like this :
Stream<List<Business>> getBusinesses() {
return _db.collection('businesses').snapshots().map((snapshot) => snapshot
.docs
.map((document) => Business.fromJson(document.data()))
.toList());
}
, the sub-collections aren't passed with the JSON object document.data(), so in my code, the Business object isn't fully completed, which means there are empty fields (Appointments, ServiceProviders,
Services), instead of getting the data from the sub-collections.
So hopefully I've explained the problem well, my question is how can I fetch all the document data including its sub-collections, and parse it to a Business Object?
Thanks.
What seems to be "the problem" is actually the point of Firestore: Keeping documents shallow so you can only get the data you need. It's then up to you to structure your data the way it will likely be used in the future.
Mind you, subcollections are not fields.
What you can do here, is add a query that fetches the documents in the subcollections (Appointments, ServiceProviders, Services), for each business. You would get the business document Id to use for the query.
It would typically look something like:
_db.collection('businesses').document(documentId).collection('Appointments')
Mind you, this is potentially too much data. It might be better to fetch the docs in those subcollections only when needed/requested by the user.
Is it possible, using Nifi, to load a json file into a structured table?
I've called the following weather forecast data (from 6000 weather stations), which i'm currently loading into HDFS. It all appears on one line:
{"SiteRep":{"Wx":{"Param":[{"name":"F","units":"C","$":"Feels Like Temperature"},{"name":"G","units":"mph","$":"Wind Gust"},{"name":"H","units":"%","$":"Screen Relative Humidity"},{"name":"T","units":"C","$":"Temperature"},{"name":"V","units":"","$":"Visibility"},{"name":"D","units":"compass","$":"Wind Direction"},{"name":"S","units":"mph","$":"Wind Speed"},{"name":"U","units":"","$":"Max UV Index"},{"name":"W","units":"","$":"Weather Type"},{"name":"Pp","units":"%","$":"Precipitation Probability"}]},"DV":{"dataDate":"2017-01-12T22:00:00Z","type":"Forecast","Location":[{"i":"14","lat":"54.9375","lon":"-2.8092","name":"CARLISLE AIRPORT","country":"ENGLAND","continent":"EUROPE","elevation":"50.0","Period":{"type":"Day","value":"2017-01-13Z","Rep":{"D":"WNW","F":"-3","G":"25","H":"67","Pp":"0","S":"13","T":"2","V":"EX","W":"1","U":"1","$":"720"}}},{"i":"22","lat":"53.5797","lon":"-0.3472","name":"HUMBERSIDE AIRPORT","country":"ENGLAND","continent":"EUROPE","elevation":"24.0","Period":{"type":"Day","value":"2017-01-13Z","Rep":{"D":"NW","F":"-2","G":"43","H":"63","Pp":"3","S":"25","T":"4","V":"EX","W":"3","U":"1","$":"720"}}}, .....
Ideally, I want the schema structuring into a 6000 row table.
I've tried writing a schema to pass the above into Pig, but haven't been successful, probably because I'm not familiar enough with json to translate this correctly.
Casting around for an easy way to add some structure to the data, I've spotted that there's a PutHBaseJson processor in Nifi.
Can anyone advise if this PutHBaseJson processor would work with the above data structure? And if so, can anyone point me towards a decent tutorial to give me a starting point on the configuration?
Greatly appreciate any guidance.
You probably want to use the SplitJson processor to split the 6000 record JSON structure into 6000 individual flowfiles. If you need to "inject" the parameter definitions from the top-level response, you can do a ReplaceText or JoltTransformJSON operation to manipulate the individual JSON records. Here is a good article by Yolanda Davis describing how to perform Jolt transforms (JSON -> JSON) in NiFi.
Once you have the individual flowfiles containing a single JSON record, putting them into HBase is very easy. Bryan Bende wrote an article describing the necessary configurations for the PutHBaseJson processor.
I am using the envelope pattern, and my canonical model part is in XML format. I usually return the model in full or in a summary version. Retrieval of documents is pretty quick, but when returning as part of my REST call, where I need to return JSON to the browser, my json:transform-to-json takes double the version of the call that just returns the XML.
Is a strategy to also have the canonical model in JSON format as well in the envelope, or to maybe have rendered json in full and summary formats in other documents outside of the envelope, which don't get searched, but are mainly used when returning results? This way I don't have to incur the hit for transforming the canonical model to JSON all the time.
Are there any other ways that this has been done?
Conversion from XML to JSON should be relatively light, but the mere fact it has to do something will take overhead. Doing that work upfront will definitely save time. You can put both formats in the same envelop (though JSON will have to be stored as string then), or in a different document as you suggest. Alternatively you could also store it in document-properties. Unfortunately, that only takes XML as well, so you will be storing your JSON as string in there too.
Alternatively, have you profiled the transform to see if there is a particular reason why it slows down so much? Using XSLT versus XQuery for the transform could make a difference too..
HTH!
json:transform-to-json has 3 algorithms optimized for different purposes and will perform with different tradeoffs of flexabilty, fidelity and performance.
"basic" (default) useful only to reverse json:transform-from-json()
"full" - to preserve as much information fidelity as possible, in exchange for a non 'prety' format in many cases.
"custom" - is ... custom ... designed when the json format is fixed or when you want control over the json output at the expense of handling a subset of XML accurately.
Basic and full are the most efficient. However all variants are fairly involved and require completely traversing the XML node tree and creating bottom up a JSON object tree. In ML version 8 this is then translated into the native JSON node structure. In a REST call it would then be serialized as text.
Compared to a direct return of an xml document vi fn:doc("file.xml") there is atleast 2 orders of magnitude more operations involved in the transform case.
For small documents in a REST call that still a small fraction of the total request time, expecialy if the REST call was performing a complex operation itself then returning a small result. Your use case seems the opposite - returning a xml document directly bypasses almost all of the XQuery processing and is sent directly from internal to the output or assigned to a variable.
If that an important use case to optimize, especially if the documents can be large, then saving them as text or binary will be much faster -- at the expense of more storage used. If this is only a variant representation of the xml, try storing the text JSON as binary as it will not incur any indexing overhead.
Otherwise if you need to query over the JSON then in ML7 storing as text gives you simple word queries, in ML8 storing as native JSON gives you structured queries -- both with efficient text serialization.
I am attempting to capture a list of all the indexes and their sizes in a way that I could capture the information using Angular's $http service and then iterate through the information using the ng-repeat preferably with something like:
<ul ng-repeat="elsindex in elsIndexHttpResponse">
<li>{{elsindex.name}}:{{elsindex.size}}</li>
</ul>
The closest thing I have found is this:
http://localhost:9200/_cat/indices?h=index,store.size
Except:
a. its responses are not in json so easily referencing it using the ng-repeat <li> elements isn't going to work; and
b. i would like, if possible, to get the size output to reflect the same unit size (like bytes).
If this involves something complicated then I'd be grateful for pointers on where I should focus.
I am using elasticsearch v1.4.4
Many thanks
I realize this question dates already, but wanted to add my 2 cents.
http://localhost:9200/_cat/indices?h=index,store.size&bytes=kb&format=json
Would actually get you exactly what you requested:
format=json -> formats the output to json
bytes=kb -> outputs the size in kilobytes
Information regarding the size unit was retrieved from cat APIs doc
Possible values for the bytes argument
Information regarding the format was an attempt in Sense, which has some auto-completion features quite useful to detect such options.
Cheers.
Index size in bytes is included with an indices stats API call:
curl http://localhost:9200/_stats/indexing,store
For nicely formatted JSON output, append ?pretty to the end of the URL:
curl http://localhost:9200/_stats/indexing,store?pretty
See the Indices stats API documentation for additional details and related information.
Just a slight modification from above answer.
curl -X GET "localhost:9200/_cat/indices?h=index,store.size&bytes=gb?pretty"
In case you want the size of a particular index, the below API works fine on Elastic Search 7.14.
curl http://10.29.61.105:9200/employee/_stats where employee is the desired index name.
I've been bashing against a brick wall on this ever since Monday, when the customer told me that we needed to simulate up to 50,000 pseudo-concurrent entities for the purposes of performance testing. This is the setup. I have text files full of JSON objects containing JSON data that looks a bit like this:
{"customerId"=>"900", "assetId"=>"NN_18_144", "employee"=>"", "visible"=>false,
"GenerationDate"=>"2012-09-21T09:41:39Z", "index"=>52, "Category"=>2...}
It's one object to a line. I'm using JMeter's JMS publisher to read the lines sequentially:
${_StringFromFile(${PATH_TO_DATA_FILES}scenario_9.json)}
from the each of which contain a different scenario.
What I need to do is read the files in and substitute assetId's value with a randomly selected value from a list of 50,000 non-sequential, pre-generated strings (I can't possibly have a separate file for each assetId, as that would involve littering the load injector with 50,000 files and configuring a thread group within JMeter for each). Programatically, it's a trivial matter to perform the substitution but it's not so simple to do it in JMeter on the fly.
Normally, I'd treat this as the interesting technical challenge that it is and spend a few days working it out, but I only have the weekend, which I suspect I'll spend sleeping overnight in the office anyway.
Can anyone help me with this, please?
Thanks.
For reading your assets, use a CSV Data SetConfig , I suppose assetId will be the variable name.
Modify your expression:
${_StringFromFile(${PATH_TO_DATA_FILES}scenario_9.json, lineToSubstitute)}
To do the substitution, add a Beanshell sampler or JSR223_SamplerJ (using groovy) and code the substitution:
String assetId = vars.get("assetId");
String lineToSubstitute = vars.get("lineToSubstitute");
String lineSubstituted = ....;
vars.put("lineSubstituted", lineSubstituted);
If your JSON body is always the same or you have little changes in it, you should:
Use an HTTP Sampler with RAW POST Body
Put the JSON body in it with variables for asset ids
Put asset ids in CSV Data Set config
Avoid using ${_StringFromFile} as it has a cost.
If you need scripting , use JSR223 Post Processor with Script in external file + Caching (available since 2.8) so that script is compiled.