MongoDB Compass Exporting Collection with Unwanted Metadata - json

When exporting a collection using MongoDB Compass (regardless of query) to JSON, the output now includes metadata ($oid, $numberInt, $numberDouble). I've exported several collections in the past couple of weeks without issue but now every export includes metadata which is affecting how the JSON is being parsed in external software.
I've tried updating to the latest version of MongoDB (4.0.10) & MongoDB Compass (1.18.0) both Community Edition, with no resolution.
Expected Output: {"_id":"unique_id"},"transaction_id":"1059833"},"transaction_amount":"2000"}}
Actual Output: {"_id":{"$oid":"unique_id"},"transaction_id":{"$numberInt":"1059833"},"transaction_amount":{"$numberInt":"2000"}}

Try the following code if you are using mongo-java-driver:
Use new JsonWriterSettings(JsonMode.SHELL).
Document doc = new Document("startDate", new Document("$gt", first).append("$lt", second));
System.out.println(doc.toJson(new JsonWriterSettings(JsonMode.SHELL)));
More details on page:
https://mongodb.github.io/mongo-java-driver/3.7/bson/extended-json/

Related

Revisions - Autodesk Forge Data Exchange API

I am following the steps from this tutorial to try to compare the parameter values from the same element but in two different revisions of the same model inside the ACC, using Data Exchange API. The steps I did are as follows:
I've created a Data Exchange inside ACC Docs (which corresponds to the REVISION_1) using a Revit Model A;
I've changed the Revit Model A, adding a new window on it, and after, uploaded the model inside the ACC Docs (which corresponds to the REVISION_2) from the same model;
With my exchangeId and collectionId on my hands, I've collected the Snapshots Revisions successfully from the Data Exchange by doing on Postman:
https://developer.api.autodesk.com/exchange/v1/collections/<COLLECTION_ID>/exchanges/<EXCHANGE_ID>/snapshots:exchange/revisions
After that, I collected the changed assets from REVISION_1 to REVISION_2 by doing:
https://developer.api.autodesk.com/exchange/v1/collections/<COLLECTION_ID>/exchanges/<EXCHANGE_ID>/assets:sync?filters=exchange.snapshot.fromRevision==<REVISION_1>;exchange.snapshot.toRevision==<REVISION_2>
And again the results are as expected, it shows only the data that has changed between both versions.
So, my main question is: Since now we have two revisions, how to retrieve the original data only from REVISION_1? I've tried several request options unsuccessfully.
By the way, is curious that now retrieving the assets from the current Data Exchange, by doing
https://developer.api.autodesk.com/exchange/v1/collections/<COLLECTION_ID>/exchanges/<EXCHANGE_ID>/assets:sync shows only information from REVISION_1, and the modified elements in REVISION_2 seem that have disappeared. Someone knows why does it happen?
I've tried the following requests without success:
https://developer.api.autodesk.com/exchange/v1/collections/<COLLECTION_ID>/exchanges/<EXCHANGE_ID>/assets:sync?filters=exchange.snapshot.fromRevision==<REVISION_1>
https://developer.api.autodesk.com/exchange/v1/collections/<COLLECTION_ID>/exchanges/<EXCHANGEID>/assets:sync?filters=exchange.snapshot.toRevision==<REVISION_1>
https://developer.api.autodesk.com/exchange/v1/collections/<COLLECTION_ID>/exchanges/<EXCHANGEID>/exchange.snapshot.fromRevision==latest-1
https://developer.api.autodesk.com/exchange/v1/collections/<COLLECTION_ID>/exchanges/<EXCHANGEID>/exchange.snapshot.toRevision==latest-1
I expected to retrieve the assets only from REVISION_1, but it seems that since I have a new revision, like REVISION_2, it is overwriting the data from the previous revision in some way.

How to generate a JSON file using JMeter Report Generator

I am trying to create a statistics.json file with JMeter using ReportGenerator, populated with the results of my .jmx tests. Is it possible to do this with JMeter?
I have gone through this tutorial: https://jmeter.apache.org/usermanual/generating-dashboard.html which focuses on creating an html dashboard using the Report Generator, but I have a project requirement of creating/updating a statstics.json file as well. I have already pulled the necessary data using a JSON Extractor post processor, and I can get the custom variables from that extractor to show up in my debug response, and in my CSV file (after adding some sample_variables to user.properties). Unfortunately I have been unsuccessful in finding more info about how to create a JSON file with these responses.
In my reportgenerator.properties file, the only parts I see that relate to json are:
jmeter.reportgenerator.exporter.json.classname=org.apache.jmeter.report.dashboard.JsonExporter
jmeter.reportgenerator.exporter.json.property.output_dir=report-output
I'm looking for some settings that would allow me to edit what goes into that JSON file, but I'm having trouble finding information in the docs. Do I need to be sending or setting my custom variables in another settings file? Any help clarifying this would be much appreciated!
Looking at JMeter source code you cannot efficiently control what's being exported into statistics.json file externally, you will have to either amend the JsonExporter class code or come up with your own implementation of the AbstractDataExporter and choose what, where and how to store.
private void createStatistic(Map<String, SamplingStatistic> statistics, MapResultData resultData) {
LOGGER.debug("Creating statistics for result data:{}", resultData);
SamplingStatistic statistic = new SamplingStatistic();
ListResultData listResultData = (ListResultData) resultData.getResult("data");
statistic.setTransaction((String) ((ValueResultData)listResultData.get(0)).getValue());
statistic.setSampleCount((Long) ((ValueResultData)listResultData.get(1)).getValue());
statistic.setErrorCount((Long) ((ValueResultData)listResultData.get(2)).getValue());
statistic.setErrorPct(((Double) ((ValueResultData)listResultData.get(3)).getValue()).floatValue());
statistic.setMeanResTime((Double) ((ValueResultData)listResultData.get(4)).getValue());
statistic.setMinResTime((Long) ((ValueResultData)listResultData.get(5)).getValue());
statistic.setMaxResTime((Long) ((ValueResultData)listResultData.get(6)).getValue());
statistic.setMedianResTime((Double) ((ValueResultData)listResultData.get(7)).getValue());
statistic.setPct1ResTime((Double) ((ValueResultData)listResultData.get(8)).getValue());
statistic.setPct2ResTime((Double) ((ValueResultData)listResultData.get(9)).getValue());
statistic.setPct3ResTime((Double) ((ValueResultData)listResultData.get(10)).getValue());
statistic.setThroughput((Double) ((ValueResultData)listResultData.get(11)).getValue());
statistic.setReceivedKBytesPerSec((Double) ((ValueResultData)listResultData.get(12)).getValue());
statistic.setSentKBytesPerSec((Double) ((ValueResultData)listResultData.get(13)).getValue());
statistics.put(statistic.getTransaction(), statistic);
}
An easier option would be writing your sample variables into a separate file using Flexible File Writer
I'm leaving the accepted answer because it is correct. However, I'd like to add that I was able to complete my requirement by using a JSR223 post processor to write a groovy script that creates a csv file wherever I need, and fill it with any data that I needed.

Accessing ArcGIS Pro geoprocessing history programmatically

I am writing an ArcGIS Pro Add-In and would like to view items in the geoprocessing history programmatically. The goal of this would be to get the list of parameters and tools used, to be able to better understand and recreate a workflow later, and perhaps, in another project where we would not have direct access to the history within ArcGIS Pro.
After a lot of searching through documentation, online posts, and debugging breakpoints in my code, I've found that some of this data does exist privately within the HistoryProjectItem class, but since this is a private class member, within a sealed class it seems that there would be nothing I can do to access this data. The other place I've seen this data is less than ideal, with the user having an option to write the geoprocessing history to an XML log file that lives within /AppData/Roaming/ESRI/ArcGISPro/ArcToolbox/History. Our team has been told that this file may be a problem because certain recursive operations may cause the file to balloon out of control, and after reading online, it seems that most people want this setting disabled to avoid large log files taking up space on their machine. Overall the log file doesn't seem like a great option as we fear it could slow down a user by having the program write large log files while they are working.
I was wondering if this data is stored somewhere that I have missed that could be accessed programmatically from the add-in. It seems to me that the data within Project.Items is always stored regardless of user settings but appears to be inaccessible this way to due class member visibility. I'm unfamiliar with geodatabases and ArcGIS file formats to know if a project will always have a .gdb which perhaps we could read the history from there.
Any insights on how to better read the Geoprocessing history in a minimally intrusive way to the user would be ideal. Is this data available elsewhere?
This was the closest/best solution I have found so far without writing to the history logs that most people avoid due to filesize bloat, and warnings that one operation may run other operations recursively causing the file to balloon massively.
https://community.esri.com/t5/arcgis-pro-sdk-questions/can-you-access-geoprocessing-history-programmatically-using-the/m-p/1007833#M5842
it involves reading the .arpx file (which is written to on save) by unzipping it, parsing the XML, and filtering the contents to only GPHistoryOperations. From there I was able to read all the parameters, environment options, status, and duration of the operation that I was hoping to gain.
public static void ListHistory()
{
// this can be run in a console app (or within a Pro add-in)
CIMGISProject project = GetProject(#"D:\tests\topologies\topotest1.aprx");
foreach(CIMProjectItem hist in project.ProjectItems
.Where(itm => itm.ItemType == "GPHistory"))
{
Debug.Print($"+++++++++++++++++++++++++++");
Debug.Print($"{hist.Name}");
XmlDocument doc = new XmlDocument();
doc.LoadXml(hist.PropertiesXML);
//it sure would be nice if Pro SDK had things like MdProcess class in ArcObjects
//https://desktop.arcgis.com/en/arcobjects/latest/net/webframe.htm#MdProcess.htm
var json = JsonConvert.SerializeXmlNode(doc, Newtonsoft.Json.Formatting.Indented);
Debug.Print(json);
}
}
static CIMGISProject GetProject(string aprxPath)
{
//aprx files are actually zip files
//https://www.nuget.org/packages/SharpZipLib
using (var zipFile = new ZipFile(aprxPath))
{
var entry = zipFile.GetEntry("GISProject.xml");
using (var stream = zipFile.GetInputStream(entry))
{
using (StreamReader reader = new StreamReader(stream))
{
var xml = reader.ReadToEnd();
//deserialize the xml from the aprx file to hydrate a CIMGISProject
return ArcGIS.Core.CIM.CIMGISProject.FromXml(xml);
}
};
};
}
Code provided by Kirk Kuykendall

SOLR - Best approach to import 20 million documents from csv file

My current task on hand is to figure out the best approach to load millions of documents in solr.
The data file is an export from DB in csv format.
Currently, I am thinking about splitting the file into smaller files and having a script while post this smaller ones using curl.
I have noticed that if u post high amount of data, most of the time the request times out.
I am looking into Data importer and it seems like a good option
Any others ideas highly appreciated
Thanks
Unless a database is already part of your solution, I wouldn't add additional complexity to your solution. Quoting the SOLR FAQ it's your servlet container that is issuing the session time-out.
As I see it, you have a couple of options (In my order of preference):
Increase container timeout
Increase the container timeout. ("maxIdleTime" parameter, if you're using the embedded Jetty instance).
I'm assuming you only occasionally index such large files? Increasing the time-out temporarily might just be simplest option.
Split the file
Here's the simple unix script that will do the job (Splitting the file in 500,000 line chunks):
split -d -l 500000 data.csv split_files.
for file in `ls split_files.*`
do
curl 'http://localhost:8983/solr/update/csv?fieldnames=id,name,category&commit=true' -H 'Content-type:text/plain; charset=utf-8' --data-binary #$file
done
Parse the file and load in chunks
The following groovy script uses opencsv and solrj to parse the CSV file and commit changes to Solr every 500,000 lines.
import au.com.bytecode.opencsv.CSVReader
import org.apache.solr.client.solrj.SolrServer
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
import org.apache.solr.common.SolrInputDocument
#Grapes([
#Grab(group='net.sf.opencsv', module='opencsv', version='2.3'),
#Grab(group='org.apache.solr', module='solr-solrj', version='3.5.0'),
#Grab(group='ch.qos.logback', module='logback-classic', version='1.0.0'),
])
SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr/");
new File("data.csv").withReader { reader ->
CSVReader csv = new CSVReader(reader)
String[] result
Integer count = 1
Integer chunkSize = 500000
while (result = csv.readNext()) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", result[0])
doc.addField("name_s", result[1])
doc.addField("category_s", result[2])
server.add(doc)
if (count.mod(chunkSize) == 0) {
server.commit()
}
count++
}
server.commit()
}
In SOLR 4.0 (currently in BETA), CSV's from a local directory can be imported directly using the UpdateHandler. Modifying the example from the SOLR Wiki
curl http://localhost:8983/solr/update?stream.file=exampledocs/books.csv&stream.contentType=text/csv;charset=utf-8
And this streams the file from the local location, so no need to chunk it up and POST it via HTTP.
Above answers have explained really well the ingestion strategies from single machine.
Few more options if you have big data infrastructure in place and want to implement distributed data ingestion pipeline.
Use sqoop to bring data to hadoop or place your csv file manually in hadoop.
Use one of below connector to ingest data:
hive- solr connector, spark- solr connector.
PS:
Make sure no firewall blocks connectivity between client nodes and solr/solrcloud nodes.
Choose right directory factory for data ingestion, if near real time search is not required then use StandardDirectoryFactory.
If you get below exception in client logs during ingestion then tune autoCommit and autoSoftCommit configuration in solrconfig.xml file.
SolrServerException: No live SolrServers available to handle this
request
Definitely just load these into a normal database first. There's all sorts of tools for dealing with CSVs (for example, postgres' COPY), so it should be easy. Using Data Import Handler is also pretty simple, so this seems like the most friction-free way to load your data. This method will also be faster since you won't have unnecessary network/HTTP overhead.
The reference guide says ConcurrentUpdateSolrServer could/should be used for bulk updates.
Javadocs are somewhat incorrect (v 3.6.2, v 4.7.0):
ConcurrentUpdateSolrServer buffers all added documents and writes them into open HTTP connections.
It doesn't buffer indefinitely, but up to int queueSize, which is a constructor parameter.

Accessing JBoss JMX data via JSON

Is there a way to access the JBoss JMX data via JSON?
I am trying to pull a management console together using data from a number of different servers. I can achieve this using screen scraping, but I would prefer to use a JSON object or XML response if one exists, but I have not been able to find one.
You should have a look at Jolokia, a full featured JSON/HTTP adapter for JMX.
It supports and has been tested on JBoss as well as on many other platforms. Jolokia
is an agent, which is deployed as a normal Java EE war, so you simply drop it into your
deploy directory within you JBoss installation. Also, there a some client libraries available, e.g. jmx4perl which allows for programatic access to the agent.
There is much more to discover and it is actively developed.
If you are using Java, then you can make small program that make JMX request to JBoss server and transform the response into XML/JSON.
Following is small code snippet. This may help you.
String strInitialProp = "javax.management.builder.initial";
System.setProperty(strInitialProp, "mx4j.server.MX4JMBeanServerBuilder");
String urlForJMX = "jnp://localhost:1099";//for jboss
ObjectName objAll = ObjectName.getInstance("*:*");
JMXServiceURL jmxUrl = new JMXServiceURL(urlForJMX);
MBeanServerConnection jmxServerConnection = JMXConnectorFactory.connect(jmxUrl).getMBeanServerConnection();
System.out.println("Total MBeans :: "+jmxServerConnection.getMBeanCount());
Set mBeanSet = jmxServerConnection.queryNames(objAll,null);
There are some jmx-rest bridges available, that internally talk JMX to MBeans and expose the result over REST calls (which can deliver JSON as data format).
See e.g. polarrose or jmx-rest-access. There are a few others out there.