I have phonorgraph object with billions of rows and we are querying it through object set service
for example, I want to get all DriverLicences from certain city.
#Function()
public getDriverLicences(city: string): ObjectSet<DriverLicences> {
let drivers = Objects.search().DriverLicences().filter(row => row.city.exactMatch(city));
return drivers ;
}
I am facing this error when I am trying query it from slate:
ERROR 400: {"errorCode":"INVALID_ARGUMENT","errorName":"ObjectSet:PagingAboveConfiguredLimitNotAllowed","errorInstanceId":"0000-000","parameters":{}}
I understand that I am probably retrieving more than 100 000 results but I need all the results because of the implemented logic in the front is a complex slate dashboard built by another team that we cannot re-factor.
The issue here is that, specifically in the Slate <> Function connector, there is a "translation layer" that serializes the contents of the object set and provides a response data structure that materializes the property:value pairs for each object in the set.
This clearly doesn't work for large object sets where throwing so much data into the browser is likely to overwhelm the resources allocated to the tab.
From context it seems like you might be migrating an existing Slate app over to Functions; in the current version, how is the query limiting the number of results returned? It certainly must not be returning several 100 thousand results for further processing on the front end? (And if so, that might be an anti-pattern to consider addressing).
As for options that you could currently explore, you can sort your object set and then specify a smaller limit to return:
Objects.search().DriverLicences().filter(row => row.city.exactMatch(city)).orderBy(date_of_issue).take(100)
You'll find a few more details in the Functions documentation Reference entry on Ontology API: Object Sets in the section on Ordering and limiting.
You can even make a work around for the (current) lack of paging when return an ObjectSet to Slate by using the last value from the property ordered on (i.e. date_of_issue) as a filter in the subsequent request and return the next N objects.
This can work if you need a Slate table or HTML widget that renders on set of results then, on a user action, gets the next page.
I use Couchbase Java SDK 2.2.6 with Couchbase server 4.1.
I query my view with the following code
public <T> List<T> findDocuments(ViewQuery query, String bucketAlias, Class<T> clazz) {
// We specifically set reduce false and include docs to retrieve docs
query.reduce(false).includeDocs();
log.debug("Find all documents, query = {}", decode(query));
return getBucket(bucketAlias)
.query(query)
.allRows()
.stream()
.map(row -> fromJsonDocument(row.document(), clazz))
.collect(Collectors.toList());
}
private static <A> A fromJsonDocument(JsonDocument saved, Class<A> clazz) {
log.debug("Retrieved json document -> {}", saved);
A object = fromJson(saved.content(), clazz);
return object;
}
In the logs from the fromJsonDocument method I see that rows are not always sorted by the row key. Usually they are, but sometimes they are not.
If I just run this query in browser couchbase GUI, I always receive results in expected order. Is it a bug or expected that view query results are not sorted when queried with async client?
What is the behaviour in different clients, not java?
This is due to the asynchronous nature of your call in the Java client + the fact that you used includeDocs.
What includeDocs will do is that it will weave in a call to get for each document id received from the view. So when you look at the asynchronous sequence of AsyncViewRow with includeDocs, you're actually looking at a composition of a row returned by the view and an asynchronous retrieval of the whole document.
If a document retrieval has a little bit of latency compared to the one for the previous row, it could reorder the (row+doc) emission.
But good news everyone! There is a includeDocsOrdered alternative in the ViewQuery that takes exactly the same parameters as includeDocs but will ensure that AsyncViewRow come in the same order returned by the view.
This is done by eagerly triggering the get retrievals but then buffering those that arrive out of order, so as to maintain the original order without sacrificing too much performance.
That is quite specific to the Java client, with its usage of RxJava. I'm not even sure other clients have the notion of includeDocs...
I am thinking of creating a generic task/function which will read a transaction from any given ovm analysis port and if the transaction matches some constraint provided by user then i will fire an event that matching transaction found.
I want the user to pass the constraint like we pass it with array locator functions using the with clause like
out = arr.find_first with (item == 10);
The with clause in many of SystemVerilog's constructs is not very OOP friendly. There is no way to pass the with expression as an argument.
Two software design patterns you may want to try is
the factory - where you allow the user to extend these functions with a difference compare function
a policy class - you can make class whose sole purpose is to provide a compare function.
When handling POST, PUT, and PATCH requests on the server-side, we often need to process some JSON to perform the requests.
It is obvious that we need to validate these JSONs (e.g. structure, permitted/expected keys, and value types) in some way, and I can see at least two ways:
Upon receiving the JSON, validate the JSON upfront as it is, before doing anything with it to complete the request.
Take the JSON as it is, start processing it (e.g. access its various key-values) and try to validate it on-the-go while performing business logic, and possibly use some exception handling to handle vogue data.
The 1st approach seems more robust compared to the 2nd, but probably more expensive (in time cost) because every request will be validated (and hopefully most of them are valid so the validation is sort of redundant).
The 2nd approach may save the compulsory validation on valid requests, but mixing the checks within business logic might be buggy or even risky.
Which of the two above is better? Or, is there yet a better way?
What you are describing with POST, PUT, and PATCH sounds like you are implementing a REST API. Depending on your back-end platform, you can use libraries that will map JSON to objects which is very powerful and performs that validation for you. In JAVA, you can use Jersey, Spring, or Jackson. If you are using .NET, you can use Json.NET.
If efficiency is your goal and you want to validate every single request, it would be ideal if you could evaluate on the front-end if you are using JavaScript you can use json2.js.
In regards to comparing your methods, here is a Pro / Cons list.
Method #1: Upon Request
Pros
The business logic integrity is maintained. As you mentioned trying to validate while processing business logic could result in invalid tests that may actually be valid and vice versa or also the validation could inadvertently impact the business logic negatively.
As Norbert mentioned, catching the errors before hand will improve efficiency. The logical question this poses is why spend the time processing, if there are errors in the first place?
The code will be cleaner and easier to read. Having validation and business logic separated will result in cleaner, easier to read and maintain code.
Cons
It could result in redundant processing meaning longer computing time.
Method #2: Validation on the Go
Pros
It's efficient theoretically by saving process and compute time doing them at the same time.
Cons
In reality, the process time that is saved is likely negligible (as mentioned by Norbert). You are still doing the validation check either way. In addition, processing time is wasted if an error was found.
The data integrity can be comprised. It could be possible that the JSON becomes corrupt when processing it this way.
The code is not as clear. When reading the business logic, it may not be as apparent what is happening because validation logic is mixed in.
What it really boils down to is Accuracy vs Speed. They generally have an inverse relationship. As you become more accurate and validate your JSON, you may have to compromise some on speed. This is really only noticeable in large data sets as computers are really fast these days. It is up to you to decide what is more important given how accurate you think you data may be when receiving it or whether that extra second or so is crucial. In some cases, it does matter (i.e. with the stock market and healthcare applications, milliseconds matter) and both are highly important. It is in those cases, that as you increase one, for example accuracy, you may have to increase speed by getting a higher performant machine.
Hope this helps.
The first approach is more robust, but does not have to be noticeably more expensive. It becomes way less expensive even when you are able to abort the parsing process due to errors: Your business logic usually takes >90% of the resources in a process, so if you have an error % of 10%, you are already resource neutral. If you optimize the validation process so that the validations from the business process are performed upfront, your error rate might be much lower (like 1 in 20 to 1 in 100) to stay resource neutral.
For an example on an implementation assuming upfront data validation, look at GSON (https://code.google.com/p/google-gson/):
GSON works as follows: Every part of the JSON can be cast into an object. This object is typed or contains typed data:
Sample object (JAVA used as example language):
public class someInnerDataFromJSON {
String name;
String address;
int housenumber;
String buildingType;
// Getters and setters
public String getName() { return name; }
public void setName(String name) { this.name=name; }
//etc.
}
The data parsed by GSON is by using the model provided, already type checked.
This is the first point where your code can abort.
After this exit point assuming the data confirmed to the model, you can validate if the data is within certain limits. You can also write that into the model.
Assume for this buildingType is a list:
Single family house
Multi family house
Apartment
You can check data during parsing by creating a setter which checks the data, or you can check it after parsing in a first set of your business rule application. The benefit of first checking the data is that your later code will have less exception handling, so less and easier to understand code.
I would definitively go for validation before processing.
Let's say you receive some json data with 10 variables of which you expect:
the first 5 variables to be of type string
6 and 7 are supposed to be integers
8, 9 and 10 are supposed to be arrays
You can do a quick variable type validation before you start processing any of this data and return a validation error response if one of the ten fails.
foreach($data as $varName => $varValue){
$varType = gettype($varValue);
if(!$this->isTypeValid($varName, $varType)){
// return validation error
}
}
// continue processing
Think of the scenario where you are directly processing the data and then the 10th value turns out to be of invalid type. The processing of the previous 9 variables was a waste of resources since you end up returning some validation error response anyway. On top of that you have to rollback any changes already persisted to your storage.
I only use variable type in my example but I would suggest full validation (length, max/min values, etc) of all variables before processing any of them.
In general, the first option would be the way to go. The only reason why you might need to think of the second option is if you were dealing with JSON data which was tens of MBs large or more.
In other words, only if you are trying to stream JSON and process it on the fly, you will need to think about second option.
Assuming that you are dealing with few hundred KB at most per JSON, you can just go for option one.
Here are some steps you could follow:
Go for a JSON parser like GSON that would just convert your entire
JSON input into the corresponding Java domain model object. (If GSON
doesn't throw an exception, you can be sure that the JSON is
perfectly valid.)
Of course, the objects which were constructed using GSON in step 1
may not be in a functionally valid state. For example, functional
checks like mandatory fields and limit checks would have to be done.
For this, you could define a validateState method which repeatedly
validates the states of the object itself and its child objects.
Here is an example of a validateState method:
public void validateState(){
//Assume this validateState is part of Customer class.
if(age<12 || age>150)
throw new IllegalArgumentException("Age should be in the range 12 to 120");
if(age<18 && (guardianId==null || guardianId.trim().equals(""))
throw new IllegalArgumentException("Guardian id is mandatory for minors");
for(Account a:customer.getAccounts()){
a.validateState(); //Throws appropriate exceptions if any inconsistency in state
}
}
The answer depends entirely on your use case.
If you expect all calls to originate in trusted clients then the upfront schema validation should be implement so that it is activated only when you set a debug flag.
However, if your server delivers public api services then you should validate the calls upfront. This isn't just a performance issue - your server will likely be scrutinized for security vulnerabilities by your customers, hackers, rivals, etc.
If your server delivers private api services to non-trusted clients (e.g., in a closed network setup where it has to integrate with systems from 3rd party developers), then you should at least run upfront those checks that will save you from getting blamed for someone else's goofs.
It really depends on your requirements. But in general I'd always go for #1.
Few considerations:
For consistency I'd use method #1, for performance #2. However when using #2 you have to take into account that rolling back in case of non valid input may become complicated in the future, as the logic changes.
Json validation should not take that long. In python you can use ujson for parsing json strings which is a ultrafast C implementation of the json python module.
For validation, I use the jsonschema python module which makes json validation easy.
Another approach:
if you use jsonschema, you can validate the json request in steps. I'd perform an initial validation of the most common/important parts of the json structure, and validate the remaining parts along the business logic path. This would allow to write simpler json schemas and therefore more lightweight.
The final decision:
If (and only if) this decision is critical I'd implement both solutions, time-profile them in right and wrong input condition, and weight the results depending on the wrong input frequency. Therefore:
1c = average time spent with method 1 on correct input
1w = average time spent with method 1 on wrong input
2c = average time spent with method 2 on correct input
2w = average time spent with method 2 on wrong input
CR = correct input rate (or frequency)
WR = wrong input rate (or frequency)
if ( 1c * CR ) + ( 1w * WR) <= ( 2c * CR ) + ( 2w * WR):
chose method 1
else:
chose method 2
Given the fact that I am using a pre-existing comment/discussion solution that uses a unique string as a thread id, I need to create a user-facing slug from an arbitrary object for the thread id that fits the following constraints:
Short
"Pretty"
Human-readable
Does not reveal internals
Unique per object instance
I thought about using {FQCN}-{id}, but it violates #4 and, when web-encoded, #2. I also considered an md5 hash of the same, but that violates #3 (and potentially #1, depending on the definition of "short").
Since the objects do not have a standardized API (eg, there's no guarantee that they'll all have a getTitle() method, for example), I'm at a loss as to how to come up with a slug that fits those constraints. How would you go about creating one, and if that's not possible, what format would you use that violates as few constraints as possible?
It sounds like if you're using arbitrary objects, you'll want to let the objects decide for themselves. You'd want to have a base class or perhaps an interface that would define that the objects are "sluggable". Everything you're going to have slugs for would need to implement this.
That way, you would have
getObjectFromSlug = function () or function getObjectFromSlug ()
and
getSlug = function () or function getSlug ()
in the interface, and each of the "sluggable" objects would be required to implement this on their own. The only issue there is that you will have to manually require them to be unique on their own.