I am trying to create a view in couchbase. Below is my query,
function (doc, meta) {
if (doc._class == "com.abc.xyz.Account" && doc.accountId) {
emit(doc.accountId, null);
}
}
However, I am getting below exception:
{"error":"Error, you cannot issue more than one query at once. Please remove all text after the semicolon closing the first query."}
Not sure what is the issue with the query. Tried searching for the solution but couldn't find any.
The javascript function mentioned in the question appears to be a Couchbase Map/Reduce View Engine definition. These can be queried using either SDK or REST API, and details are outlined in View Querying section. In addition, the Map/Reduce View UI (same place where the map/reduce functions are defined) has some limited capability to query the view being defined.
The Couchbase Query UI is exclusively for N1QL/SQL++ queries. The function mentioned can be expressed in N1QL very easily:
CREATE INDEX ON mybucket(accountId) WHERE _class == "com.abc.xyz.Account";
So it would not be necessary to create Couchbase Map/Reduce views. And the above index can be queried, for example, as:
SELECT count(*) FROM mybucket WHERE _class == "com.abc.xyz.Account";
I'm sure the above function you mention is just an example, and the actual use case is more complex, but the general approach still holds true. It's a good idea to start by using declarative tool (N1QL) as it is much simpler to use.
There are many useful resources for this such as the N1QL Tutorial to help start using N1QL. This would be the best starting point for most use cases.
Related
I am curious what techniques Database Developers and Architects use to create dynamic filter data response Stored Procedures (or Functions) for large-scale databases.
For example, let's take a database with millions of people in it, and we want to provide a stored procedure "get-person-list" which takes a JSON parameter. Within this JSON parameter, we can define filters such as $.filter.name.first, $.filter.name.last, $.filter.phone.number, $.filter.address.city, etc.
The frontend (web solution) allows the user to define one or more filters, so the front-end can say "Show me everyone with a First name of Ted and last name of Smith in San Diego."
The payload would look like this:
{
"filter": {
"name": {
"last": "smith",
"first": "ted"
},
"address": {
"city": "san diego"
}
}
}
Now, what would the best technique be to write a single stored procedure capable of handling numerous (dozens or more) filter settings (dynamically) and returning the proper result set all with the best optimization/speed?
Is it possible to do this with CTE, or are prepared statements based on IF/THEN logic (building out the SQL to be executed based on filter value) the best/only real method?
How do big companies with huge databases and thousands of users write their calls to return complex dynamic lists of data as quickly as possible?
Everything Bill wrote is true, and good advice.
I'll take it a little further. You're proposing building a search layer into your system, which is fine.
You're proposing an interface in which you pass a JSON object to code inside the DBMS.That's not fine. That code will either have a bunch of canned queries handling the various search scenarios, or will have a mess of string-handling code that reads JSON, puts together appropriate queries, then uses MySQL's PREPARE statement to run them. From my experience that is, with respect, a really bad idea.
Here's why:
The stored-procedure language has very weak string-handling support compared to host languages. No sprintf. No arrays of strings. No join or implode operators. Clunky regex, and not always present on every server. You're going to need string handling to build search queries.
Stored procedures are trickier to debug, test, deploy, and maintain than ordinary application code. That work requires special skills and special access.
You will need to maintain this code, especially if your system proves successful. You'll add requirements that will require expanding your search capabilities.
It's impossible (seriously, impossible) to know what your actual application usage patterns will be at scale. You surely will, as a consequence of growth, find usage patterns that surprise you. My point is that you can't design and build a search system and then forget about it. It will evolve along with your app.
To keep up with evolving usage patterns, you'll need to refactor some queries and add some indexes. You will be under pressure when you do that work: People will be complaining about performance. See points 1 and 2 above.
MySQL / MariaDB's stored procedures aren't compiled with an optimizing compiler, unlike Oracle and SQL Server's. So there's no compelling performance win.
So don't use a stored procedure for this. Please. Ask me how I know this sometime.
If you need a search module with a JSON interface, implement it in your favorite language (php, C#, nodejs, java, whatever). It will be easier to debug, test, deploy, and maintain.
To write a query that searches a variety of columns, you would have to write dynamic SQL. That is, write code to parse your JSON payload for the filter keys and values, and format SQL expressions in a string that is part of a dynamic SQL statement. Then prepare and execute that string.
In general, you can't "optimize for everything." Trying to optimize when you don't know in advance which queries your users will submit is a nigh-impossible task. There's no perfect solution.
The most common method of optimizing search is to create indexes. But you need to know the types of search in advance to create indexes. You need to know which columns will be included, and which types of search operations will be used, because the column order in an index affects optimization.
For N columns, there are N-factorial permutations of columns, but clearly this is impractical because MySQL only allows 64 indexes per table. You simply can't create all the indexes needed to optimize every possible query your users attempt.
The alternative is to optimize queries partially, by indexing a few combinations of columns, and hope that these help the users' most common queries. Use application logs to determine what the most common queries are.
There are other types of indexes. You could use fulltext indexing, either the implementation built in to MySQL, or else supplement your MySQL database with ElasticSearch or similar technology. These provide a different type of index that effectively indexes everything with one index, so you can search based on multiple columns.
There's no single product that is "best." Which fulltext indexing technology meets your needs requires you to evaluate different products. This is some of the unglamorous work of software development — testing, benchmarking, and matching product features to your application requirements. There are few types of work that I enjoy less. It's a toss-up between this and resolving git merge conflicts.
It's also more work to manage copies of data in multiple datastores, making sure data changes in your SQL database are also copied into the fulltext search index. This involves techniques like ETL (extract, transform, load) and CDC (change data capture).
But you asked how big companies with huge databases do this, and this is how.
Input
I to that "all the time". The web page has a <form>. When submitted, I look for fields of that form that were filled in, then build
WHERE this = "..."
AND that = "..."
into the suitable SELECT statement.
Note: I leave out any fields that were not specified in the form; I make sure to escape the strings.
I'm walking through $_GET[] instead of JSON, so it is quite easy.
INDEXing
If you have columns for each possible fields, then it is a matter of providing indexes only for the most likely columns to search on. (There are practical and even hard-coded limits on Indexes.)
If you have stored the attributes in EAV table structure, you have my condolences. Search the [entitity-attribute-value] tag for many other poor soles who wandered into that swamp.
If you store the attributes in JSON, well that is likely to be an order of magnitude worse than EAV.
If you throw all the information in a FULLTEXT columns and use MATCH, then you can get enough speed for "millions" or rows. But it comes with various caveats (word length, stoplist, endings, surprise matches, etc).
If you would like to discuss further, then scale back your expectations and make a list of likely search keys. We can then discuss what technique might be best.
I have tables table_a and table_b in my database and they are mapped in slick with TableQuery Objects. I need to copy a restricted set of data from table_a to table_b.
Let the table query objects be tableQueryA and tableQueryB. The logic for filtering and copying data is complex. So
I think of doing scala collection equivalent of table query object in a for yield and treat them as normal collections. But Everything happens in one transaction. The code looks something like this.
for {
collA <- tableQueryA.filter(.....something....).result
collB <- tableQueryB.filter(.....somethingElse.....).result
...... do something with collA and collB
}
yield ...something
Is there a harm doing this way, i.e handling as scala collections and processing them?
I am using slick 3.2
By doing two separate tableQueryX.filter().result, you'll be executing two separate queries to the database. You could replace it with one query that joins two tables.
It's hard to say what is the better approach in term of performance as it depends on amount of filter or where clauses and what kind of indexes are used by the database to fulfill those. If you need a top notch performance, try both approaches and pick one that is the fastest.
If both of your queries yield big amount of data, you need to consider memory usage for your application too because all data is loaded before scala collection api can be used.
I dont see any harm as long as data is less - but better to filter out data at DB level to avoid any potential out of memory errors.
In MySQL 5.7 a new data type for storing JSON data in MySQL tables has been
added. It will obviously be a great change in MySQL. They listed some benefits
Document Validation - Only valid JSON documents can be stored in a
JSON column, so you get automatic validation of your data.
Efficient Access - More importantly, when you store a JSON document in a JSON column, it is not stored as a plain text value. Instead, it is stored
in an optimized binary format that allows for quicker access to object
members and array elements.
Performance - Improve your query
performance by creating indexes on values within the JSON columns.
This can be achieved with “functional indexes” on virtual columns.
Convenience - The additional inline syntax for JSON columns makes it
very natural to integrate Document queries within your SQL. For
example (features.feature is a JSON column): SELECT feature->"$.properties.STREET" AS property_street FROM features WHERE id = 121254;
WOW ! they include some great features. Now it is easier to manipulate data. Now it is possible to store more complex data in column.
So MySQL is now flavored with NoSQL.
Now I can imagine a query for JSON data something like
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN
(
SELECT JSON_EXTRACT(data,"$.inverted")
FROM t1 | {"series": 3, "inverted": 8}
WHERE JSON_EXTRACT(data,"$.inverted")<4 );
So can I store huge small relations in few json colum? Is it good? Does it break normalization. If this is possible then I guess it will act like NoSQL in a MySQL column. I really want to know more about this feature. Pros and cons of MySQL JSON data type.
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Using a column inside an expression or function like this spoils any chance of the query using an index to help optimize the query. The query shown above is forced to do a table-scan.
The claim about "efficient access" is misleading. It means that after the query examines a row with a JSON document, it can extract a field without having to parse the text of the JSON syntax. But it still takes a table-scan to search for rows. In other words, the query must examine every row.
By analogy, if I'm searching a telephone book for people with first name "Bill", I still have to read every page in the phone book, even if the first names have been highlighted to make it slightly quicker to spot them.
MySQL 5.7 allows you to define a virtual column in the table, and then create an index on the virtual column.
ALTER TABLE t1
ADD COLUMN series AS (JSON_EXTRACT(data, '$.series')),
ADD INDEX (series);
Then if you query the virtual column, it can use the index and avoid the table-scan.
SELECT * FROM t1
WHERE series IN ...
This is nice, but it kind of misses the point of using JSON. The attractive part of using JSON is that it allows you to add new attributes without having to do ALTER TABLE. But it turns out you have to define an extra (virtual) column anyway, if you want to search JSON fields with the help of an index.
But you don't have to define virtual columns and indexes for every field in the JSON document—only those you want to search or sort on. There could be other attributes in the JSON that you only need to extract in the select-list like the following:
SELECT JSON_EXTRACT(data, '$.series') AS series FROM t1
WHERE <other conditions>
I would generally say that this is the best way to use JSON in MySQL. Only in the select-list.
When you reference columns in other clauses (JOIN, WHERE, GROUP BY, HAVING, ORDER BY), it's more efficient to use conventional columns, not fields within JSON documents.
I presented a talk called How to Use JSON in MySQL Wrong at the Percona Live conference in April 2018. I'll update and repeat the talk at Oracle Code One in the fall.
There are other issues with JSON. For example, in my tests it required 2-3 times as much storage space for JSON documents compared to conventional columns storing the same data.
MySQL is promoting their new JSON capabilities aggressively, largely to dissuade people against migrating to MongoDB. But document-oriented data storage like MongoDB is fundamentally a non-relational way of organizing data. It's different from relational. I'm not saying one is better than the other, it's just a different technique, suited to different types of queries.
You should choose to use JSON when JSON makes your queries more efficient.
Don't choose a technology just because it's new, or for the sake of fashion.
Edit: The virtual column implementation in MySQL is supposed to use the index if your WHERE clause uses exactly the same expression as the definition of the virtual column. That is, the following should use the index on the virtual column, since the virtual column is defined AS (JSON_EXTRACT(data,"$.series"))
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Except I have found by testing this feature that it does NOT work for some reason if the expression is a JSON-extraction function. It works for other types of expressions, just not JSON functions. UPDATE: this reportedly works, finally, in MySQL 5.7.33.
The following from MySQL 5.7 brings sexy back with JSON sounds good to me:
Using the JSON Data Type in MySQL comes with two advantages over
storing JSON strings in a text field:
Data validation. JSON documents will be automatically validated and
invalid documents will produce an error. Improved internal storage
format. The JSON data is converted to a format that allows quick read
access to the data in a structured format. The server is able to
lookup subobjects or nested values by key or index, allowing added
flexibility and performance.
...
Specialised flavours of NoSQL stores
(Document DBs, Key-value stores and Graph DBs) are probably better
options for their specific use cases, but the addition of this
datatype might allow you to reduce complexity of your technology
stack. The price is coupling to MySQL (or compatible) databases. But
that is a non-issue for many users.
Note the language about document validation as it is an important factor. I guess a battery of tests need to be performed for comparisons of the two approaches. Those two being:
Mysql with JSON datatypes
Mysql without
The net has but shallow slideshares as of now on the topic of mysql / json / performance from what I am seeing.
Perhaps your post can be a hub for it. Or perhaps performance is an after thought, not sure, and you are just excited to not create a bunch of tables.
From my experience, JSON implementation at least in MySql 5.7 is not very useful due to its poor performance.
Well, it is not so bad for reading data and validation. However, JSON modification is 10-20 times slower with MySql that with Python or PHP.
Lets imagine very simple JSON:
{ "name": "value" }
Lets suppose we have to convert it to something like that:
{ "name": "value", "newName": "value" }
You can create simple script with Python or PHP that will select all rows and update them one by one. You are not forced to make one huge transaction for it, so other applications will can use the table in parallel. Of course, you can also make one huge transaction if you want, so you'll get guarantee that MySql will perform "all or nothing", but other applications will most probably not be able to use database during transaction execution.
I have 40 millions rows table, and Python script updates it in 3-4 hours.
Now we have MySql JSON, so we don't need Python or PHP anymore, we can do something like that:
UPDATE `JsonTable` SET `JsonColumn` = JSON_SET(`JsonColumn`, "newName", JSON_EXTRACT(`JsonColumn`, "name"))
It looks simple and excellent. However, its speed is 10-20 times slower than Python version, and it is single transaction, so other applications can not modify the table data in parallel.
So, if we want to just duplicate JSON key in 40 millions rows table, we need to not use table at all during 30-40 hours. It has no sence.
About reading data, from my experience direct access to JSON field via JSON_EXTRACT in WHERE is also extremelly slow (much slower that TEXT with LIKE on not indexed column). Virtual generated columns perform much faster, however, if we know our data structure beforehand, we don't need JSON, we can use traditional columns instead. When we use JSON where it is really useful, i. e. when data structure is unknown or changes often (for example, custom plugin settings), virtual column creation on regular basis for any possible new columns doesn't look like good idea.
Python and PHP make JSON validation like a charm, so it is questionable do we need JSON validation on MySql side at all. Why not also validate XML, Microsoft Office documents or check spelling? ;)
I got into this problem recently, and I sum up the following experiences:
1, There isn't a way to solve all questions.
2, You should use the JSON properly.
One case:
I have a table named: CustomField, and it must two columns: name, fields.
name is a localized string, it content should like:
{
"en":"this is English name",
"zh":"this is Chinese name"
...(other languages)
}
And fields should be like this:
[
{
"filed1":"value",
"filed2":"value"
...
},
{
"filed1":"value",
"filed2":"value"
...
}
...
]
As you can see, both the name and the fields can be saved as JSON, and it works!
However, if I use the name to search this table very frequently, what should I do? Use the JSON_CONTAINS,JSON_EXTRACT...? Obviously, it's not a good idea to save it as JSON anymore, we should save it to an independent table:CustomFieldName.
From the above case, I think you should keep these ideas in mind:
Why MYSQL support JSON?
Why you want to use JSON? Did your business logic just need this? Or there is something else?
Never be lazy
Thanks
Strong disagree with some of things that are said in other answers (which, to be fair, was a few years ago).
We have very carefully started to adopt JSON fields with a healthy skepticism. Over time we've been adding this more.
This generally describes the situation we are in:
Like 99% of applications out there, we are not doing things at a massive scale. We work with many different applications and databases, the majority of these are capable of running on modest hardware.
We have processes and know-how in place to make changes if performance does become a problem.
We have a general idea of which tables are going to be large and think carefully about how we optimize queries for them.
We also know in which cases this is not really needed.
We're pretty good at data validation and static typing at the application layer.
Lastly,
When we use JSON for storing complex data, that data is never referenced directly by other tables. We also tend to never need to use them in where clauses in hot paths.
So with all this in mind, using a little JSON field instead of 1 or more tables vastly reduces the complexity of queries and data model. Removing this complexity makes it easier to write certain queries, makes our code simpler and just generally saves time.
Complexity and performance is something that needs to be carefully balanced. JSON fields should not be blindly applied, but for the cases where this works it's fantastic.
'JSON fields don't perform well' is a valid reason to not use JSON fields, if you are at a place where that performance difference matters.
One specific example is that we have a table where we store settings for video transcoding. The settings table has 1 'profile' per row, and the settings themselves have a maximum nesting level of 4 (arrays and objects).
Despite this being a large database overall, there's only a few hundreds of these records in the database. Suggesting to split this into 5 tables would yield no benefit and lots of pain.
This is an extreme example, but we have plenty of others (with more rows) where the decision to use JSON fields is a few years in the past, and hasn't yet caused an issue.
Last point: it is now possible to directly index on JSON fields.
I am developing a pretty big enterprise level data analysis software based on flex-4. I usually need to filter datagrids based on users selection, that requires to run a query on my database. I am wondering if there is any way to filter grid data without sql query? That would take very little time where it's causing me 2-3 minutes delay now.
If you are using ArrayCollection (or other implementation of ICollectionView), take a look at ICollectionView.filterFunction property. You can set it to what you need after user interaction and call ICollectionView.refresh() - all associated grids should automatically show filtered data then.
There are many ways to do this in ActionScript. However, since you use Flex, let's rely on the framework. The feature you are looking for the filterFunction (see the docs):
Given a data object such as {name:"Jo", type:"employee"}, you can filter employees with:
myArrayCollection.filterFunction = function(data:Object):Boolean {
return data.type == "employee";
}
myArrayCollection.refresh();
Your data grid should then be updated accordingly.
Of course, depending on the number of items being present in your list, this might run in a blink of an eye or be horribly slow =)
I have an application that allows users to filter applicants based on very large set of criteria. The criteria are each represented by boolean columns spanning multiple tables in the database. Instead of using active record models I thought it was best to use pure sql and put the bulk of the work in the database. In order to do this I have to construct a rather complex sql query based on the criteria that the users selected and then run it through AR on the db. Is there a better way to do this? I want to maximize performance while also having maintainable and non brittle code at the same time? Any help would be greatly appreciated.
As #hazzit said, it is difficult to answer without much details, but here's my two cents on this. Raw SQL is usually needed to perform complex operations like aggregates, calculations, etc. However, when it comes to search / filtering features, I often find using raw SQL overkill and not quite maintainable.
The key question here is : can you break down your problem in multiple independent filters ?
If the answer is yes, then you should leverage the power of ActiveRecord and Arel. I often find myself implementing something like this in my model :
scope :a_scope, ->{ where something: true }
scope :another_scope, ->( option ){ where an_option: option }
scope :using_arel, ->{ joins(:assoc).where Assoc.arel_table[:some_field].not_eq "foo" }
# cue a bunch of scopes
def self.search( options = {} )
output = relation
relation = relation.a_scope if options[:an_option]
relation = relation.another_scope( options[:another_option] ) unless options[:flag]
# add logic as you need it
end
The beauty of this solution is that you declare a clean interface in which you can directly pour all the params from your checkboxes and fields, and that returns a relation. Breaking the query into multiple, reusable scopes helps keeping the thing readable and maintainable ; using a search class method ties it all together and allows thorough documentation... And all in all, using Arel helps securing the app against injections.
As a side note, this does not prevent you from using raw SQL, as long as the query can be isolated inside a scope.
If this method is not suitable to your needs, there's another option : use a full-fledged search / filtering solution like Sunspot. This uses another store, separate from your db, that indexes defined parts of your data for easy and performant search.
It is hard to answer this question fully without knowing more details, but I'll try anyway.
While databases are bad at quite a few things, they are very good at filtering data, especially when it comes to a high volumes.
If you do the filtering in Ruby on Rails (or just about any other programming language), the system will have to retrieve all of the unfiltered data from the database, which will cause tons of disk I/O and network (or interprocess) traffic. It then has to go through all those unfiltered results in memory, which may be quite a burdon on RAM and CPU.
If you do the filtering in the database, there is a pretty good chance that most of the records will never be actually retrieved from disk, won't be handed over to RoR and won't then be filtered. The main reason for indexes to even exist is for the sole purpose of avoiding expensive operations in order to speed things up. (Yes, they also help maintain data integrity)
To make this work, however, you may need to help the database a bit to do its job efficiently. You will have to create indexes matching your filtering criteria, and you may have to look into performance issues with certain types of queries (how to avoid temporary tables and such). However, it is definately worth it.
Having that said, there actually are a few types of queries that a given database is not good at doing. Those are few and far between, but they do exist. In those cases, an implementation in RoR might be the better way to go. Even without knowing more about your scenario, I'd say it's a pretty safe bet that your queries are not among those.