I know the couchbase map/reduce can be used to group by using group_level but I want group the data and not just want the count.Here is the scenario
Here is the data in couchbase
{"key":"key1","value":"value1"}
{"key":"key2","value":"value2"}
{"key":"key1","value":"value3"}
Now I want, is
{key:"key1","value":[value1,value3]}
{key:"key2","value":value2}
Is there a way I can achieve this thru map/reduce functionality in couchbase?
I am not sure I understand exactly what you want to do. The compound key documentation might help you: http://docs.couchbase.com/developer/dev-guide-3.0/compound-keys.html
Related
Lets say I have the following "expenses" MySQL Table:
id
amount
vendor
tag
1
100
google
foo
2
450
GitHub
bar
3
22
GitLab
fizz
4
75
AWS
buzz
I'm building an API that should return expenses based on partial "vendor" or "tag" filters, so vendor="Git" should return records 2&3, and tag="zz" should return records 3&4.
I was thinking of utilizing elasticsearch capabilities, but I'm not sure the correct way..
most articles I read suggest replicating the table records (using logstash pipe or other methods) to elastic index.
So my API doesn't even query the DB and return an array of documents directly from ES?
Is this considered good practice? replicating the whole table to elastic?
What about table relations... What If I want to filter by nested table relation?...
So my API doesn't even query the DB and return an array of documents
directly from ES?
Yes, As you are doing query to elasticsearch, you will get result only from Elasticsearch. Another way is, just get id from Elasticsearch and use id to retrive documeents from MySQL, but this might impact response time.
Is this considered good practice? replicating the whole table to
elastic? What about table relations... What If I want to filter by
nested table relation?...
It is not about good practice or bad practice, it is all about what type of functionality and use case you want to implement and based on that technology stack can be used and data can be duplicated. There is lots of company using Elasticsearch as secondary data source where they have duplicated data just because there usecase is best fit with Elasticsearh or other NoSQL db.
Elasticsearch is NoSQL DB and it is not mantain any relationship between data. Hence, you need to denormalize your data before indexing to the Elasticsearch. You can read this article for more about denormalizetion and why it is required.
ElasticSearch provide Nested and Join data type for parent child relationship but both have some limitation and performance impact.
Below is what they have mentioned for join field type:
The join field shouldn’t be used like joins in a relation database. In
Elasticsearch the key to good performance is to de-normalize your data
into documents. Each join field, has_child or has_parent query adds a
significant tax to your query performance. It can also trigger global
ordinals to be built.
Below is what they have mentioned for nested field type:
When ingesting key-value pairs with a large, arbitrary set of keys,
you might consider modeling each key-value pair as its own nested
document with key and value fields. Instead, consider using the
flattened data type, which maps an entire object as a single field and
allows for simple searches over its contents. Nested documents and
queries are typically expensive, so using the flattened data type for
this use case is a better option.
most articles I read suggest replicating the table records (using
logstash pipe or other methods) to elastic index.
Yes, You can use logstash or any language client like java, python etc, to sync data from DB to Elasticsearch. You can check this SO answer for more information on this.
Your Search Requirements
If you go ahead with Elasticsearch then you can use N-Gram Tokenizer or Regex Query and achieve your search requirements.
Maybe you can try TiDB: https://medium.com/#shenli3514/simplify-relational-database-elasticsearch-architecture-with-tidb-c19c330b7f30
If you want to scale your MySQL and have fast filtering and aggregating, TiDB could simplify the architecture and reduce development work.
I am facing a problem here I am querying Couchbase using Nickel and my query is
SELECT *
FROM `user`
USE INDEX (ord_ts_new_idx USING GSI)
WHERE META(`user`).id LIKE 'ord::27::%'
ORDER BY ts DESC
OFFSET 0 LIMIT 5;
But here the value that I am getting is not updated, But If I make the same request after sometime It gives me the desired output.
The query that I have used for making the INDEX is
CREATE INDEX ord_ts_new_idx ON `user-account`(`ts`) USING GSI;
where ts is the TimeStamp.
So could you please tell me if there is a way in which, I can get the updated data always?
Thanks in advance. Any type of help is appreciated.
You do not mention which SDK you are using, but client SDK are you using? N1QL provides a scan_consistency parameter, so it's a matter of making sure your client SDK uses this. So go here and find your language of choice. For example, here is the Java SDK section, look under "read your own write."
Just be forewarned that by doing this for everything, you could very well take a performance penalty as the index will need to be refreshed before serving your results. So make sure you test this please.
I'm looking for recommended indexes for spring batch tables,
specially when using the API:
jobExplorer.findRunningJobExecutions(jobName);
any one?
Assuming you are using the JdbcJobExecutionDao implementation for the SimpleJobExplorer then the only columns involved in the query for the where clause are as follows:
JOB_EXECUTION.JOB_INSTANCE_ID
JOB_EXECUTION.END_TIME
JOB_INSTANCE.JOB_INSTANCE_ID
JOB_INSTANCE.JOB_NAME
And the order by is using: JOB_EXECUTION.JOB_EXECUTION_ID
You can take a look at the source for JdbcJobExecutionDao for the actual queries. As for your question about what indexes to create. JOB_INSTANCE_ID seems to be a good candidate in this case.
I have a question about making the decision whether to use MySQL database or Mongo database, the problem with my decision is that I am highly depending on these things:
I want to select records between two dates (period)
However is this possible?
My Application won't do any complex queries, just basic crud. It has Facebook integration so sometimes I got to JOIN the users table at the current setup.
Either DB will allow you to filter between dates and I wouldn't use that requirement to make the decision. Some questions you should answer:
Do you need to store your data in a relational system, like MySQL? Relational databases are better at cross entity joining.
Will your data be very complicated, but you will only make simple queries (e.g. by an ID), if so MongoDB may be a better fit as storing and retrieving complex data is a cinch.
Who and where will you be querying the data from? MySql uses SQL for querying, which is a much more well known skill than mongo's JSON query syntax.
These are just three questions to ask. In order to make a recommendation, we'll need to know more about your application?
MySQL(SQL) or MongoDB(NoSQL), both can work for your needs. but idea behind using RDBMS/NoSQL is the requirement of your application
if your application care about speed and no relation between the data is necessary and your data schema changes very frequently, you can choose MongoDB, faster since no joins needed, every data is a stored as document
else, go for MySQL
If you are looking for range queries in MongoDB - yes, Mongo supports those. For date-based range queries, have a look at this: http://cookbook.mongodb.org/patterns/date_range/
Knowing full well that my InnoDB tables don't support FULLTEXT searches, I'm wondering what my alternatives are for searching text in tables ? Is the performance that bad when using LIKE ?
I see a lot of suggestions saying to make a copy of the InnoDB table in question in a MYISAM table, and then run queries against THAT table and match keys between the two and I just don't know that that's a pretty solution.
I'm not opposed to using some 3rd party solution, I'm not a huge fan of that though. I'd like to explore more of what MySQL can do on its own.
Thoughts ?
If you want to do it right you probably should go with Lucene or Sphinx from the very start.
it will allow you to keep your table structure.
you'll have a huge performance boost (think ahead)
you'll get access to a lot of fancy search functions
Both Lucene and Sphinx scale amazingly well (Lucene powers Wikipedia and Digg / Sphinx powers Slashdot)
Using LIKE can only use an index when there is no leading %. It will be a huge performance hit to do LIKE '%foo%' on a large table. If I were you, I'd look into using sphinx. It has the ability to build its index by slurping data out of MySQL using a query that you provide. It's pretty straightforward and was designed to solve your exact problem.
There's also solr which is an http wrapper around lucene, but I find sphinx to be a little more straightforward.
I as others have i would urge use of Lucene, Sphinx or Solr.
However if these are out and your requirements are simple I've used the steps here to build simple search capability on a number projects in the past.
That link is for Symfony/PHP but you can apply the concepts to any language and application structure assuming there is an implementation of a stemming algorithm available. However, if you dont use a data access pattern where you can hook in to update the index when a record is updated its not as easily doable.
Also a couple downsides are that if you want a single index table but need to index multiple tables you either have to emulate referential integrity in your DAL, or add a fk column for each different table you want to index. Im not sure what youre trying to do so that may rule it out entirely.