Query to get last X minutes data with couchbase - couchbase

I want to fetch last X minutes record from the couch base using N1Ql ,
I know how to do this in SQL but in NOSQl with couch base i am not getting any clue . Please suggest me how i can do this.

I want to fetch last X minutes record from the couch base using N1Ql [...]
Do you mean all records added in the past X minutes? If so, according to this post Couchbase does not timestamp automatically:
https://dba.stackexchange.com/questions/77108/couchbase-get-document-creation-date
It's best to add the timestamp yourself, and then it's just a simple task of reading all records with a timestamp in the last X minutes using a N1QL query. https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/datefun.html

Related

Analytics Pattern (Get view per day)

I have a trouble about schema pattern for my project.
I would like to count the view per day for each pages and calculate rank.
I'm working with node.js and MySQL(Store pages)/MongoDB(Store every view with visitors infos)
Is that more efficient, performant to update my table (MySQL) everyday with a cron at midnight (w/ view per day row 0) then query this page or to query an aggregate in mongoDB to get the view of this day (I have the timestamp).
Thank you!
IMHO, MongoDB is not suitable for your case.
It would be better, faster and cheaper to use cache service, where you can increment counter for page.
Take a look at redis. You can implement behaviour like this.
So, the main goal to remove mongo, if it used only for counter.
According to your question about performance, it is super fast to request values stored in redis, because complexity is O(1).

MYSQL and LabVIEW

I have a table with 27 columns and 300,000 rows of data, out of which 8 columns are filled with data 0 or 1 or null. Using LabVIEW I get the total count of each of these columns using the following query:
select
d_1_result,
d_2_value_1_result,
de_2_value_2_result,
d_3_result,
d_4_value_1_result,
d_4_value_2_result,
d_5_result
from Table_name_vp
where ( insp_time between
"15-02-02 06:00:00" and "15-02-02 23:59:59" or
inspection_time between "15-02-03 00:00:00" and "15-02-03 06:00:00")
and partname = "AbvQuene";
This query runs for the number of days the user input, for example 120 days.
I found that total time taken by the query is 8 secs which not good.
I want to reduce the time to 8 millisecs.
I have also changed the engine to Myisam.
Any suggestions to reduce the time consumed by the query. (LabVIEW Processing is not taking time)
It depends on the data, and how many rows out of the 300,000 are actually selected by your WHERE clause. Obviously if all 300,000 are included, the whole table will need to be read. If it's a smaller number of rows, an index on insp_time or inspection_time (is this just a typo, are these actually the same field?) and/or partname might help. The exact index will depend on your data.
Update 2:
I can't see any reason for you wouldn't be able to load your whole DB into memory because it should be less than 60MB. Do you agree with this?
Please post your answer and the answer the following questions (you can edit a question after you have asked it - that's easier than commenting).
Next steps:
I should have mentioned this before, that before you run a query in LabVIEW I would always test it first using your DB admin tool (e.g. MySql Workbench). Please post whether that worked or not.
Post your LabVIEW code.
You can try running your query with less than 300K rows - say 50K and see how much your memory increases. If there's some limitation on how many rows you can query at one time than you can break your giant query into smaller ones pretty easily and then just add up the results sets. I can post an example if needed.
Update:
It sounds like there's something wrong with your schema.
For example, if you had 27 columns of double's and datetimes ( both are 8 bytes each) your total DB size would only be about 60MB (300K * 27 * 8 / 1048576).
Please post your schema for further help (you can use SHOW CREATE TABLE tablename).
8 millisecs is an extremely low time - I assume that's being driven by some type of hardware timing issue? If not please explain that requirement as a typical user requirement is around 1 second.
To get the response time that low you will need to do the following:
Query the DB at the start of your app and load all 300,000 rows into memory (e.g. a LabVIEW array)
Update the array with new values (e.g. array append)
Run the "query" against he array (e.g. using a for loop with a case select)
On a separate thread (i.e. LabVIEW "loop") insert the new records into to the database or do it write before the app closes
This approach assumes that only one instance of the app is running at a time because synchronizing database changes across multiple instances will be very hard with that timing requirement.

Is it faster to do string/date comparisons, or to insert-replace in the database?

Problem
I have a database (SQLite or MySQL) full of rainfall values for every day in the last few years. It is very simple:
date | rain
------------------------
"2014-10-20" 3.3
Each day, I pull in a CSV file from my local meteorology bureau. They only publish CSV files with the entire year's data, no daily/weekly/etc files, so by the end of the year there are 365 rows in the file. Within each row the date is split up into the Year, Month, and Day fields.
So when it comes time to store the info in the database, I have two options.
Solution 1: Do date comparison
I would save the date at which I last ran the program, in either the database or a text file. I parse that date using Date.strptime and store it as last_run_time. Then I load the CSV file with CSV.read('raindata.csv').each do |row|, and for every row, I parse the three date fields as a new Date object with rowdate = Date.strptime("#row[2]}-#{row[3]}-#{row[4]}") and say if rowdate > last_run_time then insert info into database.
This way, I avoid making database calls to insert-or-replace values I already have. By the end of the year this spares me 364 database queries, but it means I do a lot of Date parsing and comparing.
Solution 2: Just let the database handle it
I would avoid all of that, and just say for each row in the CSV, insert or ignore into the database. The date field in the DB is unique so if I try to insert but already have the date, it just ignores the query. Pro: avoid making Date comparisons and parsing, con: as many as 364 unnecessary hits to the database.
Question
Which of these two solutions is the smarter, more efficient, more resource-friendly one? Is it better to make unnecessary database queries and spare the CPU, or vice verca?
Database Handles are the most heavy operations whichever solution has lesser number of query is the best approach.
Parsing and Language function have much much much lesser complexity.. so process inputs in language and lesser queries
Hitting the database is probably 1,000 or 1,000,000 times more expansive than comparing dates. Having said that, it makes no difference because making 364 hits to the database once a day is considered zero load for any practical purposes.
If you need your update script to run as fast as possible, do date comparisons. You take the risk that there will be some bugs and maybe some data will be missed sometime in the future.
If you have the extra few seconds, and you care most about data integrity and simplicity update the whole thing daily.

Rails - how to search through a big data collection and display them in a few-second-intervals?

I have a database of hundreds of thousands records that I fetch from database and counting a geographic distance between these records. The problem is that this search takes like 15 - 20 seconds, so I am trying to speed it up.
I think I can't do more with indexing of columns as I am grabbing the whole database table. The most time consume to count the geographic distance (through longitude and latitude). I don't know if there's a way to speed up this computation.
Because this task is - in my mind - almost the same like searching fly tickets, where you set a "from city" point to a "destination city" point and the search engine will gradually display found results to the user in time interval, like:
it displays some results
in line 2 seconds it will add another computed results
in another 2 seconds another computed results
and so on
I think this way of displaying results would be the best for my case - however, how this engine works? How can one make the search that will like every 2 seconds display another and another new results?
As the application is written in Ruby on Rails, for this kind of search would be:
AJAX
delayed_job
Possibly something else yet?
Or am I thinking about this problem in a wrong way and is there a better one to solve it?
Thank you.

Reliably select from a database table at fixed time intervals

I have a fairly 'active' CDR table I want to select records from it every say 5 minutes for those last 5 minutes. The problem is it has a SHA IDs generated on a few of the other columns so all I have to lean on is a timestamp field by which I filter by date to select the time window of records I want.
The next problem is that obviously I cannot guarantee my script will run on the second precisely every time, or that the wall clocks of the server will be correct (which doesn't matter) and most importantly there almost certainly will be more than one record per second say 3 rows '2013-08-08 14:57:05' and before the second expired one more might be inserted.
By the time for '2013-08-08 14:57:05' and get records BETWEEN '2013-08-08 14:57:05' AND '2013-08-08 15:02:05' there will be more records for '2013-08-08 14:57:05' which I would have missed.
Essentially:
imprecise wall clock time
no sequential IDs
multiple records per second
query execution time
unreliable frequency of running the query
Are all preventing me from getting a valid set of rows in a specified rolling time window. Any suggestions for how I can go around these?
If you are using the same clock then i see no reason why things would be wrong. a resolution you would want to consider is a datetime table. So that way, every time you updated the start and stop times based on the server time.... then as things are added it would be guarenteed to be within that timeframe.
I mean, you COULD do it by hardcoding, but my way would sort of forcibly store a start and stop point in the database to use.
I would use Cron to handle the intervals and timing somewhat. Not use the time from that, but just to not lock up the database by checking all the time.
I probably not got all the details but to answer to your question title "Reliably select from a database table at fixed time intervals"...
I don't think you could even hope for a query to be run at "second precise" time.
One key problem with that approach is that you will have to deal with concurrent access and lock. You might be able to send the query at fixed time maybe, but your query might be waiting on the DB server for several seconds (or being executed seeing fairly outdated snapshot of the db). Especially in your case since the table is apparently "busy".
As a suggestion, if I were you, I would spend some time to think about queue messaging systems (like http://www.rabbitmq.com/ just to cite one, not presaging it is somehow "your" solution). Anyway those kind of tools are probably more suited to your needs.