Running a quartz job in grails? - mysql

I'm very new to Grails. I have a table like this :
+----+---------+----------------+----------------+-------------+--------------------+--------------+--------+---------------------+
| id | version | card_exp_month | card_exp_year | card_number | card_security_code | name_on_card | txn_id | date_created |
+----+---------+----------------+----------------+-------------+--------------------+--------------+--------+---------------------+
| 9 | 0 | ASdsadsd | Asdsadsadasdas | Asdsa | | batman | asd | 2012-08-13 19:38:22 |
+----+---------+----------------+----------------+-------------+--------------------+--------------+--------+---------------------+
In mysql. I wish to run a Quartz job against this table, which will compare, date_created Time stamp with present time such that, if any field's there with timestamp less than 30 minutes should be deleted.
How can I do this?

you could define a Job implementing your logic ( in the execute() method, check (date_created - now) < 30 minutes or else delete the row in the database) and then trigger this job on a regular basis.
You can read the documentation http://quartz-scheduler.org/documentation/quartz-2.1.x/cookbook or have a look at the examples : http://svn.terracotta.org/svn/quartz/branches/quartz-2.2.x/examples/src/main/java/org/quartz/examples/example1/

Check this example for grails quartz:
http://www.juancarlosfernandez.net/2012/02/como-crear-un-proceso-quartz-en-grails.html

Related

MySQL: selecting dates (from timestamp) for which condition (related to other fields in the row) is fulfilled

My SQL knowledge is rather weak and I come from procedural programming, so bear with me. I have a database that contains data from a weather station - these are collected each minute and the (important part of the) table is
MariaDB [weather]> describe readings;
+------------------+------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------+------+-----+-------------------+-------+
| time | timestamp | NO | PRI | CURRENT_TIMESTAMP | |
| inside_temp | float | YES | | NULL | |
| outside_temp | float | YES | | NULL | |
+------------------+------------+------+-----+-------------------+-------+
I want to find all days where the outside_temp was not lower and not larger than some values.
I can code it externally using MySQL for queries like
select min(outside_temp), max(outside_temp) from readings where date(time)='2022-01-27';
and iterating over all days in the database to check temperature values for each day separately, but I wonder if it is possible to do the selection just using MySQL command (I suppose it is, just beyond my imagination).
Something like select date(time), min(outside_temp), max(outside_temp) from readings group by date(time); would give you all timestamps that meet the requirements

Implementing an enrichment using Spark with MySQL is bad idea?

I am trying to build one giant schema that makes data users to query easier, in order to achieve that, streaming events have to be joined with User Metadata by USER_ID and ID. In data engineering, This operation is called "Data Enrichment" right? the tables below are the example.
# `Event` (Stream)
+---------+--------------+---------------------+
| UERR_ID | EVENT | TIMESTAMP |
+---------+--------------+---------------------+
| 1 | page_view | 2020-04-10T12:00:11 |
| 2 | button_click | 2020-04-10T12:01:23 |
| 3 | page_view | 2020-04-10T12:01:44 |
+---------+--------------+---------------------+
# `User Metadata` (Static)
+----+-------+--------+
| ID | NAME | GENDER |
+----+-------+--------+
| 1 | Matt | MALE |
| 2 | John | MALE |
| 3 | Alice | FEMALE |
+----+-------+--------+
==> # Result
+---------+--------------+---------------------+-------+--------+
| UERR_ID | EVENT | TIMESTAMP | NAME | GENDER |
+---------+--------------+---------------------+-------+--------+
| 1 | page_view | 2020-04-10T12:00:11 | Matt | MALE |
| 2 | button_click | 2020-04-10T12:01:23 | John | MALE |
| 3 | page_view | 2020-04-10T12:01:44 | Alice | FEMALE |
+---------+--------------+---------------------+-------+--------+
I was developing this using Spark, and User Metadata is stored in MySQL, then I realized it would be waste of parallelism of Spark if the spark code includes joining with MySQL tables right?
The bottleneck will be happening on MySQL if traffic will be increased I guess..
Should I store those table to key-value store and update it periodically?
Can you give me some idea to tackle this problem? How you usually handle this type of operations?
Solution 1 :
As you suggested you can keep a local cache copy of in key-value pair on your local and updated the cache as regular interval.
Solution 2 :
You can use a MySql to Kafka Connector as below,
https://debezium.io/documentation/reference/1.1/connectors/mysql.html
For every DML or table alter operations on your User Metadata Table there will be a respective event fired to a Kafka topic (e.g. db_events). You can run a thread in parallel in your Spark streaming job which polls db_events and updates your local cache key-value.
This solution would make your application a near-real time application in true sense.
One over head I can see is that there will be need to run a Kafka Connect service with Mysql Connector (i.e. Debezium) as a plugin.

Selecting row data and a scaler in SQL

I have a job table, where each job has some metrics like cost, time taken, etc. I'd like to select information for a set of jobs, like the requestor and job action, and in addition to that row data, select some high-level metrics (min cost, max cost, min time taken, etc.).
The data changes frequently, so I'd like to get this information in a single select. Is it possible to do this? I'm not sure if this is conceptually possible because the DB would have to return row-level data along with aggregate data.
Right now I can get all the details and calculate the min/max, something like this:
select requestor, action, cost, time_taken from job;
But then I have to write code to find the min/max and this query has to download all the cost/time data when I am really only interested in the min/max. I really want to do something like
select (min(cost), max(cost), min(time_taken), max(time_taken)), (requestor, action) from job;
And get the aggregate data first, and then the row level data. Is this possible? (On a real server this is on MySQL, but for dev I locally use sqlite so it'd be nice if it worked there too, but not required).
The table looks something like this:
+----+-----------+--------+------+------------+
| id | requestor | action | cost | time_taken |
+----+-----------+--------+------+------------+
| 1 | 31233 | sync | 8 | 423.3 |
+----+-----------+--------+------+------------+
| 2 | 11229 | read | 1 | 1.3 |
+----+-----------+--------+------+------------+
| 3 | 1434 | edit | 5 | 152.8 |
+----+-----------+--------+------+------------+
| 4 | 101781 | sync | 12 | 712.1 |
+----+-----------+--------+------+------------+
I'd like to get back the stats:
min/max cost: 1/12
min/max time_taken: 1.3/712.1
and all the requestors and actions:
+-----------+--------+
| requestor | action |
+-----------+--------+
| 31233 | sync |
+-----------+--------+
| 11229 | read |
+-----------+--------+
| 1434 | edit |
+-----------+--------+
| 101781 | sync |
+-----------+--------+
Do you just want aggregation?
select requestor, action, min(cost), max(cost), min(time_taken), max(time_taken),
from job
group by requestor, action;

Whats the best way to get MySQL data into Date Related Groupings without crushing our db?

I have a few tables related to an app of ours in a database that needs to be lumped into buckets to help compare one source from another.
Example, we have an app install table with a source, and a timestamp
Then we have an uninstall table with a app id.
We need to be able to basically get data into a grouping of "0-7";"7-14";"15-30";"30-60" days of age.
Then select from there the amount of ininstalls that happen in similar fashion. First week, second week, second half of month, second month.
Its not so bad if we only have 50-100k installs, however when we throw in app activity in the mix, to see if that bucket did a certain action, our actions table is in themillions, and the world ends.
Is there a way we can do this with MySQL, or is it just not practical?
It almost seems easier to setup a server side script to process each row individually into a rollup table.
Install
| App ID | Timestamp | Source
--------------------------------------------------------
| foo-1 | 2015-11-23 03:49:12 | Google
| foo-2 | 2015-12-23 03:49:12 | Facebook
| foo-3 | 2015-12-31 01:10:01 | Google
Purchase:
| App ID | Timestamp | Amount
--------------------------------------------------------
| foo-1 | 2015-11-26 05:49:12 | $10.00
| foo-1 | 2015-12-27 03:49:12 | $5.00
Uninstall:
| App ID | Timestamp
--------------------------------------------------------
| foo-2 | 2015-12-15 05:49:12
Report: (FP = First Purchase, U = Uninstall)
| Source | Total Installs | FP 0-14d | FP in 15-30 | FP in 30-60 | U in 0-14d | U in 15-30
Google | 2 | 1 | - | - | - | -
Facebook | 1 | - | - | - | 1 | -

Different value counts on same column using LIKE

I have a database like below
+------------+---------------------------------------+--------+
| sender | subject | day |
+------------+---------------------------------------+--------+
| Darshana | Re: [Dev] [Platform] Build error | Monday |
| Dushan A | (MOLDOVADEVDEV-49) GREG Startup Error | Monday |
+------------+---------------------------------------+--------+
I want to get the result using the above table. It should check if the subject contains the given word then add one to the that word column for a given day.
|Day | "Dev" | "startup"|
+---------+------------+----------+
| Monday | 1 | 2 |
| Friday | 0 | 3 |
I was thought of using DECODE function but I couldn't get the expected result.
You can do this with conditional aggregation:
select day, sum(subject like '%Dev%') as Dev,
sum(subject like '%startup%') as startup
from table t
group by day;