How to schedule Laravel 5 job to get data from external JSON file, and store value in database? - json

I'm currently working on a project in Laravel, and I want to schedule a job that grabs a value (the price of Bitcoin) from an external API (JSON file) and stores this value in my database every few minutes.
So far, I have created a job using the artisan command: artisan make:job UpdateBitcoinMarketPrice. But I've no idea what to include in my public function handle() in side of the Job class that was created.
I have fathomed that I can call this job regularly from App\Console\Kernel.php with the following function:
protected function schedule(Schedule $schedule){
// $schedule->command('inspire')
// ->hourly();
$schedule->job(new UpdateBitcoinMarketPrice)->everyFiveMinutes();}
Should I, for example, create a new Model that stores said value? Then create a new Object every-time this run?
Should I then call the first row of the database should I wish to return the value?

Job classes are very simple, normally containing only a handle() method which is called when the job is processed by the queue. You can use the contructor to inject any parameter or serialize a model so you can use it in your handle method.
So to keep it bold you can make the api call on the handle method and store the response in the databse. Knowing that this is going to fire the api call as a background job.
Something along the lines of:
public function __construct(User $user)
{
//In this case Laravel serilizes the User model example so you could use it on your background job.
//This can be anything that you need in order to make the call
$this->user = $user;
}
//Injecting as an example ExtrernalServieClass or Transformer(to transform api response).
public function handle(ExternalServiceClass $service, Transformer $transform)
{
//Here you can make the call to the api.
//Get the response parse it
// Store to database
$response = $service->postRequest($someUri, $someParams);
$parsedResponse = $transform->serviceResponse($response);
DatabaseModel::firstOrCreate($parsedResponse);
}
}
The handle method is called when the job is processed by the queue. Note that you are able to type-hint dependencies on the handle method of the job, like in the example above. The Laravel service container automatically injects these dependencies.
Now since you are going to run the job everyFiveMinutes() you have to be careful since if the previous job is not completed by default, scheduled tasks will be run even if the previous instance of the task is still running.
To prevent this, you may use the withoutOverlapping method:
$schedule->job(new UpdateBitcoinMarketPrice)->everyFiveMinutes()->>withoutOverlapping();

Related

When is the first good moment to communicate between services

I have a feather service where every record must have a unique integer number. So I need a global counter for this.
I thought the correct way to do this is to create another service named counter, where I can store that counter. Then I tried to access that service in the constructor and failed. I assume it is not ready yet. So when is a good time to access it?
I want to copy the current counter to the service instance, so I can keep in a synchronous way and dont have doubles when having two concurrent requests.
Also when I use setTimeout(() => this.prepare(), 0) to deley the setup which seems to work I realize that the this passed to async create() is not the same this used in the constructor. When doing setTimeout(() => this.foo = "test", 0) then this.foo will not be test inside async create() later. Why is that?

using Async inside a transaction in Spring application

I have a Spring application which updates particular entity details in MySQL DB using a #Transactional method, And within the same method, I am trying to call another endpoint using #Async which is one more Spring application which reads the same entity from MySql DB and updates the value in redis storage.
Now the problem is, every time I update some value for the entity, sometimes its updated in redis and sometimes it's not.
When I tried to debug I found that sometimes the second application when it reads the entity from MySql is picking the old value instead of updated value.
Can anyone suggest me what can be done to avoid this and make sure that second application always picks the updated value of that entity from Mysql?
The answer from M. Deinum is good but there is still another way to achieve this which may be simpler for you case, depending on the state of your current application.
You could simply wrap the call to the async method in an event that will be processed after your current transaction commits so you will read the updated entity from the db correctly every time.
Is quite simple to do this, let me show you:
import org.springframework.transaction.annotation.Transactional;
import org.springframework.transaction.support.TransactionSynchronization;
import org.springframework.transaction.support.TransactionSynchronizationManager;
#Transactional
public void doSomething() {
// application code here
// this code will still execute async - but only after the
// outer transaction that surrounds this lambda is completed.
executeAfterTransactionCommits(() -> theOtherServiceWithAsyncMethod.doIt());
// more business logic here in the same transaction
}
private void executeAfterTransactionCommits(Runnable task) {
TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() {
public void afterCommit() {
task.run();
}
});
}
Basically what happens here is that we supply an implementation for the current transaction callback and we only override the afterCommit method - there are others methods there that might be useful, check them out. And to avoid typing the same boilerplate code if you want to use this in other parts or simply make the method more readable I extracted that in a helper method.
The solution is not that hard, apparently you want to trigger and update after the data has been written to the database. The #Transactional only commits after the method finished executing. If another #Async method is called at the end of the method, depending on the duration of the commit (or the actual REST call) the transaction might have committed or not.
As something outside of your transaction can only see committed data it might see the updated one (if already committed) or still the old one. This also depends on the serialization level of your transaction but you generally don't want to use an exclusive lock on the database for performance reason.
To fix this the #Async method should not be called from inside the #Transactional but right after it. That way the data is always committed and the other service will see the updated data.
#Service
public class WrapperService {
private final TransactionalEntityService service1;
private final AsyncService service2;
public WrapperService(TransactionalEntityService service1, AsyncService service2) {
this.service1=service1;
this.service2=service2;
}
public updateAndSyncEntity(Entity entity) {
service1.update(entity); // Update in DB first
service2.sync(entity); // After commit trigger a sync with remote system
}
}
This service is non-transactional and as such the service1.update which, presumable, is #Transactional will update the database. When that is done you can trigger the external sync.

REST: Updating multiple records

I need to update multiple records using a single HTTP request. An example is selecting a list of emails and marking them as 'Unread'. What is the best (Restful) way to achieve this?
The way I doing right now is, by using a sub resource action
PUT http://example.com/api/emails/mark-as-unread
(in the body)
{ids:[1,2,3....]}
I read this site - http://restful-api-design.readthedocs.io/en/latest/methods.html#actions - and it suggests to use an "actions" sub-collection. e.g.
POST http://example.com/api/emails/actions
(in the body)
{"type":"mark-as-unread", "ids":[1,2,3....]}
Quotes from the referenced webpage:
Sometimes, it is required to expose an operation in the API that inherently is non RESTful. One example of such an operation is where you want to introduce a state change for a resource, but there are multiple ways in which the same final state can be achieved, ... A great example of this is the difference between a “power off” and a “shutdown” of a virtual machine.
As a solution to such non-RESTful operations, an “actions” sub-collection can be used on a resource. Actions are basically RPC-like messages to a resource to perform a certain operation. The “actions” sub-collection can be seen as a command queue to which new action can be POSTed, that are then executed by the API. ...
It should be noted that actions should only be used as an exception, when there’s a good reason that an operation cannot be mapped to one of the standard RESTful methods. ...
Create an algorithm-endpoint, like
http://example.com/api/emails/mark-unread
bulk-update is an algorithm name, a noun. It gets to be the endpoint name in REST, the list of ids are arguments to this algorithm. Typically people send them as URL query arguments in the POST call like
http://example.com/api/emails/mark-unread?ids=1,2,3,4
This is very safe, as POST is non-idempotent and you need not care about any side effects. You might decide differently and if your bulk update carries entire state of such objects opt for PUT
http://example.com/api/emails/bulk-change-state
then you would have to put the actual state into the body of the http call.
I'd prefer a bunch of simple algo like mark-unread?ids=1,2,3,4 rather than one monolithic PUT as it helps with debugging, transparent in logs etc
It a bit complicated to get array of models into an action method as argument. The easiest approach is to form a json string from your client and POST all that to the server (to your action mehtod). You can adopt the following approach
Say your email model is like this:
public class Email
{
public int EmailID {get; set;}
public int StatusID {get; set;}
// more properties
}
So your action method will take the form:
public bool UpdateAll(string EmailsJson)
{
Email[] emails = JsonConvert.DeserializeObject<Emails[]>(EmailsJson);
foreach(Email eml in emails)
{
//do update logic
}
}
Using Json.NET to help with the serialization.
On the client you can write the ajax call as follows:
$.ajax({
url: 'api/emailsvc/updateall',
method: 'post',
data: {
EmailsJson: JSON.stringify([{
ID: 1,
StatusID:2,
//...more json object properties.
},
// more json objects
])
},
success:function(result){
if(result)
alert('updated successfully');
});

Returning values from InputFormat via the Hadoop Configuration object

Consider a running Hadoop job, in which a custom InputFormat needs to communicate ("return", similarly to a callback) a few simple values to the driver class (i.e., to the class that has launched the job), from within its overriden getSplits() method, using the new mapreduce API (as opposed to mapred).
These values should ideally be returned in-memory (as opposed to saving them to HDFS or to the DistributedCache).
If these values were only numbers, one could be tempted to use Hadoop counters. However, in numerous tests counters do not seem to be available at the getSplits() phase and anyway they are restricted to numbers.
An alternative could be to use the Configuration object of the job, which, as the source code reveals, should be the same object in memory for both the getSplits() and the driver class.
In such a scenario, if the InputFormat wants to "return" a (say) positive long value to the driver class, the code would look something like:
// In the custom InputFormat.
public List<InputSplit> getSplits(JobContext job) throws IOException
{
...
long value = ... // A value >= 0
job.getConfiguration().setLong("value", value);
...
}
// In the Hadoop driver class.
Job job = ... // Get the job to be launched
...
job.submit(); // Start running the job
...
while (!job.isComplete())
{
...
if (job.getConfiguration().getLong("value", -1))
{
...
}
else
{
continue; // Wait for the value to be set by getSplits()
}
...
}
The above works in tests, but is it a "safe" way of communicating values?
Or is there a better approach for such in-memory "callbacks"?
UPDATE
The "in-memory callback" technique may not work in all Hadoop distributions, so, as mentioned above, a safer way is, instead of saving the values to be passed back in the Configuration object, create a custom object, serialize it (e.g., as JSON), saved it (in HDFS or in the distributed cache) and have it read in the driver class. I have also tested this approach and it works as expected.
Using the configuration is a perfectly suitable solution (admittedly for a problem I'm not sure I understand), but once the job has actually been submitted to the Job tracker, you will not be able to amend this value (client side or task side) and expect to see the change on the opposite side of the comms (setting configuration values in a map task for example will not be persisted to the other mappers, nor to the reducers, nor will be visible to the job tracker).
So to communicate information back from within getSplits back to your client polling loop (to see when the job has actually finished defining the input splits) is fine in your example.
What's your greater aim or use case for using this?

How can I defind that object set was already created?

I'm working with entity framework and mysql. We created a class
public class DataBaseContext : ObjectContext, IDbContext
There is a method
public IEnumerable<T> Find<T>(Func<T, bool> whereClause) where T : class
{
return CreateObjectSet<T>().Where(whereClause);
}
Is there a way not to create ObjectSet every time when I call the method? Can I check that it is already exists?
Whooooo. That is so bad method. You are passing Func<>, not Expression<Func<>>. It means that every time you execute your method EF will pull all records from database table mapped to T and execute your filtering in memory of your application - creating object set is the last thing you should be afraid of.
Anyway creating object set should not be expensive operation and if you don't want to create it every time you need to implement some "local caching" inside your object context instance.