I know that it's possible to do a manual import of data from an Excel file to a MySQL database.
But is it possible to make this an automated job, a job to be done every 24 hours or 60 seconds?
In general you should be looking at your operating system's feature set and not at MySQL if you want a certain task to be executed on a periodic basis.
On Unix systems, you can have a look at cron jobs.
On Windows, there's the Task Scheduler.
Concept
The idea is that you have a script (or something else that you can execute) that does exactly what you want once. Then tell your operating system by the means of a cron job, a task or similar to run that script every 24 hours, 60 seconds or whatever you like.
Example
Run the script (or program) excel_to_db.sh every 24 hours:
0 0 * * * /home/user/scripts/excel_to_db.sh
You might want to extend this cron job to redirect the script output (stdout and stderr) to a file and so on to make this more robust.
Related
I would like to delay deletion of data from the database. I am using MySQL, nest.js. I heard that CRON is what I need. I want to delete the entry in a week. Can you help me with this? CRON is what I need, or i need to use something another?
A cron job (or at in Windows) or a MySQL EVENT can be created to periodically check for something and take action. The resolution is only 1 minute.
If you need a very precise resolution, another technique would be required. For example, if you don't want to show a user something that is more than 1 week old to the second, then simply exclude that from the SELECT. That is add something like this to the WHERE: AND created_date >= NOW() - INTERVAL 7 DAY.
Doing the above gives you the freedom to schedule the actual DELETE for only, say, once a day -- rather than pounding on the database only to usually find nothing to do.
If you do choose to "pound on the database", be aware of the following problem. If one instance of the deleter script is running for a long time (for any of a number of reasons), it might not be finished before the next copy comes along. In some situations these scripts can stumple over each other to the extent of effectively "crashing" the server.
That leads to another solution -- a single script that runs forever. It has a simple loop:
Do the actions needed (deleting old rows)
Sleep 1 -- or 10 or 60 or whatever -- this is to be a "nice guy" and not "pound on the system".
The only tricky part is making sure that starts up after any server restart or crash of the script.
You can configure a cronjob to periodically delete it.
There are several ways to configure a cron job.
You can write a shell script that periodically deletes entities in the db using linux crontab, or you can configure an application that provides cronjobs such as jenkins or airflow.
AWS lambda also provides cronjob.
Using crontab provided by nestjs seems to be the simplest to solve the problem.
See this link
https://docs.nestjs.com/techniques/task-scheduling
I've got an SSIS job that pulls GL transactions from a Jade database via ODBC. If I have a large date range of transactions, I get a read timeout from Jade. Is there a way to structure an SSIS job so that it would pull a few days at a time in separate reads from the source, so that it avoids this timeout. I'm using a for loop and only asking for a few days at a time but it fails, so I've obviously not avoided the issue.
To be clear we're going to up the timeout on the server from 3 to 10 minutes. We won't use a 0 timeout for obvious reasons but there is always a chance that if we need to pull a large range of data for a new project we'd hit whatever reasonable timeout we set.
I'm looking for a way to structure the job to incrementally pull smaller ranges until it's complete.
I found the timeout setting, it's on the Jade Dataflow Component, not on the ODBC connection object where I thought it would be. I probably put a value in there when I created the package about 18 months ago, and forgot about it.
The hardware, infrastructure, and redundancy are not in the scope of this question.
I am building an SSIS ETL solution needs to import ~600,000 small, simple files per hour. With my current design, SQL Agent runs the SSIS package, and it takes ānā number of files and processes them.
Number of files per batch ānā is configurable
The SQL Agent SSIS package execution is configurable
I wonder if the above approach is a right choice? Or alternatively, I must have an infinite loop in the SSIS package and keep taking/processing the files?
So the question boils down to a choice between infinite loop vs. batch+schedule. Is there any other better option?
Thank you
In a similar situation, I run an agent job every minute and process all files present. If the job takes 5 minutes to run because there are alot of files, the agent skips the scheduled runs until the first one finishes so there is no worry that two processes will conflict with each other.
Is SSIS the right tool?
Maybe. Let's start with the numbers
600000 files / 60 minutes = 10,000 files per minute
600000 files / (60 minutes * 60 seconds) = 167 files per second.
Regardless of what technology you use, you're looking at some extremes here. Windows NTFS starts to choke around 10k files in a folder so you'll need to employ some folder strategy to keep that count down in addition to regular maintenance
In 2008, the SSIS team managed to load 1TB in 30 minutes which was all sourced from disk so SSIS can perform very well. It can also perform really poorly which is how I've managed to gain ~36k SO Unicorn points.
6 years is a lifetime in the world of computing so you may not need to take such drastic measures as the SSIS team did to set their benchmark but you will need to look at their approach. I know you've stated the hardware is outside of the scope of discussion but it very much is inclusive. If the file system (san, nas, local disk, flash or whatever) can't server 600k files then you'll never be able to clear your work queue.
Your goal is to get as many workers as possible engaged in processing these files. The Work Pile Pattern can be pretty effective to this end. Basically, a process asks: Is there work to be done? If so, I'll take a bit and go work on it. And then you scale up the number of workers asking and doing work. The challenge here is to ensure you have some mechanism to prevent workers from processing the same file. Maybe that's as simple as filtering by directory or file name or some other mechanism that is right for your situation.
I think you're headed down this approach based on your problem definition with the agent jobs that handle N files but wanted to give your pattern a name for further research.
I would agree with Joe C's answer - schedule the SQL Agent job to run as frequently as needed. If it's already running, it won't spawn a second process. Perhaps you're going to have multiple agents that all start every minute - AgentFolderA, AgentFolderB... AgentFolderZZH and they are each launching a master package that then has subprocesses looking for work.
Use WMI Event viewer watcher to know if new file arrived or not and next step you can call job scheduler to execute or execute direct the ssis package.
More details on WMI event .
https://msdn.microsoft.com/en-us/library/ms141130%28v=sql.105%29.aspx
I'm developing an extension for mediawiki. My extension needs to execute some database updating periodically (e.g. every 30 mins).
Reading mediawiki manual I found there is a job queue implemented, but it does not have support for scheduling.
Is there any way to set a mediawiki extension job to execute periodically?
This is not what a job queue is for; it is to run a task as soon as there are free resources. Create a maintenance script and use cron to run it periodically.
The job works thanks wiki visits. Each n visits, a job is executed (n being configured in your LocalSettings.php).
It is probably not what you are looking for, but if you really want to purge this queue every 30 minutes, you can still use a cron job. For instance :
30 * * * * php ./maintenance/runJobs.php
Based on your short elements, I would propose instead to configure cron to execute one of the scripts of your extension, and explain the set up in your install documentation.
Users of my application need to be able to schedule certain task to run at certain times (e.g. once only, every every minute, every hour, etc.). My plan is to have a cron run a script every minute to check the application to see if it has tasks to execute. If so, then execute the tasks.
Questions:
Is the running of cron every minute a good idea?
How do I model in the database intervals like cron does (e.g. every minute, ever 5th minute of every hour, etc.)?
I'm using LAMP.
Or, rather than doing any, you know, real work, simply create an interface for the users, and then publish entries in cron! Rather than having cron call you every minute, have it call scripts as directed by the users. When they add or change jobs, rewrite the crontab.
No big deal.
In unix, cron allows each user (unix login that is) to have their own crontab, so you can have one dedicated to your app, don't have to use the root crontab for this.
Do you mean that you have a series of user-defined jobs that need executed in user-defined intervals, and you'd like to have cron facilitate the processing of those jobs? If so, you'd want to have a database with at least 2 fields:
JOB,
OFTEN
where OFTEN is how often they'd like the job to run, using syntax similar to CRON.
you'd then need to write a script (in python, ruby, or some similar language) to parse that data. this script would be what runs every 1 minute via your actual cron.
take a look at this StackOverflow question, and this StackOverflow question, regarding how to parse crontab data via python.