Whats is the best way to distribute different builds to different slave?
Can you give some examples how to restrict to particular node or all nodes or labeling is better?
I need to configuration examples.
I have 4 slaves
prod1-build
prod2-build
prod3-build and prod4-build
is it better to give a label as prodbuild adn restrict to prodbuild or
can I give prod1-build || prod2-buil || prod3-build || prod4-build
So when ever the build triggers it should pick any one of the above.
Labelling is best if you have a large number of jobs, as it's easier to manage; you just label your slaves and then add the appropriate label to each build job.
If you have a smaller number of jobs, you can also tie each individual job definition to a particular slave - in that case, just put the slave name in the "Restrict where this job can be run" field.
Related
I am running a finance related web portal which involve millions of debit credit transactions in MySQL , getting the balance of a specific user at a certain limit i.e. with 3 million rows becomes slow.
Now I am thinking to create separate MySQL container for each user and record only the relevant user transactions in every container and I am sure It will be fast to calculate balance of any user.
I have around 20 thousands of user and I wants to know is it practical to create separate MySQL container for each user ? or shall I go for any other approach. Thanks
I would not recommend a separate MySQL instance per user.
I operated MySQL in docker containers at a past job. Even on very powerful servers, we could run only about 30 MySQL instances per server before running out of resources. Perhaps a few more if each instance is idle most of the time. Regardless, you'll need hundreds or thousands of servers to do what you're describing, and you'll need to keep adding servers as you get more users.
Have you considered how you will make reports if each user's data is in a different MySQL instance? It will be fine to make a report about any individual user, but you probably also need reports about aggregate financial activity across all the users. You cannot run a single query that spans MySQL instances, so you will have to do one query per instance and write custom code to combine the results.
You'll also have more work when you need to do backups, upgrades, schema changes, error monitoring, etc. Every one of these operations tasks will be multiplied by the number of instances.
You didn't describe how your data is organized or any of the specific queries you run, but there are techniques to optimize queries that don't require splitting the data into multiple MySQL instances. Techniques like indexing, caching, partitioning, or upgrading to a more powerful server. You should look into learning about those optimization techniques before you split up your data, because you'll just end up with thousands of little instances that are all poorly optimized.
I have around 20 thousands of user and I wants to know is it practical to create separate MySQL container for each user
No, definitely not. While docker containers are relatively lightweight 20k of them is a lot which will require a lot of extra resources (memory, disk, CPU).
getting the balance of a specific user at a certain limit i.e. with 3 million rows becomes slow.
There are several things you can try to do.
First of all try to optimize the database/queries (can be combined with vertical scaling - by using more powerful server for the database)
Enable replication (if not already) and use secondary instances for read queries
Use partitioning and/or sharding
I know this is sacrilegious, but for a table like that I like to use two tables. (The naughty part is the redundancy.)
History -- details of all the transactions.
Current -- the current balance
You seem to have just History, but frequently need to compute the Current for a single user. If you maintain this as you go, it will run much faster.
Further, I would do the following:
Provide Stored Procedure(s) for all actions. The typical action would be to add one row to History and update one row in Current.
Never UPDATE or DELETE rows in History. If a correction is needed, add another row with, say, a negative amount. (This, I think, is "proper" accounting practice anyway.)
Once you have made this design change, your question becomes moot. History won't need to have frequent big scans.
Use InnoDB (not MyISAM).
Another thing that may be useful -- change the main indexes on History from
PRIMARY KEY(id),
INDEX(user_id)
to
PRIMARY KEY(user_id, id), -- clusters a user's rows together
INDEX(id) -- this keeps AUTO_INCREMENT happy
So this is very much a conceptual question (as much as I'd love to build a billion user app I don't think it's going to happen).
I've read the article by Pinterest on how they scaled their MySQL fleet a number of times ( https://medium.com/#Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f ) and I still don't get how they would "open up new shards" without effecting existing users.
The article states that every table is on every shard, including the User table.
So I'm assuming that when a user registers and they are assigned a random shard, this has to be done via a function that will always return the same result regardless of the number of shards.
e.g if I sign up with test#example.com they would potentially use that email to work out the shard id and this would have to take into consideration the number of currently 'open' shards. My initial assumption was that they would use something like the mod shard they mentioned later on in the article e.g.
md5($email) % number_of_shards
But as they open up the number of shards it would change the function result.
I then thought perhaps they had a separate DB to hold purely user info for authentication purposes and this would also contain a column with the assigned shard_id, but as I say the article implies that even the user table is on each shard.
Does anyone else have any ideas or insights into how something like this might work?
You are sharding on "user", correct? I see 3 general ways to split up the users.
The modulo approach to sharding has a big problem. When you add a shard, suddenly most users need to move most users to a different shard.
At the other extreme (from modulo) is the "dictionary" approach. You have some kind of lookup that says which shard each user is on. With millions of users, maintenance of the dictionary becomes a costly headache.
I prefer a hybrid:
Do modulo 4096 (or some suitably large number)
Use a dictionary with 4096 entries. This maps 4096 values into the current number of shards.
You have a package to migrate users from one shard to another. (This is a vital component of the system -- you will use it for upgrading, serious crashes, etc, load balancing, etc)
Adding a shard involves moving a few of the 4096 to the new shard and changing the dictionary. The users to move would probably come from the 'busiest' shards, thereby relieving the pressure on them.
Yes, item 4 impacts some users, but only a small percentage of them. You can soften the blow by picking 'idle' or 'small' or 'asleep' users to move. This would involve computing some metric for each of the 4096 clumps.
Context
I'm currently developing a tool for managing orders and communicating between technicians and services. The industrial context is broadcast and TV. Multiple clients expecting media files each made to their own specs imply widely varying workflows even within the restricted scope of a single client's orders.
One client can ask one day for a single SD file and the next for a full-blown HD package containing up to fourteen files... In a MySQL db I am trying to store accurate information about all the small tasks composing the workflow, in multiple forms:
DATETIME values every time a task is accomplished, for accurate tracking
paths to the newly created files in the company's file system in VARCHARs
archiving background info in TEXT values (info such as user comments, e.g. when an incident happens and prevents moving forward, they can comment about it in this feed)
Multiply that by 30 different file types and this is way too much for a single table. So I thought I'd break it up by client: one table per client so that any order only ever requires the use of that one table that doesn't manipulate more than 15 fields. Still, this a pretty rigid solution when a client has 9 different transcoding specs and that a particular order only requires one. I figure I'd need to add flags fields for each transcoding field to indicate which ones are required for that particular order.
Concept
I then had this crazy idea that maybe I could create a temporary table to last while the order is running (that can range from about 1 day to 1 month). We rarely have more than 25 orders running simultaneously so it wouldn't get too crowded.
The idea is to make a table tailored for each order, eliminating the need for flags and unnecessary forever empty fields. Once the order is complete the table would get flushed, JSON-encoded, into a TEXT or BLOB so it can be restored later if changes need made.
Do you have experience with DBMS's (MySQL in particular) struggling from such practices if it has ever existed? Does this sound like a viable option? I am happy to try (which I already started) and I am seeking advice so as to keep going or stop right here.
Thanks for your input!
Well, of course that is possible to do. However, you can not use the MySQL temporary tables for such long-term storage, you will have to use "normal" tables, and have some clean-up routine...
However, I do not see why that amount of data would be too much for a single table. If your queries start to run slow due to much data, then you should add some indexes to your database. I also think there is another con: It will be much harder to build reports later on, when you have 25 tables with the same kind of data, you will have to run 25 queries and merge the data.
I do not see the point, really. The same kinds of data should be in the same table.
I am designing a system and I don't think it's a good idea to give the ability to the end user to delete entries in the database. I think that way because often then end user, once given admin rights, might end up making a mess in the database and then turn to me to fix it.
Of course, they will need to be able to do remove entries or at least think that they did if they are set as admin.
So, I was thinking that all the entries in the database should have an "active" field. If they try to remove an entry, it will just set the flag to "false" or something similar. Then there will be some kind of super admin that would be my company's team who could change this field.
I already saw that in another company I worked for, but I was wondering if it was a good idea. I could just make regular database backups and then roll back if they commit an error and adding this field would add some complexity to all the queries.
What do you think? Should I do it that way? Do you use this kind of trick in your applications?
In one of our databases, we distinguished between transactional and dictionary records.
In a couple of words, transactional records are things that you cannot roll back in real life, like a call from a customer. You can change the caller's name, status etc., but you cannot dismiss the call itself.
Dictionary records are things that you can change, like assigning a city to a customer.
Transactional records and things that lead to them were never deleted, while dictionary ones could be deleted all right.
By "things that lead to them" I mean that as soon as the record appears in the business rules which can lead to a transactional record, this record also becomes transactional.
Like, a city can be deleted from the database. But when a rule appeared that said "send an SMS to all customers in Moscow", the cities became transactional records as well, or we would not be able to answer the question "why did this SMS get sent".
A rule of thumb for distinguishing was this: is it only my company's business?
If one of my employees made a decision based on data from the database (like, he made a report based on which some management decision was made, and then the data report was based on disappeared), it was considered OK to delete these data.
But if the decision affected some immediate actions with customers (like calling, messing with the customer's balance etc.), everything that lead to these decisions was kept forever.
It may vary from one business model to another: sometimes, it may be required to record even internal data, sometimes it's OK to delete data that affects outside world.
But for our business model, the rule from above worked fine.
A couple reasons people do things like this is for auditing and automated rollback. If a row is completely deleted then there's no way to automatically rollback that deletion if it was in error. Also, keeping a row around and its previous state is important for auditing - a super user should be able to see who deleted what and when as well as who changed what, etc.
Of course, that's all dependent on your current application's business logic. Some applications have no need for auditing and it may be proper to fully delete a row.
The downside to just setting a flag such as IsActive or DeletedDate is that all of your queries must take that flag into account when pulling data. This makes it more likely that another programmer will accidentally forget this flag when writing reports...
A slightly better alternative is to archive that record into a different database. This way it's been physically moved to a location that is not normally searched. You might add a couple fields to capture who deleted it and when; but the point is it won't be polluting your main database.
Further, you could provide an undo feature to bring it back fairly quickly; and do a permanent delete after 30 days or something like that.
UPDATE concerning views:
With views, the data still participates in your indexing scheme. If the amount of potentially deleted data is small, views may be just fine as they are simpler from a coding perspective.
I prefer the method that you are describing. Its nice to be able to undo a mistake. More often than not, there is no easy way of going back on a DELETE query. I've never had a problem with this method and unless you are filling your database with 'deleted' entries, there shouldn't be an issue.
I use a combination of techniques to work around this issue. For some things adding the extra "active" field makes sense. Then the user has the impression that an item was deleted because it no longer shows up on the application screen. The scenarios where I would implement this would include items that are required to keep a history...lets say invoice and payment. I wouldn't want such things being deleted for any reason.
However, there are some items in the database that are not so sensitive, lets say a list of categories that I want to be dynamic...I may then have users with admin privileges be allowed to add and delete a category and the delete could be permanent. However, as part of the application logic I will check if the category is used anywhere before allowing the delete.
I suggest having a second database like DB_Archives whre you add every row deleted from DB. The is_active field negates the very purpose of foreign key constraints, and YOU have to make sure that this row is not marked as deleted when it's referenced elsewhere. This becomes overly complicated when your DB structure is massive.
There is an acceptable practice that exists in many applications (drupal's versioning system, et. al.). Since MySQL scales very quickly and easily, you should be okay.
I've been working on a project lately where all the data was kept in the DB as well. The status of each individual row was kept in an integer field (data could be active, deleted, in_need_for_manual_correction, historic).
You should consider using views to access only the active/historic/... data in each table. That way your queries won't get more complicated.
Another thing that made things easy was the use of UPDATE/INSERT/DELETE triggers that handled all the flag changing inside the DB and thus kept the complex stuff out of the application (for the most part).
I should mention that the DB was a MSSQL 2005 server, but i guess the same approach should work with mysql, too.
Yes and no.
It will complicate your application much more than you expect since every table that does not allow deletion will be behind extra check (IsDeleted=false) etc. It does not sound much but then when you build larger application and in query of 11 tables 9 require chech of non-deletion.. it's tedious and error prone. (Well yeah, then there are deleted/nondeleted views.. when you remember to do/use them)
Some schema upgrades will become PITA since you'll have to relax FK:s and invent "suitable" data for very, very old data.
I've not tried, but have thought a moderate amount about solution where you'd zip the row data to xml and store that in some "Historical" table. Then in case of "must have that restored now OMG the world is dying!1eleven" it's possible to dig out.
I agree with all respondents that if you can afford to keep old data around forever it's a good idea; for performance and simplicity, I agree with the suggestion of moving "logically deleted" records to "old stuff" tables rather than adding "is_deleted" flags (moving to a totally different database seems a bit like overkill, but you can easily change to that more drastic approach later if eventually the amount of accumulated data turns out to be a problem for a single db with normal and "old stuff" tables).
My question is a lot like this one. However I'm on MySQL and I'm looking for the "lowest tech" solution that I can find.
The situation is that I have 2 databases that should have the same data in them but they are updated primarily when they are not able to contact each other. I suspect that there is some sort of clustering or master/slave thing that would be able to sync them just fine. However in my cases that is major overkill as this is just a scratch DB for my own use.
What is a good way to do this?
My current approach is to have a Federated table on one of them and, every so often, stuff the data over the wire to the other with an insert/select. It get a bit convoluted trying to deal with primary keys and what not. (insert ignore seems to not work correctly)
p.s. I can easily build a query that selects the rows to transfer.
MySQL's inbuilt replication is very easy to set up and works well even when the DBs are disconnected most of the time. I'd say configuring this would be much simpler than any custom solution out there.
See http://www.howtoforge.com/mysql_database_replication for instructions, you should be up and running in 10-15 mins and you won't have to think about it again.
The only downside I can see is that it is asynchronous - ie. you must have one designated master that gets all the changes.
My current solution is
set up a federated table on the source box that grabs the table on the target box
set up a view on the source box that selects the rows to be updated (as a join of the federated table)
set up another federated table on the target box that grabs the view on the source box
issue an INSERT...SELECT...ON DUPLICATE UPDATE on the target box to run the pull.
I guess I could just grab the source table and do it all in one shot, but based on the query logs I've been seeing, I'm guessing that I'd end up with about 20K queries being run or about 100-300MB of data transfer depending on how things happen. The above setup sold result in about 4 queries and little more data transfered than actually needed to be.