how to avoid duplication of tickets in nsf? - duplicates

I have more than one application server for smooth access but unfortunate at time i faced problem of duplication in tickets:
my current scenario generates ticket no. after the completion of form at save events but sometime due to server issues , replication delays hence two ticket generated of same no.

Normally this question is way to broad as it does not show the minimum amount of research...
Nevertheless I will answer it: Here are some ways to solve this issue.
The easiest: Add a Servername to the ticket-number.
So count as you do, but if there are duplicates, then they are different by there servername part:
Server1-0001
Server1-0002
Server2-0003
Server3-0004
Server1-0005
Server2-0005
Another possibility is to create the number only on ONE server. You can either do this by having an agent on that server running on all documents that do not have a number yet or "asking" the server for a number when saving.
First is easy to implement, but on the servers that do not create the numbers it will take at most 2 replication intervals for a ticket to get its unique number.
Second is trickier as you need all servers to "know" one central server and write code / agents / whatever to "get" a ticket number from that server and put it in the ticket.
All of this is not trivial and therefor to broad to answer in detail here.

Related

Managing Historical Data Dependencies

3 Tables: Device, SoftwareRevision, Message. All data entered is handed by PHP scripts on an Apache server.
A device can have one software revision. A software revision can have many devices. A device can have many messages. A message can have one device.
Something like above.
The issue is, the SoftwareRevision changes how the message is used in the front end application. This means that when the software is updated on the device, we need older messages to retain the information that they were received from a different software revision.
The TL;DR here is that the fully normalized way I see of doing this becomes a real pain. I've got about 5 of these situations in my current project and 3 of them are nested inside of each other.
I see three ways of doing this:
The first is the above fully normalized way. In order to find out how to use the message on the front end application, one must find the latest entry into Device_SoftwareRevision_Records that is before the datetime of the given message. This gets really fiddly when you have a more complex database and application. Just to get the current SoftwareRevision_ID for a device you have to use a MAX GROUP BY type statement (I've ended up having to use views to simplify).
The second is to directly link the Message to the SoftwareVersion. This means you don't have to go through the whole MAX GROUP BY WHERE blah blah. The SoftwareVersion_ID is retrieved by a PHP script and then the message is entered. Of course, this is denormalized so now there is potential for duplicate data.
Aaaand heres our fully denormalized version. The Software_Revision_Records table is purely for bookkeeping purposes. Easy to use for the front-end application but a pain to update at the back-end. The back-end updating can actually be streamlined with triggers for entering into the Software_Revision_Records table so the only thing that can really go wrong is the message gets the wrong software revision when it is entered.
Is there a better way of doing this that I have missed? Is it such a sin to denormalize the database in this situation? Will my decision here cause the business to erupt into flames (probably not)?
If the messages are tied to the software revision for that particular device, then it might make more sense to reflect that relationship in the data model. i.e. have a foreign key from Messages to Device_SoftwareRevision_Records rather than from Messages to Device. You still have the relationship from Messages to Device indirectly, it's normalised, and there's no messing around with dates trying to figure out which messages were created while a given software revision was in place.
In cases where you do need dates, it might also be worth considering having both a start and stop date, and filling in any null dates with something like 9999-12-31 (to indicate that a record has not yet been ended). You can easily find the latest record without needing to do a max. It will also make it a lot easier to query the table if you do need to compare it to other dates - you can just do a between on a single record. In this example, you'd just look for this:
where Message.TimeStamp between Device_SoftwareRevision_Records.StartDate and Device_SoftwareRevision_Records.EndDate
That said, I would still - if at all possible - change the model to relate Messages to the correct table rather than rely on dates. Being able to do simple joins will be quicker, more convenient, more obvious if anyone new needs to learn the structure, and is likely to perform better.

MYSQL - Database Design Large-scale real world deployment

I would love to hear some opinions or thoughts on a mysql database design.
Basically, I have a tomcat server which recieves different types of data from about 1000 systems out in the field. Each of these systems are unique, and will be reporting unique data.
The data sent can be categorized as frequent, and unfrequent data. The unfrequent data is only sent about once a day and doesn't change much - it is basically just configuration based data.
Frequent data, is sent every 2-3 minutes while the system is turned on. And represents the current state of the system.
This data needs to be databased for each system, and be accessible at any given time from a php page. Essentially for any system in the field, a PHP page needs to be able to access all the data on that client system and display it. In other words, the database needs to show the state of the system.
The information itself is all text-based, and there is a lot of it. The config data (that doesn't change much) is key-value pairs and there is currently about 100 of them.
My idea for the design was to have 100+ columns, and 1 row for each system to hold the config data. But I am worried about having that many columns, mainly because it isn't too future proof if I need to add columns in the future. I am also worried about insert speed if I do it that way. This might blow out to a 2000row x 200column table that gets accessed about 100 times a second so I need to cater for this in my initial design.
I am also wondering, if there is any design philosophies out there that cater for frequently changing, and seldomly changing data based on the engine. This would make sense as I want to keep INSERT/UPDATE time low, and I don't care too much about the SELECT time from php.
I would also love to know how to split up data. I.e. if frequently changing data can be categorised in a few different ways should I have a bunch of tables, representing the data and join them on selects? I am worried about this because I will probably have to make a report to show common properties between all systems (i.e. show all systems with a certain condition).
I hope I have provided enough information here for someone to point me in the right direction, any help on the matter would be great. Or if someone has done something similar and can offer advise I would be very appreciative. Thanks heaps :)
~ Dan
I've posted some questions in a comment. It's hard to give you advice about your rapidly changing data without knowing more about what you're trying to do.
For your configuration data, don't use a 100-column table. Wide tables are notoriously hard to handle in production. Instead, use a four-column table containing these columns:
SYSTEM_ID VARCHAR System identifier
POSTTIME DATETIME The time the information was posted
NAME VARCHAR The name of the parameter
VALUE VARCHAR The value of the parameter
The first three of these columns are your composite primary key.
This design has the advantage that it grows (or shrinks) as you add to (or subtract from) your configuration parameter set. It also allows for the storing of historical data. That means new data points can be INSERTed rather than UPDATEd, which is faster. You can run a daily or weekly job to delete history you're no longer interested in keeping.
(Edit if you really don't need history, get rid of the POSTTIME column and use MySQL's nice extension feature INSERT ON DUPLICATE KEY UPDATE when you post stuff. See http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html)
If your rapidly changing data is similar in form (name/value pairs) to your configuration data, you can use a similar schema to store it.
You may want to create a "current data" table using the MEMORY access method for this stuff. MEMORY tables are very fast to read and write because the data is all in RAM in your MySQL server. The downside is that a MySQL crash and restart will give you an empty table, with the previous contents lost. (MySQL servers crash very infrequently, but when they do they lose MEMORY table contents.)
You can run an occasional job (every few minutes or hours) to copy the contents of your MEMORY table to an on-disk table if you need to save history.
(Edit: You might consider adding memcached http://memcached.org/ to your web application system in the future to handle a high read rate, rather than constructing a database design for version 1 that handles a high read rate. That way you can see which parts of your overall app design have trouble scaling. I wish somebody had convinced me to do this in the past, rather than overdesigning for early versions. )

Medium-term temporary tables - creating tables on the fly to last 15-30 days?

Context
I'm currently developing a tool for managing orders and communicating between technicians and services. The industrial context is broadcast and TV. Multiple clients expecting media files each made to their own specs imply widely varying workflows even within the restricted scope of a single client's orders.
One client can ask one day for a single SD file and the next for a full-blown HD package containing up to fourteen files... In a MySQL db I am trying to store accurate information about all the small tasks composing the workflow, in multiple forms:
DATETIME values every time a task is accomplished, for accurate tracking
paths to the newly created files in the company's file system in VARCHARs
archiving background info in TEXT values (info such as user comments, e.g. when an incident happens and prevents moving forward, they can comment about it in this feed)
Multiply that by 30 different file types and this is way too much for a single table. So I thought I'd break it up by client: one table per client so that any order only ever requires the use of that one table that doesn't manipulate more than 15 fields. Still, this a pretty rigid solution when a client has 9 different transcoding specs and that a particular order only requires one. I figure I'd need to add flags fields for each transcoding field to indicate which ones are required for that particular order.
Concept
I then had this crazy idea that maybe I could create a temporary table to last while the order is running (that can range from about 1 day to 1 month). We rarely have more than 25 orders running simultaneously so it wouldn't get too crowded.
The idea is to make a table tailored for each order, eliminating the need for flags and unnecessary forever empty fields. Once the order is complete the table would get flushed, JSON-encoded, into a TEXT or BLOB so it can be restored later if changes need made.
Do you have experience with DBMS's (MySQL in particular) struggling from such practices if it has ever existed? Does this sound like a viable option? I am happy to try (which I already started) and I am seeking advice so as to keep going or stop right here.
Thanks for your input!
Well, of course that is possible to do. However, you can not use the MySQL temporary tables for such long-term storage, you will have to use "normal" tables, and have some clean-up routine...
However, I do not see why that amount of data would be too much for a single table. If your queries start to run slow due to much data, then you should add some indexes to your database. I also think there is another con: It will be much harder to build reports later on, when you have 25 tables with the same kind of data, you will have to run 25 queries and merge the data.
I do not see the point, really. The same kinds of data should be in the same table.

Using Plone 4 and pas.plugins.sqlalchemy with many users

I've been using pas.plugins.sqlalchemy to provide an RDBMS backend for authentication and memberdata storage, using MySQL. Authentication works perfectly and member data is correctly stored and retrived on the RDBMS. The current users are over 20.000
However, user enumeration takes ages. I have checked the "Many users" in the Plone Control Panel / Users and Groups section but even a simple user search takes a near infinite amount of time. By debugging the plugin.py script I noticed that enumerateUsers() is called as many times as the number of users stored; therefore, an enormous amount of CPU time is needed to complete a simple search request, as the query is matched against each username, one user at a time, one query at a time.
Am I missing something here? Isn't pas.plugins.sqlalchemy useful especially when you have a very large number users? Currently, I have the sql plugin as top priority in my *acl_users/plugins/User Enumeration* setup. Should I change this?
I've pretty much inherited maintenance pas.plugins.sqlalchemy - but I haven't personally used it for more than a handful of users, yet. If you file a bug at https://github.com/auspex/pas.plugins.sqlalchemy/issues, I'll see what I can do.
I don't think it can make much difference what order the enumeration occurs - it still has to enumerate all the users in the SQL db. So it either does them before the ones found in the ZODB, or after. It sounds as if the problem begins with Zope - calling enumerateUsers() once per user seems excessive - but even so, it shouldn't be necessary to make a request to the relational db per enumeration.

Whats the best way to count views on a high traffic site?

The way I currently do it in mysql is is
UPDATE table SET hits=hits+1 WHERE id = 1;
This keeps live stats on the site, but as I understand, isnt the best way to go about doing this.
Edit:
let me clarify... this is for counting hits on specific item pages. I have a listing of movies, and I want to count how many views each movie page has gotten. After it +1s, it adds the movie ID to a session var, which stories the ids of all the pages the user viewed. If the ID of the page is in that array, it wont +1 it.
If your traffic is high enough, you shouldn't hit the database on every request. Try keeping the count in memory and sync the database on a schedule (for example, update the database on every 1000 requests or every minute.)
You could take an approach similar to Stack Overflow's view count. This basically increments the counter when the image is loaded. This has two useful aspects:
Robots often don't download images, so these don't increment the view.
Browsers cache images, so when you go back to a page, you're not causing work for the server.
The potentially slower code is run async from the rest of the page. This doesn't slow down the page from being visible.
To optimize the updates:
* Keep the counter in a single, narrow table, with a clustered index on the key.
* Have the table served up by a different database server / host.
* Use memcached and/or a queue to allow the write to either be delayed or run async.
If you don't need to display the view count in real time, then your best bet is to include the movie id in your URL somewhere, and use log scrapping to populate the database at the end of the day.
Not sure which web server you are using.
If your web server logs the requests to the site, say one line per request in a text file. Then you could just count the lines in your log files.
Your solution has a major problem in that it will lock the row in the database, therefore your site can only serve one request at a time.
it depends really on if you want hits or views
1 view from 1 ip = 1 person looking at a page
1 person refreshing the same page = multiple hits but only one view
i always prefer google analytics etc for something like this, you need to make sure that
this db update is only done once, or you could quite easily be flooded.
I'm not sure about what you're using, but you could set a cron job to automatically update the count every x minutes in Google App Engine. I think you'd use memcache to save the counts until your cron job is run. Although... GAE does have some stat reporting but you'd probably want to have your own data also. I think you can use memcache on other systems, and set cron jobs on them tool
Use logging software. Google Analytics is pretty and feature-filled (and generates zero load on your servers), but it'll miss non-JavaScript hits. If every single hit is important, use a server log analyzer like webalizer or awstats.
In general with MySQL:
If you use MyISAM table: there is a lock on the table so you better have to do an INSERT in a separate table. Then with a cron job, you UPDATE the values in your movie table.
If you use InnoDB table: there is a lock on the row so you can UPDATE the value directly.
That say, depending of the "maturity" and success of your project, you may need to implement a different solution, so:
1st advice: benchmark, benchmark, benchmark.
2nd advice: Using the data from the 1st advice, identify the bottleneck and select the solution for the issue you face but not a future issue you think you might have.
Here is a great video on this: http://www.youtube.com/watch?v=ZW5_eEKEC28
Hope this helps. :)