I will make my question very simple.
I have a ruby on rails app, backed with mysql.
I will click a button in page-1, it will goto page-2 and list a
table of 10 company's name.
Now, this list of ten companies are randomly generated(based on the
logic behind that clicked button in page-1) from COMPANIES table
which has 10k company names.
How do I calculate the count of the number of times a COMPANY name
being displayed on page-2, for a day.
Examaple: At the end of day - 1
COMPANY_NAME | COUNT
A | 2300
B | 100
C | 500
D | 10000
Now, from the research I have done, raw inserts will be costly and I learned there are 2 common ways to do it.
Open an unix file, write into it. At the end of the day, INSERT the content to the database.
Negative : File system if accessed concurrently, it will lead to lock issues.
Memcache the count and bulk insert into the DB.
What is the best way to do it in rails?
Are there any other ways to do this?
use redis. redis is good for maintaining in-memory data. It has atomic increment (an other data structures)
Related
I have a Theory question, I have an access database and I want to track cost by task. Currently I have a task tracker table that will store the users Hours|HourlyRate and Overtime|OvertimeRate among other things work order no, project no etc. I don't think that this is the best way to store this data as the users could look at the table and see each others rates, before now it didn't matter much, but I'm about to give this database to more users. I was thinking of having that Rate data in a separate table linked to the ID no of the Task table and not allow users to have access to this table, but then I couldn't do an after update event as the user wont have access to write to that table. Either that or store the rates in a separate Database with a start and end date of that given rate. For instance:
Ed | Rate $0.01 | StartDate 01/01/1999 | EndDate 12/32/1999
Ed | Rate $0.02 | StartDate 01/01/2000 | EndDate 12/32/2000
This way I can store the costing data in a separate database that the users don't have access too and just calculate the cost information every time I need it based on date and unique user ID. I was wondering what solutions have others come up within MSAccess for this type of situation.
When the probability of data loss of value(field) in 1 update (or simple in some time passing) is more than 0, then it is sure that on some point in future, the data is lost.
But how relevant is this theoretical conclusion in practice?
I need to have a database that stores the "user-likes" of a certain thing by a database design ID | THING | LIKES(int). Additionally to that there will be a database storing every single like of a user in a design ID | USER | THING.
When the amount of likes of a certain THING has to be displayed, it would be too slow to count every row of the second database WHERE THING = $value so I would just look up LIKESof the first database and if the user likes a thing I would just increase the number of LIKES by 1 (as in the theoretical question above).
I wouldn't worry about writing data from an "false-values" point of view. Most databases I know of guarantee the ACID-Properties.
Counting of course is slower than already having the count available to access via a key-index.
I've developed an iPhone app that allows users to send/receive files to/from a server.
At this point I wish to add a database to my server side application (sockets in c#) in order to keep track of the following:
personal card (name,age,email,etc...) - a user can (but isn't obligated) to fill one out
the number of files a user sent and received so far
app Stats which are sent every once in a while and contains info such as number of errors in the app, he's OS version etc...
the number of total files sent/received in the past hour (for all users)
each user has a unique 36 digit hex number "AF41-FB11-.....-FFFF"
The DB should provide the following answers: which users are "heavy users", how many files were exchanged in the past day/hour/month and is there a correlation between OS and number of errors.
Unfortunately I'm not very familiar with DB design, so I thought about the following:
a users table which will contain:
uniqueUserID | Parsonal Card | Files Sent | Files Received
an App stats table (each user can have many records)
uniqueUserID | number_of errors | OS_version | .... | sumbission_date_time
a general stats table (new record added every hour)
| total_files_received_in_the_last_hour | total_files_sent_in_the_last_hour | submission_date_time
My questions are:
performance-wise, does it make sense to collect and store data per user in side the server side application, and toss it all into the DB once an hour (e.g. open a connection, UPDATE fields/INSERT fields, close the connection) ? Or should I simply update each transaction (send/receive file) a user does every time he performs it?
Should I create a different primary key, other than the 36-digit id?
Does this design make sense??
I'm using mySQL 5.1, innoDB, the DBMS is on the same machine as the server-side app
Any insights will be helpful!
I'm designing a statistics tracking system for a sales organization that manages 300+ remote sales locations around the world. The system receives daily reports on sales figures (raw dollar values, and info-stats such as how many of X item were sold, etc.).
I'm using MAMP to build the system.
I'm planning on storing these figures in one MySQL big table, so each row is one day's statistics from one location. Here is a sample:
------------------------------------------------------------------
| LocationID | Date | Sales$ | Item1Sold | Item2Sold | Item3Sold |
------------------------------------------------------------------
| Hawaii | 3/4 | 100 | 2 | 3 | 4 |
| Turkey | 3/4 | 200 | 1 | 5 | 9 |
------------------------------------------------------------------
Because the organization will potentially receive a statistics update from each of 300 locations on a daily basis, I am estimating that within a month the table will have 9,000 records and within a year around 108,000. MySQL table partitioning based on the year should therefore keep queries in the 100,000 record range, which I think will allow steady performance over time.
(If anyone sees a problem with the theories in my above 'background data', feel free to mention them as I have no experience with building a large-scale database and this was simply what I have gathered with searching around the net.)
Now, on the front end of this system, it is web-based and has a primary focus on PHP. I plan on using the YUI framework I found online to display graph information.
What the organization needs to see is daily/weekly graphs of the sales figures of their remote locations, and whatever 'breakdown' statistics such as items sold (so you can "drill down" into a monetary graph and see what percentage of that income came from item X).
So if I have the statistics by LocationID, it's a fairly simple matter to organize this information by continent. If the system needs to display a graph of the sales figures for all locations in Europe, I can do a Query that JOINs a Dimension Table for the LocationID that gives its "continent" category and thereby sum (by date) all of those figures and display them on the graph. Or, to display weekly information, sum all of the daily reports in a given week and return them to my JS graph object as a JSON array, voila. Pretty simple stuff as far as I can see.
Now, my thought was to create "summary" tables of these common queries. When the user wants to pull up the last 3 months of sales for Africa, and the query has to go all the way down to the daily level and with various WHERE and JOIN clauses, sum up the appropriate LocationID's figures on a weekly basis, and then display to the user...well it just seemed more efficient to have a less granular table. Such a table would need to be automatically updated by new daily reports into the main table.
Here's the sort of hierarchy of data that would then need to exist:
1) Daily Figures by Location
2) Daily Figures by Continent based on Daily Figures by Location
3) Daily Figures for Planet based on Daily Figures by Continent
4) Weekly Figures by Location based on Daily Figures by Location
5) Weekly Figures By Continent based on Weekly Figures by Location
6) Weekly Figures for Planet based on Weekly Figures by Continent
So we have a kind of tree here, with the most granular information at the bottom (in one table, admittedly) and a series of less and less granular tables so that it is easier to fetch the data for long-term queries (partitioning the Daily Figures table by year will be useless if it receives queries for 3 years of weekly figures for the planet).
Now, first question: is this necessary at all? Is there a better way to achieve broad-scale query efficiency in the scenario I'm describing?
Assuming that there is no particularly better way to do this, how to go about this?
I discovered MySQL Triggers, which to me would seem capable of 'cascading the updates' as it were. After an INSERT into the Daily Figures table, a trigger could theoretically read the information of the inserted record and, based on its values, call an UPDATE on the appropriate record of the higher-level table. I.e., $100 made in Georgia on April 12th would prompt the United States table's 'April 10th-April 17th' record to UPDATE with a SUM of all of the daily records in that range, which would of course see the newly entered $100 and the new value would be correct.
Okay, so that's theoretically possible, but it seems too hard-coded. I want to build the system so that the organization can add/remove locations and set which continent they are in, which would mean that the triggers would have to be reconfigured to include that LocationID. The inability to make multiple triggers for a given command and table means that I would have to either store the trigger data separately or extract it from the trigger object, and then parse in/out the particular rule being added or removed, or keep an external array that I handled with PHP before this step, or...basically, a ton of annoying work.
While MySQL triggers initially seemed like my salvation, the more I look into how tricky it will be to implement them in the way that I need the more it seems like I am totally off the mark in how I am going about this, so I wanted to get some feedback from more experienced database people.
While I would appreciate intelligent answers with technical advice on how to accomplish what I'm trying to do, I will more deeply appreciate wise answers that explain the correct action (even if it's what I'm doing) and why it is correct.
What's the best storage mechanism (from the view of the database to be used and system for storing all the records) for a system built to track whois record changes? The program will be run once a day and a track should be kept of what the previous value was and what the new value is.
Suggestions on database and thoughts on how to store the different records/fields so that data is not redundant/duplicated
(Added) My thoughts on one mechanism to store data
Example case showing sale of one domain "sample.com" from personA to personB on 1/1/2010
Table_DomainNames
DomainId | DomainName
1 example.com
2 sample.com
Table_ChangeTrack
DomainId | DateTime | RegistrarId | RegistrantId | (others)
2 1/1/2009 1 1
2 1/1/2010 2 2
Table_Registrars
RegistrarId | RegistrarName
1 GoDaddy
2 1&1
Table_Registrants
RegistrantId | RegistrantName
1 PersonA
2 PersonB
All tables are "append-only". Does this model make sense? Table_ChangeTrack should be "added to" only when there is any change in ANY of the monitored fields.
Is there any way of making this more efficient / tighter from the size point-of-view??
The primary data is the existence or changes to the whois records. This suggests that your primary table be:
<id, domain, effective_date, detail_id>
where the detail_id points to actual whois data, likely normalized itself:
<detail_id, registrar_id, admin_id, tech_id, ...>
But do note that most registrars consider the information their property (whether it is or not) and have warnings like:
TERMS OF USE: You are not authorized
to access or query our Whois database
through the use of electronic
processes that are high-volume and
automated except as reasonably
necessary to register domain names or
modify existing registrations...
From which you can expect that they'll cut you off if you read their databases too much.
You could
store the checksum of a normalized form of the whois record data fields for comparison.
store the original and current version of the data (possibly in compressed form), if required.
store diffs of each detected change (possibly in compressed form), if required.
It is much like how incremental backup systems work. Maybe you can get further inspiration from there.
you can write vbscript in an excel file to go out and query a webpage (in this case, the particular 'whois' url for a specific site) and then store the results back to a worksheet in excel.