I have an application that let's devices communicate over MQTT.
When two (or more devices) are paired, they are in a session (with a session-id)
The topics are for example:
session/<session-id>/<sender-id>/phase
with a payload like
{'phase': 'start', 'othervars': 'examplevar'}
Every session is logged into a mySQL database into the following format:
| id | session-id | sender | topic (example: phase) | payload | entry-time | ...
Now, when I just want to get a whole session I can just query by session-id.
Another view I want to achieve looks like this:
| session-id (distinct) | begin time | end time | duration | success |
Success is a boolean; true when in the current session there is an entry where the payload has a 'phase':'success'. Otherwise it is not successful.
Now I have the problem that this query is very slow. Everytime I want to access it, it has to calculate for each session if it was successful, along with the time calculation.
Should I make a script at the end of a session, to calculate this information and put it in another table? The problem I have with this solution is that I will have duplicate data.
Can I make this faster with indexes? Or did I just make a huge design mistake from the beginning?
Thanks in advance
Indexes? Yes. YES!
If session-id is unique, get rid of id and use PRIMARY KEY(session_id).
success could be TINYINT NOT NULL with values 0 or 1 for fail or success.
If the "payload" is coming in as JSON, then I suggest storing the entire string in a column for future reference, plus pull out any columns that you need to search on and index them. In later versions of MySQL, there is a JSON datatype, which could be useful.
Please provide some SELECTs so we can further advise.
Oh, did I mention how important indexes are to databases?
Related
I want to keep track of a user counter though time and be able to generate stats about the changes in the counter through time.
I'm pretty set (although if they are better ways I would like to hear about them) about the two main tables. user and counter_change that would look pretty much like this:
user:
+-----------+------------+
| id | username |
+-----------+------------+
| 1 | foo |
| 2 | bar |
+-----------+------------+
counter_change:
+-----------+--------------------+------------+
| user_id | counter_change_val | epoch_time |
+-----------+--------------------+------------+
| 1 | 10 | 1513242884 |
| 1 | -1 | 1513242889 |
+-----------+--------------------+------------+
I want to be able to show the current counter value (with the base value being 0) at the frontend as well as some stats trough time (ex: yesterday your net counter was +10 or -2, etc)
I've thought about some possible solutions but none of them seem to be the perfect solution.
Add counter to user table (or on some new counters table):
This solution seems to be the more resources effective, at the time of inserting a counter_change, update the counter in user with the counter_change_val.
Pros:
Get the counter current value would consume virtually no resources.
Cons:
The sum of counter_changes_val could diverge from the counter in user if a bug occurs.
Couldn't be really used for stats fields as it would require an additional query, and at that point a trigger would be more handy.
Add a calculated counter to user table (or on some new counters table) on insert/update:
This solution would consist of a SQL trigger or some sort of function at ORM level that would update the value on an insert to the the counter_change table with the sum of the counter_change_val.
This would be also used on calculated fields that imply grouping by dates. For example get the average daily changes of the last 30 days.
Pros:
Get the counter current value would consume virtually no resources.
Cons:
On every insert an aggregation of all the current user counter_change would be needed.
Add a view or select with the sum of counter
This solution would consist of creating a view or select to get the sum of the aggregate counter_change_val when needed.
Pros:
Adds no fields to the tables.
Cons:
As it is calculated at runtime it would add time to request response time.
Every time the counter is consulted an aggregation of the counter_change values would be needed.
Actually, I am not sure that I have understood what you are trying to do. Nevertheless, I would suggest Option 1 or Option 2:
Option 1 is efficient, and it is sufficiently safe against errors if it is done right. For example, you could wrap inserting the counter_change and computing the new counter_value in a transaction; this will prevent any inconsistencies. You could do that either in the back-end software or in a trigger (e.g. upon inserting a counter_change).
Regarding Option 2, it is not clear to me why an aggregation over all the counter_change of the current user would be needed. You can adjust the counter_value in the user table from within an insert trigger as with option 1, and you can use transactions to make it safe.
IMHO, adjusting the current counter_value upon every insert of a counter_change is the most efficient solution. You can do it either in the back-end software or from within a trigger. In both cases, use transactions.
Option 3) should not be used because it will put much of load onto the system (assume you have 1000 counter_changes per user ...).
Regarding the statistics: This is a different problem from storing the data in the first place. You probably will need some sort of aggregation for any statistical data. To speed this up, you could think about caching results and things like that.
I'm working on a URL shortener project with PHP & MYSQL which tracks visits of each url. I've provided a table for visits which mainly consists of these properties :
time_in_second | country | referrer | os | browser | device | url_id
#####################################################################
1348128639 | US | direct | win | chrome | mobile | 3404
1348128654 | US | google | linux | chrome | desktop| 3404
1348124567 | UK | twitter| mac | mozila | desktop| 3404
1348127653 | IND | direct | win | IE | desktop| 3465
Now I want to make a query on this table. for example I want to get visits data for the url with url_id=3404. Because I should provide statistics and draw graphs, for this url, I need these data:
Number of each kind of OS for this URL , for example 20 windows, 15 linux , ...
Number of visits in each desired period of time , for example each 10 minutes in past 24 hour
Number of visits for each country
...
As you see, some data like country may accept lots of different values.
One good idea which I can imagine is to make query which outputs number of each unique value in each column, for example in the country case for the data given above, on column for num_US, one for num_UK, and one for num_IND.
Now the question is how to implement such a high-performance query in sql (MYSQL) ?
Also if you think this is not an efficient query for performance, what's your suggestion?
Any help will be appreciated deeply.
UPDATE: look at this question : SQL; Only count the values specified in each column . I think this question is similar to mine , but the difference is in variety of values possible (as lots of values are possible for country property) for each column which makes the query more complex.
It looks like you need to do more than one query. You probably could write one query with different parameters but that would make it complex and hard to maintain. I would approach it as multiple small queries. So for each requirement I make a query and call them separately or individually. For example, if you want the country query you mentioned, you could do the following
SELECT country, count (*) FROM <TABLE_NAME> WHERE url_id = 3404 GROUP BY Country
By the way, I have not tested this query, so it may be inaccurate, but this is just to give you an idea. I hope this helps.
Also, another suggestion is to use Google Analytics, look into it, they do have a lot of what you already are implementing, maybe that helps as well.
Cheers.
Each of these graphs you want to draw represents a separate relation, so my off-the-cuff response is that you can't build a single query that gives you exactly the data you need for every graph you want to draw.
From this point, your choises are:
Use different queries for different graphs
Send a bunch of data to the client and let it do the required post-processing to create the exact sets of data it needs for different graphs
farm it all out to Google Analytics (a la #wahab-mirjan)
If you go with option 2 you can minimize the amount of data you send by counting hits per (10-minute, os, browser, device, url_id) tupple. This essentially removes all duplicate rows and gives you a count. The client software would take these numbers and further reduce them by country (or whatever) to get the numbers it needs for a graph. To be honest though, I think you're buying yourself extra complexity for not very much gain.
If you insist on doing this yourself (instead of using a service) then go with a different query for each kind of graph. Start with a couple of reasonable indexes (url_id and time_in_second are obvious starting points). Use the explain statement (or whatever your database provides) to understand how each query is executed.
Sorry, I am new to Stack Overflow and having a problem with comment formatting. Here is my answer again, hopefully it workds now:
Not sure how it is poor in performance. The way I am thinking is you will end up with a table that looks like this:
country | count
#################
US | 304
UK | 123
UK | 23
So when you group by country, and count, it will be one query. I think this will get you going in the right direction. In any case, it is just an opinion, so if you find another approch, I am interested in knowing it as well.
Apologies about the comment messup up there..
Cheers
I'm trying to index a very small (6 rows) table into solr, and it says that it's added/updated 6 documents, but it doesn't return anything when I search for a field. My table is as follows
League:
field | type |
---------------------
id | int |
leaguename | string|
and here is what solr prints when I try to do the full-import
03data-config.xmlfull-importidle1602011-07-13 19:11:42Indexing completed. Added/Updated: 6 documents. Deleted 0 documents.
2011-07-13 19:11:422011-07-13 19:11:4260:0:0.120
This response format is experimental. It is likely to change in the future.
Is there a way that I can view the values that index is holding? I've tried looking in the data folder in solr, but all the files just seem to have strange non-alphanumeric characters in them.
Luke is a desktop app which allows you to examine an index, run queries and generally muck around.
If the index is remote you will first need to transfer it to your desktop, then just open it in Luke.
http://code.google.com/p/luke/
Luke rocks!
assuming you ar erunning solr on localhost and port 8983 as per standard example
you can do a wildcard query like
http://localhost:8983/solr/select?q=*:*
This will return all the documents with all stored fields.
From the admin screen query a wildcard on whatever your unique key field is.
Uniquekeyfieldname:*
That will get you a count to see if something got indexed. If you want to see all fields too then specify the field list at the end of the query strong
&FL=*
I have the following data which I want to save in my DB (this is used for sending text messages via a 3rd party API)
text_id, text_message, text_time, (array)text_contacts
text_contacts contains a normal array with all the contact_id's
How should I properly store the data in a MySQL database?
I was thinking myself either on 2 ways:
Make the array with contact_id's in a json_encoded (no need for serializing since it's not multi-dimensional) string, and store it in a text field in the DB
Make a second table with the text_id and all contact_id's on a new row..
note: The data stored in the text_contacts array does not need to be changed at any time.
note2: The data is used as individual contact_id to get the phone number from the contact, and check whether the text message has actually been sent.. (with a combination of text_id, and phonenumber)
What is more efficiƫnt, and why?
This is completely dependent upon your expected usage characteristics. If you will have a near-term need to query based upon the contact_ids, then store them independently as in your second solution. If you're storing them for archival purposes, and don't expect them to be used dynamically, you're as well off saving the time and storing them in a JSON string. It's all about the usage.
IMO, go with the second table, mapping text-ids to contact-ids. Will be easier to manipulate than storing all the contacts in one field
This topic will bring in quite a few opinions, but my belief: second table, by all means.
If you ever have a case where you actually need to search by that data, it will not require you to parse it before using it.
It is a heck of a lot easier to debug (for the same reason)
json_encode and json_decode (or equivalent) take far more time than a join does.
Lazy loading is easier, even if not necessary in most cases.
Others will find it more readable and, with a good schema definition, easier to conceptualize and maintain.
Almost all implementations would use one table for storing each text_contacts, and then a second table would use a foreign key to reference the text_contacts table. So, if say you had a table text_contacts that looked like this:
contact_id | name
1 | someone
2 | someone_else
And a text message table that looked like this:
text_id | text_message | text_time | text_contact
1 | "Hey" | 12:48 | 1
2 | "Hey" | 12:48 | 2
Each contact that has been sent a message would have a new entry in the text message table, with the last column referencing the contact_id field of the text_contacts table. This way makes it much easier to retrieve messages by contact, because you can say "select * from text_messages where text_contact = 1" instead of searching through each of the arrays on the single table to find the messages sent by a specific user.
What's the best storage mechanism (from the view of the database to be used and system for storing all the records) for a system built to track whois record changes? The program will be run once a day and a track should be kept of what the previous value was and what the new value is.
Suggestions on database and thoughts on how to store the different records/fields so that data is not redundant/duplicated
(Added) My thoughts on one mechanism to store data
Example case showing sale of one domain "sample.com" from personA to personB on 1/1/2010
Table_DomainNames
DomainId | DomainName
1 example.com
2 sample.com
Table_ChangeTrack
DomainId | DateTime | RegistrarId | RegistrantId | (others)
2 1/1/2009 1 1
2 1/1/2010 2 2
Table_Registrars
RegistrarId | RegistrarName
1 GoDaddy
2 1&1
Table_Registrants
RegistrantId | RegistrantName
1 PersonA
2 PersonB
All tables are "append-only". Does this model make sense? Table_ChangeTrack should be "added to" only when there is any change in ANY of the monitored fields.
Is there any way of making this more efficient / tighter from the size point-of-view??
The primary data is the existence or changes to the whois records. This suggests that your primary table be:
<id, domain, effective_date, detail_id>
where the detail_id points to actual whois data, likely normalized itself:
<detail_id, registrar_id, admin_id, tech_id, ...>
But do note that most registrars consider the information their property (whether it is or not) and have warnings like:
TERMS OF USE: You are not authorized
to access or query our Whois database
through the use of electronic
processes that are high-volume and
automated except as reasonably
necessary to register domain names or
modify existing registrations...
From which you can expect that they'll cut you off if you read their databases too much.
You could
store the checksum of a normalized form of the whois record data fields for comparison.
store the original and current version of the data (possibly in compressed form), if required.
store diffs of each detected change (possibly in compressed form), if required.
It is much like how incremental backup systems work. Maybe you can get further inspiration from there.
you can write vbscript in an excel file to go out and query a webpage (in this case, the particular 'whois' url for a specific site) and then store the results back to a worksheet in excel.