Execute run time query VS storing results in database - mysql

I am working on very large database. Due to the privacy policy I can't write the original schema but I can give an example similar to that.
Example
Let's say I have a product table. It is the main table. It had many details and for those individual details I created other tables like product_detail_1 product_detail_2 product_detail_3 upto 10. These details tables contains many different values.
Now all these tables have a foreign key which refers to the primary key in a product table. So if I want any particular data then it's very easy to retrieve once I have the key. .
Now my question is if I want to display results based on particular details then which method I should follow?
Method 1:
I run query everytime I want to search.
In this case the query will be too long because the query likely to contain 4 to 7 where clauses on different detail tables so it will take some time to execute.
OR
Method 2:
I store the result of every search query in a new table.
Benefits of this technique is
- If the same query executed again then the result is already available so it won't take much time.
So if I store query result in a table with text datatype column is there any limitations for length in column?
Or should I try method 1?
I'm using mysql database

Related

Filtering a query by another query while allowing record input

I have 2 queries. A, B.
Query A has several columns of data and B has only 1 column. When I link A & B I get exactly what I want (filtered records of A).
However, I still do want to input new data into the query, how do I do this?
Ok then :)
Question was how to make a query with JOINs updateable.
See: Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables
Reasons why a Query or Recordset is not Updateable
There are many reasons why your data may not be updateable. Some are
pretty obvious:
The query is a Totals query (uses GROUP BY) or Crosstab query (uses TRANSFORM), so the records aren't individual records
The field is a calculated field, so it can't be edited
You don't have permissions/rights to edit the table or database
The query uses VBA functions or user defined functions and the database isn't enabled (trusted) to allow code to run
Some reasons are less obvious but can't be avoided:
Linked tables without a primary key for certain backend databases (e.g. SQL Server). Access/Jet requires the table to be keyed to make
any changes. This makes sense since Access wants to issue a SQL query
for modifications but can't uniquely identify the record.
Less obvious are these situations:
Queries with some fields are summaries linked to individual records and the individual records still can't be edited
Queries with multi-table joins that aren't on key fields
Union queries
Another resource: http://allenbrowne.com/ser-61.html

Redshift Usage - 1 row by 400 columns per user or (20-400) rows by 4 columns per user

We are building an analytics engine which has to store attribute preference score for each user. We are expecting 400 attributes and they may change(at what frequency is not known as yet). We are planning to store this in Redshift.
My qs is:
Should we store as 1 row per user with 400 cols(1 column for each attribute)
or should we go for a table structure like
(uid, attribute id, attribute value, preference score) which will be (20-400)rows by 3 columns
Which kind of storage would lead to a better performance in Redshift.
Should be really consider NoSQL for this?
Note:
1. This is a backend for real time application with increasing number of users.
2. For processing, the above table has to be read with entire information of all attibutes for one user i.e indirectly create a 1*400 matrix at runtime.
Please help me which desgin would be ideal for such a use case. Thank you
You can go for tables like given in this example and then use bitwise functions
http://docs.aws.amazon.com/redshift/latest/dg/r_bitwise_examples.html
Bitwise functions are here
For your problem, I would suggest a two table design. Its more pain in the beginning but will help in future.
First table would be a key value kind of first table, which would store all the base data and would be kind of future proof, where you can add/remove more attributes, but this table will continue working.
And a N(400 in your case) column 2nd table. This second table you can build using the first table. For the second table, you can start with a bare minimum set of columns .. lets say only 50 out of those 400. So that querying this table would be really fast. And the structure of this table can be refreshed periodically to match with the current reporting requirements. Also you will always have the base table in case you need to backfill any data.

MySQL Query: Return all rows with a certain value in one column when value in another column matches specific criteria

This may be a little difficult to answer given that I'm still learning to write queries and I'm not able to view the database at the moment, but I'll give it a shot.
The database I'm trying to acquire information from contains a large table (TransactionLineItems) that essentially functions as a store transaction log. This table currently contains about 5 million rows and several columns describing products which are included in each transaction (TLI_ReceiptAlias, TLI_ScanCode, TLI_Quantity and TLI_UnitPrice). This table has a foreign key which is paired with a primary key in another table (Transactions), and this table contains transaction numbers (TRN_ReceiptNumber). When I join these two tables, the query returns one row for every item we've ever sold, and each row has a receipt number. 16 rows might have the same receipt number, meaning that all of these items were sold in a single transaction. Below that might be 12 more rows, each sharing another receipt number. All transactions are broken down into multiple rows like this.
I'm attempting to build a query which returns all rows sharing a single receipt number where at least one row with that receipt number meets certain criteria in another column. For example, three separate types of gift cards all have values in the TLI_ScanCode column that begin with "740000." I want the query to return rows with values beginning with these six digits in the TLI_ScanCode column, but I would also like to return all rows which share a receipt number with any of the rows which meet the given scan code criteria. Essentially, I need the query to return all rows for every receipt number which is also paired in at least one row with a gift card-related scan code.
I attempted to use a subquery to return a column of all receipt numbers paired with gift card scan codes, using "WHERE A.TRN_ReceiptAlias IN (subquery..." to return only those rows with a receipt number which matched one of the receipt numbers returned by the subquery. This appeared to run without issue for five minutes before the server ground to a halt for another twenty while it processed the query. The query appeared to complete successfully, but given that I was working with IT to restore normal store operations during this time I failed to obtain the results of the query (apart from the associated shame and embarrassment).
I'd like to know if there is a way to write a query to obtain this information without causing the server to hang. I'm assuming that either: a) it wasn't very smart to use a subquery in this manner on such a large table, or b) I don't know enough about SQL to obtain the information I need. I'm assuming the answer is both A and B, but I'd very much like to learn how to do this the right way. Any help would be greatly appreciated. Thanks!
SELECT *
FROM a as a1
JOIN b
ON b.id = a.id
JOIN a as a2
ON a2.id = b.id
WHERE b.some_criteria = 'something';
Include an index on (b.id,b.some_criteria)
You aren't the first person, nor will you be the last to bring down your system with an inefficient query.
The most important lesson is that "Decision Support" and "Analytics" really don't co-exist with a transaction system. You really want to pull the data into a datamart or datawarehouse or some other database that isn't your transaction database, so that you don't take the business offline.
In terms of understanding why your initial query was so inefficient, you want to familiarize yourself with the EXPLAIN EXTENDED syntax that returns you plan information that should help you debug your query and work on making it perform acceptably. If you update your question with the actual explain plan output for it, that would be helpful in determining what the issue is.
Just from the outline you provided, it does sound like a self join would make sense rather than the subquery.

SQL to update column in modified table

I am a reasonably competent SQL programmer but my skills are still pretty much in the domain of simple INSERT, SELECT, UPDATE statements with an occasional LIKE etc thrown in. What I am currently trying to do is rather more complex. Here is the scenario.
I have three tables.
Table 1, *users* identifies users via a User ID, uid. Users can have one or more sub accounts
Table 2 *accounts* keeps a record of subaccounts for each user with, amongst other things the columns uid and sid where uid is the one defined in the *users* table.
Table 3, *data* is currently storing some data, in a data column that is being associated with a particular subaccount, sid.
The thing I have just realized is that there is no particular reason to block users from using those data across subaccounts. No problem - I can change my data subset search SQL to work with the uid instead. However, given the frequency of such searches, it seems well worth while simply sticking in a uid column in *data*.
To do that I would need to write some smart SQL that would get uid,sid pairs from the *accounts* table and use that information to update the newly created uid column in the data table. This I have to admit is beyond my knowledge of SQL.
I should mention that the system using these data is now in production and has several 100s of users so the option of just acting like they are not there is not available. Not terribly relevant I think but I should mention that uid and sid are alphanumeric strinsg with both columns being indexed.
I would be most grateful to anyone here who might be able to help out with it.
Mysql can do updates based on joins and based on reading of your schema here's what I'd do...
UPDATE accounts a, data d
set d.uid=a.uid
where a.sid=d.sid
and d.uid is NULL

MySQL Partitioning, Delete old data from multiple related tables

I am new to MySQL partitioning, therefore any example will be appreciated.
I am trying to create a sort of an ageing mechanism for a data that is distributed between several MyISAM tables.
My question will actually include several sub-questions.
The relevant tables are:
First table contains raw data with high input frequency (next to each record there is an auto incremented id).
Second table contains processed results, there is a result record per every raw data record (result record contains the source id record of the auto incremented field of raw data record)
Questions:
I need to be able to partition the raw data table and result data table similarly so that both of them will include only 10 weeks of data in single partition (each raw data record contains unixtimestamp field), how do i do it , can someone write small example case for two such tables?.
I want to be able to change the 10 weeks constraint on the fly.
I want that when ever the current partition will be filled or a new partition is created , the previous (10 weeks before) partition will be deleted automatically.
I don't want the auto increment id integer to be overflown, as much as i understand the ids are unique for the partition only, so if i am not wrong the auto increment id will start from zero for the next partition? but what if the previous partition still exist, will i have 2 duplicated ids , how i know to reference only for the last id when i present a result record?
I want to load raw data using LOAD DATA INTO... instead of multiple inserts , is MySQL partitioning functionality affected?
And the last question, would you suggest some other approach to implement aging mechanism (i am writing Java implementation product that processes around 1 GB or raw data per day and stores the results in MySQL)
It's hard to give a real answer on this question since it depends on your data. But let me give you some things to think about.
I assume we're talking about some kind of logs with recent data (so not spanning multiple years). You can partition by range. You could add one field to your table with the year/week number (ie 201201, 201202, etc). If this question is related to your question about importing into multiple tables, you can easily do this is that import script.
On the fly as in, repartition your data on the fly (70GB?). I would not recommend it. But you could do it if you had the weeknumber in there. If you later want to change it to 12 days, you could add a column for the date and partition by that.
Well it won't be deleted automatically but a cron job can handle that right? Just check how many partitions there are, and if there are 3(?) delete the first one.
The partition needs to have a primary index on the field that you partition (if you want to use auto increment). Therefor you can never fully rely on the auto increment id alone. I don't see a way around this.
I'm not sure what you mean.
If your data is just some logs in chronological order then you might just use separate tables for each period. Then before you start the new period (at 00:00) check the last id of the last table, create a new table and set the auto increment to that value +1. Then your import will decide when a new period will begin so it can be easily changed. Your import script can use a small table in where it can store the next period.
LOAD DATA is really quite fast. I would just have two steps(in no partic order) - LOAD DATA and then 'delete .. where date < 10 weeks'. Autoincrement will go on for as long as the datatype you're using. If you wanted to be super careful you could push it back to zero periodically.
Once the data is in the 'raw' table run your routine to create the 'processed' table. We use a v similar process where I work. We keep a separate table that has 'write' and 'parse' pointers to all of our 'raw' tables. As new data comes in and gets parsed the appropriate row pointers get set. If the 'raw' table gets truncated you can reset the 'write' pointer but leave the 'parse' pointer. (we store the offset in another table when this happens - just to be sure).
And if I recommend , creating the index column for each of the related columns can also enhanced the performance Delete old data from multiple related tables since we have just compared the index numbers rather than strings.
I wonder if your tables are being sorted or not.