How to increase the limitation of kudu table column from 300 to 1000? - kudu

We want to increase the limitation of kudu table column from 300 to 1000.
Please suggest the steps or approach to increase? We know it might affect the production but its client requirement.

Not sure what happens if exceeding 300 columns limit, but here you are: --unlock-unsafe-flags --max-num-columns=1000. this will help you

Related

Assigning remainder value to largest value in stacked SSRS bar graph

I have a 100% stacked bar graph in SSRS. Due to the rounding some of the categories does not add up to 100. In some instances the total would be 99 or 101.
I want to achieve the following:
1. Calculate the total of the stack
2. If the total is greater than 100 then the difference should be deducted from the smallest value in the stack
3. If the total is smaller than 100 then the difference should be added to the largest value in the stack
Is this possible?
Or what other solution is there to resolve this issue?
Thanks
Achieving this functionality dynamically within the report itself would be very difficult and likely inefficient. I would suggest doing this calculation in the database instead. For example, have your SQL query populate a temporary table, run the logic to update the table, and then return the results to the report dataset.
This depends on the type of database you're using and the permissions you have. But in general, this would be a better approach.
Try increasing the precision of the results in your data. If your percentages are at 1 or 2 decimals (100.00%), you are highly unlikely to have enough remainders to run up or down a full integer. 100.02% should still show as 100% when formatted without decimals.

MySQL - Basic issue with a large table

In my db there are two large tables. The first one (A) has 1.7 million rows, the second one (B): 2.1 millions. Records in A and B have a fairly identical size.
I can do any operation on A. It takes time, but it works. On B, I can't do anything. Even a simple select count(*) just hangs for ever. The problem is I don't see any error: it just hangs (when I show the process list it just says "updating" for ever).
It seems weird to me that the small delta (percentage-wise) between 1.7 and 2.1 million could make such a difference (from being able to do everything, to not even be able to do the simplest operation).
Can there be some kind of 2 million rows hard limit?
I am on Linux 2.6+, and I use innoDB.
Thanks!
Pierre
It appears it depends more on the amount of data in each row than it does on the total number of rows. If the rows contain little data, then the maximum rows returned will be higher than rows with more data. Check this link for more info:
http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html
The row size (the number of bytes needed to store one row) might be much larger for the second table. Count(*) may require a full table scan - ie reading through the entire table on disk - larger rows mean more I/O and longer time.
The presence/absence of indexes will likely make a difference too.
As I was saying in my initial post, the thing was the two tables were fairly similar, so row size would be fairly close in both tables. That's why I was a bit surprised, and I started to think that maybe, somehow, a 2 million limit was set somewhere.
It turns out my table was corrupted. It is bizarre since I was still able to access some records (using joins with other tables), and mySQL was not "complaining". I found out by doing a CHECK TABLE: it did not return any error, but it crashed mysqld every time...
Anyway, thank you all for your help on this.
Pierre

Find out how much storage a row is taking up in the database

Is there a way to find out how much space (on disk) a row in my database takes up?
I would love to see it for SQL Server CE, but failing that SQL Server 2008 works (I am storing about the same data in both).
The reason I ask is that I have a Image column in my SQL Server CE db (it is a varbinary[max] in the SQL 2008 db) and I need to know now many rows I can store before I max out the memory on my device.
Maybe not the 100% what you wanted but if you want to know how much size an Image take just do
SELECT [RaportID]
,DATALENGTH([RaportPlik]) AS 'FileSize'
,[RaportOpis]
,[RaportDataOd]
,[RaportDataDo]
FROM [Database]
Any other additional counting you need to do yourself (as in prediction etc).
A varbinary(max) column could potentially contain up to 2GB of data by itself for each row. For estimated use based on existing data, perhaps you could do some analysis using the DATALENGTH function to work out what space a typical one of your images is taking up, and extrapolate from there.
You can only make rough guesses - there is no exact answer to the question "how many rows I can store before I max out the memory on my device" since you do not have exclusive use of your device - other programs take resources too, and you can only know how much storage is available at the present time, not at some time in the future. Additionally, your images are likely compressed and therefore take variable amounts of RAM.
For guessing purposes, simply the size of your image is a good approximation of the row size; the overhead of the row structure is negligible.

Best free way to store 20 million rows a day?

Daily 20-25 million rows that will be removed at midnight for next days data. Can mySQL handle 25 million indexed rows? What would be another good solution?
You give very little information on the context but sometimes not using a database and instead a binary/plain text file is just fine and can -- depending on your requirements -- be much more efficient and maintainable. e.g if it's sensor data storing it in a binary file with each record at a known offset could be a good solution. You saying that the data would be deleted every 24h seems to indicate that you might not need some the properties of a relational database solution such as ACID, replication, integrated backup and so on, so perhaps a flat file approach is just fine?
Our MySQL database has over 300 million rows indexed and we only ever experience problems with complex joins running a little slow - most can be optimized though.
Handling the rows was no problem - the key to our performance was good indexes.
Considering you are dropping the information at midnight, i would also look at MySQL partitioning which would allow you to drop that part of the table whilst allowing the next day to continue inserting if need be.
The issue is not the number of rows itself -- it's what you do with the database. Are you doing only inserts during the day followed by some batch report? Or, are you doing thousands of queries per second on the data? Inserts/Updates/Deletes? If you slam enough load at any database platform, you can max it out with a single table and a single row (taking it to the most extreme). I used MySQL 4.1 w/ MyISAM (hardly the most modern of anything) on a site with a 40M row user table. It did < 5ms queries, I think. We were rendering pages in less than 200ms. However, we had lots and lots of caching set up, so the number of queries wasn't too high. And, we were doing simple statements like SELECT * FROM USER WHERE USER_NAME = 'SMITH'
Can you comment more on your use case?
If you are using Windows, you could do worse than use SqlExpress 2008, which should easily handle that load, depending on how many indexes you are creating on it. So long as you keep < 4GB total db size, it shouldn't be a problem.
From my experience, mySQL tends to not scale well at all. If you must have a free solution for this I would highly recommend postgreSQL.
Also (this may or may not be an issue for you), but keep in mind that if you're dealing with that much data, the maximum size of a mySQL database is 4 terabytes, if I remember correctly.
I don't think there is a practical limit on the max number of rows in mySQL, so if you MUST use mySQL, I think it would work for what you want to do, but personally for a production system I wouldn't recommend it.
As a general solution I'd recommend PostgreSQL too, but depending on your specific needs, other solutions might be better/faster. For example, if you do not need to query your data while it is being written, TokyoCabinet (the table based API / TDB) might be faster and more lightweight/robust.
I haven't looked into them in mysql, but this sounds like a perfect application for table partitions
use only as an index database and store it in the form of file approach would be more effective because you will remove within 24 hours and the process will be faster also not burden your server

Can MySQL Cluster handle a terabyte database

I have to look into solutions for providing a MySQL database that can handle data volumes in the terabyte range and be highly available (five nines). Each database row is likely to have a timestamp and up to 30 float values. The expected workload is up to 2500 inserts/sec. Queries are likely to be less frequent but could be large (maybe involving 100Gb of data) though probably only involving single tables.
I have been looking at MySQL Cluster given that is their HA offering. Due to the volume of data I would need to make use of disk based storage. Realistically I think only the timestamps could be held in memory and all other data would need to be stored on disk.
Does anyone have experience of using MySQL Cluster on a database of this scale? Is it even viable? How does disk based storage affect performance?
I am also open to other suggestions for how to achieve the desired availability for this volume of data. For example, would it be better to use a third party libary like Sequoia to handle the clustering of standard MySQL instances? Or a more straight forward solution based on MySQL replication?
The only condition is that it must be a MySQL based solution. I don't think that MySQL is the best way to go for the data we are dealing with but it is a hard requirement.
Speed wise, it can be handled. Size wise, the question is not the size of your data, but rather the size of your index as the indices must fit fully within memory.
I'd be happy to offer a better answer, but high-end database work is very task-dependent. I'd need to know a lot more about what's going on with the data to be of further help.
Okay, I did read the part about mySQL being a hard requirement.
So with that said, let me first point out that the workload you're talking about -- 2500 inserts/sec, rare queries, queries likely to have result sets of up to 10 percent of the whole data set -- is just about pessimal for any relational data base system.
(This rather reminds me of a project, long ago, where I had a hard requirement to load 100 megabytes of program data over a 9600 baud RS-422 line (also a hard requirement) in less than 300 seconds (also a hard requirement.) The fact that 1kbyte/sec × 300 seconds = 300kbytes didn't seem to communicate.)
Then there's the part about "contain up to 30 floats." The phrasing at least suggests that the number of samples per insert is variable, which suggests in turn some normaliztion issues -- or else needing to make each row 30 entries wide and use NULLs.
But with all that said, okay, you're talking about 300Kbytes/sec and 2500 TPS (assuming this really is a sequence of unrelated samples). This set of benchmarks, at least, suggests it's not out of the realm of possibility.
This article is really helpful in identifying what can slow down a large MySQL database.
Possibly try out hibernate shards and run MySQL on 10 nodes with 1/2 terabyte each so you can handle 5 terabytes then ;) well over your limit I think?