MySQL query execution with Prisma taking too long - mysql

I've built a node-GraphQl server (using GraphQl-yoga) with MySQL database running inside a docker container and I'm using Prisma to interact with it (i.e to perform all sort of DB operations). My db is growing faster with time (7 GB consumed in one month). I have 10 tables and one of them has 600 000 rows and its growing exponentially (almost 20 000 rows are being added to this table each day). As the application starts, it has to fetch the data from this table. Now the problem is that I have to stop and then restart mysql service each day for my application to work properly, otherwise it would either take too much time to load the data (from table with 6 lac rows) or it completely stops working (and again I've to restart MySQL service and then it starts working fine, at-least for one day). I don't know whether its the problem with mysql database and specifically with the table that has 600 000 rows and growing rapidly (i'm new to mysql) or using prisma which performs all queries? Is there any possible way to get rid of this problem (stop and restart mysql service)?
// Table structure in datamodel.prisma file inside prisma folder
type Topics {
id: ID! #unique
createdAt: DateTime! #createdAt
locationId: Int
obj: Json
}

I am not sure how does Prisma API read data from this table.
My simple sugesstion is first you have to read first and last ID for last date using createdAt column and group it using this same column and get Min and Max ID. In this first read operation select ID only.
Then select records between these two ID. So you don't need to read all records each time.

Related

I want to figure out how to run a query on a very big table(more than 2 billion rows) without crashing MySQL database

I want exact index value of a particular date time(non-indexed column).Without running query on the non-index column(date time column) of a very huge table in mysql Database.

Altering MySQL table column type from INT to BIGINT

I have a table with just under 50 million rows. It hit the limit for INT (2147483647). At the moment the table is not being written to.
I am planning on changing the ID column from INT to BIGINT. I am using a Rails migration to do this with the following migration:
def up
execute('ALTER TABLE table_name MODIFY COLUMN id BIGINT(8) NOT NULL AUTO_INCREMENT')
end
I have tested this locally on a dataset of 2000 rows and it worked ok. Running the ALTER TABLE command across the 50 million should be ok since the table is not being used at the moment?
I wanted to check before I run the migration. Any input would be appreciated, thanks!
We had exactly same scenario but with postgresql, and i know how 50M fills up the whole range of int, its gaps in the ids, gaps generated by deleting rows over time or other factors involving incomplete transactions etc.
I will explain what we ended up doing, but first, seriously, testing a data migration for 50M rows on 2k rows is not a good test.
There can be multiple solutions to this problem, depending on the factors like which DB provider are you using? We were using mazon RDS and it has limits on runtime and what they call IOPS(input/output operations) if we run such intensive query on a DB with such limits it will run out of its IOPS quota mid way throuh, and when IOPS quota runs out, DB ends up being too slow and kind of just useless. We had to cancel our query, and let the IOPS catch up which takes about 30 minutes to 1 hour.
If you have no such restrictions and have DB on premises or something like that, then there is another factor, which is, if you can afford downtime?**
If you can afford downtime and have no IOPS type restriction on your DB, you can run this query directly, which will take a lot fo time(may half hour or so, depending on a lot of factors) and in the meantime
Table will be locked, as rows are being changed, so make sure not only this table is not getting any writes, but also no reads during the process, to make sure your process goes to the end smoothly without any deadlocks type situation.
What we did avoiding downtimes and the Amazon RDS IOPS limits:
In my case, we had still about 40M ids left in the table when we realized this is going to run out, and we wanted to avoid downtimes. So we took a multi step approach:
Create a new big_int column, name it new_id or something(have it unique indexed from start), this will be nullable with default null.
Write background jobs which runs each night a few times and backfills the new_id column from id column. We were backfilling about 4-5M rows each night, and a lot more over weekends(as our app had no traffic on weekends).
When you are caught up backfilling, now we will have to stop any access to this table(we just took down our app for a few minutes at night), and create a new sequence starting from the max(new_id) value, or use existing sequence and bind it to the new_id column with default value to nextval of that sequence.
Now switch primary key from id to new_id, before that make new_id not null.
Delete id column.
Rename new_id to id.
And resume your DB operations.
This above is minimal writeup of what we did, you can google up some nice articles about it, one is this. This approach is not new and pretty much common, so i am sure you will find even mysql specific ones too, or you can just adjust a couple of things in this above article and you should be good to go.

Node.js: Fetching million rows from MySQL and processing the stream

I have a MySQL table with over 100 million rows. This table is a production table from which a lot of read requests are served. I need to fetch lets say a million rows from this table, process the rows in Node.js script and then store the data to Elasticsearch.
I have done this a lot with MongoDB without facing any issues. I start a read stream and I keep on pausing the stream after reading every 1000 rows, and once the downstream process is complete for this set of 1000 rows, I resume the stream and keep on doing till all the rows are processed. I have not faced any performance challenges with MongoDB since the find query returns a cursor which fetches result in batches. So, irrespective of how big my input is, it does not cause any issues.
Now, I don't know how the streaming queries in MySQL work under the hood. Will my approach work for MySQL or will I have to execute the query again and again like first select query fetches those rows where id < 1000 and the next query fetches result where id between 1000 and 2000 and so on. Anybody worked on similar problem before. I found a similar question on Stackoverflow but there was no answer.
You can use LIMIT (Starting Point),(No.Of Records) you need to fetch at a time and change Starting point after every iteration to multiple of no. of records like -
LIMIT 0,1000(It will start fetching row from Zeroth Position and will fetch 1000 records) and next time use LIMIT 1000,1000(It will fetch next 1000 records starting from 1000th row)

Worsening performance with mysql data insertion and entity framework

I'm pulling match data from an online game API and saving details to a locally hosted mysql database. Each call to the API returns about 100 matches and I'm inserting 15 matches at a time. For each match i'm inserting anywhere from 150-250 rows in 5 tables.
I've used the optimizations described here: Fastest Way of Inserting in Entity Framework
I've been able to insert about 9 matches/sec but now that I've saved 204,000 matches the insertion time has slowed to 2.5 matches/sec. I hope to save all matches since inception which is probably around 300+ million matches.
I can't use SqlBulkCopy because this is a mysql database.
Is there any further optimizations that I can do? I'd like to parallelize, but I suppose I'll still be blocked on DB.
Thanks.

MySQL vs Redis for storing follower/following users

I'm trying to find the best candidate for storing the follower/following user data,
I was thinking initially to store it in Redis in a set where user -> set of user ids but then I thought about scenario where there is over 1 million or even 10 million followers for a user, how Redis would handle such a huge set? also there is no way I can do pagination on a set in redis, I have to retrieve the whole set which will not work if a user wants to browse who following after him.
If I store it in MySQL I can definitely do pagination but it might take a long time fetching 10 million records from the database whenever I have to build the user feed, I can do this in old batch fashion but it still sounds pretty painful whenever a user who has many followers will post something and then processing these 10 million records would take forever just for fetching his followers.
Would it be worthwhile to store it in MySQL for pagination (mainly Frontend) and in Redis for the event-driven messaging which builds the activity feed?
It's a personal decision whether to use redis or mysql for this task. Both will have no problem with those 10 million records.
MySQL has the LIMIT x,y command for getting a subset of the followers from the database.
For redis you can use sorted sets and use the userid of the follower or the time user started following as score for an sorted set. And like MySQL redis supports getting a subset of the large sorted set.