I have a input file with roughy 200,000 lines and I need to query MySQL database (engine : InnoDB) for each of the line. I read one line from file at a time and make my target variable/field. Then I query MySQL database with that target field. The information I need is not located in a single table rather several tables with more than 100 million rows in them. Hence, I made a function to query the database. I make all the necessary queries inside the function and retrun the concatenation of the queries as text via the funciton. But it is not performing fast. Is there any way to speed this procedure or Shall I completely follow some other approach? I have checked the queries with explain command and they are fine. Some smart solutions would be highly appreciated.
Related
I have created CSV from Postgresql and successfully uploaded 180 million records to Neo4j. After that, I have created Indices. But when I tried to create Relationship using PERIODIC COMMIT in cypher-shell. The script got stuck. Even I changed PERIODIC COMMIT to 10. What should be the remedy.
It's hard to answer as we haven't seen the query. But for me it sounds like you haven't an index that identifies the node for each the relation creation. May you can run your query with the explain keyword a single relationship like
explain
match (n:Node {hasid:1}),(m:Node {hasid:2})
create (n)-[:REL]->(m)
and check if the query planer uses the index. If the query planer does not use the index change your query so that it is using the index. Otherwise it will take a very long time.
I have a project assignment that need the big data. So, I decide to test mysql query performance with big data. I want to multiply one table on the database. I've try it before, but I got an very long process to multiply it.
First I've try to use INSERT INTO the table itself and I got long process.
Second, I've tried a different way, and I use mysqlimport to import 1 GB data and I got about 1,5 hours long.
So If I want to enlarge the mysql table, do you have any suggestions for me?
Though this question should be flagged as "not constructive". I will still suggest you something.
If your objective is "only to make the table large" as per your comment. Why taking all the trouble to insert duplicate OR mysqlimport . Instead search for and download free sample large databases and play around
https://launchpad.net/test-db/+download
http://dev.mysql.com/doc/index-other.html
http://www.ozerov.de/bigdump/
If explicitly a particular table structure is needed, then run some DDL queries (ALTER TABLE) to shape those tables (downloaded) according to your wish
Simply said I have to write an application to synchronise several database tables. Because of the requirements the changes should be put into a queue (in form of a SQL statement) and here lies the problem: I'm not able to change the existing application which uses the database to add the executed query directly into the queue. Therefore I need to catch all data changing SQL queries of specific tables (> 20 tables) in the database.
I though about the following solutions:
To catch directly the MySQL query with triggers like it is described Can a trigger access the query string (best answer for this case I could find!), but I couldn't get the query that actives the trigger - only the query that I used within it.
To active the General Query Log. But I read about heavy performance considerations and so it isn't an arguable solution, because it would log even the tables I don't need (> 120 tables) and a lot of simple queries run on the database.
To use a history table filled by trigger. With this I wouldn't save the SQL statement of the queries with this solution (which would slow down my current concept of synchronisation), but it would be possible to realise.
Does someone know any other solution or how I could do the impossible by accessing the query within a trigger?
I'm grateful about any suggestion!
Related questions:
Can a trigger access the query string
Log mysql db changing queries and users
you could setup mysql proxy https://launchpad.net/mysql-proxy between existing application and mysql server. And intercept/modify/add any queries in the proxy.
I am currently working on an application in Access 2007 with a split FE and BE. FE is local wiht BE on a network share. To eliminate some of the issues found with using linked tables over a network, I am attempting, through VBA using ADO, to load two temp tables with data from two linked when the application first loads using the cn.Execute "INSERT INTO TempTable1 SELECT * FROM LinkedTable1" and cn.Execute "INSERT INTO TempTable2 SELECT * FROM LinkedTable2".
LinkedTable1 has 45,552 records in it and LinkedTable2 has 45,697 records in it.
The first execute statement takes anwhere from 50-85seconds. However the second execute statement takes no more than 9 seconds. These times are consistent. In an attempt to see if there were issues with one of the tables and not the other, I have switched the order of the statements in my code and the times still come out the same (first execute is way too long and second execute is very fast). (As a side note, I have also tried DAO using the CurrentDB.Execute command with no different results.) This would make sense to me if the first statement was processing more records than the second, but although a small number, the second table has more records than the first!
Does anyone have ANY suggestions on why this is happening and/or how to get this first execute statement to speed up?
Thanks in advance!
ww
What indexes do you have defined on the two temp tables, as well as primary key definitions? Updating the indexes as the data is appended could be one reason one table is slower.
My guess is that there are two sources for the difference:
the initial creation of the remote LDB file when you execute the first INSERT statement. This shows up as overhead in the first SQL command, when it's actually something that persists through both.
caching: likely the file is small enough that Jet/ACE is pulling large chunks of it across the wire (the header and metadata, plus the requested data pages) during the first operation so that there's much less data that is not already in local memory when the second command is issued.
My question is why you are having problems with linked table performance in the first place. Solve that and you then won't have to muck about with temp tables. See Tony Toews's Performance FAQ.
I have a table of about 800 000 records. Its basically a log which I query often.
I gave condition to query only queries that were entered last month in attempt to reduce the load on a database.
My thinking is
a) if the database goes only through the first month and then returns entries, its good.
b) if the database goes through the whole database + checking the condition against every single record, it's actually worse than no condition.
What is your opinion?
How would you go about reducing load on a dbf?
If the field containing the entry date is keyed/indexed, and is used by the DB software to optimize the query, that should reduce the set of rows examined to the rows matching that date range.
That said, it's a commonly understood that you are better off optimizing queries, indexes, database server settings and hardware, in that order. Changing how you query your data can reduce the impact of a query a millionfold for a query that is badly formulated in the first place, depending on the dataset.
If there are no obvious areas for speedup in how the query itself is formulated (joins done correctly or no joins needed, or effective use of indexes), adding indexes to help your common queries would by a good next step.
If you want more information about how the database is going to execute your query; you can use the MySQL EXPLAIN command to find out. For example, that will tell you if it's able to use an index for the query.