error while fetching rows in R - mysql

I want to fetch some data from my SQL server in R. The way I'm doing this is,
rs=dbSendQuery(con,"myquery")
data=fetch(rs,n=-1)
this works perfectly for a small table. However, for a bigger table, the fetch command says,
Warning message:
In fetch(ms, n = -1) : error while fetching rows
The problem still remains even if I restrict my rows (n=10). So, I'm not sure if it's a timeout problem or what.
What might be the case?
data shows,
1] creator ratio
<0 rows> (or 0-length row.names)

There are couple of points I want to mention which can help OP in identifying and fixing problem.
1) Do not use fetch. Instead use dbFetch. The quote from R-help suggests as
fetch() is provided for compatibility with older DBI clients - for all
new code you are strongly encouraged to use dbFetch()
2) Execute your query from Query Editor in SQL Server Management Studio and check for performance. Fine tune tables used query for indexes. Once ready and happy try it from R
3) If query is selecting many columns then it would be good to first try selecting just one or two columns.
4) I hope you freeing resources and closing connection in later part of your code. It can be done like:
# Free all resources
dbClearResult(rs)
# Close connection
dbDisconnect(con)

Related

Mysql Workbench - The best way to organize running frequently used SQL queries while development

I'm a java dev who uses Mysql Workbench as a database client and IntelliJ IDEA as an IDE. Every day I do SQL queries to the database from 5 up to 50 times a day.
Is there a convenient way to save and re-run frequently used queries in Mysql Workbench/IntelliJ IDEA so that I can:
avoid typing a full query which has already been used again
smoothly access a list of queries I've already used (e.g by auto-completion)
If there is no way to do it using Mysql Workbench / IDEA, could you please advise any good tools providing this functionality?
Thanks!
Create Stored Procedures, one per query (or sequence of queries). Give them short names (to avoid needing auto-completion).
For example, to find out how many rows in table foo (SELECT COUNT(*) FROM foo;).
One-time setup:
DELIMITER //
CREATE PROCEDURE foo_ct
BEGIN;
SELECT COUNT(*) FROM foo;
END //
DELIMITER ;
Usage:
CALL foo_ct();
You can pass arguments in in order to make minor variations. Passing in a table name is somewhat complex, but numbers of dates, etc, are practical and probably easy.
If you have installed SQLyog for your mysql then you can use Favorites menu option in which you can save your query and in one click it will automatically writes the saved query on Query Editor.
The previous answers are correct - depending on the version of the Query Browser they are either called Favorites or Snippets - the problem being you can't create sub-folders to group them. And keeping tabs open is an option - but sometimes the browser 'dies' - and you're back to ground 0. So the obvious solution I came up with - create a database table! I have a few 'metadata' fields for descriptions - the project a query is associated to; problem the query solves; and the actual query.
You could keep your query library in an SQL file and load that when WB opens (it's automatically opened when you restart WB and that file was open on last close). When you want to run a specific query place the caret in it's text and press Ctrl+Enter (Cmd+Enter on Mac) to run only this query. The organization of that SQL file is totally up to you. You have more freedom than any "favorites" solution can give you. You can even have more than one file with grouped statements.
Additionally, MySQL Workbench has a query history (see the Output Tab), which is saved to disk, so you can return to a query even month's after you wrote it.

Cheapest SQL Statement possible / Are there Client-Side SQL Statements?

Questions
What is/are the most cheapest SQL-Statment(s) (in terms of Processing Overhead/CPU Cycles).
Are there (this will most likely be DB-Client specific) any Statments that are evaluated directly by the client and even do not go to the database server?
The result doesn't matter, if an empty statement (which produces an SQL Error) is the cheapest OK, then this is good too. But I am more interested in non Error Responses.
Background:
I have an application that queries a lot of data from the DB. However I do not require this data. Sadly, I have no possibility to skip this query. But I have the possibility to change the SQL Query itself. So I am trying to find the cheapst SQL Statement to use, ideally it should not even go to the SQL Server and the SQL-Client Library should answer it. I will be using MySQL.
UPDATES (on comments):
Yes, it can be a No-Operation. It must be something I can pass as a regular SQL String to the mysql client library. Whatever that string could be, is the question. The goal is, that this Query then somehowreturns nothing, using the least Resources on the SQL Server as possible. But in idealcase the client itself will realize that this query doesnt even have to go to the server, like a version Check of the client library (OK I know this is no standard SQL then but maybe there is something I do not know about, a statement that will be "short circuited/answered" on the client itself).
Thanks very much!
DO 0
DO executes the expressions but does not return any results. In most respects, DO is shorthand for SELECT expr, ..., but has the advantage that it is slightly faster when you do not care about the result.

sending terminal/shell command from mysql to terminal and retrieve answer while looping cursor

I'm using php with MySQL on macOS.
I would like to select a large amount of emails from a database and perform a dns lookup for each email in my selection using a dig command from the terminal/shell, something like: "dig gmail.com" .
Of course, I can loop this select through php but it will be very slow compared to looping cursor on MySQL.
How to send terminal commands from mysql to the terminal and retrieve answer on macOS?
You can't execute shell commands from within an SQL query (thank god), or else it would be a horrible security vulnerability... You would have to do it from php.
P.S. It is however possible to execute shell commands from the MySQL command line utility
\! ls
...but if I understand your question, it won't help solve your current problem.
(I'm assuming that you really mean ADDR_SPEC when you're talking about email addresses)
but it will be very slow in compare with looping cursor on mysql
No not really. The only difference is that depending on how you implement this the PHP approach requires that you retrieve the entire result set before you start iterating through it. However breaking this up into smaller result sets is trivial.
Also, the limitation on the performance of your algorithm is the speed of DNS lookups - and that's all about latency - if your objective is to make this go faster then you should be running multple requests in parallel.
The next thing you should consider is that you've probably got multiple mailboxes for each MX, e.g. user1#gmail.com, user2#gmail.com.... While if you've got DNS caching setup properly there will be less overhead than going to the source each time, if you're working with a very large data set or will be doing this more than once, it makes a lot more sense to just work with unique MX host values, e.g.
SELECT DISTINCT SUBSTR(addr_spec FROM LOCATE('#', addr_spec)) AS mx2chk
FROM yourtable
WHERE addr_spec LIKE '%#%'
AND (email_checked IS NULL
OR email_checked<NOW() - INTERVAL 300 DAY )
;
Indeed, if your flagging the data then you can use your own database to verify the MX.
using a dig command from terminal/shell
Please don't tell me that you're running a shell from a PHP controlling process to do a DNS lookup?

mySQL C API multiple statements

So I'm building a C program that connects to a mySQL database. Everything worked perfectly. Then, to save on number of queries, I decided that I would like to execute 10 statements at a time. I set the "CLIENT_MULTI_STATEMENTS" flag in the connection, and separated my statements by semicolons.
When I execute the first batch of 10 statements, it succeeds and mysql_real_query() returns a 0.
When I try the second batch though, it returns a "1" and doesn't work. Nowhere can I find what this "1" error code means, so I was hoping someone may have run into this problem before.
Please note that these are all UPDATE statements, and so I have no need for result sets, it's just several straight-up calls to mysql_real_query().
It's not clear from the documentation whether the errors this function can cause are returned or not, but it should be possible to obtain the actual error using mysql_error().
My guess is that you still have to loop through the result sets whether you're interested in them or not.`
Are these prepared statements? If that's the case then you can't use CLIENT_MULTI_STATEMENTS.
Also, note (from http://dev.mysql.com/doc/refman/5.5/en/c-api-multiple-queries.html) that:
After handling the result from the
first statement, it is necessary to
check whether more results exist and
process them in turn if so. To support
multiple-result processing, the C API
includes the mysql_more_results() and
mysql_next_result() functions. These
functions are used at the end of a
loop that iterates as long as more
results are available. Failure to
process the result this way may result
in a dropped connection to the server.
You have to walk over all the results, regardless of whether or not you care about the values.

Is there a way to filter a SQL Profiler trace?

I'm trying to troubleshoot this problem using SQL Profiler (SQL 2008)
After a few days running the trace in production, finally the error happened again, and now i'm trying to diagnose the cause. The problem is that the trace has 400k rows, 99.9% of which are coming from "Report Server", which I don't even know why it's on, but it seems to be pinging SQL Server every second...
Is there any way to filter out some records from the trace, to be able to look at the rest?
Can I do this with the current .trc file, or will I have to run the trace again?
Are there other applications to look at the .trc file that can give me this functionality?
You can load a captured trace into SQL Server Profiler: Viewing and Analyzing Traces with SQL Server Profiler.
Or you can load into a tool like ClearTrace (free version) to perform workload analysis.
You can load into a SQL Server table, like so:
SELECT * INTO TraceTable
FROM ::fn_trace_gettable('C:\location of your trace output.trc', default)
Then you can run a query to aggregate the data such as this one:
SELECT
COUNT(*) AS TotalExecutions,
EventClass,
CAST(TextData as nvarchar(2000)) ,
SUM(Duration) AS DurationTotal ,
SUM(CPU) AS CPUTotal ,
SUM(Reads) AS ReadsTotal ,
SUM(Writes) AS WritesTotal
FROM
TraceTable
GROUP BY
EventClass,
CAST(TextData as nvarchar(2000))
ORDER BY
ReadsTotal DESC
Also see: MS SQL Server 2008 - How Can I Log and Find the Most Expensive Queries?
It is also common to set up filters for the captured trace before starting it. For example, a commonly used filter is to limit to only events which require more than a certain number of reads, say 5000.
Load the .trc locally and then Use save to database to local db and then query to your hearts content.
These suggestions are great for an existing trace - if you want to filter the trace as it occurs, you can set up event filters on the trace before you start it.
The most useful filter in my experience is application name - to do this you have to ensure that every connection string used to connect to your database has an appropriate Application Name value in it, ie:
"...Server=MYDB1;Integrated Authentication=SSPI;Application Name=MyPortal;..."
Then in the trace properties for a new trace, select the Events Selection tab, then click Column Filters...
Select the ApplicationName filter, and add values to LIKE to include only the connections you have indicated, ie using MyPortal in the LIKE field will only include events for connections that have that application name.
This will stop you from collecting all the crud that Reporting Services generates, for example, and make subsequent analysis a lot faster.
There are a lot of other filters available as well, so if you know what you are looking for, such as long execution (Duration) or large IO (Reads, Writes) then you can filter on that as well.
Since SQL Server 2005, you can filter a .trc file content, directly from SQL Profiler; without importing it to a SQL table. Just follow the procedure suggested here:
http://msdn.microsoft.com/en-us/library/ms189247(v=sql.90).aspx
An additional hint: you can use '%' as a filter wildcard. For instance, if you want to filter by HOSTNAME like SRV, then you can use SRV%.
Here you can find a complete script to query the default trace with the complete list of events you can filter:
http://zaboilab.com/sql-server-toolbox/anayze-sql-default-trace-to-investigate-instance-events
You have to query sys.fn_trace_gettable(#TraceFileName,default) joining sys.trace_events to decode events numbers.