What is the best approach to combine multiple MySQL tables in R? For instance, I need to rbind 14 large `MySQL tables (each >100k rows by 100 columns). I tried the below approach, which consumed most of my memory and got time out from MySQL. I am wondering if there is alternative solution? I do not need to fetch the whole table, just need group the whole table by a couple of variables and calculate some metrics.
station_tbl_t <- dbSendQuery(my_db, "select * from tbl_r3_300ft
union all
select * from tbl_r4_350ft
union all
select * from tbl_r5_400ft
union all
select * from tbl_r6_500ft
union all
select * from tbl_r7_600ft
union all
select * from tbl_r8_700ft
union all
select * from tbl_r9_800ft
union all
select * from tbl_r10_900ft
union all
select * from tbl_r11_1000ft
union all
select * from tbl_r12_1200ft
union all
select * from tbl_r13_1400ft
union all
select * from tbl_r14_1600ft
union all
select * from tbl_r15_1800ft
union all
select * from tbl_r16_2000ft
")
Consider iteratively importing MySQL table data and then row bind with R. And be sure to select needed columns to save on overhead:
tbls <- c("tbl_r3_300ft", "tbl_r4_350ft", "tbl_r5_400ft",
"tbl_r6_500ft", "tbl_r7_600ft", "tbl_r8_700ft",
"tbl_r9_800ft", "tbl_r10_900ft", "tbl_r11_1000ft",
"tbl_r12_1200ft", "tbl_r13_1400ft", "tbl_r14_1600ft",
"tbl_r15_1800ft", "tbl_r16_2000ft")
sql <- "SELECT Col1, Col2, Col3 FROM"
dfList <- lapply(paste(sql, tbls), function(s) {
tryCatch({ return(dbGetQuery(my_db, s))
}, error = function(e) return(as.character(e)))
})
# ROW BIND VERSIONS ACROSS PACKAGES
master_df <- base::do.call(rbind, dfList)
master_df <- plyr::rbind.fill(dfList)
master_df <- dplyr::bind_rows(dfList)
master_df <- data.table::rbindlist(dfList)
Related
I am trying to join 3 different tables that holds my test execution results as "PASS", "FAIL" and "SKIP". There are 2 common properties in these 3 tables on the basis of which I need to club my result i.e. "BUILD_NUMBER" and "COMPONENT".
Tried several approach but does not get the desired result.
Best result reached so far.
Sample query:
select test_execution.COMPONENT, test_execution.BUILD_NUMBER,
count(test_execution.TEST_STATUS) as PASS from (test_execution
INNER JOIN test_execution_fail ON
test_execution.BUILD_NUMBER = test_execution_fail.BUILD_NUMBER) group by
COMPONENT,BUILD_NUMBER;
My tables look like below:
CREATE TABLE test_execution_skip (
BUILD_NUMBER int,
TEST_NAME varchar(255),
TEST_CLASS varchar(255),
COMPONENT varchar(255),
TEST_STATUS varchar(255)
);
Other two tables are exactly same with test_execution and test_execution_fail as their names.
test_execution table holds 3 records(all pass values), test_execution_fail table holds 2 records (all fail values) and test_execution_skip table holds 1 record(skip value).
I want to populate data that will show me BUILD_NUMBER, COMPONENT, TOTAL, PASS, FAIL, SKIP as records where TOTAL, PASS, FAIL and SKIP will show the respectives counts.
Any help is appreciated here.
Not sure if this answers your question but you could try something like this
WITH cte AS (
SELECT * FROM test_execution
union
SELECT * FROM test_execution_fail
UNION
SELECT * FROM test_execution_skip
)
SELECT t.*, (SKIP + FAIL + PASS) AS TOTAL FROM (
select
COMPONENT,
BUILD_NUMBER,
SUM(IF(TEST_STATUS = 'skip', 1, 0 )) as SKIP,
SUM(IF(TEST_STATUS = 'fail', 1, 0 )) as FAIL,
SUM(IF(TEST_STATUS = 'pass', 1, 0 )) as PASS
FROM cte
group by COMPONENT,BUILD_NUMBER
)t
db fiddle
I am trying to make a request where I select from an array of value using the IN, but inside this array, if I have the same value twice, I'd like the request to return the result twice.
To clarify, here is an example:
select id_exo, count(id_exo) FROM BLOC WHERE id_seance IN (10,10) group by id_exo
So inside the IN, I put 2 times the value 10.
Here is the result:
id_exo
count(id_exo)
60
1
82
1
But in count, I'd like to have the number 2 since I have put twice 10 inside my IN.
How can I achieve that?
SELECT id_exo, COUNT(id_exo)
FROM bloc
JOIN (SELECT 10 id_seance
UNION ALL
SELECT 10) val USING (id_seance)
GROUP BY id_exo
Prior to MySQL 8.0 you can join with a sub select:
select * from BLOC as b
inner join (
select 1 as 'id', 10 as 'value'
union
select 2,10
union
select 3,10) as myValues on myValues.value = b.id_seance
You need the id column as the UNION statement removes duplicate rows
If you are lucky enough to have MySQL 8.0 look at the VALUES statement
https://dev.mysql.com/doc/refman/8.0/en/values.html
Here you should instead be able to join with something like
VALUES ROW(10), ROW(10), ROW(10)
I have below Mysql query -
Select ('GOOGLE.COM',
'MSN.COM',
'YAHOO.COM',
'YOUTUBE.COM',
'BING.COM',
'FACEBOOK.COM',
'LIVE.COM',
'MICROSOFT.COM',
'WIKIPEDIA.ORG',
'LINKEDIN.COM') as domain from dual;
I'm expecting the below resultset.
'GOOGLE.COM'
'MSN.COM'
'YAHOO.COM'
'YOUTUBE.COM'
'BING.COM'
'FACEBOOK.COM'
'LIVE.COM'
'MICROSOFT.COM'
'WIKIPEDIA.ORG'
'LINKEDIN.COM'
But I'm getting error Operand should contain 1 column(s).
How can I fix my query so I can get correct results?
If you want one record per value, you can use union all:
select 'GOOGLE.COM' domain
union all select 'MSN.COM'
union all select 'YAHOO.COM'
union all select 'YOUTUBE.COM'
union all select 'BING.COM'
union all select 'FACEBOOK.COM'
union all select 'LIVE.COM'
union all select 'MICROSOFT.COM'
union all select 'WIKIPEDIA.ORG'
union all select 'LINKEDIN.COM'
In other RDMBS, such as Postgres or SQL Server, this would have been as simple as:
SELECT domain
FROM ( VALUES
('GOOGLE.COM'),
('MSN.COM'),
('YAHOO.COM'),
('YOUTUBE.COM'),
('BING.COM'),
('FACEBOOK.COM'),
('LIVE.COM'),
('MICROSOFT.COM'),
('WIKIPEDIA.ORG'),
('LINKEDIN.COM')
) AS t(domain);
But MySQL does not support this syntax. A workaround is to to create a table, that you can then link in your queries:
CREATE TABLE tmp (domain VARCHAR(50));
INSERT INTO tmp(domain) VALUES
('GOOGLE.COM'),
('MSN.COM'),
('YAHOO.COM'),
('YOUTUBE.COM'),
('BING.COM'),
('FACEBOOK.COM'),
('LIVE.COM'),
('MICROSOFT.COM'),
('WIKIPEDIA.ORG'),
('LINKEDIN.COM')
;
I have 8 queries all with the same design etc to make a new table but for different criteria's and would like to append them into one single table.
Is there any way with VBA code or possibly UNION to do this?
SELECT tbl_SCCMQ.CONTRACT_ACCOUNT_NUMBER, tbl_SCCMQ.BP_Partner, tbl_SCCMQ.CONTRACT_NUMBER, tbl_SCCMQ.BILL_TO_DATE, tbl_SCCMQ.CONTRACT_START_DATE, tbl_SCCMQ.AGEING_DATE, tbl_SCCMQ.DateDiff, tbl_SCCMQ.PAYMENT_TYPE, tbl_SCCMQ.BP_Type, tbl_SCCMQ.[Next Bill Due Date], tbl_SCCMQ.[BAND], tbl_SCCMQ.RAG, tbl_SCCMQ.BILL_STATUS INTO tbl_01_Resi_CCQ_R1_4_Never_Billed_NoSS
FROM tbl_SCCMQ
WHERE (((tbl_SCCMQ.BP_Type)="B2C") AND ((tbl_SCCMQ.RAG) Like "R*") AND ((tbl_SCCMQ.BILL_STATUS)="First") AND ((tbl_SCCMQ.BILL_BLOCK) Is Null) AND ((tbl_SCCMQ.BILL_LOCK) Is Null) AND ((tbl_SCCMQ.INVOICE_LOCK) Is Null));
Here are two tables,
qry_01_Resi_CCQ_R1_4_Never_Billed_NoSS
qry_02_SME_CCQ_R1_4_Never_Billed_NoSS
and would like them all importing into main table "Data"
I am quite new to Access and VBA etc.
your question looks like you know how to solve the problem.
Note: queries 1 to 8 must have the same number of fields and datatypes must be consistent for each field's ordinal position (asserted in your question.)
SQL syntax to create a new table (Data) from the queries:
select *
INTO Data
from (
select * from query1
union all
select * from query2
union all
...
union all
select * from query8
) as queryData
or
SQL syntax to append data to existing table:
INSERT INTO Data
select *
from (
select * from query1
union all
select * from query2
union all
...
union all
select * from query8
) as queryData
VBA syntax to run the query in code:
dim db as dao.database: set db = Currentdb
dim strSQL as string
strSQL = "...." ' as above
db.execute strSQL
I'm wondering if there is significant difference in performance between
select * from table where something
and
select column from table where something
SELECT * FROM table WHERE something
Will retrive all columns in that table where as
SELECT column FROM table WHERE something
would only retrive that column.
This means that the later would be faster. But if that would be a SIGNIFICANT diference depends on you table size.
You can read this answer on a similar question for more info
Yes there is performance difference.
SELECT * FROM someTable WHERE something
will be heavier as compared to
SELECT column FROM someTable WHERE something
Because the first one will have to process all the columns while the second one will have to process only one. You should always prefer the second one.
For further detail, I would refer you to What is the reason not to use select *?
Here's a little benchmark I did to compare SELECT * vs SELECTing individual columns. It's a simplified code with 100 iterations in the loop and in my test I queried only what I needed, 25 columns out of 34, vs SELECT *
Results: SELECT * took on average 4.8sec to complete, and SELECT individual columns took 3.2sec ! This is very significant. So indeed SELECT * is much slower. However, on a smaller query from a table with 4 columns, selecting all vs * gave virtually the same benchmarks. So in my tests, SELECT * will have a performance impact on complex queries from big tables with a lot of columns.
$start_time = microtime(true);
for ($x = 0; $x < 100; $x++) {
$results = $dbp->multi(
"SELECT t1.id, t1.name, t2.field, t2.something, t2.somethingelse "
. "FROM table1name t1 INNER JOIN table2name t2 ON t2.id = t1.id "
. "ORDER BY title ASC LIMIT 1, 50");
}
$ms = (number_format(microtime(true) - $start_time, 4) * 1000);
$end_time_sec = floor($ms / 60000) . 'm:' . floor(($ms % 60000) / 1000) . 's:' . str_pad(floor($ms % 1000), 3, '0', STR_PAD_LEFT) . 'ms';
echo "$ms ms / $end_time_sec";