We are in need of a way to load a relationship explicitly outside of a query and from an given set of cache values.
Our query is quite complicated and have a few explicit joinedLoad(..) option. Sadly using too much of those is really slowing the query as a whole and so we started using subqueryLoad(..) technique. However this does not work as expected, subqueryLoad is emitting a second query using a distinct clause on the first query (which is quite costly). What we are trying to do instead is to build a set of the relationship key we need to load once the first query has finished to run. Once we get back the result from this second query, how do we tell sqlalchemy to associate the result of the first query with the result of the second query ?
Here a snippet showing what we do for now (it works but it is quite ugly):
result = session.query(tableA).option(lazyload('*'), joinedload(tableB)).all()
relationship_keys = set()
for r in result:
relationship_keys.add(r.tableC_id)
cache_relationships = session.query(tableC).filter(tableC.id.in_(relationship_keys)).all()
# link instance between them. This will not emit SQL as it will hit the cache previously loaded by using session.get(..)
[r.tableC_relationship for r in result]
Related
My customer is using MSAccess to read SQL Server Data.
Originally they created a linked table to the SQLServer base table,
then in Access, created a Query that aggregated and filtered.
Select f1,f2,sum(f3),sum(f4)
From linkedtablename
where fx = 'somevalue'
group by f1,f2
For security & performance reasons, I built simple a query in MSSQL to do the filtering & aggregating,
and asked the user to point to that instead, with a passthrough query.
So now they have and ODBC 'passthroughquery' doing a 'select * from MSSQLview'
However, MSAccess seems to be really struggling when we do anything with this passthrough.
e.g. Adding the passthrough to a new MSAccess query window takes forever.
Seems as if Access is doing some heavy reading of the source or source metadata each time we interact with it.
Running a select against the passthrough is also taking an age ... but with the aggregating being done by MSSQL, it should be a lot faster !?
So the question is, why does MSAccess struggle so much ?
Is Access trying to profile the source data even without an explicit 'select' being done ?
Or is it trying to read metadata every time we interact with the Passthrough ?
Ultimately, I am hoping that there is some configuration setting that will force Access to treat this like a 'standard' table !
If you use a PT query, keep in mind that if you use this as a source for a client query, then ZERO additional filtering can be used, or will work.
In other words?
A PT query is one of the fastest (high performance) ways to pull data. BUT ONLY if you do NOT attempt to add additional filtering. A PT query cannot be filtered by client side (well, it can, but ONLY after pulling the PT full query source).
As a result?
Use a linked view. They perform JUST as well as a stored procedure and a PT query, but they can and do respect client side filtering. So, for example you can build a client side query against that linked view with criteria, and ONLY the records matching that query are pulled down the network pipe.
this seems somewhat counter intuitive, but PT queries are fast, but they DO NOT respect additional filters client side (to be specific, you can filter against a pt query, but Access ONLY does so after pulling all records in the PT query). So, one would do VERY well to say avoid using a PT query to fill a combo box, or any thing else client side that will and does attempt to apply additional filtering and criteria.
To be crystal clear:
A PT query is great, but ONLY if you going to have the PT query do the filter in the first place. Additional filtering can be done, but that assumes the original PT query did not pull a lot of data in the first place. So a PT query rows pulled is what you WILL GET client side.
In 99% of cases, you are FAR better off to put that query in a linked view, and thus you are free to filter and add criteria to that view (even client side), and ONLY records meeting that criteria are pulled down the network pipe. this includes even using a client side query on that linked view. And this also includes basing a report on that linked view, and say you have VBA to add/provide a "where" clause to that report. (in this case, once again, Access will ONLY pull records based on that criteria. If you use a PT query for the report and attempt to filter - that filtering ONLY occurs AFTER all PT records been pulled.
So, PT queries cannot effective REDUCE bandwidth requirements to ANY lower then what the PT query returns in the first place. However, linked views DO allow and DO respect additional filters applied - even when done client side. As a result, a PT query is not all that useful unless the PT query has the criteria pre-defined and known ahead of time.
So, I would strong suggest you try/test/use a linked view.
In other words, put that sum and group by in a server side view and link to that.
Edit: and you CAN add the where clause to that view client side against that view.
However, because Fx is not a column, then you have to either add that column to the view, or create a stored procedure, and use it this way:
CREATE PROCEDURE [dbo].[GetMyGroupSum]
#fx nvarchar(50)
AS
BEGIN
SET NOCOUNT ON;
SELECT f1, f2, SUM(f3), SUM(f4)
FROM dbo.TheTableToQueryOn
WHERE fx = #fx
GROUP BY f1,f2
END
Now, create a PT query in Access. You code will then look like:
dim strFX as string
strFX = "somevalue"
currentdb.querydefs("MyPtQuery").sql = "EXEC GetMyGroupSum '" & strFX & "'"
You are now free to use the above query for a report, code or even launch a form based on that PT query.
In MOST cases you are better to use a view, but because the query does NOT return the column you need to filter on, then a PT query is the solution, but in most cases, it is not.
I'm using a servlet to make the jdbc connection, write the PreparedStatements and execute ResultSets. I am able to display the data into a webpage just fine, however I also want to be able to count the number of entries. I know there are other ways to count how many rows I have using java code, but I want to use SQL statements and I saw this
SELECT COUNT(*) FROM table_name;
and made a preparedstatement and tried to execute. However, it is not returning the value of the count, I instead get
"com.mysql.jdbc.JDBC42ResultSet#4a9b1e8b" or "com.mysql.jdbc.JDBC42PreparedStatement#4a9b1e8b" (because I tried getting the value of the count using both).
Basically, I am wondering how to get the count value in my html table from the servlet, not the long statements above.
Many thanks, I'm a beginner.
When you execute your SQL query with JDBC, you get a Resultset even when you get only one record with only one field as in your question.
You need to call the getInt or the getLong method of your recordset to get the actual value.
long countValue = rs.getLong(1);
Have a look at Oracle's documentation on JDBC
You can also have a look at this post on SO
I have job in Talend that is designed to bring together some data from different databases: one is a MySQL database and the other a MSSQL database.
What I want to do is match a selection of loan numbers from the MySQL database (about 82,000 loan numbers) to the corresponding information we have housed in the MSSQL database.
However, the tables in MSSQL to which I am joining the data from MySQL are much larger (~ 2 million rows), are quite wide, and thus cost much more time to query. Ideally I could perform an inner join between the two tables based on the loan number, but since they are in different databases this is not possible. The inner join that is performed inside a tMap occurs after the Lookup input has already returned its data set, which is quite large (especially since this particular MSSQL query will execute a user-defined function for each loan number).
Is there any way to create a global variable out of the output from the MySQL query (namely, the loan numbers selected by the MySQL query) and use that global variable as an IN clause in the MSSQL query?
This should be possible. I'm not working in MySQL but I have something roughly equivalent here that I think you should be able to adapt to your needs.
I've never actually answered a Stackoverflow question and while I was typing this the page started telling me I need at least 10 reputation to post more than 2 pictures/links here and I think I need 4 pics, so I'm just going to write it out in words here and post the whole thing complete with illustrations on my blog in case you need more info (quite likely, I should think!)
As you can see, I've got some data coming out of the table and getting filtered by tFilterRow_1 to only show the rows I'm interested in.
The next step is to limit it to just the field I want to use in the variable. I've used tMap_3 rather than a tFilterColumns because the field I'm using is a string and I wanted to be able to concatenate single quotes around it but if you're using an integer you might not need to do that. And of course if you have a lot of repetition you might also want to get a tUniqueRows in there as well to save a lot of unnecessary repetition
The next step is the one that does the magic. I've got a list like this:
'A1'
'A2'
'B1'
'B2'
etc, and I want to turn it into 'A1','A2','B1','B2' so I can slot it into my where clause. For this, I've used tAggregateRow_1, selecting "list" as the aggregate function to use.
Next up, we want to take this list and put it into a context variable (I've already created the context variable in the metadata - you know how to do that, right?). Use another tMap component, feeding into a tContextLoad widget. tContextLoad always has two columns in its schema, so map the output of the tAggregateRows to the "value" column and enter the name of the variable in the "key". In this example, my context variable is called MyList
Now your list is loaded as a text string and stored in the context variable ready for retrieval. So open up a new input and embed the variable in the sql code like this
"SELECT distinct MY_COLUMN
from MY_SECOND_TABLE where the_selected_row in ("+
context.MyList+")"
It should be as easy as that, and when I whipped it up it worked first time, but let me know if you have any trouble and I'll see what I can do.
I have joined 5 tables and done transformation on these tables. Now I got a single table at the end. Now I want to perform sql query on this single table to filter records. But I don't know how to perform simple sql query on this table. I have attached a snap shot which shows the resulting table. How I get this resulting data set as the source? I want to populate my destination after filter out this data.
I am using SSIS 2008.
Click here to see the Table on which I want to perform a simple sql query
SELECT * FROM `first_table`
where `some_column` =
(
SELECT `*`
FROM second_table
WHERE
`some_column2`='something'
LIMIT 1
)
Try this code This will help. You can even use this to connect all those four tables with each other.
From the image you posted, it looks like you have a set of data in the dataflow you're trying to query against. You need to do one of two things at this point. Either you insert the data into a table in the database and use another data flow to query it, or you use use a conditional split (or multicast and conditional splits) to filter the rows down further from there.
Without more detail about what you're actually trying to accomplish, these are the recommendations I can determine.
You could send the rows into a record set destination, but you aren't able to query it like a regular table and you'd need some C#/VB skills to access it to do more than a FOR EACH loop.
Assuming your sql query that you want to run against the resulting table is simple, you can use a script component task. By simple, I mean, if it is of this nature:
SELECT * FROM T WHERE a = 'zz' and b = 'XX' etc.
However, if your query has self joins, then you would be better of dumping the outcome of joining those 5 tables in to a physical table, and go from there.
It appears that query is going to be real straight-forward; in that case using a script component would be helpful.
A separate question: It's advisable to do the sorting at the database level. You are using 5 sort tasks in your solution. Can you please elucidate the reason?
I need to do a count on the items in a joined result set where a condition is true. I thus have a "from join where where" type of expression. This expression must end with a select or groupby. I do not need the column data actually and figure it is thus faster not to select it:
count = (from e in dc.entries select new {}).Count();
I have 2 questions:
Is there a faster way to do this in terms of the db load?
I have to duplicate my entire copy of the query. Is there a way to structure my query where I can have it one place for both counts and for getting say a list with all fields?
Thanks.
Please pay especial attention:
The query is a join and not a simple table thus I must use a select statement.
I will need 2 different query bodies because I do not need to load all the actual fields for the count but will for the list.
I assume when I use the select query it is filling up with data when I use query.Count vs Table.Count. Look forward to those who understand what I'm asking for possible better ways to do this and some detailed knowledge of what actually happens. I need to pull out the logging to look into this deeper.
Queryable.Count
The query behavior that occurs as a
result of executing an expression tree
that represents calling
Count(IQueryable)
depends on the implementation of the
type of the source parameter. The
expected behavior is that it counts
the number of items in source.
In fact, if you use LinqToSql or LinqToEntities, Queryable.Count() is sent into the database. No columns are loaded to memory. Check the generated sql to confirm.
I assume when I use the select query it is filling up with data when I use query.Count vs Table.Count
This is not true. Check the generated sql to confirm.
I have to duplicate my entire copy of the query. Is there a way to structure my query where I can have it one place for both counts and for getting say a list with all fields
If you need both the count and the list, get the list and count it.
If you need the count sometimes and other times you need the list... write a method that returns the complex IQueryable, and sometimes call .Count() and other times call .ToList();
I do not need the column data actually and figure it is thus faster not to select it.
This is basically false in your scenario. It can be true in a scenario where an index covers the result columns, but you don't have any result columns.
In your scenario, whatever index is chosen by the query optimizer, that index can be used to make the count.
Sum up: Query optimizer will perform the optimization you desire.
//you can put a where condition here
var queryEntries = from e in dc.entries select e;
//Get count
queryEntries.Count();
//Loop through Entries, so you basically returned all entries
foreach(entry en in queryEntries)
{}