Order By varbinary column that holds docx files - sql-server-2008

I'm using MS SQL 2008 server, and I have a column that stores a word document ".docx".
Within the word document is a definition (ie: a term). I need to sort the definitions upon returning a dataset.
so basically...
SELECT * FROM DocumentsTable
Order By DefinitionsColumn ASC.
So my problem is how can this be accomplished, the binary comlumn only sorts on the binary value and not the word document content?
I was wondering if fulltext search/index would work. I already have that working, just not sure if I can use it with ORDER BY.
-Thanking all in advance.

I think you'd need to add another column, and populate this with the term from inside the docx. If it's possible at all to get SQL to read the docx (maybe with a custom .net function?) then it's going to be pretty slow.
Better to populate and maintain another column.

You have a couple options that may or may not be acceptable.
store the string definition contents
of the file in a field along side
the binary file column in the
record.
Only store the string definition in the record, and build the .docx
file at runtime for use within your
application.

Related

Is there an update to set commas to existing integers/decimals used as currencies in a table?

Data already exists, but since it refers to large amounts, it's not so readable without commas.
I've read this is the function:
(https://i.stack.imgur.com/LVoEC.png)
but how do I make a sentence to modify the existing data in a table?
As far as I know, this should be done in your frontend application. The database should only store data in it's raw format. That way you have the possibility to change the formating depending on the users location for example, since not all countries use "." and "," the same way in numbers.
If you just need it while you are developing your querys and not for the end-user, you could search, if your sql-client has an option to format the output, but it is probably not done in your sql-query.

Google-BigQuery - schema parsing of CSV file

We are using Java API to load a CSV file to Google Big Query. Is there a way to detect the columns on load and auto select the appropriate schema type?
For example, if a specific column has only float, then BigQuery assigns the column as float, if non numeric then it assigns column as string. Is there a method to do this?
The roundabout way is to assign each column as string by default when loading the CSV.
Then do a query on each column -
SELECT count(columnname)- count(float(columnname)) FROM dataset.table
(assuming I am only interested in isolating columns that have "float values" that I can use for math functions from my application)
Any other method to solve this problem?
Right now, BigQuery does not support schema inference, so as you suggest, your options are:
Provide the schema explicitly when loading data.
Load all data using the string type, and cast/convert at query time.
Note that you can use the allowLargeResults feature to clean up and rewrite your imported data (but note that you'll be charged for the query, which will increase your data ingestion costs).
For the record, schema auto-detect is now supported: https://cloud.google.com/bigquery/federated-data-sources#auto-detect

Is there a way to get only the numeric elements of a string in mysql?

I'm looking to make it easier for the clients to search for stuff like phone/mobile/fax numbers. For that to happen I want to strip both the search value and the relevant columns in my database of any non-numeric characters before comparing them. I'm using these functions to get only the numeric elements of the strings in mysql but they slow my queries down to a crawl when I use them.
Is there any way to do it without blowing my run times sky high?
The reason why your query times are exploding is because any use of such functions disables you from using any index. Since you are not searching directly on a field, but on the output of a function, there is no way mySQL can use an index to execute the query.
This is in addition to the fact that you have to compute the function output for each record.
The best way around these runtimes, if you have access and permission to do so, is to add a new column with the content you're filtering. Add a WRITE trigger to fill the column with the stripped values, run a script that updates the field once for all records. Add an index and include the new column. Then, in your application, use the new column for searches for a number value of a telephone. Downsides are table schema alterations and added code for the business logic and/or data abstraction layer.

Search in a field with html entities

Our customer's data (SQL Server 2005) has html entities in it (é -> é).
We need to search inside those fields, so a search for "équipe" will find "équipe".
We can't change the data, because our customer's customers can edit those fields as will (with a HTML editor), so if we remove the entities, on the next edit they might reappear, and the problem will still be there.
We can't use a .net server-side function, because we need to find the rows before they are returned to the server.
I would use a function that replaces the entities by their UTF-8 counterparts, but it's kind of tiresome, and I think it seriously drops the search performances (something about full table scan if I recall correctly).
Any idea ?
Thanks
You would only need to examine and encode the incoming search term.
If you convert "équipe" to "équipe" and use that in your WHERE/FTS clause then any index on that field could still be used, if the optimizer deems it appropriate.

Trim before destination write in SSIS?

I have a dataflow, where there is a DB source and a flat text file destination(delimited by pipe '|').
The DB source is picking up the SQL query from a variable.
Problem is that if my DB field size of say, firstname and lastname are 30 characters, i get the output as(space represented by dots)
saurabh......................|kumar.......................
What I need is the fields to be trimmed, so that the actual output is
saurabh|kumar
I have more than 40 columns to write, and I would not want to manually insert RTRIM after every column in my BIG sql query :(
I should add that the source can have upto 50,000 rows returned. I was thinking of putting a script component in between, but processing every row might have a performance impact.
Any ideas?
You have quite a few options, but some will obviously be undesirable or impossible to do because of your situation.
First, I'll assume that trailing spaces in the data are because the data types for the source columns are CHAR or NCHAR. You can change the data types in the source database to VARCHAR or NVARCHAR. This probably isn't a good idea.
If the data types in the source data are VARCHAR or NVARCHAR and the trailing spaces are in the data, you can update the data to remove the trailing spaces. This is probably not appealing either.
So, you have SSIS and the best place to handle this is in the data flow. Unfortunately, you must develop a solution for each column that has the trailing spaces. I don't think you'll find a quick and simple "fix all columns" solution.
You can do the data trimming with a script transformation, but you must write the code to do the work. Or, you can use a Derived Column transformation component. In the Derived Column transformation you would add a derived column for each column that needs trimming. For example, you would have a firstname column and a lastname column. The derived column value would replace the existing column value.
In the Derived Column transformation you would use SSIS expression syntax to trim the data. The firstname and lastname trim expressions would be
RTRIM(firstname)
RTRIM(lastname)
Performance will probably be better for the Derived Column transformation, but it may not differ much from the script solution. However, the Derived Column transformation will probably be easier to read and understand later.
You could try using a script component in the data flow? Unlike the control flow, a data-flow script component has inputs & outputs.
Look at this example in MSDN: http://msdn.microsoft.com/en-us/library/ms345160.aspx
If you can iterate each column of the row (?) as it flows through the script component, you could do a .Net trim on the column's data, then pass the trimmed row to the output.
Advantage there of course is it will trim future rows you add later.
Just an idea, I haven't tried this myself. Do post back if it works.
See this:
http://masstrimmer.codeplex.com
It will trim rows by using parallelism.