Excluding Content From SQL Bulk Insert - sql-server-2008

I want to import my IIS logs into SQL for reporting using Bulk Insert, but the comment lines - the ones that start with a # - cause a problem becasue those lines do not have the same number f fields as the data lines.
If I manually deleted the comments, I can perform a bulk insert.
Is there a way to perform a bulk insert while excluding lines based on a match such as : any line that beings with a "#".
Thanks.

The approach I generally use with BULK INSERT and irregular data is to push the incoming data into a temporary staging table with a single VARCHAR(MAX) column.
Once it's in there, I can use more flexible decision-making tools like SQL queries and string functions to decide which rows I want to select out of the staging table and bring into my main tables. This is also helpful because BULK INSERT can be maddeningly cryptic about the why and how of why it fails on a specific file.
The only other option I can think of is using pre-upload scripting to trim comments and other lines that don't fit your tabular criteria before you do your bulk insert.

I recommend using logparser.exe instead. LogParser has some pretty neat capabilities on its own, but it can also be used to format the IIS log to be properly imported by SQL Server.

Microsoft has a tool called "PrepWebLog" http://support.microsoft.com/kb/296093 - which strips-out these hash/pound characters, however I'm running it now (using a PowerShell script for multiple files) and am finding its performance intolerably slow.
I think it'd be faster if I wrote a C# program (or maybe even a macro).
Update: PrepWebLog just crashed on me. I'd avoid it.
Update #2, I looked at PowerShell's Get-Content and Set-Content commands but didn't like the syntax and possible performance. So I wrote this little C# console app:
if (args.Length == 2)
{
string path = args[0];
string outPath = args[1];
Regex hashString = new Regex("^#.+\r\n", RegexOptions.Multiline | RegexOptions.Compiled);
foreach (string file in Directory.GetFiles(path, "*.log"))
{
string data;
using (StreamReader sr = new StreamReader(file))
{
data = sr.ReadToEnd();
}
string output = hashString.Replace(data, string.Empty);
using (StreamWriter sw = new StreamWriter(Path.Combine(outPath, new FileInfo(file).Name), false))
{
sw.Write(output);
}
}
}
else
{
Console.WriteLine("Source and Destination Log Path required or too many arguments");
}
It's pretty quick.

Following up on what PeterX wrote, I modified the application to handle large log files since anything sufficiently large would create an out-of-memory exception. Also, since we're only interested in whether or not the first character of a line starts with a hash, we can just use StartsWith() method on the read operation.
class Program
{
static void Main(string[] args)
{
if (args.Length == 2)
{
string path = args[0];
string outPath = args[1];
string line;
foreach (string file in Directory.GetFiles(path, "*.log"))
{
using (StreamReader sr = new StreamReader(file))
{
using (StreamWriter sw = new StreamWriter(Path.Combine(outPath, new FileInfo(file).Name), false))
{
while ((line = sr.ReadLine()) != null)
{
if(!line.StartsWith("#"))
{
sw.WriteLine(line);
}
}
}
}
}
}
else
{
Console.WriteLine("Source and Destination Log Path required or too many arguments");
}
}
}

Related

RDF4J SPARQL query to JSON

I am trying to move data from a SPARQL endpoint to a JSONObject. Using RDF4J.
RDF4J documentation does not address this directly (some info about using endpoints, less about converting to JSON, and nothing where these two cases meet up).
Sofar I have:
SPARQLRepository repo = new SPARQLRepository(<My Endpoint>);
Map<String, String> headers = new HashMap<String, String>();
headers.put("Accept", "SPARQL/JSON");
repo.setAdditionalHttpHeaders(headers);
try (RepositoryConnection conn = repo.getConnection())
{
String queryString = "SELECT * WHERE {GRAPH <urn:x-evn-master:mwadata> {?s ?p ?o}}";
GraphQuery query = conn.prepareGraphQuery(queryString);
debug("Mark 2");
try (GraphQueryResult result = query.evaluate())
this fails because "Server responded with an unsupported file format: application/sparql-results+json"
I figured a SPARQLGraphQuery should take the place of GraphQuery, but RepositoryConnection does not have a relevant prepare statement.
If I exchange
try (RepositoryConnection conn = repo.getConnection())
with
try (SPARQLConnection conn = (SPARQLConnection)repo.getConnection())
I run into the problem that SPARQLConnection does not generate a SPARQLGraphQuery. The closest I can get is:
SPARQLGraphQuery query = (SPARQLGraphQuery)conn.prepareQuery(QueryLanguage.SPARQL, queryString);
which gives a runtime error as these types cannot be cast to eachother.
I do not know how to proceed from here. Any help or advise much appreciated. Thank you
this fails because "Server responded with an unsupported file format: application/sparql-results+json"
In RDF4J, SPARQL SELECT queries are tuple queries, so named because each result is a set of bindings, which are tuples of the form (name, value). In contrast, CONSTRUCT (and DESCRIBE) queries are graph queries, so called because their result is a graph, that is, a collection of RDF statements.
Furthermore, setting additional headers for the response format as you have done here is not necessary (except in rare circumstances), the RDF4J client handles this for you automatically, based on the registered set of parsers.
So, in short, simplify your code as follows:
SPARQLRepository repo = new SPARQLRepository(<My Endpoint>);
try (RepositoryConnection conn = repo.getConnection()) {
String queryString = "SELECT * WHERE {GRAPH <urn:x-evn-master:mwadata> {?s ?p ?o}}";
TupleQuery query = conn.prepareTupleQuery(queryString);
debug("Mark 2");
try (TupleQueryResult result = query.evaluate()) {
...
}
}
If you want to write the result of the query in JSON format, you could use a TupleQueryResultHandler, for example the SPARQLResultsJSONWriter, as follows:
SPARQLRepository repo = new SPARQLRepository(<My Endpoint>);
try (RepositoryConnection conn = repo.getConnection()) {
String queryString = "SELECT * WHERE {GRAPH <urn:x-evn-master:mwadata> {?s ?p ?o}}";
TupleQuery query = conn.prepareTupleQuery(queryString);
query.evaluate(new SPARQLResultsJSONWriter(System.out));
}
This will write the result of the query (in this example to standard output) using the SPARQL Query Results JSON format. If you have a non-standard format in mind, you could of course also create your own TupleQueryResultHandler implementation.
For more details on the various ways in which you can process the result (including iterating, streaming, adding to a List, or just directly sending to a result handler), see the documentation on querying a repository. As an aside, the javadoc on the RDF4J APIs is pretty extensive too, so if your Java editing environment has support for displaying that, I'd advise you to make use of it.

How to read from MySQL table Polygon data

I am developing an application that I need location data to be stored on MySQL table. In addition to point locations, I need regions (polygon) as well.
I am currently writing the polygon coordinates as follow :
oMySQLConnecion = new MySqlConnection(DatabaseConnectionString);
if (oMySQLConnecion.State == System.Data.ConnectionState.Closed || oMySQLConnecion.State == System.Data.ConnectionState.Broken)
{
oMySQLConnecion.Open();
}
if (oMySQLConnecion.State == System.Data.ConnectionState.Open)
{
string Query = #"INSERT INTO region (REGION_POLYGON) VALUES (PolygonFromText(#Parameter1))";
MySqlCommand oCommand = new MySqlCommand(Query, oMySQLConnecion);
oCommand.Parameters.AddWithValue("#Parameter1", PolygonString);
int sqlSuccess = oCommand.ExecuteNonQuery();
oMySQLConnecion.Close();
oDBStatus.Type = DBDataStatusType.SUCCESS;
oDBStatus.Message = DBMessageType.SUCCESSFULLY_DATA_UPDATED;
return oDBStatus;
}
After the execution, I see the Blob in MySQL table.
Now I want to read the data back for my testing and it does not work the way I tried below :
if (oMySQLConnecion.State == System.Data.ConnectionState.Open)
{
string Query = #"SELECT REGION_ID,REGION_NICK_NAME,GeomFromText(REGION_POLYGON) AS POLYGON FROM region WHERE REGION_USER_ID = #Parameter1";
MySqlCommand oCommand = new MySqlCommand(Query, oMySQLConnecion);
oCommand.Parameters.AddWithValue("#Parameter1", UserID);
using (var reader = oCommand.ExecuteReader())
{
while (reader.Read())
{
R_PolygonCordinates oRec = new R_PolygonCordinates();
oRec.RegionNumber = Convert.ToInt32(reader["REGION_ID"]);
oRec.RegionNickName = reader["REGION_NICK_NAME"].ToString();
oRec.PolygonCodinates = reader["POLYGON"].ToString();
polygons.Add(oRec);
}
}
int sqlSuccess = oCommand.ExecuteNonQuery();
oMySQLConnecion.Close();
return polygons;
}
It returns an empty string.
I am not sure if I am really writing the data since I can not read Blob.
Is my reading syntax incorrect?
** Note:** I am using Visual Studio 2017. The MySQL latest version with Spacial classes.
Any help is highly appreciated.
Thanks
GeomFromText() takes a WKT (the standardized "well-known text" format) value as input and returns the MySQL internal geometry type as output.
This is the inverse of what you need, which is ST_AsWKT() or ST_AsText() -- take an internal-format geometry object as input and return WKT as output.
Prior to 5.6, the function is called AsWKT() or AsText(). In 5.7 these are all synonyms for exactly the same function, but the non ST_* functions are deprecated and will be removed in the future.
https://dev.mysql.com/doc/refman/5.7/en/gis-format-conversion-functions.html#function_st-astext
I don't know for certain what the ST_ prefix means, but I assume it's "spatial type." There's some discussion in WL#8055 that may be of interest.

SSIS write DT_NTEXT into an UTF-8 csv file

I need to write the result of an SQL query into a CSV file (UTF-8 (I need this encoding as there are French letters)). One of the columns is too large (more than 20000 char) so I can't use DT_WSTR for it. The type that is inputted is DT_TEXT so I use a Data Conversion to change it to DT_NTEXT. But then when I want to write it to the file I have this error message :
Error 2 Validation error. The data type for "input column" is
DT_NTEXT, which is not supported with ANSI files. Use DT_TEXT instead
and convert the data to DT_NTEXT using the data conversion component
Is there a way I can write the data to my file?
Thank you
I had this kind of issues also sometimes. When working with data larger than 255 characters SSIS sees it as blob data and will always handle this as such.
I then converted this blob stream data to a readable text with a script component. Then other transformation should be possible.
This was the case in ssis that came with sql server 2008 but I believe this isn't changed yet.
I ended up doing just like Samyne says, I used a script.
First I've modified my SQL SP, instead of having several columns I put all the info in one single column like follows :
Select Column1 + '^' + Column2 + '^' + Column3 ...
Then I used this code in a script
string fileName = Dts.Variables["SLTemplateFilePath"].Value.ToString();
using (var stream = new FileStream(fileName, FileMode.Truncate))
{
using (var sw = new StreamWriter(stream, Encoding.UTF8))
{
OleDbDataAdapter oleDA = new OleDbDataAdapter();
DataTable dt = new DataTable();
oleDA.Fill(dt, Dts.Variables["FileData"].Value);
foreach (DataRow row in dt.Rows)
{
foreach (DataColumn column in dt.Columns)
{
sw.WriteLine(row[column]);
}
}
sw.WriteLine();
}
}
Putting all the info in one column is optional, I just wanted to avoid handling it in the script, this way if my SP is changed I don't need to modify the SSIS.

Mahout 0.7 Failed to get recommendation with a large data using MysqlJdbcDataModel

I am using Mahout to build an Item-based Cf recommendation engine.
I create an MahoutHelper class which has a constructor:
public MahoutHelper(String serverName, String user, String password,
String DatabaseName, String tableName) {
source = new MysqlConnectionPoolDataSource();
source.setServerName(serverName);
source.setUser(user);
source.setPassword(password);
source.setDatabaseName(DatabaseName);
source.setCachePreparedStatements(true);
source.setCachePrepStmts(true);
source.setCacheResultSetMetadata(true);
source.setAlwaysSendSetIsolation(true);
source.setElideSetAutoCommits(true);
DBmodel = new MySQLJDBCDataModel(source, tableName, "userId", "itemId",
"value", null);
similarity = new TanimotoCoefficientSimilarity(DBmodel);
}
and the recommend method is:
public List<RecommendedItem> recommendation() throws TasteException {
Recommender recommender = null;
recommender = new GenericItemBasedRecommender(DBmodel, similarity);
List<RecommendedItem> recommendations = null;
recommendations = recommender.recommend(userId, maxNum);
System.out.println("query completed");
return recommendations;
}
It's using datasource to build datamodel but the problem is that when mysql has only a few data (less than 100) the program works fine for me, while when the scale turns to be over 1,000,000, the program stacks at doing recommendation and never goes forward. I have no idea how it happens. By the way I used the same data to build a FileDataModel with a .dat file, and it takes only 2~3 second to complete analysis. I am confused.
Using the database directly will only work for tiny data sets, like maybe a hundred thousand data points. Beyond that the overhead of such data-intensive applications will never run quickly; a query takes thousands of SQL queries or more.
Instead you must load and re-load into memory. You can still pull from the database; look at ReloadFromJDBCDataModel as a wrapper.

Issue using NHibernate SchemaUpdate with MySQL - no error, but nothing happens!

I've been having some issues using SchemaUpdate with MySQL.
I seem to have implemented everything correctly, but when I run it
it doesn't update anything. It doesn't generate any errors, and it
pauses for about the sort of length of time you would expect it to
take to inspect the DB schema, but it simply doesn't update anything,
and when I try to get it to script the change it just doesn't do
anything - it's as if it can;'t detect any changes to up the DB
schema, but I have created a new entity and a new mapping class - so I
cant see why it's not picking it up.
var config = Fluently.Configure()
.Database(() => {
var dbConfig = MySQLConfiguration.Standard.ConnectionString(
c => c.Server(configuration.Get<string>("server", ""))
.Database(configuration.Get<string>("database",""))
.Password(configuration.Get<string>("password", ""))
.Username(configuration.Get<string>("user", ""))
);
});
config.Mappings(
m => m.FluentMappings
.AddFromAssemblyOf<User>()
.AddFromAssemblyOf<UserMap>()
.Conventions.AddFromAssemblyOf<UserMap>()
.Conventions.AddFromAssemblyOf<PrimaryKeyIdConvention>()
// .PersistenceModel.Add(new CultureFilter())
);
var export = new SchemaUpdate(config);
export.Execute(false, true);
I don't think there's anything wrong with my config because it works
perfectly well with ShemaExport - it's just SchemaUpdate where I seem
to have a problem.
any ideas would be much appreciated!
Did you try to wrap SchemaUpdate execution in a transaction? There're some databases which need to run this in a transaction AFAIK.
using (var tx = session.BeginTransaction())
{
var tempFileName = Path.GetTempFileName();
try
{
using (var str = new StreamWriter(tempFileName))
{
new SchemaExport(configuration).Execute(showBuildScript, true, false, session.Connection, str);
}
}
finally
{
if (File.Exists(tempFileName))
{
File.Delete(tempFileName);
}
}
tx.Commit();
}
I figured it out:
The problem is that MySQL doesn't have multiple databases. It seems like some parts of MySQL and/or NHibernate use Schemas instead and SchemaUpdate seems to be one of them. So when I have
Database=A
in my connectionstring, and
<class ... schema="B">
in the mapping, then SchemaUpdate seems to think that this class is "for a different database" and doesn't update it.
The only fix I can think of right now would be to do a SchemaUpdate for every single schema (calling USE schema; first). But afaik, NHibernate has no interface to get a list of all schemas that are used in the mappings (correct me if I'm wrong). I'm afraid I have to iterate through the XML files manually (I use XML-based mappings) and collect them...