Jdbc river stops on MapperParsingException - mysql

I am using Elastic search version 1.2.0, Jdbc river version 1.2.0.1.
Following is my Jdbc river command.
curl -XPUT 'localhost:9200/_river/tbl_messages/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"url" : "jdbc:mysql://localhost:3306/messageDB",
"user" : "username",
"password" : "password",
"sql" : "select messageAlias.id as _id,messageAlias.subject as subject from tbl_messages messageAlias",
"index" : "MessageDb",
"type" : "tbl_messages",
"maxbulkactions":1000,
"maxconcurrentbulkactions" : 4,
"autocommit" : true,
"schedule" : "0 0-59 0-23 ? * *"
}
}'
Subject column's index meta data
subject: {
type: string
}
This table has 2 Million records and subject field contains arbitrary strings. Some sample data are "You're invited ","{New York:45} We rock!!","{Invitation:27}" so on.
My problem is that when jdbc river encounters one such record with {anything inside of this}, It stalls the river and throws parsing exception. It never moves on to index next records.
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [subject]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:418)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:537)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:479)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:394)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:413)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:534)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:433)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: unknown property [Inivitation]
at org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateFieldForString(StringFieldMapper.java:332)
at org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:278)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:408)
... 12 more
Deleting this record in db,clearing data inside ES_HOME/data and recreating the river seems to be the only way to proceed until it encounter the above said formatted record again.
How do I make it to continue indexing irrespective of exception when parsing few records?

It is related to Elastic search and not the river.
https://github.com/jprante/elasticsearch-river-jdbc/issues/258
https://github.com/elasticsearch/elasticsearch/issues/2898

Related

Kafka JDBC Sink Connector Not Consuming Events from Remote Cluster

Standalone Kafka JDBC sink connector not inserting data into mysql database (Using confluent-community-2.12)
I've installed confluent-community-2.12 on centos7 and started the standalone jdbc sink connector to consume records from a remote kafka cluster. I can consume the records via a simple java consumer and see the record data however, when I start the sink connector it loads and connects to the remote cluster fine but just stops at the following log INFO
[2019-05-01 11:10:21,619] INFO Initializing writer using SQL dialect: MySqlDatabaseDialect (io.confluent.connect.jdbc.sink.JdbcSinkTask:57)
[2019-05-01 11:10:21,620] INFO WorkerSinkTask{id=sink-mysql-standalone-0} Sink task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:301)
Even after starting the connector if I produce any data to the cluster, at sink connector nothing happens.
I tried to fetch the result of the connector status using:
curl localhost:8083/connectors/sink-mysql-standalone/status
the result is as follows:
{"name":"sink-mysql-standalone","connector":{"state":"RUNNING","worker_id":"10.3.0.40:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"10.3.0.40:8083"}],"type":"sink"}
sink.properties:
name=sink-mysql-standalone
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=my-topic
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
# JDBCSink connector specific configuration
connection.url=jdbc:mysql://10.3.0.37:3306/mydb?zeroDateTimeBehavior=convertToNull&useUnicode=yes&characterEncoding=UTF-8
connection.user=myuser
connection.password=mypassword
insert.mode=upsert
table.name.format = tbl_kaf_${topic}
pk.mode=kafka
pk.fields=__connect_topic,__connect_partition,__connect_offset
fields.whitelist=messageId
auto.create=true
auto.evolve=true
The producer produces following record:
Key: id_5dfbdffe-ffbc-4fbf-925c-a14734304fa8, Value: {
"type" : "text",
"messageId" : "ID:activemq-XXXXXX-XXXXXXXXXXXXX-X:XX:1:2:2",
"correlationId" : "",
"destination" : {
"type" : "queue",
"name" : "qToKafka"
},
"replyTo" : null,
"priority" : 0,
"expiration" : 0,
"timestamp" : 1556549819473,
"redelivered" : false,
"properties" : {},
"payloadText" : "<Some XML Data>",
"payloadMap" : null,
"payloadBytes" : null
}
Please let me know what am I missing here.
Thanks

Mysql to Elasticsearch - won't create index and export data

I am attempting to import a MySQL table into Elasticsearch.It is a table containing 10 different columns with a an 8 digits VARCHAR set as a Primary Key. MySQL database is located on a remote host.
To transfer data from MySQL into Elasticsearch I've decided to use Logstash and jdbc MySQL driver.
I am assuming that Logstash will create the index for me if it isn't there.
Here's my logstash.conf script:
input{
jdbc {
jdbc_driver_library => "/home/user/logstash/mysql-connector-java-5.1.17-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://[remotehostipnumber]/databasename"
jdbc_validate_connection => true
jdbc_user => "username"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table"
}
}
output
{
elasticsearch
{
index => "tables"
document_type => "table"
document_id => "%{table_id}"
hosts => "localhost:9200"
}stdout { codec => json_lines }
}
When running logstash config test it outputs 'Configration OK' message:
sudo /opt/logstash/bin/logstash --configtest -f /home/user/logstash/logstash.conf
Also when executing the logstash.conf script, Elasticsearch outputs:
Settings: Default filter workers: 1
Logstash startup completed
But when I go to check whether the index has been created and data has also been added:
curl -XGET 'localhost:9200/tables/table/_search?pretty=true'
I get:
{
"error" : {
"root_cause" : [ {
"type" : "index_not_found_exception",
"reason" : "no such index",
"resource.type" : "index_or_alias",
"resource.id" : "tables",
"index" : "table"
} ],
"type" : "index_not_found_exception",
"reason" : "no such index",
"resource.type" : "index_or_alias",
"resource.id" : "tables",
"index" : "tables"
},
"status" : 404
}
What could be the potential reasons behind the data not being indexed?
PS. I am keeping the Elasticsearch server running in the separate terminal window, to ensure Logstash can connect and interact with it.
For those who end up here looking for the answer to the similar problem.
My database had 4m rows and it must have been too much for logstash/elasticsearch/jdbc driver to handle in one command.
After I divided the initial transfer into 4 separate chunks of work, the script run and added the desired table into the elasticsearch NoSQL db.
use following code to export data from mysql table and create index in elastic search
echo '{
"type":"jdbc",
"jdbc":{
"url":"jdbc:mysql://localhost:3306/your_database_name",
"user":"your_database_username",
"password":"your_database_password",
"useSSL":"false",
"sql":"SELECT * FROM table1",
"index":"Index_name",
"type":"Index_type",
"poll" : "6s",
"autocommit":"true",
"metrics": {
"enabled" : true
},
"elasticsearch" : {
"cluster" : "clustername",
"host" : "localhost",
"port" : 9300
}
}
}' | java -cp "/etc/elasticsearch/elasticsearch-jdbc-2.3.4.0/lib/*" -"Dlog4j.configurationFile=file:////etc/elasticsearch/elasticsearch-jdbc-2.3.4.0/bin/log4j2.xml" "org.xbib.tools.Runner" "org.xbib.tools.JDBCImporter"

Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/util/JSON

I am trying to connect MongoDb with Hadoop. I have Hadoop-1.2.1 installed in my Ubuntu 14.04. I installed MongoDB-3.0.4 and also downloaded and added mongo-hadoop-hive-1.3.0.jar, mongo-java-driver-2.13.2.jar jars in hive session. I have downloaded mongo-connector.sh (found in this site)and included it under Hadoop_Home/lib.
I have set input and output sources like this :
hive> set MONGO_INPUT=mongodb://[user:password#]<MongoDB Instance IP>:27017/DBname.collectionName;
hive> set MONGO_OUTPUT=mongodb://[user:password#]<MongoDB Instance IP>:27017/DBname.collectionName;
hive> add JAR brickhouse-0.7.0.jar;
hive> create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
My collection in MongoDb is this :
> db.shows.find()
{ "_id" : ObjectId("559eb22fa7999b1a5f50e4e6"), "title" : "Arrested Development", "airdate" : "November 2, 2003", "network" : "FOX" }
{ "_id" : ObjectId("559eb238a7999b1a5f50e4e7"), "title" : "Stella", "airdate" : "June 28, 2005", "network" : "Comedy Central" }
{ "_id" : ObjectId("559eb23ca7999b1a5f50e4e8"), "title" : "Modern Family", "airdate" : "September 23, 2009", "network" : "ABC" }
>
Now I am trying to create a Hive table
CREATE EXTERNAL TABLE mongoTest(title STRING,network STRING)
> STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
> WITH SERDEPROPERTIES('mongo.columns.mapping'='{"title":"name",”airdate”:”date”,”network”:”name”}')
> TBLPROPERTIES('mongo.uri'='${hiveconf:MONGO_INPUT}');
When I run this command, it says
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/util/JSON
Then I added hive-json-serde.jar and hive-serdes-1.0-SNAPSHOT.jar jars and tried to create the table again. But the error remains the same. How can I rectify this error?
I actually added these mongo-hadoop-core-1.3.0.jar , mongo-hadoop-hive-1.3.0.jar and mongo-java-driver-2.13.2.jar jars in Hadoop_Home/lib folder. Then I was able to get data from MongoDb to Hive without any errors.
There are smart-quotes which the parser is seeing - ”
”airdate”:”date”,”network”:”name”
They should be
"airdate":"date","network":"name"

MongoDB FailedToParse: Bad characters in value

I have a simple mongodb database. I'm dumping using mongodump.
dump command
mongodump --db user_profiles --out /data/dumps/user-profiles
Here is the content of the user_profiles database. It has one collection (user_data) consisting of the following:
{ "_id" : ObjectId("555a882a722f2a009fc136e4"), "username" : "thor", "passwd" : "*1D28C7B35C0CD618178988146861D37C97883D37", "email" : "thor#avengers.com", "phone" : "4023331000" }
{ "_id" : ObjectId("555a882a722f2a009fc136e5"), "username" : "ironman", "passwd" : "*626AC8265C7D53693CB7478376CE1B4825DFF286", "email" : "tony#avengers.com", "phone" : "4023331001" }
{ "_id" : ObjectId("555a882a722f2a009fc136e6"), "username" : "hulk", "passwd" : "*CB375EA58EE918755D4EC717738DCA3494A3E668", "email" : "hulk#avengers.com", "phone" : "4023331002" }
{ "_id" : ObjectId("555a882a722f2a009fc136e7"), "username" : "captain_america", "passwd" : "*B43FA5F9280F393E7A8C57D20648E8E4DFE99BA0", "email" : "steve#avengers.com", "phone" : "4023331003" }
{ "_id" : ObjectId("555a882a722f2a009fc136e8"), "username" : "daredevil", "passwd" : "*B91567A0A3D304343624C30B306A4B893F4E4996", "email" : "daredevil#avengers.com", "phone" : "4023331004" }
After copying the dump to a nfs and then trying to load the dump into a test server using mongorestore
mongorestore --host db-test --port 27017 /remote/dumps/user-profiles
I'm getting the following error:
Mon May 18 20:19:23.918 going into namespace [user_profiles.user_data]
assertion: 16619 code FailedToParse: FailedToParse: Bad characters in value: offset:30
How to resolve this FailedToParse Error
To do further testing I created a test_db with a test_collection that only had one simple value 'x':1, and even that didn't work. So I knew something else had to be going on.
Versions of your tools matters
The version of mongodump that was being used was 3.0.3. The version on another virtual machine that was using mongorestore was 2.4.x. This was the cause for the errors. Once I got the mongodb-org-tools updated on my virtual machine (see official guide) I was able to get up and running as expected.
Hopefully this helps someone in the future. Check your versions!
mongodump --version
mongorestore --version

Solr return 400(bad request) while indexing the json file from localhost

I am new in Solr . I have setup solr in local pc. I am facing problem with the indexing of json file. I have one json file in local pc which i want to index in solr. it show some error which i have mention below.
Error
D:\Solr\Example\exampledocs>java -Durl=http://localhost:8983/solr/update -Dtype=application/json -jar post.jar timeline.json
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update using content-type application/json..
POSTing file timeline.json
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/update
SimplePostTool: WARNING: Response: {"responseHeader":{"status":400,"QTime":0},"error":{"msg":"Unknown command: Name [8]","code":400}}
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/update..
Time spent: 0:00:00.167
Please help me , how can i solve it? Thanks in advance.
Log show
ERROR - 2014-07-30 18:38:52.330; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Unexpected character '{' (code 123) in prolog; expected '<'
at [row,col {unknown-source}]: [1,1]
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '{' (code 123) in prolog; expected '<'
at [row,col {unknown-source}]: [1,1]
at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2047)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
... 32 more
Json file
{ "Name" : "Matches and Schedule", "timestamp" : { "$date" : 1400825267792 }, "_id" : { "$oid" : "537ee50494" } }
{ "Name" : "meet Modi", "timestamp" : { "$date" : 1401449841192 }, "_id" : { "$oid" : "53886d3a2c" } }
Can't comment yet.
Please post a snippet of your JSON file so we can comment.
Also, in these cases, Solr log files are your best friend, they will tell you exactly what it doesn't like about the data you are posting.
Also, you don't have to specify the host if you are sending to localhost, it is the default.
EDIT: Your JSON doesn't look correct. What do you expect it to do with stuff like:
"timestamp" : { "$date" : 1400825267792 }
First of, if you need timestamp is solr to be a date/time field it needs to be in UTC format. Second, Solr doesn't support nested elements.
Finally, if you are posting multiple documents via json, the format needs to look like this:
[ {"id":"doc1","field2":"val2"} , {"id":"doc2","field2":"val3"} ]
Note that all documents are enclosed in square brackets and separated by a comma.