Ruby & MySQL: How to handle missing elements while parsing XML file - mysql

Currently I am trying to parse large xml file, Here is the how my xml file looks like:
<post>
<row Id="22" PostTypeId="2" ParentId="9" CreationDate="2008-08-01T12:07:19.500" Score="7" Body="<p>The best way that I know of because of leap years and everything is:</p>
<pre><code>DateTime birthDate = new DateTime(2000,3,1);<br>int age = (int)Math.Floor((DateTime.Now - birthDate).TotalDays / 365.25D);<br></code></pre>
<p>Hope this helps.</p>" OwnerUserId="17" LastEditorUserId="17" LastEditorDisplayName="Nick" LastEditDate="2008-08-01T15:26:37.087" LastActivityDate="2008-08-01T15:26:37.087" CommentCount="1" CommunityOwnedDate="2011-08-16T19:40:43.080" />
<row Id="29" PostTypeId="2" ParentId="13" CreationDate="2008-08-01T12:19:17.417" Score="18" Body="<p>There are no HTTP headers that will report the clients timezone so far although it has been suggested to include it in the HTTP specification.</p>
<p>If it was me, I would probably try to fetch the timezone using clientside JavaScript and then submit it to the server using Ajax or something.</p>" OwnerUserId="19" LastActivityDate="2008-08-01T12:19:17.417" CommentCount="0" />
</post>
Different between these two records in this XML file is that doesn't have LastEditDate element. I believe as a result of that I get the following error:
/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date/format.rb:1031:in `dup': can't dup NilClass (TypeError)
from /soft/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date/format.rb:1031:in `_parse'
from /soft/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date.rb:1732:in `parse'
from load.rb:105:in `on_start_element'
from load.rb:165:in `parse'
Here is the code segment that its getting referred:
if element == 'row'
#post_st.execute(attributes['Id'], attributes['PostTypeId'], attributes['AcceptedAnswerId'], attributes['ParentId'], attributes['Score'], attributes['ViewCount'],
attributes['Body'], attributes['OwnerUserId'] == nil ? -1 : attributes['OwnerUserId'], attributes['LastEditorUserId'], attributes['LastEditorDisplayName'],
DateTime.parse(attributes['LastEditDate']).to_time.strftime("%F %T"), DateTime.parse(attributes['LastActivityDate']).to_time.strftime("%F %T"), attributes['Title'] == nil ? '' : attributes['Title'],
attributes['AnswerCount'] == nil ? 0 : attributes['AnswerCount'], attributes['CommentCount'] == nil ? 0 : attributes['CommentCount'],
attributes['FavoriteCount'] == nil ? 0 : attributes['FavoriteCount'], DateTime.parse(attributes['CreationDate']).to_time.strftime("%F %T"))
post_id = attributes['Id']
furthermore I think this is the line where I look for LastEditDate
DateTime.parse(attributes['LastEditDate']).to_time.strftime("%F %T"), DateTime.parse(attributes['LastActivityDate']).to_time.strftime("%F %T"), attributes['Title'] == nil ? '' : attributes['Title']
I guess since the element doesn't exist I get the above mentioned error. I was wondering how do I handle this scenario where if an element doesn't exist set it to a default value. Because while I am parsing these record I insert them into MySQL database. Which has following table structure:
+--------------------------+--------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------+--------------+------+-----+---------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | |
| post_type_id | int(11) | NO | | NULL | |
| accepted_answer_id | int(11) | YES | | NULL | |
| parent_id | int(11) | YES | MUL | NULL | |
| score | int(11) | YES | | NULL | |
| view_count | int(11) | YES | | NULL | |
| body_text | text | YES | | NULL | |
| owner_id | int(11) | NO | | NULL | |
| last_editor_user_id | int(11) | YES | | NULL | |
| last_editor_display_name | varchar(40) | YES | | NULL | |
| last_edit_date | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| last_activity_date | timestamp | NO | | 0000-00-00 00:00:00 | |
| title | varchar(256) | NO | | NULL | |
| answer_count | int(11) | NO | | NULL | |
| comment_count | int(11) | NO | | NULL | |
| favorite_count | int(11) | NO | | NULL | |
| created | timestamp | NO | | 0000-00-00 00:00:00 | |
+--------------------------+--------------+------+-----+---------------------+-----------------------------+
I have setup last_edit_date as not null column.
Based on the answer provided I made the change but error still remains the same:
def convert_to_mysql_time(date='1973-01-01T01:01:01.000')
DateTime.parse(date).to_time.strftime("%F %T")
end
def on_start_element(element, attributes)
if element == 'row'
#post_st.execute(attributes['Id'], attributes['PostTypeId'], attributes['AcceptedAnswerId'], attributes['ParentId'], attributes['Score'], attributes['ViewCount'],
attributes['Body'], attributes['OwnerUserId'] == nil ? -1 : attributes['OwnerUserId'], attributes['LastEditorUserId'], attributes['LastEditorDisplayName'],
convert_to_mysql_time(attributes['LastEditDate']), DateTime.parse(attributes['LastActivityDate']).to_time.strftime("%F %T"), attributes['Title'] == nil ? '' : attributes['Title'],
attributes['AnswerCount'] == nil ? 0 : attributes['AnswerCount'], attributes['CommentCount'] == nil ? 0 : attributes['CommentCount'],
attributes['FavoriteCount'] == nil ? 0 : attributes['FavoriteCount'], DateTime.parse(attributes['CreationDate']).to_time.strftime("%F %T"))
post_id = attributes['Id']
Here is the error:
/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date/format.rb:1031:in `dup': can't dup NilClass (TypeError)
from /soft/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date/format.rb:1031:in `_parse'
from /soft/ruby/1.9.2/ubuntuamd1/lib/ruby/1.9.1/date.rb:1732:in `parse'
from load.rb:102:in `convert_to_mysql_time'
from load.rb:109:in `on_start_element'
from load.rb:169:in `parse'
from load.rb:169:in `<main>'

I would, write a method that converts String dates to MySQL dates, and supply it a default value if the nil is supplied to the method, e.g:
def convert_to_my_sql_date(date)
date = '1973-01-01T01:01:01.000' if (date.empty? rescue true) #was added since empty string gets supplied as an argument, and the rescue to make arguments that do not respond to empty? take a default date
DateTime.parse(date).to_time.strftime("%F %T")
end
So when the date is nil it uses the default, then you can now use as below in your method:
convert_to_my_sql_date(attributes['LastEditDate'])

Related

Getting error while sending or writing python dataframe on real time server

I am working on python dataframe to send on real time server database (Mysql). The code is working fine on local machine but facing issue on server.
The below code i have tried.
import pandas as pd
from sqlalchemy import create_engine
def db_write(db_config,contact_df):
IP_ADDR=db_config["ip_addr"]
PORT_NUMBER=db_config["port_num"]
USER_NAME=db_config["user_name"]
PASSWORD=db_config["password"]
engine = create_engine("mysql+pymysql://"+USER_NAME+":"+PASSWORD+"#"+IP_ADDR+"/db_replica")
con = engine.connect()
contact_df.to_sql(con=con, name='users',if_exists='append', index=False)
con.close()
#call a db_write() function
db_write(json_data['mysql_db'],processed_db_df)
I want to write the processed_db_df dataframedat into database (mysql). But, while running the code on real time server getting below error.
sqlalchemy.exc.DataError: (pymysql.err.DataError) "Incorrect string
value: '\xE0\xB8\xAAibh...' for column 'first_name'
sqlalchemy.exc.IntegrityError: (pymysql.err.IntegrityError) "Column
'last_name' cannot be null")
I tried with setting the the charset utf value at the end of connection string link below
engine
create_engine("mysql+pymysql://"+USER_NAME+":"+PASSWORD+"#"+IP_ADDR+"/db_replica?charset=utf8")
But still. issue did not resolved.
I check the database table schema and it looks like below
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+---------+-------+
| Unnamed: 0 | bigint(20) | YES | | NULL | |
| ext_lead_id | text | YES | | NULL | |
| activity | text | YES | | NULL | |
| update_date_time | text | YES | | NULL | |
| first_name | text | NO | | NULL | |
| last_name | text | YES | | NULL
Instead of text i want varchar as a dataype or else plz help me with the custome schema in SQLAlChermy
Thanks in advance

Load NULL values INT

FIY:
I'm working with a CVS file from Census - FactFinder
Using MySQL 5.7
OS is Windows 10 PRO
So, I created this table:
+----------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+-------+
| SERIALNO | bigint(13) | NO | PRI | NULL | |
| DIVISION | int(9) | YES | | NULL | |
| PUMA | int(4) | YES | | NULL | |
| REGION | int(1) | YES | | NULL | |
| ST | int(1) | YES | | NULL | |
| ADJHSG | int(7) | YES | | NULL | |
| ADJINC | int(7) | YES | | NULL | |
| FINCP | int(6) | YES | | NULL | |
| HINCP | int(6) | YES | | NULL | |
| R60 | int(1) | YES | | NULL | |
| R65 | int(1) | YES | | NULL | |
+----------+------------+------+-----+---------+-------+
And tried to load data using:
LOAD DATA INFILE "C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/Housing_Illinois.csv"
INTO TABLE housing
CHARACTER SET latin1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
It didn`t work as this message appear:
ERROR 1366 (HY000): Incorrect integer value: '' for column 'FINCP' at
row 2
The row the error message is referring to is:
2012000000051,3,104,2,17,1045360,1056030,,8200,1,1
I believed FINCP which is the blank value ,, right before 8200 is the problem. So I followed this thread instructions: MySQL load NULL values from CSV data
And updated my code to:
LOAD DATA INFILE "C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/Housing_Illinois.csv"
INTO TABLE housing
CHARACTER SET latin1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
(#SERIALNO, #DIVISION, #PUMA, #REGION, #ST, #ADJHSG, #ADJINC, #FINCP, #HINCP, #R60, #R65)
SET
SERIALNO = nullif(#SERIALNO,''),
DIVISION = nullif(#DIVISION,''),
PUMA = nullif(#PUMA,''),
REGION = nullif(#REGION,''),
ST = nullif(#ST,''),
ADJHSG = nullif(#ADJHSG,''),
ADJINC = nullif(#ADJINC,''),
FINCP = nullif(#FINCP,''),
HINCP = nullif(#HINCP,''),
R60 = nullif(#R60,''),
R65 = nullif(#R65,'');
The first error is now gone but this message appears:
' for column 'R65' at row 12t integer value: '
The row at which this message is referring to is:
2012000000318,3,1602,2,17,1045360,1056030,,,,
There's no error message so I don't know what exactly is the problem. I can only assume that the problem is that there are four consecutive blank values.
Another tip, if I use CSV and change all blank to 0 the code goes smoothly, but I`m not a fan or editing raw data so I would like to know other options.
Bottom line, I have two questions:
Shouldn`t data be loaded with the first code as MySQL should take ,, as null and 0 a plain 0?
What's the problem I'm getting now that I'm using SERIALNO = nullif(#SERIALNO,'')
I want to be able to differentiate between 0 and null/blank values.
Thank you.
MySQL's LOAD DATA tool interprets \N as being a NULL value. So, if your troubled row looked like this:
2012000000318,3,1602,2,17,1045360,1056030,\N,\N,\N,\N
then you might not have this problem. If you have access to a regex replacement tool, you may try searching for the following pattern:
(?<=^)(?=,)|(?<=,)(?=,)|(?<=,)(?=$)
Then, replace with \N. This should fill in all the empty slots with \N, which semantically will be interpreted by MySQL as meaning NULL. Note that if you were to write a table out from MySQL, then nulls would be replaced with \N. The issue is that your data source and MySQL don't know about each other.

Unexpected NOT EQUAL TO NULL comparison in MySQL [duplicate]

This question already has answers here:
What's the difference between " = null" and " IS NULL"?
(4 answers)
Closed 5 years ago.
I have below table in MySQL.
city_data
+------+-----------+-------------+
| id | city_code | city_name |
+------+-----------+-------------+
| 4830 | BHR | Bharatpur |
| 4831 | KEP | Nepalgunj |
| 4833 | OHS | Sohar |
| 4834 | NULL | Shirdi |
+------+-----------+-------------+
and below query.
select id,city_code,city_name from city_data where city_code != 'BHR';
I was expecting 3 rows.
| 4831 | KEP | Nepalgunj |
| 4833 | OHS | Sohar |
| 4834 | NULL | Shirdi |
+------+-----------+-------------+
But getting only 2 rows.
| 4831 | KEP | Nepalgunj |
| 4833 | OHS | Sohar |
+------+-----------+-------------+
I am not able to understand why the row
| 4834 | NULL | Shirdi |
Not includes in the result of my query. The where condition(NULL != 'BHR') should have been passed.
Please, someone, help to clear the doubt.
According to MySQL Reference Manual, section 3.3.4.6: Working with NULL values the following is why:
Because the result of any arithmetic comparison with NULL is also
NULL, you cannot obtain any meaningful results from such comparisons.
In MySQL, 0 or NULL means false and anything else means true. The
default truth value from a boolean operation is 1.
This means that NULL != 'BHR' will evaluate to NULL, which in turn will mean false to MySQL. In order for the query to work as you want, you have to append OR city_code IS NULL to your query.
You cannot compare null values with !=, because it is null, use IS NULL predicate instead:
select id,city_code,city_name
from city_data
where city_code != 'BHR' OR city_code IS NULL;
It is not possible to test for NULL values with comparison operators, such as =, <, or <>. Therefore query is confusing and NULL record is being ignored. for more info go to https://www.w3schools.com/sql/sql_null_values.asp

MySQL query executes fine, but returns (false) empty result set when using != NULL?

I have the following result set, that I'm trying to drill down
+----+---------+---------------+---------------------+----------------------+---------------+-----------+------------------+------------------+
| id | auth_id | trusts_number | buy_sell_actions_id | corporate_actions_id | fx_actions_id | submitted | created_at | updated_at |
+----+---------+---------------+---------------------+----------------------+---------------+-----------+------------------+------------------+
| 2 | 6 | N100723 | 2 | NULL | NULL | 0 | 08/05/2015 11:30 | 08/05/2015 15:32 |
| 5 | 6 | N100723 | NULL | NULL | 1 | 0 | 08/05/2015 15:10 | 08/05/2015 15:10 |
| 6 | 6 | N100723 | NULL | NULL | 2 | 1 | 08/05/2015 15:12 | 08/05/2015 15:41 |
+----+---------+---------------+---------------------+----------------------+---------------+-----------+------------------+------------------+
This result set is generated with the query
SELECT * FROM actions WHERE auth_id = 6 AND trusts_number = 'N100723'
I also want to get rid of any field with fx_actions is NULL, so I change the query to
SELECT * FROM actions WHERE auth_id = 6 AND trusts_number = 'N100723' AND fx_actions_id != NULL
However this returns an empty result set. I've never used "negative" query parameters in MySQL before, so I'm not sure if they should take on a different syntax or what?
Any help would be much appreciated.
Normal comparison operators don't work well with NULL. Both Something = NULL and Something != NULL will return 'unknown', which causes the row to be omitted in the result. Use the special operators IS NULL and IS NOT NULL instead:
SELECT * FROM actions
WHERE auth_id = 6
AND trusts_number = 'N100723'
AND fx_actions_id IS NOT NULL
Wikipedia on NULL and its background
Because null isn't a value, you should use IS NOT NULL

django insert a raw to msql

I am a fresh hand at django. I want to insert a new row into MySQL database, but when I trying to do as following, it goes error.
from django.db import models
...
class Msg(models.Model):
MsgId = BigIntegerField(length = 20)
ToUserName = CharField(max_length = 45)
FromUserName = CharField(max_length = 45)
Content = TextField(max_length = 1024, blank = True)
...
db_entry = Msg(MsgId=received_MsgId, ToUserName=received_ToUserName,
FromUserName=received_FromUserName, MsgType=received_MsgType,
Content=received_Content)
db_entry.save()
This following is the table all_massages existing in my database, and how can I add a new row to it, and what extra things I need to do.
+--------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+-------+
| MsgId | bigint(20) unsigned | NO | PRI | NULL | |
| ToUserName | varchar(45) | NO | | NULL | |
| FromUserName | varchar(45) | NO | | NULL | |
| Content | text | YES | | NULL | |
+--------------+---------------------+------+-----+---------+-------+
If you have altered mode and executed syncdb ...it wont work....so delete the existing table and execute syncdb!!
Django documentation clearly specifies how to perform raw sql queries.
https://docs.djangoproject.com/en/dev/topics/db/sql/#performing-raw-sql-queries
For directly executing UPDATE, INSERT, or DELETE queries.
https://docs.djangoproject.com/en/dev/topics/db/sql/#executing-custom-sql-directly