viewing mysql blob with putty - mysql

I am saving a serialized object to a mysql database blob.
After inserting some test objects and then trying to view the table, i am presented with lots of garbage and "PuTTYPuTTY" several times.
I believe this has something to do with character encoding and the blob containing strange characters.
I am just wanting to check and see if this is going to cause problems with my database, or if this is just a problem with putty showing the data?
Description of the QuizTable:
+-------------+-------------+-------------------+------+-----+---------+----------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------------+-------------+-------------------+------+-----+---------+----------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------+
| classId | varchar(20) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | FK related to the ClassTable. This way each Class in the ClassTable is associated with its quiz in the QuizTable. |
| quizId | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | This is the quiz number associated with the quiz. |
| quizObject | blob | NULL | NO | | NULL | | select,insert,update,references | This is the actual quiz object. |
| quizEnabled | tinyint(1) | NULL | NO | | NULL | | select,insert,update,references | |
+-------------+-------------+-------------------+------+-----+---------+----------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------+
What i see when i try to view the table contents:
select * from QuizTable;
questionTextq ~ xp sq ~ w
t q1a1t q1a2xt 1t q1sq ~ sq ~ w
t q2a1t q2a2t q2a3xt 2t q2xt test3 | 1 |
+-------------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
3 rows in set (0.00 sec)

I believe you can use the hex function on blobs as well as strings. You can run a query like this.
Select HEX(quizObject) From QuizTable Where....

Putty is reacting to what it thinks are terminal control character strings in your output stream. These strings allow the remote host to change something about the local terminal without redrawing the entire screen, such as setting the title, positioning the cursor, clearing the screen, etc..
It just so happens that when trying to 'display' something encoded like this, that a lot of binary data ends up sending these characters.
You'll get this reaction catting binary files as well.
blob will completely ignore any character encoding settings you have. It's really intended for storing binary objects like images or zip files.
If this field will only contain text, I'd suggest using a text field.

Related

Pandas to_sql discarding rows when appending to mysql table

I'm working with articles scraped from online newspapers with a mysql database and python. I want to use pandas to_sql method on a dataframe for appending recently scraped articles to a mysql table. It works pretty well, but im having some problems with the following:
Since the articles are automatically scraped from news sites, about 1% of them have issues (encoding, or texts are too long or stuff like that) and dont fit on the mysql table fields. Pandas to_sql method for some reason IGNORES these errors and discards the rows that do not fit. For example I have the following mysql table:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | YES | | NULL | |
| description | text | YES | | NULL | |
| content | text | YES | | NULL | |
| link | varchar(300) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
And I also have a Dataframe that contains 15 rows and 4 columns (title, description, content, link).
If 1 of those rows has a title larger than 255 characters, it wont fit in the mysql table. I expected an error when doing df.to_sql('press', con=con, index=False, if_exists='append'), that way I know i have a problem to fix; but the actual result was that 14 ROWS where appended instead of 15.
This could work for me, but i need to know which row was discarded so i can flag it for later revision. Is it possible to tell pandas to let me know which indexes are ignored?
Thanks!

Load NULL values INT

FIY:
I'm working with a CVS file from Census - FactFinder
Using MySQL 5.7
OS is Windows 10 PRO
So, I created this table:
+----------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+-------+
| SERIALNO | bigint(13) | NO | PRI | NULL | |
| DIVISION | int(9) | YES | | NULL | |
| PUMA | int(4) | YES | | NULL | |
| REGION | int(1) | YES | | NULL | |
| ST | int(1) | YES | | NULL | |
| ADJHSG | int(7) | YES | | NULL | |
| ADJINC | int(7) | YES | | NULL | |
| FINCP | int(6) | YES | | NULL | |
| HINCP | int(6) | YES | | NULL | |
| R60 | int(1) | YES | | NULL | |
| R65 | int(1) | YES | | NULL | |
+----------+------------+------+-----+---------+-------+
And tried to load data using:
LOAD DATA INFILE "C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/Housing_Illinois.csv"
INTO TABLE housing
CHARACTER SET latin1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
It didn`t work as this message appear:
ERROR 1366 (HY000): Incorrect integer value: '' for column 'FINCP' at
row 2
The row the error message is referring to is:
2012000000051,3,104,2,17,1045360,1056030,,8200,1,1
I believed FINCP which is the blank value ,, right before 8200 is the problem. So I followed this thread instructions: MySQL load NULL values from CSV data
And updated my code to:
LOAD DATA INFILE "C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/Housing_Illinois.csv"
INTO TABLE housing
CHARACTER SET latin1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
(#SERIALNO, #DIVISION, #PUMA, #REGION, #ST, #ADJHSG, #ADJINC, #FINCP, #HINCP, #R60, #R65)
SET
SERIALNO = nullif(#SERIALNO,''),
DIVISION = nullif(#DIVISION,''),
PUMA = nullif(#PUMA,''),
REGION = nullif(#REGION,''),
ST = nullif(#ST,''),
ADJHSG = nullif(#ADJHSG,''),
ADJINC = nullif(#ADJINC,''),
FINCP = nullif(#FINCP,''),
HINCP = nullif(#HINCP,''),
R60 = nullif(#R60,''),
R65 = nullif(#R65,'');
The first error is now gone but this message appears:
' for column 'R65' at row 12t integer value: '
The row at which this message is referring to is:
2012000000318,3,1602,2,17,1045360,1056030,,,,
There's no error message so I don't know what exactly is the problem. I can only assume that the problem is that there are four consecutive blank values.
Another tip, if I use CSV and change all blank to 0 the code goes smoothly, but I`m not a fan or editing raw data so I would like to know other options.
Bottom line, I have two questions:
Shouldn`t data be loaded with the first code as MySQL should take ,, as null and 0 a plain 0?
What's the problem I'm getting now that I'm using SERIALNO = nullif(#SERIALNO,'')
I want to be able to differentiate between 0 and null/blank values.
Thank you.
MySQL's LOAD DATA tool interprets \N as being a NULL value. So, if your troubled row looked like this:
2012000000318,3,1602,2,17,1045360,1056030,\N,\N,\N,\N
then you might not have this problem. If you have access to a regex replacement tool, you may try searching for the following pattern:
(?<=^)(?=,)|(?<=,)(?=,)|(?<=,)(?=$)
Then, replace with \N. This should fill in all the empty slots with \N, which semantically will be interpreted by MySQL as meaning NULL. Note that if you were to write a table out from MySQL, then nulls would be replaced with \N. The issue is that your data source and MySQL don't know about each other.

PuTTY outputs weird stuff when selecting in MySQL

I've encountered a strange problem when I was using PuTTY to query the following MySQL command: select * from gts_camera
The output seems extremely weird:
As you can see, putty outputs loads of "PuTTYPuTTYPuTTY..."
Maybe it's because of the table attribute set:
mysql> describe gts_kamera;
+---------+----------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+----------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| datum | datetime | YES | | CURRENT_TIMESTAMP | |
| picture | longblob | YES | | NULL | |
+---------+----------+------+-----+-------------------+----------------+
This table stores some big pictures and their date of creation.
(The weird ASCII-characters you can see on top of the picture is the content.)
Does anybody know why PuTTY outputs such strange stuff, and how to solve/clean this?
Cause I can't type any other commands afterwards. I have to reopen the session again.
Sincerely,
Michael.
The reason this happens is because of the contents of the file (as you have a column defined with longblob). It may have some characters that Putty will not understand, therefore it will break as it is happening with you.
There is a configuration that may help though.
You can also not select every column in that table (at least not the *blob ones) as:
select id, datum from gts_camera;
Or If you still want to do it use the MySql funtion HEX:
select id, datum, HEX(picture) as pic from gts_camera;

MySQL Screwed Up Output

I'm see some very strange outputs from MySQL, and I don't know whether it's my console or my data that's causing this. Here are some screenshots:
Any ideas?
edit:
mysql> describe transformed_step_a1_sfdc_lead_history;
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| old_value | varchar(255) | YES | | NULL | |
| new_value | varchar(255) | YES | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
Max
To verify if there is any control characters, you can use -s option, see http://dev.mysql.com/doc/refman/5.5/en/mysql-command-options.html#option_mysql_raw
It's impossible to tell exactly what the problem is from your screenshots, but the text in your database contains control characters. The usual culprit is CR, which moves the cursor back to the beginning of the line and starts overwriting text already there.
If you have programmatic access to your database then you will be able to dump the values with control characters expressed as pintables so that you can see what is actually in there.

Defining a webservice for usage analytics (dekstop application)

Current situation
I have a desktop application (C++ Win32), and I wish to track users' usage analytics anonymously (actions, clicks, usage time, etc.)
The tracking is done via designated web services for specific actions (install, uninstall, click) and everything is written by my team and stored on our DB.
The need
Now we're adding more usage types and events with a variety of data, so we need define the services.
Instead of having tons of different web services for each action, I want to have a single generic service for all usage types, that is capable of receiving different data types.
For example:
"button_A_click" event, has data with 1 field: {window_name (string)}
"show_notification" event, has data with 3 fields: {source_id (int), user_action (int), index (int)}
Question
I'm looking for an elegant & convenient way to store this sort of diverse data, so later I could query it easily.
The alternatives I can think of:
Storing the different data for each usage type as one field of JSON/XML object, but it would be extremely hard to pull data and write queries for those fields
Having extra N data fields for each record, but it seems very wasteful.
Any ideas for this sort of model? Maybe something like google analytics? please Advise...
Technical: The DB is MySQL running under phpMyAdmin.
Disclaimer:
There is a similar post, which brought to my attention services like DeskMetrics and Tracker bird, or try to embed google analytics to C++ native application, but I'd rather the service to by my own, and better understand how to design this sort of model.
Thanks!
This seems like a database normalization problem.
I am also going to assume that you also have a table named events where all events will be stored.
Additionally, I am going to assume you have to the following data attributes (for simplicity's sake): window_name, source_id, user_action, index
To achieve normalization, we will need the following tables:
events
data_attributes
attribute_types
This is how each of the tables should be structured:
mysql> describe events;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| event_type | varchar(255) | YES | | NULL | |
+------------+------------------+------+-----+---------+----------------+
mysql> describe data_attributes;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| event_id | int(11) | YES | | NULL | |
| attribute_type | int(11) | YES | | NULL | |
| attribute_name | varchar(255) | YES | | NULL | |
| attribute_value | int(11) | YES | | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
mysql> describe attribute_types;
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| type | varchar(255) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
The idea is that you will have to populate attribute_types with all possible types you can have. Then, for each new event, you will add an entry in the events table and corresponding entries in the data_attributes table to map that event to one or more attribute types with the appropriate values.
Example:
"button_A_click" event, has data with 1 field: {window_name "Dummy Window Name"}
"show_notification" event, has data with 3 fields: {source_id: 99, user_action: 44, index: 78}
would be represented as:
mysql> select * from attribute_types;
+----+-------------+
| id | type |
+----+-------------+
| 1 | window_name |
| 2 | source_id |
| 3 | user_action |
| 4 | index |
+----+-------------+
mysql> select * from events;
+----+-------------------+
| id | event_type |
+----+-------------------+
| 1 | button_A_click |
| 2 | show_notification |
+----+-------------------+
mysql> select * from data_attributes;
+----+----------+----------------+-------------------+-----------------+
| id | event_id | attribute_type | attribute_name | attribute_value |
+----+----------+----------------+-------------------+-----------------+
| 1 | 1 | 1 | Dummy Window Name | NULL |
| 2 | 2 | 2 | NULL | 99 |
| 3 | 2 | 3 | NULL | 44 |
| 4 | 2 | 4 | NULL | 78 |
+----+----------+----------------+-------------------+-----------------+
To write a query for this data, you can use the COALESCE function in MySQL to get the value for you without having to check which of the columns is NULL.
Here's a quick example I hacked up:
SELECT events.event_type as `event_type`,
attribute_types.type as `attribute_type`,
COALESCE(data_attributes.attribute_name, data_attributes.attribute_value) as `value`
FROM data_attributes,
events,
attribute_types
WHERE data_attributes.event_id = events.id
AND data_attributes.attribute_type = attribute_types.id
Which yields the following output:
+-------------------+----------------+-------------------+
| event_type | attribute_type | value |
+-------------------+----------------+-------------------+
| button_A_click | window_name | Dummy Window Name |
| show_notification | source_id | 99 |
| show_notification | user_action | 44 |
| show_notification | index | 78 |
+-------------------+----------------+-------------------+
EDIT: Bugger! I read C#, but I see you are using C++. Sorry about that. I leave the answer as-is as its principle could still be useful. Please regard the examples as pseudo-code.
You can define a custom class/structure that you use with an array. Then serialize this data and send to the WebService. For example:
[Serializable()]
public class ActionDefinition {
public string ID;
public ActionType Action; // define an Enum with possible actions
public List[] Fields; //Or a list of 'some class' if you need more complex fields
}
List AnalyticsCollection = new List(Of, Actiondefinition);
// ...
SendToWS(Serialize(AnalyticsCollection));
Now you can dynamically add as many events as you want with the needed flexibility.
on server side you can simply parse the data:
List[of, ActionDefinition] AnalyticsCollection = Deserialize(GetWS());
foreach (ActionDefinition ad in AnalyticsCollection) {
switch (ad.Action) {
//.. check for each action type
}
}
I would suggest adding security mechanisms such as checksum. I imagine the de/serializer would be pretty custom in C++ so perhaps as simple Base64 encoding can do the trick, and it can be transported as ascii text.
You could make a table for each event in wich you declare what param means what. Then you have a main table in wich you only input the events name and param1 etc. An admin tool would be very easy, you go through all events, and describe them using the table where each event is declared. E.g. for your event button_A_click you insert into the description table:
Name Param1
button_A_Click WindowTitle
So you can group your events or select only one event ..
This is how I would solve it.