How to select a line from a MySQL text field - mysql

Say I have a mysql table which contains two columns, one is a job number, and the other contains the stderr of the respective job:
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| job | int(11) | YES | UNI | NULL | |
| jerrors | text | YES | | NULL | |
+----------+---------+------+-----+---------+-------+
The text contains newline characters.
I would like to select from jerrors only the lines containing a given word, i.e. something like
(\n.*word*.*\n)
How would I construct such query? If I do
SELECT job, jerrors FROM mytable WHERE jerrors LIKE "%No such file%";
then I get the whole text in the column.
Here's a sample text
Error in <TCling::RegisterModule>: cannot find dictionary module A2Dict_rdict.pcm
Error in <TCling::RegisterModule>: cannot find dictionary module A1Dict_rdict.pcm
Error in <TCling::RegisterModule>: cannot find dictionary module SpectraDict_rdict.pcm
PedestalT0 File : conf/A2-roT0.job1685_0-run1685_2.dat: No such file or directory
[A1DataDecoder] WARNING: 12 Slots for 20 pedestals! Using '16' for last 19 pedestals ...
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libMathMore.so for ROOT::Math::GSMIntegrator

Related

mariadb select * returns empty set

when I run SELECT * FROM urlcheck
it returns 'EMPTY Set (0.0 sec)'
According SHOW TABLE STATUS LIKE 'urlcheck'
The table has 3 rows.
Table structure is:
+-------------+---------------+------+-----+---------+----------------+<br>
| Field | Type | Null | Key | Default | Extra |<br>
+-------------+---------------+------+-----+---------+----------------+<br>
| id | int(11) | NO | PRI | NULL | auto_increment |<br>
| coursegroup | varchar(20) | YES | | NULL | |<br>
| url | varchar(2588) | YES | | NULL | |<br>
+-------------+---------------+------+-----+---------+----------------+<br>
I start by selecting the database with USE db
any ideas why this happened. I know this is similiar to Mysql select always returns empty setMysql select always returns empty set but that was apparently a corrupted database. I have truncated this database and add new rows and I still get the same problem. The code that adds records FWIW is
cur.execute('insert into urlcheck (coursegroup, url) values("'+coursegroup+'","'+url+'");')
db.commit
cur.close
The problem was a syntax error in my code.
should have been:
db.commit()
cur.close()
lack of parentheses caused the code not to write. I leave this here even as it redounds to my own humiliation in the hopes it helps someone else.

Pandas to_sql discarding rows when appending to mysql table

I'm working with articles scraped from online newspapers with a mysql database and python. I want to use pandas to_sql method on a dataframe for appending recently scraped articles to a mysql table. It works pretty well, but im having some problems with the following:
Since the articles are automatically scraped from news sites, about 1% of them have issues (encoding, or texts are too long or stuff like that) and dont fit on the mysql table fields. Pandas to_sql method for some reason IGNORES these errors and discards the rows that do not fit. For example I have the following mysql table:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | YES | | NULL | |
| description | text | YES | | NULL | |
| content | text | YES | | NULL | |
| link | varchar(300) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
And I also have a Dataframe that contains 15 rows and 4 columns (title, description, content, link).
If 1 of those rows has a title larger than 255 characters, it wont fit in the mysql table. I expected an error when doing df.to_sql('press', con=con, index=False, if_exists='append'), that way I know i have a problem to fix; but the actual result was that 14 ROWS where appended instead of 15.
This could work for me, but i need to know which row was discarded so i can flag it for later revision. Is it possible to tell pandas to let me know which indexes are ignored?
Thanks!

Load NULL values INT

FIY:
I'm working with a CVS file from Census - FactFinder
Using MySQL 5.7
OS is Windows 10 PRO
So, I created this table:
+----------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+-------+
| SERIALNO | bigint(13) | NO | PRI | NULL | |
| DIVISION | int(9) | YES | | NULL | |
| PUMA | int(4) | YES | | NULL | |
| REGION | int(1) | YES | | NULL | |
| ST | int(1) | YES | | NULL | |
| ADJHSG | int(7) | YES | | NULL | |
| ADJINC | int(7) | YES | | NULL | |
| FINCP | int(6) | YES | | NULL | |
| HINCP | int(6) | YES | | NULL | |
| R60 | int(1) | YES | | NULL | |
| R65 | int(1) | YES | | NULL | |
+----------+------------+------+-----+---------+-------+
And tried to load data using:
LOAD DATA INFILE "C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/Housing_Illinois.csv"
INTO TABLE housing
CHARACTER SET latin1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
It didn`t work as this message appear:
ERROR 1366 (HY000): Incorrect integer value: '' for column 'FINCP' at
row 2
The row the error message is referring to is:
2012000000051,3,104,2,17,1045360,1056030,,8200,1,1
I believed FINCP which is the blank value ,, right before 8200 is the problem. So I followed this thread instructions: MySQL load NULL values from CSV data
And updated my code to:
LOAD DATA INFILE "C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/Housing_Illinois.csv"
INTO TABLE housing
CHARACTER SET latin1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
(#SERIALNO, #DIVISION, #PUMA, #REGION, #ST, #ADJHSG, #ADJINC, #FINCP, #HINCP, #R60, #R65)
SET
SERIALNO = nullif(#SERIALNO,''),
DIVISION = nullif(#DIVISION,''),
PUMA = nullif(#PUMA,''),
REGION = nullif(#REGION,''),
ST = nullif(#ST,''),
ADJHSG = nullif(#ADJHSG,''),
ADJINC = nullif(#ADJINC,''),
FINCP = nullif(#FINCP,''),
HINCP = nullif(#HINCP,''),
R60 = nullif(#R60,''),
R65 = nullif(#R65,'');
The first error is now gone but this message appears:
' for column 'R65' at row 12t integer value: '
The row at which this message is referring to is:
2012000000318,3,1602,2,17,1045360,1056030,,,,
There's no error message so I don't know what exactly is the problem. I can only assume that the problem is that there are four consecutive blank values.
Another tip, if I use CSV and change all blank to 0 the code goes smoothly, but I`m not a fan or editing raw data so I would like to know other options.
Bottom line, I have two questions:
Shouldn`t data be loaded with the first code as MySQL should take ,, as null and 0 a plain 0?
What's the problem I'm getting now that I'm using SERIALNO = nullif(#SERIALNO,'')
I want to be able to differentiate between 0 and null/blank values.
Thank you.
MySQL's LOAD DATA tool interprets \N as being a NULL value. So, if your troubled row looked like this:
2012000000318,3,1602,2,17,1045360,1056030,\N,\N,\N,\N
then you might not have this problem. If you have access to a regex replacement tool, you may try searching for the following pattern:
(?<=^)(?=,)|(?<=,)(?=,)|(?<=,)(?=$)
Then, replace with \N. This should fill in all the empty slots with \N, which semantically will be interpreted by MySQL as meaning NULL. Note that if you were to write a table out from MySQL, then nulls would be replaced with \N. The issue is that your data source and MySQL don't know about each other.

Truncate column names in SELECT (MySQL client)

When I'm looking into new databases to explore what is there, usually I get tables with long column names but short contents, like:
mysql> select * from Seat limit 2;
+---------+---------------------+---------------+------------------+--------------+---------------+--------------+-------------+--------------+-------------+---------+---------+----------+------------+---------------+------------------+-----------+-------------+---------------+-----------------+---------------------+-------------------+-----------------+
| seat_id | seat_created | seat_event_id | seat_category_id | seat_user_id | seat_order_id | seat_item_id | seat_row_nr | seat_zone_id | seat_pmp_id | seat_nr | seat_ts | seat_sid | seat_price | seat_discount | seat_discount_id | seat_code | seat_status | seat_sales_id | seat_checked_by | seat_checked_date | seat_old_order_id | seat_old_status |
+---------+---------------------+---------------+------------------+--------------+---------------+--------------+-------------+--------------+-------------+---------+---------+----------+------------+---------------+------------------+-----------+-------------+---------------+-----------------+---------------------+-------------------+-----------------+
| 4897 | 2016-09-01 00:05:54 | 330 | 331 | NULL | NULL | NULL | 0 | NULL | NULL | 0 | NULL | NULL | NULL | 0.00 | NULL | NULL | free | NULL | NULL | 0000-00-00 00:00:00 | NULL | NULL |
| 4898 | 2016-09-01 00:05:54 | 330 | 331 | NULL | NULL | NULL | 0 | NULL | NULL | 0 | NULL | NULL | NULL | 0.00 | NULL | NULL | free | NULL | NULL | 0000-00-00 00:00:00 | NULL | NULL |
+---------+---------------------+---------------+------------------+--------------+---------------+--------------+-------------+--------------+-------------+---------+---------+----------+------------+---------------+------------------+-----------+-------------+---------------+-----------------+---------------------+-------------------+-----------------+
Since the length of the header is longer that the contents of each row, I see a unformatted output which is hard to standard, specially when you search for little clues like fields that aren't being used and so on.
Is there any way to tell mysql client to truncate column names automatically, for example, to 10 characters as maximum? With the first 10 character is usually enough to know which column they refer to.
Of course I could stablish column aliases for that with AS, but if there's too much columns and you want to do a fast exploration, that would take too long for each table.
Other solution will be to tell mysql to remove the prefix seat_ for each column for example (of course, for each column I would need to change the used prefix).
I don't think there's any way to do that automatically. Some options are:
1) Use a graphical UI such as PhpMyAdmin to view the table contents. These typically allow you to adjust column widths.
2) End the query with \G instead of ;:
mysql> SELECT * FROM seat LIMIT 2\G
This will display the columns horizontally instead of vertically:
seat_id: 4897
seat_created: 2016-09-01 00:05:54
seat_event_id: 330
...
I often use the latter for tables with lots of columns because reading the horizontal format can be difficult, especially when it wraps around on the terminal.
3) Use the less pager in a mode that doesn't wrap lines. You can then scroll left and right with the arrow keys.
mysql> pager less -S
See How to better display MySQL table on Terminal
You can skip the column names completely by running the MySQL client with the -N or --skip-column-names option. Then the width of your columns will be determined by the widest data, not the column name. But there would be no row for the column names.
You can also use column aliases to set your own column names, but you'd have to enter these yourself manually.

viewing mysql blob with putty

I am saving a serialized object to a mysql database blob.
After inserting some test objects and then trying to view the table, i am presented with lots of garbage and "PuTTYPuTTY" several times.
I believe this has something to do with character encoding and the blob containing strange characters.
I am just wanting to check and see if this is going to cause problems with my database, or if this is just a problem with putty showing the data?
Description of the QuizTable:
+-------------+-------------+-------------------+------+-----+---------+----------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------------+-------------+-------------------+------+-----+---------+----------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------+
| classId | varchar(20) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | FK related to the ClassTable. This way each Class in the ClassTable is associated with its quiz in the QuizTable. |
| quizId | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | This is the quiz number associated with the quiz. |
| quizObject | blob | NULL | NO | | NULL | | select,insert,update,references | This is the actual quiz object. |
| quizEnabled | tinyint(1) | NULL | NO | | NULL | | select,insert,update,references | |
+-------------+-------------+-------------------+------+-----+---------+----------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------+
What i see when i try to view the table contents:
select * from QuizTable;
questionTextq ~ xp sq ~ w
t q1a1t q1a2xt 1t q1sq ~ sq ~ w
t q2a1t q2a2t q2a3xt 2t q2xt test3 | 1 |
+-------------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
3 rows in set (0.00 sec)
I believe you can use the hex function on blobs as well as strings. You can run a query like this.
Select HEX(quizObject) From QuizTable Where....
Putty is reacting to what it thinks are terminal control character strings in your output stream. These strings allow the remote host to change something about the local terminal without redrawing the entire screen, such as setting the title, positioning the cursor, clearing the screen, etc..
It just so happens that when trying to 'display' something encoded like this, that a lot of binary data ends up sending these characters.
You'll get this reaction catting binary files as well.
blob will completely ignore any character encoding settings you have. It's really intended for storing binary objects like images or zip files.
If this field will only contain text, I'd suggest using a text field.