Truncate column names in SELECT (MySQL client) - mysql

When I'm looking into new databases to explore what is there, usually I get tables with long column names but short contents, like:
mysql> select * from Seat limit 2;
+---------+---------------------+---------------+------------------+--------------+---------------+--------------+-------------+--------------+-------------+---------+---------+----------+------------+---------------+------------------+-----------+-------------+---------------+-----------------+---------------------+-------------------+-----------------+
| seat_id | seat_created | seat_event_id | seat_category_id | seat_user_id | seat_order_id | seat_item_id | seat_row_nr | seat_zone_id | seat_pmp_id | seat_nr | seat_ts | seat_sid | seat_price | seat_discount | seat_discount_id | seat_code | seat_status | seat_sales_id | seat_checked_by | seat_checked_date | seat_old_order_id | seat_old_status |
+---------+---------------------+---------------+------------------+--------------+---------------+--------------+-------------+--------------+-------------+---------+---------+----------+------------+---------------+------------------+-----------+-------------+---------------+-----------------+---------------------+-------------------+-----------------+
| 4897 | 2016-09-01 00:05:54 | 330 | 331 | NULL | NULL | NULL | 0 | NULL | NULL | 0 | NULL | NULL | NULL | 0.00 | NULL | NULL | free | NULL | NULL | 0000-00-00 00:00:00 | NULL | NULL |
| 4898 | 2016-09-01 00:05:54 | 330 | 331 | NULL | NULL | NULL | 0 | NULL | NULL | 0 | NULL | NULL | NULL | 0.00 | NULL | NULL | free | NULL | NULL | 0000-00-00 00:00:00 | NULL | NULL |
+---------+---------------------+---------------+------------------+--------------+---------------+--------------+-------------+--------------+-------------+---------+---------+----------+------------+---------------+------------------+-----------+-------------+---------------+-----------------+---------------------+-------------------+-----------------+
Since the length of the header is longer that the contents of each row, I see a unformatted output which is hard to standard, specially when you search for little clues like fields that aren't being used and so on.
Is there any way to tell mysql client to truncate column names automatically, for example, to 10 characters as maximum? With the first 10 character is usually enough to know which column they refer to.
Of course I could stablish column aliases for that with AS, but if there's too much columns and you want to do a fast exploration, that would take too long for each table.
Other solution will be to tell mysql to remove the prefix seat_ for each column for example (of course, for each column I would need to change the used prefix).

I don't think there's any way to do that automatically. Some options are:
1) Use a graphical UI such as PhpMyAdmin to view the table contents. These typically allow you to adjust column widths.
2) End the query with \G instead of ;:
mysql> SELECT * FROM seat LIMIT 2\G
This will display the columns horizontally instead of vertically:
seat_id: 4897
seat_created: 2016-09-01 00:05:54
seat_event_id: 330
...
I often use the latter for tables with lots of columns because reading the horizontal format can be difficult, especially when it wraps around on the terminal.
3) Use the less pager in a mode that doesn't wrap lines. You can then scroll left and right with the arrow keys.
mysql> pager less -S
See How to better display MySQL table on Terminal

You can skip the column names completely by running the MySQL client with the -N or --skip-column-names option. Then the width of your columns will be determined by the widest data, not the column name. But there would be no row for the column names.
You can also use column aliases to set your own column names, but you'd have to enter these yourself manually.

Related

Pandas to_sql discarding rows when appending to mysql table

I'm working with articles scraped from online newspapers with a mysql database and python. I want to use pandas to_sql method on a dataframe for appending recently scraped articles to a mysql table. It works pretty well, but im having some problems with the following:
Since the articles are automatically scraped from news sites, about 1% of them have issues (encoding, or texts are too long or stuff like that) and dont fit on the mysql table fields. Pandas to_sql method for some reason IGNORES these errors and discards the rows that do not fit. For example I have the following mysql table:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | YES | | NULL | |
| description | text | YES | | NULL | |
| content | text | YES | | NULL | |
| link | varchar(300) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
And I also have a Dataframe that contains 15 rows and 4 columns (title, description, content, link).
If 1 of those rows has a title larger than 255 characters, it wont fit in the mysql table. I expected an error when doing df.to_sql('press', con=con, index=False, if_exists='append'), that way I know i have a problem to fix; but the actual result was that 14 ROWS where appended instead of 15.
This could work for me, but i need to know which row was discarded so i can flag it for later revision. Is it possible to tell pandas to let me know which indexes are ignored?
Thanks!

MySQL - Select everything from one table, but only first matching value in second table

I'm feeling a little rusty with creating queries in MySQL. I thought I could solve this, but I'm having no luck and searching around doesn't result in anything similar...
Basically, I have two tables. I want to select everything from one table and the matching row from the second table. However, I only want to have the first result from the second table. I hope that makes sense.
The rows in the daily_entries table are unique. There will be one row for each day, but maybe not everyday. The second table notes contains many rows, each of which are associated with ONE row from daily_entries.
Below are examples of my tables;
Table One
mysql> desc daily_entries;
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| eid | int(11) | NO | PRI | NULL | auto_increment |
| date | date | NO | | NULL | |
| location | varchar(100) | NO | | NULL | |
+----------+--------------+------+-----+---------+----------------+
Table Two
mysql> desc notes;
+---------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+----------------+
| task_id | int(11) | NO | PRI | NULL | auto_increment |
| eid | int(11) | NO | MUL | NULL | |
| notes | text | YES | | NULL | |
+---------+---------+------+-----+---------+----------------+
What I need to do, is select all entries from notes, with only one result from daily_entries.
Below is an example of how I want it to look:
+----------------------------------------------+---------+------------+----------+-----+
| notes | task_id | date | location | eid |
+----------------------------------------------+---------+------------+----------+-----+
| Another note | 3 | 2014-01-02 | Home | 2 |
| Enter a note. | 1 | 2014-01-01 | Away | 1 |
| This is a test note. To see what happens. | 2 | | Away | 1 |
| Testing another note | 4 | | Away | 1 |
+----------------------------------------------+---------+------------+----------+-----+
4 rows in set (0.00 sec)
Below is the query that I currently have:
SELECT notes.notes, notes.task_id, daily_entries.date, daily_entries.location, daily_entries.eid
FROM daily_entries
LEFT JOIN notes ON daily_entries.eid=notes.eid
ORDER BY daily_entries.date DESC
Below is an example of how it looks with my query:
+----------------------------------------------+---------+------------+----------+-----+
| notes | task_id | date | location | eid |
+----------------------------------------------+---------+------------+----------+-----+
| Another note | 3 | 2014-01-02 | Home | 2 |
| Enter a note. | 1 | 2014-01-01 | Away | 1 |
| This is a test note. To see what happens. | 2 | 2014-01-01 | Away | 1 |
| Testing another note | 4 | 2014-01-01 | Away | 1 |
+----------------------------------------------+---------+------------+----------+-----+
4 rows in set (0.00 sec)
At first I thought I could simply GROUP BY daily_entries.date, however that returned only the first row of each matching set. Can this even be done? I would greatly appreciate any help someone can offer. Using Limit at the end of my query obviously limited it to the value that I specified, but applied it to everything which was to be expected.
Basically, there's nothing wrong with your query. I believe it is exactly what you need because it is returning the data you want. You can not look at as if it is duplicating your daily_entries you should be looking at it as if it is return all notes with its associated daily_entry.
Of course, you can achieve what you described in your question (there's an answer already that solve this issue) but think twice before you do it because such nested queries will only add a lot of noticeable performance overhead to your database server.
I'd recommend to keep your query as simple as possible with one single LEFT JOIN (which is all you need) and then let consuming applications manipulate the data and present it the way they need to.
Use mysql's non-standard group by functionality:
SELECT n.notes, n.task_id, de.date, de.location, de.eid
FROM notes n
LEFT JOIN (select * from
(select * from daily_entries ORDER BY date DESC) x
group by eid) de ON de.eid = n.eid
You need to do these queries with explicit filtering for the last row. This example uses a join to do this:
SELECT n.notes, n.task_id, de.date, de.location, de.eid
FROM daily_entries de LEFT JOIN
notes n
ON de.eid = n.eid LEFT JOIN
(select n.eid, min(task_id) as min_task_id
from notes n
group by n.eid
) nmin
on n.task_id = nmin.min_task_id
ORDER BY de.date DESC;

mysql - How to filter results without specifying the columns

It might sound silly but Im just curious.
I have a table named posts:
+----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| title | varchar(50) | YES | | NULL | |
| body | text | YES | | NULL | |
| created | datetime | YES | | NULL | |
| modified | datetime | YES | | NULL | |
+----------+------------------+------+-----+---------+----------------+
The values:
+----+-----------------------+----------------------------------------+---------------------+---------------------+
| id | title | body | created | modified |
+----+-----------------------+----------------------------------------+---------------------+---------------------+
| 2 | A title once again!!! | And the post body follows. Tralalalala | 2013-06-03 13:13:44 | 2013-06-05 09:36:51 |
| 3 | Title strikes back | This is really exciting! Not. | 2013-06-03 13:13:46 | NULL |
| 11 | Tomcat | Tommy boy!!! FFF | 2013-06-04 16:33:22 | 2013-06-04 16:48:40 |
| 12 | FFD | dsfdsf | 2013-06-04 16:48:56 | 2013-06-04 16:55:50 |
| 13 | fdf | dfdsf | 2013-06-04 16:57:47 | 2013-06-05 09:36:54 |
| 14 | GGD | dsfdsf | 2013-06-04 17:02:33 | 2013-06-04 17:02:33 |
| 15 | GG# | dsfdsfff322 | 2013-06-05 09:36:20 | 2013-06-05 09:36:28 |
+----+-----------------------+----------------------------------------+---------------------+---------------------+
Let's say I want to search for row that has the value Th (not case sensitive) regardless of the FIELD. This is like making a quick search function.
Normally I would do something like : SELECT * FROM posts WHERE title LIKE '%Th%' OR body LIKE '%Th%'
I did not include the other fields because obviously they are not gonna accept those values.
I wanna know if there's a shortcut to this? Like SELECT * FROM posts LIKE '%Th%'.
Please advise. Thanks.
Using plain old SQL you need to specify all the column names you wish to include.
If you want more search-box-like behavior, I'd suggest looking at MySQL's fulltext functions; see:
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
The SQL language is based on the presumption of the schema being known. Thus, there is no "search any column" type of functionality. How would it work against non-text columns? What about columns of different collations? Aside from the language not having a feature, specifying the columns expresses your intent to the next developer and that as much as anything should be an overriding consideration.
Other answers have covered that you need to specify all the columns. Here is an alternative formulation that is a bit shorter:
SELECT *
FROM posts
WHERE concat(title, ' ', body) LIKE '%Th%'
If you are looking for an exact match, then you can do:
select *
from posts
where 'Th' in (title, body)
No there is no shortcut for using a where clause. and specifying the columns. Otherwise the query engine can never know what to filter and what column to filter unless you specify them in the where clause.
If you want a custom shortcut - you can write a function which takes a single parameter (the search string) and returns the required fields.
I'm afraid there isn't.
Not sure what your use case is... does this alternative approach work for your use case?
mysql -u{user} -p{password} -h{hostname} {database_name} -B -e "{query}" | grep "{search_string}"
It connects to the database and runs the specified query, returns query results in new lines, fields separated by tab stop. Then use Unix utility grep to filter returned rows.

MySQL/RDBMS: Is it okay to index long strings? Will it do the job?

Let's suppose I have a table of movies:
+------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| title | tinytext | YES | | NULL | |
| synopsis | synopsis | YES | | NULL | |
| year | int(4) | YES | | NULL | |
| ISBN | varchar(13) | YES | | NULL | |
| category | tinytext | YES | | NULL | |
| author | tinytext | YES | | NULL | |
| theme | tinytext | YES | | NULL | |
| edition | int(2) | YES | | NULL | |
| search | text | YES | | NULL | |
+------------+---------------------+------+-----+---------+----------------+
In this example, I'm using search column as a summary of the table. So, a possible record would be like the following:
+------------+-------------------------------------------------------------+
| Field | Value |
+------------+-------------------------------------------------------------+
| id | 1 |
| title | Awesome Book |
| synopsis | This is a cool book with a cool history |
| year | 2013 |
| ISBN | 1234567890123 |
| category | Horror |
| author | John Doe |
| theme | Programmer goes insane |
| edition | 2nd |
| search | 2013 horror john doe awesome book this is a cool book (...) |
+------------+---------------------+------+-----+---------+----------------+
This column search will be the one scanned when a search is made. Notice that it has all the words of other fields, in lower case, and possibly some extra words to help on a search.
I have two questions about it:
1) Knowing that this column is a text field and can get really big, is it okay to index it? Will it improve the performance as expected? Why?
2) Despite the index, is it a good idea to use this method to search or is it better to try every column on my query? How can I improve it?
OBS: I don't really have this table, it's just for example purposes. Please ignore any error in datatypes or syntax I may have done.
1) Knowing that this column is a text field and can get really big, is
it okay to index it? Will it improve the performance as expected? Why?
Yes, you can index it, but no, it won't improve performance. An index on string-type columns only helps when the query matches the start of the column - so in your case, someone searching '2013 horror john' would hit the index, but someone searching 'horror john 2013' would not.
2) Despite the index, is it a good idea to use this method to search
or is it better to try every column on my query? How can I improve it?
As Gordon Linoff writes, the best solution is probably full text searching - this is blazingly fast for text searches, deals with "fuzzy" matching, and generally allows you to write a search function similar to the way google works.
Indexing the search column is not helpful.
What you may want is full text search capabilities on the column, which you can read about here.
Which you use for search depends on whether the searches will be using context. If someone searches for "Clinton", do you want them to restrict the search to authors named "Clinton" or to books about "Clinton"? If you don't care about the context, then full text on one field is quite reasonable.
I need to add: you don't need to put all the search terms in a separate field to use full text search. You can create a full text index on multiple columns. This gives you the flexibility of using full text searches with context (by looking only in specific columns) or without context (by looking in all of them). Your question was about the search column in particular, but that is not the best way to implement the functionality that you are looking for.

Defining a webservice for usage analytics (dekstop application)

Current situation
I have a desktop application (C++ Win32), and I wish to track users' usage analytics anonymously (actions, clicks, usage time, etc.)
The tracking is done via designated web services for specific actions (install, uninstall, click) and everything is written by my team and stored on our DB.
The need
Now we're adding more usage types and events with a variety of data, so we need define the services.
Instead of having tons of different web services for each action, I want to have a single generic service for all usage types, that is capable of receiving different data types.
For example:
"button_A_click" event, has data with 1 field: {window_name (string)}
"show_notification" event, has data with 3 fields: {source_id (int), user_action (int), index (int)}
Question
I'm looking for an elegant & convenient way to store this sort of diverse data, so later I could query it easily.
The alternatives I can think of:
Storing the different data for each usage type as one field of JSON/XML object, but it would be extremely hard to pull data and write queries for those fields
Having extra N data fields for each record, but it seems very wasteful.
Any ideas for this sort of model? Maybe something like google analytics? please Advise...
Technical: The DB is MySQL running under phpMyAdmin.
Disclaimer:
There is a similar post, which brought to my attention services like DeskMetrics and Tracker bird, or try to embed google analytics to C++ native application, but I'd rather the service to by my own, and better understand how to design this sort of model.
Thanks!
This seems like a database normalization problem.
I am also going to assume that you also have a table named events where all events will be stored.
Additionally, I am going to assume you have to the following data attributes (for simplicity's sake): window_name, source_id, user_action, index
To achieve normalization, we will need the following tables:
events
data_attributes
attribute_types
This is how each of the tables should be structured:
mysql> describe events;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| event_type | varchar(255) | YES | | NULL | |
+------------+------------------+------+-----+---------+----------------+
mysql> describe data_attributes;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| event_id | int(11) | YES | | NULL | |
| attribute_type | int(11) | YES | | NULL | |
| attribute_name | varchar(255) | YES | | NULL | |
| attribute_value | int(11) | YES | | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
mysql> describe attribute_types;
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| type | varchar(255) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
The idea is that you will have to populate attribute_types with all possible types you can have. Then, for each new event, you will add an entry in the events table and corresponding entries in the data_attributes table to map that event to one or more attribute types with the appropriate values.
Example:
"button_A_click" event, has data with 1 field: {window_name "Dummy Window Name"}
"show_notification" event, has data with 3 fields: {source_id: 99, user_action: 44, index: 78}
would be represented as:
mysql> select * from attribute_types;
+----+-------------+
| id | type |
+----+-------------+
| 1 | window_name |
| 2 | source_id |
| 3 | user_action |
| 4 | index |
+----+-------------+
mysql> select * from events;
+----+-------------------+
| id | event_type |
+----+-------------------+
| 1 | button_A_click |
| 2 | show_notification |
+----+-------------------+
mysql> select * from data_attributes;
+----+----------+----------------+-------------------+-----------------+
| id | event_id | attribute_type | attribute_name | attribute_value |
+----+----------+----------------+-------------------+-----------------+
| 1 | 1 | 1 | Dummy Window Name | NULL |
| 2 | 2 | 2 | NULL | 99 |
| 3 | 2 | 3 | NULL | 44 |
| 4 | 2 | 4 | NULL | 78 |
+----+----------+----------------+-------------------+-----------------+
To write a query for this data, you can use the COALESCE function in MySQL to get the value for you without having to check which of the columns is NULL.
Here's a quick example I hacked up:
SELECT events.event_type as `event_type`,
attribute_types.type as `attribute_type`,
COALESCE(data_attributes.attribute_name, data_attributes.attribute_value) as `value`
FROM data_attributes,
events,
attribute_types
WHERE data_attributes.event_id = events.id
AND data_attributes.attribute_type = attribute_types.id
Which yields the following output:
+-------------------+----------------+-------------------+
| event_type | attribute_type | value |
+-------------------+----------------+-------------------+
| button_A_click | window_name | Dummy Window Name |
| show_notification | source_id | 99 |
| show_notification | user_action | 44 |
| show_notification | index | 78 |
+-------------------+----------------+-------------------+
EDIT: Bugger! I read C#, but I see you are using C++. Sorry about that. I leave the answer as-is as its principle could still be useful. Please regard the examples as pseudo-code.
You can define a custom class/structure that you use with an array. Then serialize this data and send to the WebService. For example:
[Serializable()]
public class ActionDefinition {
public string ID;
public ActionType Action; // define an Enum with possible actions
public List[] Fields; //Or a list of 'some class' if you need more complex fields
}
List AnalyticsCollection = new List(Of, Actiondefinition);
// ...
SendToWS(Serialize(AnalyticsCollection));
Now you can dynamically add as many events as you want with the needed flexibility.
on server side you can simply parse the data:
List[of, ActionDefinition] AnalyticsCollection = Deserialize(GetWS());
foreach (ActionDefinition ad in AnalyticsCollection) {
switch (ad.Action) {
//.. check for each action type
}
}
I would suggest adding security mechanisms such as checksum. I imagine the de/serializer would be pretty custom in C++ so perhaps as simple Base64 encoding can do the trick, and it can be transported as ascii text.
You could make a table for each event in wich you declare what param means what. Then you have a main table in wich you only input the events name and param1 etc. An admin tool would be very easy, you go through all events, and describe them using the table where each event is declared. E.g. for your event button_A_click you insert into the description table:
Name Param1
button_A_Click WindowTitle
So you can group your events or select only one event ..
This is how I would solve it.