Being the sports nerd that I am, I'm looking to take daily XML files that are produced by the Major League Baseball Website, and import them into either an Access or MySQL database. The issue I'm running into, is that almost every XML file they produce is just slightly different than the last. For example, one game file may have a field named batter23 that is next to event22 while the other file calls it batter24and is next to pitcher25. I know that XML files can be inconsistent, but I know there has to be a way to consistently get the data into a database. Is there anyway to standardize these XML files? Some code that will parse each file in a list, and organize them into a specific style and giving them consistent field names? Currently I import the XML file into a Excel sheet first, where I change the file type to a CSV, but from there the field names and column locations are still different from file to file.
My goal is to have all the files in a structure where I can quickly import them into a database each day, without having to manually change column locations, or field names. I'm open to any and all options, but my experience in most languages are rookie level at best, so forgive me for my lack of knowledge.
The files are pretty standard as far as XML goes.., you just have to figure what each file represents.
I did a quick look around a Red Sox v Royals game from September 14. (Go Sox!)
In year_2014/month_09/day_14/gid_2014_09_14_bosmlb_kcamlb_1/players.xml
I can see that Ortiz has an id of 120074.
If I look in batters for his player Id, I can see his stats for that game.
(year_2014/month_09/day_14/gid_2014_09_14_bosmlb_kcamlb_1/batters/120074.xml)
It goes on. Basically, in order to load these files into a database, you will have perform some level of processing for them to make any sense.
The IDs don't appear to change between games, but I only took a cursory glance.
As for loading the data, XML::Simple in perl can take an XML and spit out a perl data structure very easily. Unless you need something more heavy duty, this should cover you.
Loading the players.xml:
#!/bin/env perl
use strict; use warnings;
use Data::Dumper;
use XML::Simple;
my $players_xml = XMLin('players.xml');
print Dumper $xml;
Gives you something like:
$VAR1 = {
'venue' => 'Kauffman Stadium',
'date' => 'September 14, 2014',
'team' => {
'Boston Red Sox' => {
'id' => 'BOS',
'player' => {
'605141' => {
'avg' => '.283',
'team_abbrev' => 'BOS',
'parent_team_id' => '111',
'hr' => '4',
'team_id' => '111',
'status' => 'A',
'last' => 'Betts',
'rl' => 'R',
'parent_team_abbrev' => 'BOS',
'first' => 'Mookie',
'rbi' => '12',
'game_position' => '2B',
'num' => '50',
'position' => '2B',
'current_position' => '2B',
'boxname' => 'Betts',
'bats' => 'R',
'bat_order' => '1'
},
...
It's then trivial to navigate these hashes and insert DB rows as you like.
Related
I'm hoping for a straight-forward solution to do this, but so far I've been coming up empty...
I have a front end vue app/form that sends data back to my laravel backend - I have a controller that validates and saves the request (not looking for feedback on this architecture at the moment unless it actually solves the problem - that's a task for another day...)
I've added a json column called "custom_redeem_fields"
For context, it's to support more flexibility and accepts key/val pairs to use in another field called "custom_redeem_instructions" that has text with delimiters for each of the keys from "custom_redeem_fields", although, I'd prefer to keep from defining these keys statically because the whole point is to be able to add new keys at will. So custom_redeem_instructions will read something like "please visit {•URL•} and enter code {•CODE•}..." and those values will come from the custom_redeem_fields json field.
In the model, I have "custom_redeem_fields" in the fillable array, as well as set as castable to json.
protected $fillable = ['custom_redeem_fields'];
protected $casts = ['custom_redeem_fields' => 'json'];
In the controller, I have ~20 additional columns (not really relevant here, so I've only included two) so I'm trying not to call them out individually beyond their validation rules. The request typically sends one field at a time, so the user can update and save each field as they go. This was working appropriately for all the other fields I had before I added the "custom_redeem_fields.xxxx" to the mix.
$validatedData = $request->validate([
'title' => 'sometimes|required|max:255',
'text' => 'sometimes|required_unless:redeem_type,9|max:255',
'custom_redeem_fields.email' => 'sometimes|email',
'custom_redeem_fields.phone' => ['sometimes', new ValidPhone],
'custom_redeem_fields.code' => 'sometimes',
'custom_redeem_fields.url' => 'sometimes|url'
]);
$ticket = Ticket::find($id)
$ticket->update($validatedData);
Now, with the "custom_redeem_fields.xxxxx" this falls apart - the entire json object stored in "custom_redeem_fields" is overwritten with the most recent update, rather than just updating the key included in the validatedData array. So if I save:
[
"title" => "Monty Pythons Flying Circus"
"text" => "Monty Pythons Flying Circus is a British surreal sketch comedy series created by and starring the comedy group Monty Python, consisting of Graham Chapman, ..."
"custom_redeem_fields" => [
"email" => "bob#example.com",
"phone" => "503.555.5555",
"code" => "1xoicvjq",
"url" => "https://example.com/"
]
]
and then I send:
"custom_redeem_fields" => ["email" => "pat#example.com"]
the custom redeem fields returns:
"custom_redeem_fields" => ["email" => "pat#example.com"]
rather than:
"custom_redeem_fields" => ["email" => "pat#example.com", "phone" => "503.555.5555", "code" => "1xoicvjq", "url" => "https://example.com/"]
It seems that validation rules need json keys to be notated with dot syntax (custom_redeem_fields.url), and eloquent needs arrow syntax (custom_redeem_fields->url), but I'm not sure what's the most straightforward way to transition between the two, which seems very not-laravel, and the documentation is certainly lacking in this department...
Any help would be appreciated.
Thanks!
Wouldn't array_merge() solve your problem, it would overwrite values you provide with the second parameter. If you give it the already existing ones as the first, it would combine the two as you want.
$customRedeemInput = [...];
$model->custom_redeem_fields = array_merge($model->custom_redeem_fields, $customRedeemInput);
$model->save();
I need help pulling data from the field with unknown format of data structure. I am using a wordpress quiz plugin and i want to pull data from its backend table.
Data stored in answer_data is:
a:4:{
i:0;O:27:"WpProQuiz_Model_AnswerTypes":7:{s:10:"*_answer";s:17:"Kieran Trippier ";s:8:"*_html";b:0;s:10:"*_points";i:1;s:11:"*_correct";b:0;s:14:"*_sortString";s:0:"";s:18:"*_sortStringHtml";b:0;s:10:"*_mapper";N;}
i:1;O:27:"WpProQuiz_Model_AnswerTypes":7:{s:10:"*_answer";s:11:"Hugo Lloris";s:8:"*_html";b:0;s:10:"*_points";i:1;s:11:"*_correct";b:0;s:14:"*_sortString";s:0:"";s:18:"*_sortStringHtml";b:0;s:10:"*_mapper";N;}
i:2;O:27:"WpProQuiz_Model_AnswerTypes":7:{s:10:"*_answer";s:14:"Moussa Dembele";s:8:"*_html";b:0;s:10:"*_points";i:1;s:11:"*_correct";b:0;s:14:"*_sortString";s:0:"";s:18:"*_sortStringHtml";b:0;s:10:"*_mapper";N;}
i:3;O:27:"WpProQuiz_Model_AnswerTypes":7:{s:10:"*_answer";s:14:"Jan Vertonghen";s:8:"*_html";b:0;s:10:"*_points";i:1;s:11:"*_correct";b:1;s:14:"*_sortString";s:0:"";s:18:"*_sortStringHtml";b:0;s:10:"*_mapper";N;}
}
while looking at the structure of the table, data type of answer_data is longtext but has utf8_general_ci alongside it. I dont know what this means.
From this data i want to pull, quiz answers i.e.Kieran Trippier,Hugo Lloris,Moussa Dembele and Jan Vertonghen.
Any help or hint will be very much appreciated.
EDIT 1:
Array (
[0] => WpProQuiz_Model_AnswerTypes Object ( [_answer:protected] => Kieran Trippier [_html:protected] => [_points:protected] => 1 [_correct:protected] => [_sortString:protected] => [_sortStringHtml:protected] => [_mapper:protected] => )
[1] => WpProQuiz_Model_AnswerTypes Object ( [_answer:protected] => Hugo Lloris [_html:protected] => [_points:protected] => 1 [_correct:protected] => [_sortString:protected] => [_sortStringHtml:protected] => [_mapper:protected] => )
[2] => WpProQuiz_Model_AnswerTypes Object ( [_answer:protected] => Moussa Dembele [_html:protected] => [_points:protected] => 1 [_correct:protected] => [_sortString:protected] => [_sortStringHtml:protected] => [_mapper:protected] => )
[3] => WpProQuiz_Model_AnswerTypes Object ( [_answer:protected] => Jan Vertonghen [_html:protected] => [_points:protected] => 1 [_correct:protected] => 1 [_sortString:protected] => [_sortStringHtml:protected] => [_mapper:protected] => ) )
how to get values from this array?
That is some form of serialization. It seems to be an array a of 4 elements, each of which is further structure. The plugin understands; you probably don't need to understand it.
utf8_general_ci is the "Collation" to be used for comparing the LONGTEXT strings. It implies CHARACTER SET utf8, which is the 3-byte subset of UTF-8 (aka, MySQL's utf8mb4). This allows you to include characters from most languages around the world.
One would hope that the plugin provides a way to dissect this structure, rather than leaving you guessing. Furthermore, the plugin could change the structure without notice.
Here's the hint: See PHP's serialize() and unserialize().
Using the unserialized result:
$foo seems to be an array of 'objects' of class WpProQuiz_Model_AnswerTypes. One of the 'properties' of that object seems to be $_answer. So, see if this gives you the list of answers:
foreach($foo as $obj) {
echo $obj->_answer, "\n";
}
Or, to grab the answers into an array $answers:
$answers = array();
foreach($foo as $obj) {
$answers[] = $obj->_answer;
}
i'am new to Databases and to DBIx:Class. So please forgive me if this is a total newbie fault.
I just followed a tutorial and then i tried to deploy the schema to my database. According to the tutorial i split the modules up in several files. After i ran createTable.pl 'mysqlshow bla' shows me a empty database.
Database is up and running. Creating a table via the mysql CREATE TABLE statement does work.
Skript file which should create a table according to the schema ../createTable.pl
#!/usr/bin/env perl
use Modern::Perl;
use MyDatabase::Main;
my ($database, $user) = ('bla', 'flo');
my $schema = MyDatabase::Main->connect("dbi:mysql:dbname=$database", "$user");
$schema->deploy( { auto_drop_tables => 1 } );
Main.pm for loading the namespaces ../MyDatabase/Main.pm
package MyDatabase::Main;
use base qw/ DBIx::Class::Schema /;
__PACKAGE__->load_namespaces();
1;
Schema file for the table ../MyDatabase/Result/Album.pm
package MyDatabase::Main::Result::Album;
use base qw/ DBIx::Class::Core /;
__PACKAGE__->load_components(qw/ Ordered /);
__PACKAGE__->position_column('rank');
__PACKAGE__->table('album');
__PACKAGE__->add_columns(albumid =>
{ accessor => 'album',
data_type => 'integer',
size => 16,
is_nullable => 0,
is_auto_increment => 1,
},
artist =>
{ data_type => 'integer',
size => 16,
is_nullable => 0,
},
title =>
{ data_type => 'varchar',
size => 256,
is_nullable => 0,
},
rank =>
{ data_type => 'integer',
size => 16,
is_nullable => 0,
default_value => 0,
}
);
__PACKAGE__->set_primary_key('albumid');
1;
I already spent some hours on finding help through google but there isn't much related to the deploy() method.
Can anyone explain me what my mistake is?
Thank you
You can find the documentation for all CPAN Perl modules on metacpan.org (newer, full-text indexed) and search.cpan.org.
Read the docs for DBI, you'll find an environment variable called DBI_TRACE that when set will print every SQL statement to STDOUT.
DBIx::Class has a similar called DBIC_TRACE.
The first one should help you to see what the deploy method is doing.
Is no password required for connecting to your database?
Ok today i played again with perl and database stuff and i found out what the mistake was.
First of all i started with DBI_TRACE and DBIC_TRACE, it produced a lot of messages but nothing i could handle, for me it seemed like nothing gave me a hint on the problem.
Then i searched google for a while about this problem and for more examples of the deploy method. At some point i noticed that my folder structure is wrong.
The Schema file for the table should be placed in
../MyDatabase/Main/Result/Album.pm
instead of being placed in
../MyDatabase/Result/Album.pm
After moving the Schema file to the correct folder everything worked well.
Shame on me for this mistake :( But thank you for your help
I have made a site that uses custom posts types for a projects section.
I need to change the post type from 'projects' to 'galleries' but as I have already uploaded a bunch of projects was wondering how I would do this with as little as possible hassle (I do not want to have to re-upload all the images and text etc)
I found a few articles that tell me to do a SQL query to rename the posts
UPDATE `wp_posts`
SET `post_type` = '<new post type name>'
WHERE `post_type` = '<old post type name>';
And this one for the taxonomy
UPDATE `wp_term_taxonomy`
SET `taxonomy` = '<new taxonomy name>'
WHERE `taxonomy` = '<old taxonomy name>';
I just have no idea what I am supposed to do with this code. If it is SQL do I run it in a php file or is there some sort of 'terminal' that can be found in the WP dashboard or cPanel of my site?
Below is how I created my post type (Not sure if this helps)
function create_my_post_types() {
//projects
register_post_type(
'Projects', array('label' => 'Projects','description' => '','public' => true,'show_ui' => true,'show_in_menu' => true, 'menu_position' => 8,'capability_type' => 'post','hierarchical' => false,'rewrite' => array('slug' => '','with_front' => '0'),'query_var' => true,'exclude_from_search' => false,'supports' => array('title','editor','thumbnail'),'taxonomies' => array('category',),'labels' => array (
'name' => 'Projects',
'singular_name' => 'Project',
'menu_name' => 'Projects',
'add_new' => 'Add New Project',
'add_new_item' => 'Add New Project',
'edit' => 'Edit',
'edit_item' => 'Edit Project',
'new_item' => 'New Project',
'view' => 'View Project',
'view_item' => 'View Project',
'search_items' => 'Search Projects',
'not_found' => 'No Projects Found',
'not_found_in_trash' => 'No Projects Found in Trash',
'parent' => 'Parent Projects',
),) );
} // end create_my_post_types
If you have CPanel access, you can look for PHPMyAdmin and run the SQL code there.
Go to PHPMyAdmin.
Select your wordpress database from the left.
RECOMMENDED: Backup your database first, by going to the export tab at the top and doing a quick export.
Select "SQL" from the top tabs.
Copy your SQL queries in the huge textarea, and click Go.
Hope it works!
It's better to go directly with a plugin:
Convert Post Types
This is a utility for converting lots of posts or pages to a custom post type (or vice versa). You can limit the conversion to posts in a single category or children of specific page. You can also assign new taxonomy terms, which will be added to the posts' existing terms.
All the conversion process happens in the function bulk_convert_posts(), using the core functions wp_update_post and wp_set_post_terms. IMO, you should use WordPress functions to do the conversion, there are quite some steps happening in the terms function before the MySQL command.
Do a database backup before proceeding with this kind of operations.
Is anyone aware of any tutorials that demonstrate how to import data in a Ruby app with FasterCSV and saving it to a SQLite or MySQL database?
Here are the specific steps involved:
Reading a file line by line (the .foreach method does this according to documentation)
Mapping header names in file to database column names
Creating entries in database for CSV data (seems doable with .new and .save within a .foreach block)
This is a basic usage scenario but I haven't been able to find any tutorials for it, so any resources would be helpful.
Thanks!
So it looks like FasterCSV is now part of the Ruby core as of Ruby 1.9, so this is what I ended up doing, to achieve the goals in my question above:
#importedfile = Import.find(params[:id])
filename = #importedfile.csv.path
CSV.foreach(filename, {:headers => true}) do |row|
#post = Post.find_or_create_by_email(
:content => row[0],
:name => row[1],
:blog_url => row[2],
:email => row[3]
)
end
flash[:notice] = "New posts were successfully processed."
redirect_to posts_path
Inside the find_or_create_by_email function is the mapping from the database columns to the columns of the CSV file: row[0], row[1], row[2], row[3].
Since it is a find_or_create function I don't need to explicitly call #post.save to save the entry to the database.
If there's a better way please update or add your own answer.
First, start with other Stack Overflow answers: Best way to read CSV in Ruby. FasterCSV?
Before jumping into writing the code, I check whether there is an existing tool to do the import. You might want to look at mysqlimport.
This is a simple example showing how to map the CSV headers to a database's columns:
require "csv"
data = <<EOT
header1, header2, header 3
1, 2, 3
2, 2, 3
3, 2, 3
EOT
header_to_table_columns = {
'header1' => 'col1',
'header2' => 'col2',
'header 3' => 'col3'
}
arr_of_arrs = CSV.parse(data)
headers = arr_of_arrs.shift.map{ |i| i.strip }
db_cols = header_to_table_columns.values_at(*headers)
arr_of_arrs.each do |ary|
# insert into the database using an ORM or by creating insert statements
end
Ruby is great for rolling your own import routines.
Reading a file(handy block structure to ensure that the file handle is closed properly):
File.open( filepath ) do |f|
f.each_line do |line|
do something with the line...
end
end
Mapping header names to columns(you might want to check for matching array lengths):
Hash[header_array.zip( line_array )]
Creating entries in the database using activerecord:
SomeModel.create( Hash[header_array.zip( line_array )] )
It sounds like you are planning to let users upload csv files and import them into the database. This is asking for trouble unless they are savvy about data. You might want to look into a nosql solution to simplify things on the import front.
This seems to be the shortest way, if you can use the ID to identify the records and if no mapping of column names is necessary:
CSV.foreach(filename, {:headers => true}) do |row|
post = Post.find_or_create_by_id row["id"]
post.update_attributes row.to_hash
end