Having issues with CSV files with columns that should be simply ignored, irregulat formating - csv

We have a series of datafiles that are a rectangle of data that someone exports from Excel. Sometimes they export extra columns that are blank entirely. 1, 2, 15. We don't want those columns OR their content, which are nominally empty.
I.E. (pseudo codee)
Our object model
class object "Address" {
Name as string
Street as string
City as string
State as string
Zip as string}
Name, Street, City, State, Zip,,,,,
Bob, Windsor, Chicago, IL, 12342,,,,,
Tom, Second, St Louis, MO, 45122,,,,,
Steve, Main, Nashville, TN, 12124,,,,,
,,,,,,,,,
,,,,,,,,,
,,,,,,,,,
We want the 3 rows of five columns of data. We can add new items to the class for the four "unnecessary" columns, but that doesn't work when the number of those erroneous columns is variable or unknown. When we ignore and continue we get no data. When we don't turn off errors it gets upset when we don't have object class items for those columns.
etc.
We are expecting it to only read columns as we defined in our object model and ignore any extraneous columns

Put an array field at the end of the class and decorate it with [FieldOptional].
Here is a working program:
[DelimitedRecord(",")]
public class Contact
{
[FieldTrim(TrimMode.Both)]
public string Name { get; set; }
[FieldTrim(TrimMode.Both)]
public string Street { get; set; }
[FieldTrim(TrimMode.Both)]
public string City { get; set; }
[FieldTrim(TrimMode.Both)]
public string State { get; set; }
[FieldTrim(TrimMode.Both)]
public string Zip { get; set; }
[FieldOptional]
public string[] Ignore { get; set; }
}
internal class Program
{
static void Main(string[] args)
{
var engine = new DelimitedFileEngine<Contact>();
var records = engine.ReadString("Bob, Windsor, Chicago, IL, 12342,,,,,");
Assert.AreEqual("Bob", records[0].Name);
Assert.AreEqual("Windsor", records[0].Street);
Assert.AreEqual("Chicago", records[0].City);
Assert.AreEqual("IL", records[0].State);
Assert.AreEqual("12342", records[0].Zip);
Console.WriteLine("All OK");
Console.ReadKey();
}
}

Related

Using json key to store value, is it a good approach?

I am writing a rest api and I am quite new to the json serialization.
I know that a json object consists of pairs of <key>:<value>.
I have an object "channels" which contains multiple channel objects which consist of an id and some other attributes like "x", "y" and "z".
In our team we found two ways to represent the object "channels", the usual way that I see people implement is like this:
{
"channels":
[
{
"id":0,
"x":0,
"y":0,
"z":0
},
...
]
}
There is also this version, which uses the id as key:
{
"channels":
{
"0":
{
"x":0,
"y":0,
"z":0
},
...
}
}
Please note that the first implementation explicitily uses an array while the second relies on the <key> to access the specific channel directly.
What is the best way to represent this object? Is it ok to represent a key as value (like the id of the previous case)?
Right now there are only two channels (always with id 0 and 1), but in the future we may add more.
You should prefer the first approach, because it is much easier and more intuitive to consume the JSON that way. If someone wanted to use your API, they would likely create model classes to deserialize into. With the first approach this is easy:
public class RootObject
{
public List<Channel> channels { get; set; }
}
public class Channel
{
public int id { get; set; }
public int x { get; set; }
public int y { get; set; }
public int z { get; set; }
}
In fact, you can just take the JSON and dump it into a generator tool like http://json2csharp.com/ to get these classes (that is what I did here).
In contrast, with the second approach, the keys in the JSON representing the IDs are dynamic, which a generator won't recognize as such. So you'll get something like this, which needs to be manually corrected:
public class RootObject
{
public Channels channels { get; set; }
}
public class Channels
{
public __invalid_type__0 __invalid_name__0 { get; set; }
}
public class __invalid_type__0
{
public int x { get; set; }
public int y { get; set; }
public int z { get; set; }
}
I've seen some people try to fix it like this, which will work for your one-channel example, but obviously won't scale:
public class RootObject
{
public Channels channels { get; set; }
}
public class Channels
{
[JsonProperty("0")]
public Data Item0 { get; set; }
}
public class Data
{
public int x { get; set; }
public int y { get; set; }
public int z { get; set; }
}
To consume the JSON properly with the dynamic keys, the classes actually need to look like this:
public class RootObject
{
public Dictionary<string, Channel> channels { get; set; }
}
public class Channel
{
public int x { get; set; }
public int y { get; set; }
public int z { get; set; }
}
However, the fact that you need to use a Dictionary here is not always intuitive to the casual user. In fact, I have lost count of the number of times some flavor of the question, "How can I handle dynamic keys in JSON?" is asked on StackOverflow. Do your users a favor and don't make them have to think about it.
Over and above just deserializing the JSON, the first model is also superior because the Channel object contains all the data about the channel: the id is inside the object itself. It is easy to pass around and use that way. Also it is trivial to convert a List<Channel> into a Dictionary<int, Channel> later if you need to do a key lookup:
var dict = rootObject.channels.ToDictionary(ch => ch.id);
With the second approach, the id is separate from rest of the channel data, so if you wanted to pass the channel to a method which needed both, you would either have to pass two parameters or create a new model class to wrap everything together. In other words, it is more awkward to use.
Bottom line, I see no real upside to using the second approach at all. Go with the first.
If a channel is an object, not an array then you should use the first option, as you define channel specific contract. If a channel can have subsets of channels, then I suggest using the second approach as you can gain an access to specific subset via the use of a key ( which has to be unique in order to work properly ).

C# JSON data serialized and binded to DataGridView

I have this data class for storing data parsed from JSON formatted web data (using Json.NET library):
[Serializable()]
public class MovieData
{
public string FilePath { get; set; }
public string OrigName { get; set; }
[JsonProperty(PropertyName = "id")]
public int Id { get; set; }
[JsonProperty(PropertyName = "year")]
public int Year { get; set; }
[JsonProperty(PropertyName = "genres")]
public string[] Genres { get; set; }
}
The next class is for to be able serialize collection of MovieData objects:
[Serializable()]
[XmlRoot("MovieCollection")]
public class MovieCollection
{
[XmlArray("Movies")]
[XmlArrayItem("Movie", typeof(Movie))]
public List<Movie> movies = new List<MovieData>();
}
Finally, I need to bind such a collection of MovieData to DataGridView (or single MovieData object to DataGridViewRow), like:
dgvMovies.DataSource = movieCollection.movies;
Is it possible to bind it without hard-setting of DataGridViewColumn collection before? Native data types are not problem, problem is string[] Genres array, which I need to format it in DataGridView in some way, like:
"genres[0] / genres[0] / ... genres[n]"
At this moment, while simply setting DataSource to collectin, this array is ignored (is not displayed anyway).
In MovieData class, you can add the following property :
public string GenresAsString
{
get { return String.Join("/", Genres); }
set { Genres = value.Split('/'); }
}
You will surely have to improve the setter to make it more resilient (triming, removing empty genres) if you plan to let the user modify this value.
Else you can remove the setter.

Issue with MVC 3 model binder for propery which is a list of inherited objects

The issue is very similar to this post
How to implement custom JsonConverter in JSON.NET to deserialize a List of base class objects?
However instead of trying to serialize a string manually we are attempting to use the model binding in MVC 3. So here is the scenario
[DataContract]
public class Company
{
[DataMember]
public List<Person> Employees { get; set; }
}
[DataContract]
public class Person
{
[DataMember]
public string FirstName { get; set; }
[DataMember]
public string LastName { get; set; }
}
[DataContract]
[KnownType(typeof(Person))]
public class Employee : Person
{
[DataMember]
public string Department { get; set; }
[DataMember]
public string JobTitle { get; set; }
}
[DataContract]
[KnownType(typeof(Person))]
public class Artist : Person
{
[DataMember]
public string Skill { get; set; }
}
public JsonResult PopulateCompany()
{
Company model = new Company();
model.Employees = new List<Person>
{
new Employee(),
new Employee(),
new Artist(),
};
return Json(model, JsonRequestBehavior.AllowGet);
// in the View the model is correctly deserialized. E.g. we can see the properties from Artist
}
public ActionResult PopulateCompany(Company model)
{
// the returned model is also being populated except the Person object is being added to the Employees and we can no longer access the properties of Artist.
return View(model);
}
Thank you.
The model binding process involve first initializing the model. In your case it initializes an instance of Company with a property List<Person> Employees. Based on the values that are posted back, if a key/value pair is found that matches a Person (e.g. Persons[0].FirstName: "Ian") then a new instance of Person is initialized and its properties are set and added to the collection.
The DefaultModelBinder has no way of knowing that you want to initialize a different concrete type.
The easy solution is to use a view model containing collection properties of each type (e.g. public List<Employees> Employees { get; set; }; public List<Artist> Artists { get; set; }; etc).
The alternative (difficult) solution is to create a custom ModelBinder that will generate concrete types based on values in the model. This article (the section on Abstract Model Binder) is a good start for learning how to create a custom ModelBinder

How to map different navigation properties in TPH types to the same table?

I have this existing databse schema which implies self-reference many-to-many relationship using a joint table. The Location table may contain information Country, City, District, or Area according to the Disciminator field. The table RelatedLocation holds the self-reference relations.
My domain model is as follows, where the Location class is abstract one, and each inherited class conatins related navigation properties.
public abstract class Location
{
public int Id { get; set; }
public string Name { get; set; }
}
public class Country : Location
{
public virtual ICollection<District> Districts { get; set; }
}
public class District : Location
{
public virtual ICollection<Country> Countries { get; set; }
public virtual ICollection<City> Cities { get; set; }
}
public class City : Location
{
public virtual ICollection<District> Districts { get; set; }
public virtual ICollection<Area> Areas { get; set; }
}
public class Area : Location
{
public virtual ICollection<City> Cities { get; set; }
}
On OnModelCreating I use the following to map each of inherited class many-to-many relation
modelBuilder.Entity<Country>()
.HasMany(c => c.Districts)
.WithMany(d => d.Countries)
.Map(t => t.ToTable("RelatedLocations").MapLeftKey("ParentId").MapRightKey("RelatedId"));
modelBuilder.Entity<City>()
.HasMany(c => c.Districts)
.WithMany(d => d.Cities)
.Map(t => t.ToTable("RelatedLocations").MapLeftKey("ParentId").MapRightKey("RelatedId"));
Upon creating the model I receive and exeption with "Each EntitySet must refer to a unique schema and table", that is EF complains about Mapping different relations to the same table "RelatedLocaions" more than once.
I do not know mapping this way is not supported in EF4.1 or I am mapping it in a wrong approach!
I doubt that the mapping you are trying is possible. I would try something similar to this:
public abstract class Location
{
public int Id { get; set; }
public string Name { get; set; }
public virtual ICollection<Location> ParentLocations { get; set; }
public virtual ICollection<Location> RelatedLocations { get; set; }
}
public class Country : Location
{
// readonly = not mapped
public IEnumerable<District> Districts
{
get { return RelatedLocations.OfType<District>(); }
}
}
public class District : Location
{
public IEnumerable<Country> Countries
{
get { return ParentLocations.OfType<Country>(); }
}
public IEnumerable<City> Cities
{
get { return RelatedLocations.OfType<City>(); }
}
}
// same approch for the other collections
And then this mapping:
modelBuilder.Entity<Location>()
.HasMany(l => l.ParentLocations)
.WithMany(l => l.RelatedLocations)
.Map(t => t.ToTable("RelatedLocations")
.MapLeftKey("ParentId")
.MapRightKey("RelatedId"));
The many-to-many mapping goes always between ParentLocations and RelatedLocations but these collections are populated with different instances of the derived classes according to the concrete type you are working with. The readonly collections are only helpers which perform a type cast in memory (based on the lazily loaded ParentLocations and RelatedLocations) of the Location entities.
Edit
Perhaps instead of using .OfType<T>() which filters all objects of type T from the source collection, .Cast<T>() is preferable which tries to cast all objects in the source collection to type T and throws an exception if casting is not possible. It should basically lead to the same result because ICollection<Location> in your base class should always only be populated by the same derived type. For example: Country.RelatedLocations should only contain entities of type District. But maybe the exception is good in this case because it indicates that something is wrong instead of silently ignoring the entities of another type in the collections (which OfType would do).
Edit 2
I want to emphasize that the IEnumerable collections are helpers which allow you to retrieve the entities with the derived type. The collections just perform a type cast, nothing more. They have nothing to do with the mapping to the database, EF even doesn't "see" that they exist. You can remove them and nothing would change in the EF model and the database table columns, relationships and referential constraints.
How would you add and retrieve entities in this model? Examples:
Add a new Country with a list of District entities:
var country = new Country() { RelatedLocations = new List<Location>() };
country.Name = "Palau";
// ParentLocations stays empty because Country has no parents
var district1 = new District { Name = "District1" };
var district2 = new District { Name = "District2" };
country.RelatedLocations.Add(district1); // because District is a Location
country.RelatedLocations.Add(district2);
context.Locations.Add(country); // because Country is a Location
context.SaveChanges();
Retrieve this entity again:
var country = context.Locations.OfType<Country>()
.SingleOrDefault(c => c.Name == "Palau");
// now get the districts, RelatedLocations is lazily loaded
var districts = country.RelatedLocations.Cast<District>();
// What type is districts? It's an IEnumerable<District>.
// So we can also use a helper property:
// var districts = country.Districts;
Retrieve a district:
var district = context.Locations.OfType<District>()
.SingleOrDefault(d => d.Name == "District1");
var countries = district.ParentLocations.Cast<Country>();
// or with the helper: var countries = district.Countries;
// countries collection contains Palau, because of many-to-many relation
Edit 3
Instead of creating a Country with new you can create a lazy loading proxy. Then you don't need to initialize the RelatedLocations collection. I was wondering how this could work with a derived type, but I just found that there is an overload of Create with a generic parameter for this purpose:
var country = context.Locations.Create<Country>();

EF - Saving a new entity with linked existing entities creating duplicates

public class Car {
public string SomeProperty { get; set; }
public Manufacturer Manufacturer { get; set; }
public IList<Color> Colors { get; set; }
}
public class Manufacturer {
public int Id { get; set; }
public string Name { get; set; }
}
public class Color {
public int Id { get; set; }
public string Name { get; set; }
}
I already have tables full of Colors and Manufacturers. When I create a new car I want to be able to assign it a Color and Manufacturer bound from .net MVC.
When I save my new car with
context.Cars.Add(car);
A new Car is created (great) but a new Color and Manufacturer are also created even though these objects already had an Id and Name set that matched the content in the database.
The two solutions I see are to either write a custom save method for car, and tell the context that the Manufacturer and Color are Unchanged.
context.Cars.Add(car);
context.Entry(car.Manufacturer).State = EntityState.Unchanged;
foreach (Color color in car.Colors)
context.Entry(car.Color).State = EntityState.Unchanged;
Alternatively, to load the Manufacturer and Color from EF and then link them to the Car, instead of using the MVC bound objects.
car.Manufacturer = carRepository.GetManufacturer(car.Manufacturer.Id);
car.Colors = carRepository.GetColorsById(car.Colors);
I am not thrilled by either solution as this example is very trivial, but my real cases are far more complicated. I don't really want to have to fiddle around with EF in detail for each object I save. I have lots of complex object graphs to save and this seems very error prone.
Is there a way of making EF behave more like NHibernate, whereby you can give it something with an ID already assigned and it will assume without your intervention that it already exists?
Edit - question clarified to show collection of existing entities as well as many-to-one relationships.
Unfortunately, EF does not have anything like session.Load in NHibernate that allows you to get a proxy from an id.
The usual way to deal with this in EF is create a separate FK field containing the scalar value that corresponds to the reference. For example:
public virtual Manufacturer Manufacturer { get; set; }
public int ManufacturerId { get; set; }
Then you only have to set ManufacturerId and it will be saved correctly.
(So much for "POCO" and "code first". Pffffff)
You can define scalar properties in your entities and bind the values to them instead. Eg add
ManufacturerId and ColorId
public class Car {
public string SomeProperty { get; set; }
public int? ManufacturerId { get; set; }
public virtual Manufacturer Manufacturer { get; set; }
public int? ColorId { get; set; }
public virtual Color Color { get; set; }
}
Then set those scalar properties when you assign (eg through a DropDownList)
This way you can avoid loading many related entities to populate the entity.