Formatting data for use with JSON and D3.js

Formatting data for use with JSON and D3.js - mysql

I have the following data in my MySQL database. These three columns are a subset of a table that I have selected using a query.
Value Date Time
230.8 13/08/08 15:01:22+22
233.7 13/08/08 15:13:12+22
234.5 13/08/08 15:40:33+22
I want to represent this data on a graph of (Value) versus (Date & Time) in a chronological manner. What is the format I need to put the above data into before using JSON cause I've had a look at a few tutorials and when I apply the same logic (like this:http://www.d3noob.org/2013/02/using-mysql-database-as-source-of-data.html) I don't seem to be getting any graph at all.
Or will JSON and D3.js not work for my requirement? Do I need to look at something else? Like some other JavaScript?

Your question is a little bit vague, but I'll try to adress a few of your topics to help you get started.
Firstly, I would suggest finding the visualization that fits your needs. From the data subset that you showed in the question, I would suggest maybe this one. It is interesting because if you have multiple values for different times in a given day, you could construct various time series graphs and compare them interactively. There are other options, so you should explore and find a good starting point to improve and adapt to your needs.
Regarding the origin/format of the data, if you are able to extract that data you showed to a variable (with PHP, for example), you can then manipulate the data and build a structure from it. It doesn't necessarily have to be JSON and/or CSV. As long as you can handle it with d3.js's API functions. It isn't very difficult, but it is something that requires you to understand and read about the topic. First understand how to query for your needs with MySQL. Then, I would suggest starting here if you decide to go with JSON.
The example visualization I mentioned above uses a CSV file as a data source. Other option could be for instance to build a CSV file (or data structure - ie, an array) to feed into d3.js. There are various questions covering "how to create CSV with PHP", so you shouldn't have much difficulty finding the info you need.
Either way, after you feel confortable with what you know about these topics, start breaking your problem into smaller tasks and finding answers to one question at a time. If you need, post more questions here in SO and include your attempts at coding a solution, this will definitely get you all the help you might need.

in python it would look like this:
import json
output = json.dumps(['data', {'data_1': ('230.8', '13/08/08', '15:01:22+22')}, {'data_2': ('233.7', '13/08/08', '15:13:12+22')}, {'data_3': ('234.5', '13/08/08', '15:40:33+22')}])
print output
more information about python and json can be found here

Related

unable to import csv table DATEs columns to BigQuery

I am unable to import csv table > DATEs columns to BigQuery,
DATEs are not recognized, even they have correct format according this docu
https://cloud.google.com/bigquery/docs/schema-detect YYYY-MM-DD
So DATEs columns are not recognized and are renamed to _2020-0122, 2020-01-23...
Is the issue that DATES are in 1st row as column name ?
But How can I then import dates, when I want use them in TimeSeries Charts (DataStudio) ?
here is sample source csv>
Province/State,Country/Region,Lat,Long,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-026
Anhui,China,31.8257,117.2264,1,9,15,39,60
Beijing,China,40.1824,116.4142,14,22,36,41,68
Chongqing,China,30.0572,107.874,6,9,27,57,75
Here is ig from Bigquery

If you have finite number of days, you can try unpivot table when using it. See blog post.
otherwise, if you dont know how many day columns in csv file.
choose a unique character as csv delimiter then just load whole file into a single column staging table, then use split function. you'll also need unnest. This approach requires a full scan and will be more expensive, especially when file gets bigger.

The issue is that in column names you cannot have a date type, for this reason when the CSV is imported it takes the dates and transforms them to the format with underscores.
The first way to face the problem would be modifying the CSV file, because any import with the first row as a header will change the date format and then it will be harder to get to date type again. If you have any experience in any programming language you can do the transformation very easily. I can help doing this but I do not know your use case so maybe this is not possible. Where does this CSV come from?
If the CSV previous modification is not possible then the second option is what ktopcuoglu said, importing the whole file as one column and process this using SQL function. This is way harder than the first option and as you import all the data into a single column, all the data will have the same data type, what will be a headache too.
If you could explain where the CSV comes from we may be able to influence it before being ingested by BigQuery. Else, you'll need to deep into SQL a bit.
Hope it helps!
Hi, now I can help you further.
First I found some COVID datasets into the public bigquery datasets. The one you are taking from github is already in BigQuery, but there are many others that may work better for your task such as the one called “covid19_ecdc”, that is inside bigquery-public-data. This last one has the confirmed cases and deaths per date and country so it should be easy to make a time series.
Second, I found an interesting link performing what you meant with python and data studio. It’s a kaggle discussion so you may not be familiar with it, but it deserves a check for sure . Moreover, he is using the dataset you are trying to use.
Hope it helps. Do not hesitate to ask!

Pattern match to identify date format

My source having different date formats as shown below, And im looking for an algorithm to identify the source date pattern tried in Pentaho Data integration with select value and Fuzzy steps.
Date Column (String)
"20150210"
"20050822--"
"2014-02-May"
"20051509--"
"02-May-2014"
"2013-May-12"
"12DEC2013"
"15050815"
"May-02-2014"
"12312015"
I know that in PDI we can achieve through JS step by writing If conditions for each pattern but is not a good idea and this approach makes transformation dead when dealing with huge records, looking out for efficient way to search date pattern.
I believe this is very common issue in all ETL projects, Here Im trying to understand how enterprise vendors like SAS Data Integration, Informatica, SSIS provides easy way to handle.
Do we have any Algorithm to identify source pattern. If so which one?
The formats that are listed above are not limited.

One cannot simply determine a "monovalent" value as the format for any given input.
Consider all of the following formats completely valid:
MM-dd-yy
dd-MM-yy
yy-MM-dd
As stated in a comment by #billinkc, what would you call 01-02-05 in that case?
If at all, your would be a solvable one only if you took a data set into account (e.g. you know that the next X rows are all from the same date format). Then you can look at it as a linear problem with some constraints that can help you determine the date format. Even then, you can't assure that you'll get a definite answer, just increase the probability that you'll have a definite answer.

is JSON a good solution for data transfer between client and server?

I am trying to understand why JSON is widely used for data transfer between client and server. I understand that it offers simple design which is easy to understand. However, on the contrary;
A JSON string includes repeated data, e.g, incase of a table, columns names (keys) are repeated in each object . Would it not be wise to send columns as first object and rest of the object should be the data (without columns/keys information) from the table.
Once we have a JSON object, the searching based on keys is expensive (in time) compared to indexes. Imagine a table with 20-30 column, doing this searching for each key for each object would cost a lot more time compare to directly using indexes.
There may be many more drawbacks and advantages, add here if you know one.

I think if you want data transfer then you want a table based format. The JSON format is not a table based format like standard databases or Excel. This can complicate analyzing data if there is a problem because someone will usually use excel for that (sorting, filtering, formulas). Also building test files will be more difficult because you can't simply use excel to export to JSON.
But, If you wanted to use JSON for data transfer you could basically build a JSON version of a CSV file. You would only use arrays.
Columns: ["First_Name", "Last_Name"]
Rows: [
["Joe", "Master"],
["Alice", "Gooberg"]
.... etc
]
Seems messy to me though.
If you wanted to use objects then you will have to embed Column names for every bit of data, which in my opinion indicates a wrong approach.

Performance of MySql Xml functions?

I am pretty excited about the new Mysql XMl Functions.
Now I can finally embed something like "object oriented" documents in my oldschool relational database.
For an example use-case consider a user who sings up at your website using facebook connect.
You can fetch an object for the user using the graph api, and get nice information. This information however can vary vastly. Some fields may or may not be set, some may be added over time and so on.
Well if you are just intersted in very special fields (for example friends relations, gender, movies...), you can project them into your relational database scheme.
However using the XMl functions you could store the whole object inside a field and then your different models can access the data using the ExtractValue function. You can store everything right away without needing to worry what you will need later.
But what will the performance be?
For example I have a table with 50 000 entries which represent useres.
I have an enum field that states "male", "female" (or various other genders to be politically correct).
The performance of for example fetching all males will be very fast.
But what about something like WHERE ExtractValue(userdata, '/gender/') = 'male' ?
How will the performance vary if the object gets bigger?
Can I maby somehow put an Index on specified xpath selections?
How do field types work together with this functions/performance. Varchar/blob?
Do I need fulltext indexes?
To sum up my question:
Mysql XML functins look great. And I am sure they are really great if you just want to store structured data that you fetch and analyze further in your application.
But how will they stand battle in procedures where there are internal scans/sorting/comparision/calculations performed on them?
Can Mysql replace document oriented databases like CouchDB/Sesame?
What are the gains and trade offs of XML functions?
How and why are they better/worse than a dynamic application that stores various data as attributes?
For example a key/value table with an xpath as key and the value as value connected to the document entity.
Anyone made any other experiences with it or has noticed something mentionable?

I tend to make comments similar to Pekka's, but I think the reason we cannot laugh this off is your statement "This information however can vary vastly." That means it is not realistic to plan to parse it all and project it into the database.
I cannot answer all of your questions, but I can answer some of them.
Most notably I cannot tell you about performance on MySQL. I have seen it in SQL Server, tested it, and found that SQL Server performs in memory XML extractions very slowly, to me it seemed as if it were reading from disk, but that is a bit of an exaggeration. Others may dispute this, but that is what I found.
"Can Mysql replace document oriented databases like CouchDB/Sesame?" This question is a bit over-broad but in your case using MySQL lets you keep ACID compliance for these XML chunks, assuming you are using InnoDB, which cannot be said automatically for some of those document oriented databases.
"How and why are they better/worse than a dynamic application that stores various data as attributes?" I think this is really a matter of style. You are given XML chunks that are (presumably) documented and MySQL can navigate them. If you just keep them as-such you save a step. What would be gained by converting them to something else?
The MySQL docs suggest that the XML file will go into a clob field. Performance may suffer on larger docs. Perhaps then you will identify sub-documents that you want to regularly break out and put into a child table.
Along these same lines, if there are particular sub-docs you know you will want to know about, you can make a child table, "HasDocs", do a little pre-processing, and populate it with names of sub-docs with their counts. This would make for faster statistical analysis and also make it faster to find docs that have certain sub-docs.
Wish I could say more, hope this helps.

Storing JSON in an msSQL database?

I'm developing a form generator, and wondering if it would be bad mojo to store JSON in an SQL database?
I want to keep my database & tables simple, so I was going to have
`pKey, formTitle, formJSON`
on a table, and then store
{["firstName":{"required":"true","type":"text"},"lastName":{"required":"true","type":"text"}}
in formJSON.
Any input is appreciated.

I use JSON extensively in my CMS (which hosts about 110 sites) and I find the speed of access data to be very fast. I was surprised that there wasn't more speed degradation. Every object in the CMS (Page, Layout, List, Topic, etc) has an NVARCHAR(MAX) column called JSONConfiguration. My ORM tool knows to look for that column and reconstitute it as an object if needed. Or, depending on the situation, I will just pass it to the client for jQuery or Ext JS to process.
As for readability / maintainability of my code, you might say it's improved because I now have classes that represent a lot of the JSON objects stored in the DB.
I used JSON.net for all serialization / deserialization. https://www.newtonsoft.com/json
I also use a single query to return meta-JSON with the actual data. As in the case of Ext JS, I have queries that return both the structure of the Ext JS object as well as the data the object will need. This cuts out one post back / SQL round trip.
I was also surprised at how fast the code was to parse a list of JSON objects and map them into a DataTable object that I then handed to a GridView.
The only downside I've seen to using JSON is indexing. If you have a property of the JSON you need to search, then you have to store it as a separate column.
There are JSON DB's out there that might server your needs better: CouchDB, MongoDB, and Cassandra.

A brilliant way to make an object database from sql server. I do this for all config objects and everything else that doesn't need any specific querying. extending your object - easy, just create a new property in your class and init with default value. Don't need a property any more? Just delete it in the class. Easy roll out, easy upgrade. Not suitable for all objects, but if you extract any prop you need to index on - keep using it. Very modern way of using sql server.

It will be slower than having the form defined in code, but one extra query shouldn't cause you much harm. (Just don't let 1 extra query become 10 extra queries!)
Edit: If you are selecting the row by formTitle instead of pKey (I would, because then your code will be more readable), put an index on formTitle

We have used a modified version of XML for exactly the purpose you decribe for seven or eight years and it works great. Our customers' form needs are so diverse that we could never keep up with a table/column approach. We are too far down the XML road to change very easily but I think JSON would work as well and maybe evan better.
Reporting is no problem with a couple of good parsing functions and I would defy anyone to find a significant difference in performance between our reporting/analytics and a table/column solution to this need.

I wouldn't recommend it.
If you ever want to do any reporting or query based on these values in the future it's going to make your life a lot harder than having a few extra tables/columns.
Why are you avoiding making new tables? I say if your application requires them go ahead and add them in... Also if someone has to go through your code/db later it's probably going to be harder for them to figure out what you had going on (depending on what kind of documentation you have).

You should be able to use SisoDb for this. http://sisodb.com

I think it not an optimal idea to store object data in a string in SQL. You have to do transformation outside of SQL in order to parse it. That presents a performance issue and you lose the leverage of using SQL native data parsing capability. A better way would be to store JSON as an XML datatype in SQL. This way, you kill two birds with one stone: You don't have to create shit load of tables and still get all the native querying benefits of SQL.
XML in SQL Server 2005? Better than JSON in Varchar?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008