InsideView: Get bulk data in JSON Format

If you ran into this article, then like me you’re wondering why the heck the InsideView API doesn’t allow you to pull bulk data in JSON format. InsideView’s justification is likely that the CSV format (the only option you currently have for receiving bulk data) is much more space efficient since it doesn’t contain metadata per row. Instead, CSV has the header information in the first row, and then each consecutive row is represented by comma separated values (duh…). However, This has one big downside – the header information is only one level deep. Let me give a brief example to show the advantages of JSON and the downside of CSV formats.

JSON

CSV

Let’s look at the ticker field. As we can observe, there are two big issues with CSV format. First, the naming convention of the headers doesn’t match. The JSON format has the field named “tickers” and the CSV format names it “ticker.” Second, there is no way we can get the property names of the ticker object (tickerName and exchange) from the CSV format.

Despite the space advantage, in this day and age, I still expect a robust API to delegate the choice of efficiency to the developer. But enough complaining – what can we do to get around this problem?

Well, before I propose a solution, let me give you a bit of a background to my situation. As part of our solution, we needed to pull bulk data from InsideView via Subscription API and store it in a database in JSON format. We wanted to be able to pull that data when needed and convert it to dynamic objects (ExpandoObject) and access the fields by name. In other words, we needed the solution to be as flexible as possible. That said, if you are dealing with bulk CSV data and want to be able to convert it to objects or even access any of the properties by name, you have two choices: create a class with static (hard-coded) fields or create some sort of a smart mapping.

The first option is obviously very easy to implement; however, this restricts you to the existing InsideView API. If anything changes in the InsideView API, such as new fields being added, you’ll need to modify the class, recompile, and redeploy the solution. In our case, this was unacceptable, so we went for the second solution.

Enter XML mapping. To overcome the limitation of a statically defined class, we decided to create an XML mapping file that can be configured dynamically. Essentially, our XML file contains the skeleton of the JSON format. Each node in the XML file represents a field in the bulk API. Combined with nested structure and attributes that dictate the correct name mapping and whether the field is an array, we can create a proper dynamic object and serialize it as JSON to store in the DB. Below we provided an example of such an XML file and code to read it.

XML Structure

<?xml version=”1.0″ encoding=”utf-8″ ?>
<PossibleHeaders>
  <id jsonName=”companyId”></id>
  <name></name>
  <companyType></companyType>
  <companyStatus></companyStatus>
  <ticker jsonName=”tickers” array=”true”>
    <exchange></exchange>
    <tickerName></tickerName>
  </ticker>
  <industry></industry>
  <subIndustry></subIndustry>
  <street></street>
  <city></city>
  <state></state>
  <zip></zip>
  <country></country>
  <website jsonName=”websites” array=”true”></website>
  <phone></phone>
  <fax></fax>
  <revenue></revenue>
  <employees></employees>
  <subsidiary></subsidiary>
  <description></description>
  <sic></sic>
  <sicDescription></sicDescription>
  <naics></naics>
  <naicsDescription></naicsDescription>
  <financialYearEnd></financialYearEnd>
  <parentCompanyId></parentCompanyId>
  <ultimateParentCompanyId></ultimateParentCompanyId>
  <equifaxId></equifaxId>
  <fortuneRanking></fortuneRanking>
  <sources array=”true”></sources>
  <mostRecentQuarter></mostRecentQuarter>
  <revenueCurrency></revenueCurrency>
  <industryCode></industryCode>
  <subIndustryCode></subIndustryCode>
  <britishSics array=”true”>
    <britishSic></britishSic>
    <description></description>
  </britishSics>
  <parentCompanyName></parentCompanyName>
  <parentCompanyCountry></parentCompanyCountry>
  <ultimateParentCompanyName></ultimateParentCompanyName>
  <ultimateParentCompanyCountry></ultimateParentCompanyCountry>
  <revenueRange></revenueRange>
  <employeeRange></employeeRange>
  <foundationDate></foundationDate>
  <gender></gender>
  <ethnicity></ethnicity>
  <mbe></mbe>
  <wbe></wbe>
  <vbe></vbe>
  <dbe></dbe>
  <lgbt></lgbt>
  <disabled></disabled>
</PossibleHeaders>

<?xml version=“1.0” encoding=“utf-8” ?>

</ticker>

</britishSics>

</PossibleHeaders>

Code to read XML

// Parses the CSV Data based on the by matching the headers to the XML document structure then going 
        // through each record row and setting the fields accordingly. 
        private static List<DataResult> parseCSVDataByTemplate(XmlDocument csvTemplate, string csvData, string isvIdKey)
        {
            List<DataResult> processedRecords = new List<DataResult>();
            using (TextFieldParser parser = new TextFieldParser(new MemoryStream(Encoding.UTF8.GetBytes(csvData ?? string.Empty))))
            {
                parser.HasFieldsEnclosedInQuotes = true;
                parser.SetDelimiters(Config.INSIDEVIEW_ID_DELIMITER);
                string[] headers = parser.ReadFields();
                string[] lineFieldData;
                string headerForField;
                Dictionary<string, dynamic> record;
                while (!parser.EndOfData)
                {
                    lineFieldData = parser.ReadFields();
                    record = new Dictionary<string, dynamic>();
                    // process each field in currrent record row
                    for (int fieldIndex = 0; fieldIndex < lineFieldData.Length; fieldIndex++)
                    {
                        XmlNode fieldNode = csvTemplate.GetElementsByTagName(headers[fieldIndex])[0];
                        headerForField = fieldNode.Attributes[Config.INSIDEVIEW_TEMPLATE_JSON_KEY] != null ? 
                            fieldNode.Attributes[Config.INSIDEVIEW_TEMPLATE_JSON_KEY].Value : headers[fieldIndex];
                        if (string.IsNullOrEmpty(lineFieldData[fieldIndex]))
                        {
                            record.Add(headerForField, string.Empty);
                            continue;
                        }
                        // has nested arrtibutes
                        if(fieldNode.HasChildNodes)
                        {
                            // is an array
                            if (fieldNode.Attributes[Config.INSIDEVIEW_TEMPLATE_ARRAY_KEY] != null)
                            {
                                string[] nestedObjectDataStrings = lineFieldData[fieldIndex].Split(Config.INSIDEVIEW_TEMPLATE_ARRAY_DELIMITER_KEY);
                                List<Dictionary<string, string>> nestedObjects = new List<Dictionary<string, string>>(nestedObjectDataStrings.Length);
                                foreach (string nestedObjectString in nestedObjectDataStrings)
                                {
                                    nestedObjects.Add(parseNestedObject(nestedObjectString, fieldNode.ChildNodes));
                                }
                                record.Add(headerForField, nestedObjects.ToArray());
                            }
                            else
                            {
                                record.Add(headerForField, parseNestedObject(lineFieldData[fieldIndex], fieldNode.ChildNodes));
                            }
                        }
                        else
                        {
                            // is an array
                            if (fieldNode.Attributes[Config.INSIDEVIEW_TEMPLATE_ARRAY_KEY] != null)
                            {
                                record.Add(headerForField, lineFieldData[fieldIndex].Split(Config.INSIDEVIEW_TEMPLATE_ARRAY_DELIMITER_KEY));
                            }
                            else
                            {
                                record.Add(headerForField, lineFieldData[fieldIndex]);
                            }
                        }
                    }
                    processedRecords.Add(new DataResult (record[isvIdKey], null, new JavaScriptSerializer().Serialize(record)));
                }
            }
            return processedRecords;
        }
        // If a field has a nested object, then this function will format it with the nested 
        // object’s field name (according to InsideView’s JSON format) and value.
        private static Dictionary<string, string> parseNestedObject(string nestedObjectString, XmlNodeList childNodes)
        {
            Dictionary<string, string> nestedObject = new Dictionary<string, string>();
            string[] nestedObjectFields = nestedObjectString.Split(Config.INSIDEVIEW_TEMPLATE_NESTED_OBJECT_ATTRIBUTE_DELIMITER_KEY);
            for (int nestedFieldIndex = 0; nestedFieldIndex < childNodes.Count; nestedFieldIndex++)
            {
                XmlNode childNode = childNodes[nestedFieldIndex];
                nestedObject.Add(childNode.Name, nestedObjectFields[nestedFieldIndex]);
            }
            return nestedObject;
        }

// Parses the CSV Data based on the by matching the headers to the XML document structure then going

// through each record row and setting the fields accordingly.

private static List<DataResult> parseCSVDataByTemplate(XmlDocument csvTemplate, string csvData, string isvIdKey)

{

List<DataResult> processedRecords = new List<DataResult>();

using (TextFieldParser parser = new TextFieldParser(new MemoryStream(Encoding.UTF8.GetBytes(csvData ?? string.Empty))))

{

parser.HasFieldsEnclosedInQuotes = true;

parser.SetDelimiters(Config.INSIDEVIEW_ID_DELIMITER);

string[] headers = parser.ReadFields();

string[] lineFieldData;

string headerForField;

Dictionary<string, dynamic> record;

while (!parser.EndOfData)

{

lineFieldData = parser.ReadFields();

record = new Dictionary<string, dynamic>();

// process each field in currrent record row

for (int fieldIndex = 0; fieldIndex < lineFieldData.Length; fieldIndex++)

{

XmlNode fieldNode = csvTemplate.GetElementsByTagName(headers[fieldIndex])[0];

headerForField = fieldNode.Attributes[Config.INSIDEVIEW_TEMPLATE_JSON_KEY] != null ?

fieldNode.Attributes[Config.INSIDEVIEW_TEMPLATE_JSON_KEY].Value : headers[fieldIndex];

if (string.IsNullOrEmpty(lineFieldData[fieldIndex]))

{

record.Add(headerForField, string.Empty);

continue;

}

// has nested arrtibutes

if(fieldNode.HasChildNodes)

{

// is an array

if (fieldNode.Attributes[Config.INSIDEVIEW_TEMPLATE_ARRAY_KEY] != null)

{

string[] nestedObjectDataStrings = lineFieldData[fieldIndex].Split(Config.INSIDEVIEW_TEMPLATE_ARRAY_DELIMITER_KEY);

List<Dictionary<string, string>> nestedObjects = new List<Dictionary<string, string>>(nestedObjectDataStrings.Length);

foreach (string nestedObjectString in nestedObjectDataStrings)

{

nestedObjects.Add(parseNestedObject(nestedObjectString, fieldNode.ChildNodes));

}

record.Add(headerForField, nestedObjects.ToArray());

}

else

{

record.Add(headerForField, parseNestedObject(lineFieldData[fieldIndex], fieldNode.ChildNodes));

}

else

{

// is an array

if (fieldNode.Attributes[Config.INSIDEVIEW_TEMPLATE_ARRAY_KEY] != null)

{

record.Add(headerForField, lineFieldData[fieldIndex].Split(Config.INSIDEVIEW_TEMPLATE_ARRAY_DELIMITER_KEY));

}

else

{

record.Add(headerForField, lineFieldData[fieldIndex]);

}

processedRecords.Add(new DataResult (record[isvIdKey], null, new JavaScriptSerializer().Serialize(record)));

}

return processedRecords;

}

// If a field has a nested object, then this function will format it with the nested

// object’s field name (according to InsideView’s JSON format) and value.

private static Dictionary<string, string> parseNestedObject(string nestedObjectString, XmlNodeList childNodes)

{

Dictionary<string, string> nestedObject = new Dictionary<string, string>();

string[] nestedObjectFields = nestedObjectString.Split(Config.INSIDEVIEW_TEMPLATE_NESTED_OBJECT_ATTRIBUTE_DELIMITER_KEY);

for (int nestedFieldIndex = 0; nestedFieldIndex < childNodes.Count; nestedFieldIndex++)

{

XmlNode childNode = childNodes[nestedFieldIndex];

nestedObject.Add(childNode.Name, nestedObjectFields[nestedFieldIndex]);

}

return nestedObject;

}

Essentially, our solution is an example of how dynamic mapping can be used to properly read the InsideView data from the original CSV format. Depending on the level of robustness required, this solution can be simplified or elaborated on. The choice is yours, now go have fun with InsideView!

InsideView: Get bulk data in JSON Format

JSON

CSV

XML Structure

Code to read XML

Related

Recent Posts

Archives

InsideView: Get bulk data in JSON Format

JSON

CSV

XML Structure

Code to read XML

Related

Recent Posts

Tags

Archives