Tutorial :Hacking JavaScript Array Into JSON With Python



Question:

I am fetching a .js file from a remote site that contains data I want to process as JSON using the simplejson library on my Google App Engine site. The .js file looks like this:

var txns = [      { apples: '100', oranges: '20', type: 'SELL'},       { apples: '200', oranges: '10', type: 'BUY'}]  

I have no control over the format of this file. What I did at first just to hack through it was to chop the "var txns = " bit off of the string and then do a series of .replace(old, new, [count]) on the string until it looked like standard JSON:

cleanJSON = malformedJSON.replace("'", '"').replace('apples:', '"apples":').replace('oranges:', '"oranges":').replace('type:', '"type":').replace('{', '{"transaction":{').replace('}', '}}')  

So that it now looks like:

[{ "transaction" : { "apples": "100", "oranges": "20", "type": "SELL"} },    { "transaction" : { "apples": "200", "oranges": "10", "type": "BUY"} }]  

How would you tackle this formatting issue? Is there a known way (library, script) to format a JavaScript array into JSON notation?


Solution:1

It's not too difficult to write your own little parsor for that using PyParsing.

import json  from pyparsing import *    data = """var txns = [     { apples: '100', oranges: '20', type: 'SELL'},      { apples: '200', oranges: '10', type: 'BUY'}]"""      def js_grammar():      key = Word(alphas).setResultsName("key")      value = QuotedString("'").setResultsName("value")      pair = Group(key + Literal(":").suppress() + value)      object_ = nestedExpr("{", "}", delimitedList(pair, ","))      array = nestedExpr("[", "]", delimitedList(object_, ","))      return array + StringEnd()    JS_GRAMMAR = js_grammar()    def parse(js):      return JS_GRAMMAR.parseString(js[len("var txns = "):])[0]    def to_dict(object_):      return dict((p.key, p.value) for p in object_)    result = [      {"transaction": to_dict(object_)}      for object_ in parse(data)]  print json.dumps(result)  

This is going to print

[{"transaction": {"type": "SELL", "apples": "100", "oranges": "20"}},   {"transaction": {"type": "BUY", "apples": "200", "oranges": "10"}}]  

You can also add the assignment to the grammar itself. Given there are already off-the-shelf parsers for it, you should better use those.


Solution:2

I would use the yaml parser as its better at most things like this. It comes with GAE as well as it is used for the config files. Json is a subset of yaml.

All you have to do is get rid of "var txns =" then yaml should do the rest.

import yaml    string = """[{ apples: '100', oranges: '20', type: 'SELL'},                { apples: '200', oranges: '10', type: 'BUY'}]"""    list = yaml.load(string)    print list  

This gives you.

[{'type': 'SELL', 'apples': '100', 'oranges': '20'},   {'type': 'BUY', 'apples': '200', 'oranges': '10'}]  

Once loaded you can always dump it back as a json.


Solution:3

If you know that's what it's always going to look like, you could do a regex to find unquoted space-delimited text that ends with a colon and surround it with quotes.

I'm always worried about unexpected input with a regex like that, though. How do you know the remote source won't change what you get?


Solution:4

You could create an intermediate page containing a Javascript script that just loads the remote one and dumps it to JSON. Then Python can make requests to your intermediate page and get out nice JSON.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »