A quicky on JSON

Posted on August 15, 2016 in misc

JSON

JSON is a data format that is used for storing or exchanging information and exchanging information. But JSON is human readable and easy to parse. And most importantly it is structured.

A simple example of json:

{
  "name": "jason",
  "age": 24,
  "gender":"male"
}

Fundamentally a json can be built using two things:

Key Value pairs:

You can also call them as property value pairs. Imagine you are collecting data for a classroom. Then your data should store values of properties related to the classroom or students. Some of properties you can consider are "student_id", "name", "class" etc. The json for it would look something like this:

{
  "student_id": "514300234324",
  "name":"bran",
  "class":"12"
}

list :

Some times the value of a property can be a list of values ( not just one value). The below example shows that:

{
  "hero": "ironman",
  "powers": ["cool", "ironSuit", "intellignet"]
}

nested JSON:

A JSON object inside another JSON:

{
   "id" : "1234",
   "name": {
             "firstName": "jason",
             "lastName": "b"
         }
}

JSON library

As you can see any of the above examples are easily readable. Similarly they are also easily parsable. To parse JOSN data in python we shall use the json library.

The json library parses the JSON data from either the files or the strings. It converts the JSON data into a python dictionary and viceversa.

Consider the following JSON string:

In [50]:
json_string = '{ "name": "brian", "age": 24, "gender":"male", "skills": ["programming", "musician"]}'

Lets parse this string using the python's json library:

In [53]:
import json
json_obj= json.loads(json_string)

And we can access the parsed data like this:

In [54]:
print json_obj["name"]
print json_obj["age"]
print json_obj["gender"]
print json_obj["skills"]
brian
24
male
[u'programming', u'musician']
### parsing JSON from files: If we remember, in the last chapter we have saved the all the twitter streaming data to a file called streamingData.json . We can use the json library to the parse the data in these files:
In [75]:
data=[]
with open('./streamingData.json', 'r') as jsonFile:
    for line in jsonFile:
        data.append(json.loads(line))
        
print "Total number of tweets loaded: ", len(data)
Total number of tweets loaded:  5662

All the tweets with their related information are stored in the list named "data". Lets check the different properties that a tweet has:

In [83]:
print data[0].keys()   # properties
print data[0]["text"]  # how to access the  tweet message itself
[u'contributors', u'truncated', u'text', u'is_quote_status', u'in_reply_to_status_id', u'id', u'favorite_count', u'source', u'retweeted', u'coordinates', u'timestamp_ms', u'entities', u'in_reply_to_screen_name', u'in_reply_to_user_id', u'retweet_count', u'id_str', u'favorited', u'user', u'geo', u'in_reply_to_user_id_str', u'lang', u'created_at', u'filter_level', u'in_reply_to_status_id_str', u'place']
#BaşbakanaSoruyorum atılan her iki twitten biri
Engelli Öğretmenlere ait, sn @handefrt nasıl
görmüyorsun bu. #EngelliOgretmenAlimiOlacakmi"

This is all the properties for one tweet. But for now we will only need actual tweet text and can ignore the rest of the other proprerties.

In [94]:
tweets = []
for item in data:
    if "text" in item.keys():
        tweet = item["text"]
        tweets.append(tweet)
    
print "total no of tweets extracted from json: ", len(tweets)
total no of tweets extracted from json:  5230

There is difference between the json objects count and tweets count becuase, not all json objects have tweets, some might just be empty.








To do:

Work on the twitter data that you have extracted in previous chapters:

  1. Try load the json from file into memory
  2. Check out the various keys for any one json object, there are some intresting data fields that can be used for different projects.
  3. Try extract the tweets alone from this json dataset.
In [ ]: