Implementing and visualizing word2Vec using Tensorflow

Posted on August 15, 2017 in misc • Tagged with Text-mining, python, word2vec, tensorflow, tsne

In this project we will try to build and train a skip gram model to obtain vectors for words in the dataset: (word2vec)

Cleaning the tweets

Posted on August 15, 2016 in misc • Tagged with Text-mining, twitter

The noise in the data:

Once you have obtained the data that we want, the next major step is to pre-process this data. Most of the textual data that we get from social media has a lot of noise. We cannot use this data stright awawy. We need to remove this noise before we try to get any meaningful insights from this data and do some magic with it. The noise can be due to the usage of colloqual language, encoding issues etc.


Continue reading

Command Line Utilities for text processing

Posted on August 15, 2016 in misc • Tagged with Text-mining

Unix Command line Utilities:

You might be wondering why do we need command line utilites for text processing. But the traditional unix command line tools are extremely helpfull when it comes to text processing. Most of the preprocesing and cleaning of text can be done using these tools. It would be helpfull to learn regular expressions before learning to use these tools because some of these utilites use regular experssions.


Continue reading

Grabbing twitter Data

Posted on August 15, 2016 in misc • Tagged with Text-mining, twitter

Table of Contents

Introduction:

Twitter is one of the ten most visited web sites around the world. Its a microblogging platform that allows you share messages of lenght not more than 140 characters. Not just that , but it also helps you discover messages related to the topics you are intrested in. It can be usefull in many ways. Since its one of the most used websites, there is a lot of data flowing through its network which could used for varius research purposes. Fourtunately twitter decided to share this huge data with some limitations ofcourse to the public. Twitter offers API (Application Programing Interface) to do this. In this tutorial we will be exploring Twitter API and what it has to offer and how we can use some of these to obtain the data that we want.


Continue reading

A quicky on JSON

Posted on August 15, 2016 in misc • Tagged with Text-mining, twitter

JSON

JSON is a data format that is used for storing or exchanging information and exchanging information. But JSON is human readable and easy to parse. And most importantly it is structured.

A simple example of json:

{
  "name": "jason",
  "age": 24,
  "gender":"male"
}

Fundamentally a json can be built using two things:


Continue reading

projects

Posted on August 15, 2016 in misc • Tagged with Text-mining

Project

Since the entire tuotrial, including the excerices(things to try sections) is focussed on data from twitter, it would be also nice to try data sets from different domains. Data from different domains present their own challenges and it's all about decision making and solving one problem at a time. You can try to perform sentiment ananlysis on the data from the below domains:


Continue reading

Python quicky

Posted on August 15, 2016 in misc • Tagged with Text-mining, python

A simple introduction to python:

Python is simple, fast and most importantly easy to learn language. Though I am not an expert at languages a good programming language should not make you worry or think about the implementation over logic.


Continue reading

python setup and package installation

Posted on August 15, 2016 in misc • Tagged with Text-mining, python

Python : Setup and Installation

Its a good news for mac book and linux users: In either case python is comes out of the box and you need not install it. But in case of windows please follow the below setup instructions:


Continue reading

Regular expressions tutorial

Posted on August 15, 2016 in misc • Tagged with Text-mining, python

Regular expressions

Intro

When it comes to text mining you need to understand that a text is nothing but a sequence of characters. The character can be a digits, alphabets, symbols, spaces, new line etc. And in this chapter we are going to learn about regular expressions a very powerfull concept that helps us to play with text. It can be used for finding patterns, searching words or characters, substitution etc.


Continue reading

Sentiment Analysis an intro

Posted on August 15, 2016 in misc • Tagged with Text-mining, python, twitter

Sentiment Analysis:

The whole idea of text mining is about gaining insights in textual data. Sentiment anaysis is one of the important applications in the area of text mining. It tries to identify weather the opinoin expressed in a text is positive, negitive or netural towards a given topic.

For example,

  1. I am happy about my promotion
    Continue reading