Implementing and visualizing word2Vec using Tensorflow
Posted on August 15, 2017 in misc • Tagged with Text-mining, python, word2vec, tensorflow, tsne
In this project we will try to build and train a skip gram model to obtain vectors for words in the dataset: (word2vec)¶
Cleaning the tweets
Posted on August 15, 2016 in misc • Tagged with Text-mining, twitter
The noise in the data:¶
Once you have obtained the data that we want, the next major step is to pre-process this data. Most of the textual data that we get from social media has a lot of noise. We cannot use this data stright awawy. We need to remove this noise before we try to get any meaningful insights from this data and do some magic with it. The noise can be due to the usage of colloqual language, encoding issues etc.
Continue reading
Command Line Utilities for text processing
Posted on August 15, 2016 in misc • Tagged with Text-mining
Unix Command line Utilities:¶
You might be wondering why do we need command line utilites for text processing. But the traditional unix command line tools are extremely helpfull when it comes to text processing. Most of the preprocesing and cleaning of text can be done using these tools. It would be helpfull to learn regular expressions before learning to use these tools because some of these utilites use regular experssions.
Continue reading
Grabbing twitter Data
Posted on August 15, 2016 in misc • Tagged with Text-mining, twitter
Table of Contents¶
Introduction:¶
Twitter is one of the ten most visited web sites around the world. Its a microblogging platform that allows you share messages of lenght not more than 140 characters. Not just that , but it also helps you discover messages related to the topics you are intrested in. It can be usefull in many ways. Since its one of the most used websites, there is a lot of data flowing through its network which could used for varius research purposes. Fourtunately twitter decided to share this huge data with some limitations ofcourse to the public. Twitter offers API (Application Programing Interface) to do this. In this tutorial we will be exploring Twitter API and what it has to offer and how we can use some of these to obtain the data that we want.
Continue reading
A quicky on JSON
Posted on August 15, 2016 in misc • Tagged with Text-mining, twitter
JSON¶
JSON is a data format that is used for storing or exchanging information and exchanging information. But JSON is human readable and easy to parse. And most importantly it is structured.
A simple example of json:
{
"name": "jason",
"age": 24,
"gender":"male"
}
Fundamentally a json can be built using two things:
Continue reading
projects
Posted on August 15, 2016 in misc • Tagged with Text-mining
Project¶
Since the entire tuotrial, including the excerices(things to try sections) is focussed on data from twitter, it would be also nice to try data sets from different domains. Data from different domains present their own challenges and it's all about decision making and solving one problem at a time. You can try to perform sentiment ananlysis on the data from the below domains:
Continue reading
Python quicky
Posted on August 15, 2016 in misc • Tagged with Text-mining, python
A simple introduction to python:¶
Python is simple, fast and most importantly easy to learn language. Though I am not an expert at languages a good programming language should not make you worry or think about the implementation over logic.
Continue reading
python setup and package installation
Posted on August 15, 2016 in misc • Tagged with Text-mining, python
Python : Setup and Installation¶
Its a good news for mac book and linux users: In either case python is comes out of the box and you need not install it. But in case of windows please follow the below setup instructions:
Continue reading
Regular expressions tutorial
Posted on August 15, 2016 in misc • Tagged with Text-mining, python
Table of Contents¶
Regular expressions¶
Intro¶
When it comes to text mining you need to understand that a text is nothing but a sequence of characters. The character can be a digits, alphabets, symbols, spaces, new line etc. And in this chapter we are going to learn about regular expressions a very powerfull concept that helps us to play with text. It can be used for finding patterns, searching words or characters, substitution etc.
Continue reading
Sentiment Analysis an intro
Posted on August 15, 2016 in misc • Tagged with Text-mining, python, twitter
Sentiment Analysis:¶
The whole idea of text mining is about gaining insights in textual data. Sentiment anaysis is one of the important applications in the area of text mining. It tries to identify weather the opinoin expressed in a text is positive, negitive or netural towards a given topic.
For example,
- I am happy about my promotion
Continue reading