# AnalyticBridge

Subscribe to Vincent Granville's Weekly Digest:
Cristian Mesiano
• Male
• zürich
• Switzerland

312 members

55 members

## Cristian Mesiano's Discussions

### Classification Step: C 5.0 vs SVM2 Replies

A new post on my blog:An early approach at "document classifier".I hope it will triggers some discussion on classification techniques.…Continue

Tags: SVM, C5, Classification

Started this discussion. Last reply by Cristian Mesiano Sep 12, 2011.

# Cristian Mesiano's Page

## Latest Activity

Cristian Mesiano posted a blog post

### Document Clustering and Graph Clustering: graph entropy as linkage function

Let's continue our discussion about the applications of the graph entropy concept.Today I'm going to show how we can re-use the same concept on the document clustering.What I want to highlight is that through such methodology it's possible to:extract from a document the relevant words (as discussed here);clustering of the words of a document (as…See More
Nov 17, 2012
Cristian Mesiano posted a blog post

### Key words through graph entropy Hierarchical clustering

CLICK HERE TO READ MOREIn the last post I showed how to extract key words from a text through a principle called graph entropy.Today I'm going to show another application of the graph entropy in order to extract clusters of key words.WhyThe key words of a document depict the main topic of the content, but if the document is big, often, there are many different sub topics related to the main.In this perspective, a clusters of…See More
Oct 24, 2012
Cristian Mesiano posted a blog post

### Graph Entropy to extract relevant words

I would share with you some early results about a research I'm doing in the field of "graph entropy" applied to text mining problem.click here to read the entire postWhy Graph Entropy is so important?Based on the main concept of entropy the following assumptions are true:The entropy of a graph should be a functional of the stability of the structure (so that it depicts in some…See More
Sep 24, 2012
Cristian Mesiano posted a blog post

### Function minimization: Simulated Annealing led by variance criteria vs Nelder Mead

Most of the datamining problems can be reduced as a minimization/maximization problem.... click here to read the entire postExamplesLet's consider  easy scenarios where the function cost is conditionated just by two parameters.…See More
Aug 12, 2012
Cristian Mesiano posted a blog post

### Simulated Annealing: How to boost performance through Matrix Cost rescaling

One of the most widely algorithm used in Machine Learning is the Simulated Annealing (SA).The reason of its celebrity lays in:Simplicity of implementationbroad spectrum of applicability ... click here to read the entire postThe experimentsI considered two instances of the problem, the first one with 10 towns and the second one with 50 towns.Even if for this kind of problem there are…See More
Jul 3, 2012
Cristian Mesiano posted a blog post

### Outlier analysis: Chebyschev criteria vs approach based on Mutual Information

As often happens, I usually do many thing in the same time, so during a break while I was working for a new post on applications of mutual information in data mining, I read the interesting paper suggested by Sandro Saitta on his blog (dataminingblog)  related to the outlier detection. ...Usually such behavior is not proficient to obtain good results, but this time I think that the change of prospective has been positive!…See More
May 23, 2012
Cristian Mesiano posted a blog post

### Uncertainty coefficients for Features Reduction - comparison with LDA technique

...click here to read the entire post...Uncertainty coefficientConsider a set of people's data labelled with two different labels, let's say blue and red, and let's assume that for this people we have a bunch of variables to describe them.Moreover, let's assume that one of the variables is the social security number (SSN) or whatever univocal ID for each person.Let me do some…See More
May 5, 2012
Cristian Mesiano posted a blog post

### Earthquake prediction through sunspots part II: common Data mining mistakes!

While I was writing the last post I was wondering how long before my followers notice the mistakes I introduced in the experiments.Let's start the treasure hunt!1. Don't always trust your data: often they are not homogeneous....click here to read the entire postA good data miner must always check his dataset! you should always ask to yourself whether the data have been produced in a congruent way.…See More
Apr 4, 2012
Cristian Mesiano posted a blog post

### Support Vector Regression (SVR): predict earthquakes through sunspots

In the last months we discussed a lot about text mining algorithms, I would like for a while focus on data mining aspects.Today I would talk about one of the most intriguing topics related to data mining tasks: the regression  analysis....To read the entire post click hereExperiment: Earthquakes prediction using sunspots as regressorEarly warning: this is just a tutorial, so…See More
Mar 7, 2012
Cristian Mesiano posted a blog post

### Features Extraction: Co-occurrences and Graph clustering

In the last two post we have discussed about co - occurrences analysis to extract features  in order to classify documents and extract "meta concepts" from the corpus.We have also noticed that this approach doesn't return better than the traditional bag of words.I would now explore some derivation of this approach, taking advantage of the graph theory.the graph of the co occurrences is really huge and complex, how could we reduce its complexity without big information loss?The Kcore…See More
Feb 29, 2012
Cristian Mesiano posted a blog post

### Document Classification: latent semantic vs bag of words. Who is the best?

We have seen few posts ago an approach to extract meta "concepts" from text based on latent semantic paradigm.In this post we apply this approach to classify documents, and we do a comparison between this approach and the canonical bag of words.The comparison test will be done through the ensemble method already showed in the last post.To read the entire post click here.The…See More
Feb 20, 2012
Cristian Mesiano's blog post was featured

### Document Classification: how to boost your classifier

ADaBoost.M1 tries to improve step by step the accuracy of the classifier analyzing its behavior on training set. (Of course you cannot try to improve the classifier working with the test set!!).Here lays the problem, because if we choose as "weak algorithm" an SVM, we know that almost always it returns excellent accuracy on the training set with results closed to 100% (in term of true positive).In this scenario, try to improve the accuracy of classifier assigning different weights to the…See More
Jan 30, 2012
Cristian Mesiano posted a blog post

### Document Classification: how to boost your classifier

ADaBoost.M1 tries to improve step by step the accuracy of the classifier analyzing its behavior on training set. (Of course you cannot try to improve the classifier working with the test set!!).Here lays the problem, because if we choose as "weak algorithm" an SVM, we know that almost always it returns excellent accuracy on the training set with results closed to 100% (in term of true positive).In this scenario, try to improve the accuracy of classifier assigning different weights to the…See More
Jan 30, 2012
Cristian Mesiano posted a blog post

### Extract meta concepts through co-occurrences analysis and graph theory

....So what I did is the following (be aware that is not the formal implementation of LSA!):Filter and take the base form of the words as usual.Build the multidimensional sparse matrix of the co-occurrences;I calculated for each instance the frequency to find it in the corpus;I calculated for each instance the frequency to find it in the doc;I weighted such TF-IDF considering also the distance among the co-occurrences.In this way we are able to rank all co-occurrences and set a threshold to…See More
Jan 13, 2012
Cristian Mesiano posted a blog post

### Clustering algorithm to approximate functions

The strategy is very easy to describe: 1. Divide the domain of your function in k sub intervals. 2. Initialize k monomials; 3. Consider the monomials as centroids of your clustering algorithm. 4. Assign the points of the function to each monomial in compliance to the cluster algo. 5. Use the gradient descent to adjust the parameters of each monomial. 6. Go to 4. until the accuracy is good enough.Read the entire post at:…See More
Dec 15, 2011
Cristian Mesiano posted a blog post

### Power Real Polynomial to approximate functions: The Gradient Method

In the real world rarely a problem can be solved using just a single algorithm, more often a solution is a chain of algorithms where the output of the former is the input for the follower.But you know that quite often machine learning algorithms return functions almost always extremely complex, and they don’t fit directly in the next step of your strategy.In these conditions, it is really helpful the trick of the function approximation, that is, we reduce the complexity of our original model…See More
Dec 8, 2011

## Profile Information

Short Bio:
I am graduated (Master Degree) in CS and I work in a Enterprise Content Management department in a multinational company (Reinsurance focused). Our tasks are focused on document auto classification, document data extraction, selective search. We mix latest algorithms with Business requirements to offer real solution for demanding content consumers!
My Website or LinkedIn Profile (URL):
http://textanddatamining.blogspot.com/
Field of Expertise:
Business Analytics, Predictive Modeling, Data Mining, Econometrics, Statistical Programming, Artificial Intelligence
Years of Experience in Analytical Role:
8
Professional Status:
Other
Interests:
Networking
What is your Favorite Data Mining or Analytical Website?
http://textanddatamining.blogspot.com/
Swiss RE

## Cristian Mesiano's Blog

### Document Clustering and Graph Clustering: graph entropy as linkage function

Let's continue our discussion about the applications of the graph entropy concept.

Today I'm going to show how we can re-use the same concept on the document clustering.

What I want to highlight is that through such methodology it's possible to:

1. extract from a document the relevant words (as discussed …
Continue

Posted on November 17, 2012 at 1:35am

### Key words through graph entropy Hierarchical clustering

In the last post I showed how to extract key words from a text through a principle called graph entropy.

Today I'm going to show another application of the graph entropy in order to extract clusters of key words.

Why

The key words of a document depict the main topic of the content, but if the document is big, often, there are many different sub topics related to the…

Continue

Posted on October 24, 2012 at 11:34am

### Graph Entropy to extract relevant words

I would share with you some early results about a research I'm doing in the field of "graph entropy" applied to text mining problem.

Why Graph Entropy is so important?

Based on the main concept of entropy the following assumptions are true:

• The entropy of a graph should be a functional of the…
Continue

Posted on September 24, 2012 at 2:39pm

### Function minimization: Simulated Annealing led by variance criteria vs Nelder Mead

Most of the datamining problems can be reduced as a minimization/maximization problem.

Examples
Let's consider  easy scenarios where the function cost is conditionated just by two parameters.…

Continue

Posted on August 12, 2012 at 3:09am

## Comment Wall (1 comment)

Join AnalyticBridge

At 1:23pm on August 31, 2011, Marco Santambrogio said…

Ciao Cristian.

Bhe consigli su questo tema sono sempre delicati.

Sul text mining ti direi di seguire bene gli sviluppi di SAS e di EXPERT SYSTEM

In generale, gli Analytics un pò alla volta si stanno sviluppando, non solo nel Finance e Telco.

1

2

3

4

5

6

7

8

9

10