Subscribe to Vincent Granville's Weekly Digest:

Extract meta concepts through co-occurrences analysis and graph theory

....
So what I did is the following (be aware that is not the formal implementation of LSA!):
  1. Filter and take the base form of the words as usual.
  2. Build the multidimensional sparse matrix of the co-occurrences;
  3. I calculated for each instance the frequency to find it in the corpus;
  4. I calculated for each instance the frequency to find it in the doc;
  5. I weighted such TF-IDF considering also the distance among the co-occurrences.

In this way we are able to rank all co-occurrences and set a threshold to discard items having low rank.
In the last step I built a graph where I linked the co-occurrences.
As you can see in the following examples, the graphs are initially pretty complex, and to refine the results, I applied filter based on the number of connected components in the graph.
to read the entire post, visit my blog at:
results before filtering:
Results after filtering:

Views: 310

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service