Subscribe to Vincent Granville's Weekly Digest:

Easy text extraction of SEC filings (DEF-14, 10 K), patents, or any other semi-structured - demo at http://www.text2data.net

Executable demo at t http://www.text2data.net/index.html

I am fairly newbie to text mining. I found the "document extraction" problem interesting, esp. for SEC docs - in a generic way that can be applied to any doc with latin chars. I think the generic text mining problem from documents has practical use, and dont really have an idea how satisfactorily it has been solved, would like to have your views...

While doing this, I do not know how many conventional approaches I have rudely trampled upon, but the end result (extraction depth, width) seemed decent enough.

Maybe posting to this forum is like carrying coal to Newcastle - but I was encouraged by the general interest in the thing by financial and text mining companis - and when I actually visited the web sites of a few generic text mining products, I did not find any special features that were not already present or future-possible in the demo.

Hope you like it....

Views: 355

Tags: 10 k, def 14, document, mining, sec, stuctured, text extraction, text mining

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Kinshuk Adhikary on June 3, 2009 at 8:22pm
hi parag,
actually to be frank, the as-shown USPTO abstract in the demo is a trivial case, the financial docs are much more challenging. However, I think the miner is capable of doing far more complicated things with patent abstracts which I didnt have the biz knowledge to configure - for example, the individual claims can be categorized by say "linkedness", specific patterns extracted etc. Only a patent lawyer would be able to find the best ways to extract, the tool can only supply the mechanics of it.
Comment by parag kulkarni on June 3, 2009 at 8:25am
hi
I am particularly involved and interested in patent analysis and here text mining plays big role. I do some competitive intelligence also. But since I am newbie in the area of text mining I found your thread interesting.Thanks a lot sharing.

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service