Data Intelligence, Business Analytics
We have 2+ million products that need to be assigned to already defined taxonomy (online ecommerce catalog).
I suspect the best would be a Bayesian classifier.
Really would like open source. Need to do it on our backend, not an ASP model.
Any info would be greatly appreciated.
has anyone actually used weka on a large dataset. It seems to throw outOfMemory exceptions (aka heap) no matter how much memory your machine has and regardless of how much heap you specify for it from the command line. It seems the heap settings do not flow to things like the evaluation classes. Using it has been a very frustrating experience.
Same is true for RWeka....