Subscribe to Vincent Granville's Weekly Digest:
One of the questions I get all the time has to do with pre- and post-processing of raw data in PMML. If you need to massage your data before you send it to a scoring engine, this means that the value you get from the engine is not being used to its full potential. What people don't usually realize is that PMML supports commonly used data transformations which can therefore be represented together with the model so that all that needs to be done for production use is to present the scoring engine with the raw input data. In this case, data can be assembled at the source and have the score engine massage and score it. PMML also allows for pos-processing of model output. For example, it has a Targets element which allows for score calibration or scaling.

Views: 8

Replies to This Discussion

In the work that I do, the operational implementation of the pre- and post-processing of the data is typically much more effort than the operational implementation of the predictive model. The ability to export an operational specification as PMML (or some other executable language) from the development environment to the deployment environment would be a great advantage.

Is there a "PMML for Dummies" somewhere that explains and gives examples of the pre- and post-processing functionality?
I don't know of any "PMML for Dummies" as of now ... but my guess is that it would not be very difficult to assemble a document with examples for all the data transformations that can take place in PMML (pre and post-processing). Maybe it is even something I could add to the PMML Knol ... or something the DMG would be interested in providing in their website. I will bring the issue up.
I just finished writing a primer on data pre-processing in PMML. It contains several examples on how to massage data in PMML. For example, it shows how to group small and large sets of data, how to represent complex arithmetic expressions ... and much more.

You can find it by clicking HERE. Suggestions are welcome on how it can be improved. Thanks!

RSS

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service