The Data Dictionary for a PMML model requires quite a bit of metadata for each field. With sparse, high dimensional data the Data Dictionary could be many times larger than either the training data or the trained model. Has anyone developed a standard extension to the PMML syntax that, for instance, just says that all fields have the same metadata?