The need for synthetically generated data is growing rapidly as the size of enterprise applications increases. Situations requiring this technology include regression testing of database applications, data mining applications, and the need to supply qrealistic but not realq data for third party application development. The common approach today to supplying this need involves the manual creation of special-purpose data generators for specific data sets. This dissertation describes a general purpose synthetic data generation framework. Such a framework significantly speeds up the process of describing and generating synthetic data. The framework includes a language called SDDL that is capable of describing complex data sets and a generation engine called SDG which supports parallel data generation. Related theory in the areas of the relational model, E-R diagrams, randomness and data obfuscation is explored. Finally, the power and flexibility of the SDG/SDDL framework are demonstrated by applying the framework to a collection of applications.These referential integrity constraints could be enforced in SDDL with query pools as follows: alt; table name=aquot; Salesaquot; length= ... table. For example, consider the following E-R diagram: Figure 5-1: E-R Diagram for aquot;Sells Inaquot; Relation In Figure 120.
|Title||:||Synthetic Data Generation: Theory, Techniques and Applications|
|Publisher||:||ProQuest - 2008|