synthetic data generator
Basic Information
The Synthetic Data Generator (SDG) is an open source framework for creating high-quality synthetic tabular data that preserves statistical properties of original datasets while avoiding sensitive information. It is designed for use cases such as data sharing, model training, debugging, system development and testing where privacy-safe replicas are needed. The project provides models, data connectors, a Synthesizer API, example workflows and Colab demos. It supports generation from metadata when no training data is available and includes tools to handle large-scale datasets with memory optimizations. Distribution options include a prebuilt Docker image and a PyPI package, and the codebase is accompanied by documentation, benchmarks and contribution guidance. The project is licensed under Apache-2.0.