Unlabeled data? No problem. We'll hallucinate some new data and slap on the best-guess labels.
Powerful combination of generative models and self-training techniques
Generative Models: Learns data distribution (GANs, VAEs)
Pseudo-Labeling: Assigns labels to unlabeled data
The step-by-step process of combining generative models with pseudo-labeling
Upload your data and configure the pseudo-labeling process
Configure your settings and click "Start Training" to begin the pseudo-labeling process.
See how pseudo-labeling improves your model performance
Final model performance on test set
Everything you need to know about generative models with pseudo-labeling
Pseudo-labeling is a semi-supervised learning technique where a model is first trained on labeled data, then used to predict labels for unlabeled
Pseudo-labeling is a semi-supervised learning technique where a model is first trained on labeled data, then used to predict labels for unlabeled data. These predicted labels (called "pseudo-labels") are then used to augment the training set, allowing the model to learn from both labeled and unlabeled data. The process is typically iterative, with the model being retrained on the expanded dataset to improve its performance.
Key aspects of pseudo-labeling:
Generative models enhance pseudo-labeling in several key ways:
Popular combinations include Semi-Supervised GANs (where the discriminator also classifies real samples) and VAEs with pseudo-labeled latent representations.
MixMatch and FixMatch are both state-of-the-art semi-supervised learning techniques that combine consistency regularization with pseudo-labeling, but with some key differences:
| Feature | MixMatch | FixMatch |
|---|---|---|
| Core Approach | Mixes labeled and unlabeled data with MixUp augmentation | Uses weak and strong augmentations with consistency |
| Pseudo-Labeling | Sharpens label distribution from multiple augmentations | Uses model predictions on weakly augmented samples |
| Augmentation | Standard augmentations + MixUp | Weak (flip/shift) vs. strong (RandAugment/CutOut) |
| Confidence Threshold | None (uses all predictions) | Only keeps predictions above threshold (typically 0.95) |
| Complexity | More complex (temperature sharpening, MixUp) | Simpler and often more effective |
FixMatch generally achieves better performance with less hyperparameter tuning, making it more popular in practice.
Choosing the right confidence threshold for pseudo-labeling involves balancing quantity and quality:
Pro Tip: Start with a high threshold (e.g., 0.95) and gradually lower it as training progresses and the model becomes more confident.
Generative models with pseudo-labeling work particularly well with:
Note: The approach can still work for less ideal data types but may require more careful tuning of parameters and potentially more iterations.
Evaluating pseudo-label quality is crucial for successful semi-supervised learning. Here are several methods:
Maintain a small labeled validation set to track whether adding pseudo-labels improves or harms performance.
Plot the distribution of prediction confidences. A healthy distribution shows most high-confidence predictions are correct.
Use t-SNE or UMAP to visualize how pseudo-labeled points cluster with true labeled points.
Randomly sample and inspect pseudo-labels, especially for critical applications.
In Noisy Student approaches, measure how often teacher and student models agree on pseudo-labels.
Have questions or suggestions? We'd love to hear from you!