Documentation Index Fetch the complete documentation index at: https://mintlify.com/QuantaAlpha/RepoMaster/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Machine learning tasks typically require extensive setup, from finding the right frameworks to configuring training pipelines. RepoMaster automates this process by discovering and orchestrating ML repositories from GitHub to solve your AI tasks.
What RepoMaster Can Do
Model Training Train image classifiers, NLP models, and more using pre-built architectures
Transfer Learning Fine-tune pre-trained models on your custom datasets
Data Preparation Load, preprocess, and augment training data automatically
Inference Run predictions on new data using trained models
Model Evaluation Generate metrics, confusion matrices, and performance reports
Experiment Tracking Track hyperparameters and results across multiple runs
How It Works
Describe your ML task in natural language:
python launcher.py --mode backend --backend-mode unified
Example User Input:
Train an image classifier on CIFAR-10 dataset using transfer learning
Task Analysis
RepoMaster understands this requires:
Image classification framework
CIFAR-10 dataset loading
Transfer learning architecture (ResNet, VGG, etc.)
Training pipeline setup
Repository Discovery
Searches GitHub for:
PyTorch/TensorFlow implementations
CIFAR-10 training examples
Transfer learning tutorials
Model zoo repositories
Pipeline Setup
Downloads CIFAR-10 dataset
Loads pre-trained model weights
Configures data augmentation
Sets up training loop with optimal hyperparameters
Execution & Monitoring
Trains model with progress tracking
Validates on test set
Saves best model checkpoint
Generates performance metrics
Results Delivery
Trained model weights
Training/validation curves
Test accuracy and confusion matrix
Sample predictions visualization
Real-World Example: Image Classification
From USAGE.md:
Task:
Train an image classifier on CIFAR-10 dataset using transfer learning
What RepoMaster Does
1. Repository Search & Selection
🔍 Searching for image classification repositories...
✅ Found 25+ relevant repositories
📊 Top candidates:
- pytorch/vision (official PyTorch models)
- huggingface/pytorch-image-models (timm)
- tensorflow/models (TensorFlow model garden)
- keras-team/keras-applications
✅ Selected: huggingface/pytorch-image-models
Reason: Modern architectures + easy transfer learning
2. Environment Setup
📦 Setting up ML environment...
✅ Created virtual environment
✅ Installed: torch, torchvision, timm, matplotlib
✅ GPU detected: NVIDIA RTX 4090 (24GB)
3. Dataset Preparation
📊 Loading CIFAR-10 dataset...
✅ Downloading: 170MB
✅ Train set: 50,000 images (10 classes)
✅ Test set: 10,000 images
✅ Applied augmentation: RandomCrop, RandomHorizontalFlip, Normalize
4. Model Configuration
🧠 Configuring transfer learning...
✅ Base model: ResNet-50 (pre-trained on ImageNet)
✅ Modified final layer: 1000 → 10 classes
✅ Optimizer: AdamW (lr=0.001)
✅ Scheduler: CosineAnnealingLR
✅ Loss: CrossEntropyLoss
5. Training
🚀 Starting training...
Epoch 1/20:
Train: 100%|████████| 391/391 [02:15<00:00]
Loss: 1.234 | Acc: 65.4%
Val Loss: 0.892 | Val Acc: 72.1%
Epoch 5/20:
Train: 100%|████████| 391/391 [02:12<00:00]
Loss: 0.453 | Acc: 84.2%
Val Loss: 0.412 | Val Acc: 86.3%
Epoch 10/20:
Train: 100%|████████| 391/391 [02:11<00:00]
Loss: 0.234 | Acc: 91.8%
Val Loss: 0.298 | Val Acc: 90.5%
...
Epoch 20/20:
Train: 100%|████████| 391/391 [02:10<00:00]
Loss: 0.089 | Acc: 96.7%
Val Loss: 0.256 | Val Acc: 92.1%
✅ Training complete! Best val accuracy: 92.1% (epoch 20)
6. Evaluation & Results
📊 Evaluating on test set...
✅ Test Accuracy: 91.8%
Per-class accuracy:
airplane: 93.2%
automobile: 94.1%
bird: 87.3%
cat: 84.5%
deer: 91.2%
dog: 86.7%
frog: 93.8%
horse: 92.4%
ship: 94.6%
truck: 93.5%
💾 Saved:
- coding/cifar10_model.pth (best weights)
- coding/training_curves.png (loss/accuracy plots)
- coding/confusion_matrix.png
- coding/sample_predictions.png
✨ Task completed successfully!
Common AI/ML Use Cases
Computer Vision
Image Classification
Object Detection
Image Segmentation
Style Transfer
Task: Train image classifier on custom dataset in images/ folder
with 5 categories: cats, dogs, birds, cars, flowers
Capabilities:
Automatic dataset splitting (train/val/test)
Data augmentation selection
Architecture recommendation
Hyperparameter tuning
Task: Detect and count objects in surveillance video footage
using YOLO or similar detector
Features:
Pre-trained detector selection
Video frame processing
Bounding box visualization
Object tracking across frames
Task: Segment medical images to identify tumor regions
Approaches:
U-Net architecture for medical imaging
Mask R-CNN for instance segmentation
Semantic segmentation models
Task: Apply artistic style to images (already covered in detail)
See Neural Style Transfer use case.
Natural Language Processing
Text Classification
Named Entity Recognition
Text Summarization
Question Answering
Task: Train sentiment classifier on movie reviews dataset
to predict positive/negative sentiment
What happens:
Finds transformer models (BERT, RoBERTa)
Tokenizes text data
Fine-tunes pre-trained model
Evaluates on test set
Task: Extract person names, organizations, and locations
from news articles
Tools:
spaCy NER models
Transformer-based NER (BERT-NER)
Custom entity training
Task: Summarize long research papers into 3-sentence abstracts
Models:
BART for abstractive summarization
T5 for text-to-text generation
Extractive summarization with BERT
Task: Build QA system that answers questions about
documentation corpus
Approach:
Document embedding and indexing
Question encoder
Answer extraction/generation
Time Series & Tabular Data
Forecasting:
Predict next 30 days of sales based on historical data
using LSTM or Transformer model
Anomaly Detection:
Detect anomalies in server metrics time series data
using autoencoder or isolation forest
Regression:
Predict house prices from features: size, location, rooms, age
using gradient boosting (XGBoost, LightGBM)
Advanced Features
Hyperparameter Optimization
Task:
Train image classifier with automatic hyperparameter tuning:
optimize learning rate, batch size, and architecture depth
RepoMaster can integrate:
Optuna for Bayesian optimization
Ray Tune for distributed tuning
Grid search or random search
Multi-GPU Training
Task:
Train large language model on 4 GPUs using distributed training
Features:
Automatic DistributedDataParallel setup
Gradient accumulation
Mixed precision training (AMP)
Model parallelism for very large models
Model Deployment
Task:
Export trained model to ONNX format for production deployment
Supported formats:
ONNX (cross-framework)
TorchScript (PyTorch)
SavedModel (TensorFlow)
TFLite (mobile)
CoreML (iOS)
Model Zoo Access
RepoMaster can access state-of-the-art pre-trained models:
Hugging Face 100k+ models for NLP, vision, audio, multimodal
PyTorch Hub Official PyTorch model repository
TensorFlow Hub TensorFlow model collection
timm PyTorch Image Models - 700+ architectures
OpenAI GPT, CLIP, DALL-E models
Detectron2 Facebook’s detection and segmentation
Integration with Data Pipeline
Model Training
Train ML models using discovered repositories
Evaluation
Generate metrics, visualizations, and reports
Deployment
Export model for production use
Best Practices
Start with small experiments
Train on 10% of data first to verify pipeline works,
then scale to full dataset
Train image classifier optimized for CPU inference,
model size under 50MB
Train for 50 epochs with checkpointing every 5 epochs
and early stopping if validation loss doesn't improve
Train model and generate feature importance plots
and example predictions with explanations
GPU Utilization:
Train with automatic mixed precision (AMP) for faster training
and larger batch sizes
Data Loading:
Use multi-worker data loading and prefetching
for maximum GPU utilization
Memory Efficiency:
Use gradient checkpointing to train larger models
within available GPU memory
Example: Full ML Pipeline
🌟 Unified Assistant started!
============================================================
📋 Task: Complete ML pipeline for custom dataset
User: Train image classifier on my photos in data/images/
with categories: landscape, portrait, architecture, nature, urban
🔧 Analyzing task...
✅ Detected: Custom image classification task
📊 Analyzing dataset...
Found: 2,347 images across 5 categories
landscape: 512 images
portrait: 445 images
architecture: 398 images
nature: 521 images
urban: 471 images
✅ Dataset is balanced, no class weighting needed
📦 Creating train/val/test splits...
✅ Train: 70% (1,643 images)
✅ Val: 15% (352 images)
✅ Test: 15% (352 images)
🔍 Searching for image classification frameworks...
✅ Selected: pytorch-image-models (timm)
🧠 Selecting architecture...
✅ Recommended: EfficientNet-B3
Reason: Best accuracy/speed tradeoff for 5 classes
⚙️ Configuring training...
Image size: 224x224
Batch size: 32
Learning rate: 1e-3
Epochs: 30
Augmentation: AutoAugment + Mixup
🚀 Starting training...
[Training progress...]
✅ Training complete!
Best val accuracy: 94.3%
Test accuracy: 93.8%
📊 Generating visualizations...
✅ Created:
- training_curves.png
- confusion_matrix.png
- sample_predictions.png
- class_activation_maps.png
💾 Saved model:
- photo_classifier.pth (52.3 MB)
- model_config.json
- class_mapping.json
✨ Task completed! Your classifier is ready to use.
Troubleshooting
Reduce batch size to 16 and use gradient accumulation
to maintain effective batch size of 64
Add stronger data augmentation and dropout,
use early stopping based on validation loss
Enable mixed precision training and increase
number of data loading workers
Try different learning rate schedule: warmup + cosine decay,
or use AdamW optimizer instead of SGD
Next Steps
Neural Style Transfer Detailed computer vision example
Data Processing Prepare data for ML training
Repository Agent How ML repositories are discovered
Programming Assistant Custom ML code generation