Datasets
Supported Datasets
Currently, the project supports the following datasets out of the box:
Image Datasets
MNIST
The MNIST dataset is a large database of handwritten digits that is commonly used for training various image processing systems. It includes 60,000 training images and 10,000 testing images.
CIFAR-10
The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 different classes, with 6,000 images per class. It is widely used for training machine learning and computer vision algorithms.
ImageNetV2
The ImageNetV2 dataset contains 10,000 images across 1,000 classes, designed to mirror the structure and labeling of the original ImageNet dataset. It was created to evaluate model generalization and robustness by testing on new images while maintaining the same class definitions. ImageNetV2 is widely used for benchmarking large-scale image classification models.
Custom Datasets
We do not support for custom datasets (yet!), however we plan to include that in the initial alpha release.
Usage Examples
To see examples of how to use these datasets in your projects, refer to the
examples/image_classification directory. This directory contains example
scripts demonstrating how to load and preprocess the MNIST and CIFAR-10
datasets, as well as how to train and evaluate models using these datasets.
Image Datasets
Example: Loading MNIST Dataset
use deltaml::common::DatasetOps;use deltaml::data::MnistDataset;
#[tokio::main]async fn main() {    // Load the train and test data    let mut train_data = MnistDataset::load_train().await;    let test_data = MnistDataset::load_test().await;}If you are interested in more datasets we would recommend later to use Nebula.