OCR Project
Loading...
Searching...
No Matches
dataset.h File Reference

Training dataset loader. More...

#include "cnn.h"
#include <stddef.h>
Include dependency graph for dataset.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  Sample
 A single labelled training sample. More...
struct  Dataset
 A collection of training samples. More...

Functions

Datasetdataset_load (const char *root_dir, int n_threads)
 Load all images from root_dir into a Dataset.
void dataset_shuffle (Dataset *ds)
 Shuffle the samples in a Dataset in-place (Fisher-Yates).
void dataset_free (Dataset *ds)
 Free all memory owned by a Dataset.
void dataset_print_info (const Dataset *ds)
 Print a summary of the dataset to stdout.

Detailed Description

Training dataset loader.

Expects a root directory with one sub-directory per letter:

training_data/ A/ image1.png image2.png … B/ … … Z/ …

Each image is loaded with SDL2, converted to grayscale, and resized to CNN_IMG_H × CNN_IMG_W. The resulting pixel values are normalised to [0, 1].

Parallelism: loading can be parallelised over POSIX threads; the number of worker threads is controlled by the caller.

Function Documentation

◆ dataset_free()

void dataset_free ( Dataset * ds)

Free all memory owned by a Dataset.

Parameters
dsDataset returned by dataset_load(). No-op if NULL.
Here is the caller graph for this function:

◆ dataset_load()

Dataset * dataset_load ( const char * root_dir,
int n_threads )

Load all images from root_dir into a Dataset.

Opens sub-directories named A–Z. For each PNG/JPG image found, loads it with SDL2, converts to grayscale, resizes to 28×28, normalises to [0, 1], and appends a Sample to the dataset. Non-image files are silently skipped.

Parameters
root_dirPath to the training data root (e.g. "training_data/").
n_threadsNumber of POSIX loader threads (1 = single-threaded).
Returns
Pointer to a heap-allocated Dataset, or NULL on error. Free with dataset_free().
Note
Requires SDL2 and SDL2_image to be linked.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ dataset_print_info()

void dataset_print_info ( const Dataset * ds)

Print a summary of the dataset to stdout.

Shows the total number of samples and the per-class distribution.

Parameters
dsDataset to describe.
Here is the caller graph for this function:

◆ dataset_shuffle()

void dataset_shuffle ( Dataset * ds)

Shuffle the samples in a Dataset in-place (Fisher-Yates).

Parameters
dsDataset to shuffle.
Here is the caller graph for this function: