|
OCR Project
|
Training dataset loader. More...


Go to the source code of this file.
Classes | |
| struct | Sample |
| A single labelled training sample. More... | |
| struct | Dataset |
| A collection of training samples. More... | |
Functions | |
| Dataset * | dataset_load (const char *root_dir, int n_threads) |
Load all images from root_dir into a Dataset. | |
| void | dataset_shuffle (Dataset *ds) |
| Shuffle the samples in a Dataset in-place (Fisher-Yates). | |
| void | dataset_free (Dataset *ds) |
| Free all memory owned by a Dataset. | |
| void | dataset_print_info (const Dataset *ds) |
| Print a summary of the dataset to stdout. | |
Training dataset loader.
Expects a root directory with one sub-directory per letter:
training_data/ A/ image1.png image2.png … B/ … … Z/ …
Each image is loaded with SDL2, converted to grayscale, and resized to CNN_IMG_H × CNN_IMG_W. The resulting pixel values are normalised to [0, 1].
Parallelism: loading can be parallelised over POSIX threads; the number of worker threads is controlled by the caller.
| void dataset_free | ( | Dataset * | ds | ) |
Free all memory owned by a Dataset.
| ds | Dataset returned by dataset_load(). No-op if NULL. |

| Dataset * dataset_load | ( | const char * | root_dir, |
| int | n_threads ) |
Load all images from root_dir into a Dataset.
Opens sub-directories named A–Z. For each PNG/JPG image found, loads it with SDL2, converts to grayscale, resizes to 28×28, normalises to [0, 1], and appends a Sample to the dataset. Non-image files are silently skipped.
| root_dir | Path to the training data root (e.g. "training_data/"). |
| n_threads | Number of POSIX loader threads (1 = single-threaded). |


| void dataset_print_info | ( | const Dataset * | ds | ) |
Print a summary of the dataset to stdout.
Shows the total number of samples and the per-class distribution.
| ds | Dataset to describe. |
