|
OCR Project
|
Training dataset loader with optional POSIX thread parallelism. More...
#include "dataset.h"#include "../preprocess/image.h"#include <dirent.h>#include <pthread.h>#include <stdio.h>#include <stdlib.h>#include <string.h>#include <sys/stat.h>
Classes | |
| struct | LoaderArgs |
| Arguments passed to each loader thread. More... | |
Functions | |
| static int | is_image_file (const char *filename) |
Check whether filename ends with a known image extension. | |
| static void * | loader_thread (void *arg) |
| POSIX thread worker: load all images from one class directory. | |
| static size_t | count_images (const char *dir_path) |
| Count the number of image files in a directory. | |
| Dataset * | dataset_load (const char *root_dir, int n_threads) |
Load all images from root_dir into a Dataset. | |
| void | dataset_shuffle (Dataset *ds) |
| Shuffle the samples in a Dataset in-place (Fisher-Yates). | |
| void | dataset_free (Dataset *ds) |
| Free all memory owned by a Dataset. | |
| void | dataset_print_info (const Dataset *ds) |
| Print a summary of the dataset to stdout. | |
Training dataset loader with optional POSIX thread parallelism.
|
static |
Count the number of image files in a directory.
| dir_path | Directory path. |


| void dataset_free | ( | Dataset * | ds | ) |
Free all memory owned by a Dataset.
| ds | Dataset returned by dataset_load(). No-op if NULL. |

| Dataset * dataset_load | ( | const char * | root_dir, |
| int | n_threads ) |
Load all images from root_dir into a Dataset.
Opens sub-directories named A–Z. For each PNG/JPG image found, loads it with SDL2, converts to grayscale, resizes to 28×28, normalises to [0, 1], and appends a Sample to the dataset. Non-image files are silently skipped.
| root_dir | Path to the training data root (e.g. "training_data/"). |
| n_threads | Number of POSIX loader threads (1 = single-threaded). |


| void dataset_print_info | ( | const Dataset * | ds | ) |
Print a summary of the dataset to stdout.
Shows the total number of samples and the per-class distribution.
| ds | Dataset to describe. |

| void dataset_shuffle | ( | Dataset * | ds | ) |
|
static |
Check whether filename ends with a known image extension.
Accepted: .png, .jpg, .jpeg, .bmp (case-insensitive).
| filename | File name (not full path). |

|
static |
POSIX thread worker: load all images from one class directory.
Opens args->dir_path, iterates over image files, loads each one with image_load_normalised() and writes the result to args->buf.
| arg | Pointer to a heap-allocated LoaderArgs (owned by caller). |

