OCR Project
Loading...
Searching...
No Matches
dataset.c File Reference

Training dataset loader with optional POSIX thread parallelism. More...

#include "dataset.h"
#include "../preprocess/image.h"
#include <dirent.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
Include dependency graph for dataset.c:

Classes

struct  LoaderArgs
 Arguments passed to each loader thread. More...

Functions

static int is_image_file (const char *filename)
 Check whether filename ends with a known image extension.
static void * loader_thread (void *arg)
 POSIX thread worker: load all images from one class directory.
static size_t count_images (const char *dir_path)
 Count the number of image files in a directory.
Datasetdataset_load (const char *root_dir, int n_threads)
 Load all images from root_dir into a Dataset.
void dataset_shuffle (Dataset *ds)
 Shuffle the samples in a Dataset in-place (Fisher-Yates).
void dataset_free (Dataset *ds)
 Free all memory owned by a Dataset.
void dataset_print_info (const Dataset *ds)
 Print a summary of the dataset to stdout.

Detailed Description

Training dataset loader with optional POSIX thread parallelism.

Function Documentation

◆ count_images()

size_t count_images ( const char * dir_path)
static

Count the number of image files in a directory.

Parameters
dir_pathDirectory path.
Returns
Number of image files (0 if directory cannot be opened).
Here is the call graph for this function:
Here is the caller graph for this function:

◆ dataset_free()

void dataset_free ( Dataset * ds)

Free all memory owned by a Dataset.

Parameters
dsDataset returned by dataset_load(). No-op if NULL.
Here is the caller graph for this function:

◆ dataset_load()

Dataset * dataset_load ( const char * root_dir,
int n_threads )

Load all images from root_dir into a Dataset.

Opens sub-directories named A–Z. For each PNG/JPG image found, loads it with SDL2, converts to grayscale, resizes to 28×28, normalises to [0, 1], and appends a Sample to the dataset. Non-image files are silently skipped.

Parameters
root_dirPath to the training data root (e.g. "training_data/").
n_threadsNumber of POSIX loader threads (1 = single-threaded).
Returns
Pointer to a heap-allocated Dataset, or NULL on error. Free with dataset_free().
Note
Requires SDL2 and SDL2_image to be linked.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ dataset_print_info()

void dataset_print_info ( const Dataset * ds)

Print a summary of the dataset to stdout.

Shows the total number of samples and the per-class distribution.

Parameters
dsDataset to describe.
Here is the caller graph for this function:

◆ dataset_shuffle()

void dataset_shuffle ( Dataset * ds)

Shuffle the samples in a Dataset in-place (Fisher-Yates).

Parameters
dsDataset to shuffle.
Here is the caller graph for this function:

◆ is_image_file()

int is_image_file ( const char * filename)
static

Check whether filename ends with a known image extension.

Accepted: .png, .jpg, .jpeg, .bmp (case-insensitive).

Parameters
filenameFile name (not full path).
Returns
1 if the file is a recognised image, 0 otherwise.
Here is the caller graph for this function:

◆ loader_thread()

void * loader_thread ( void * arg)
static

POSIX thread worker: load all images from one class directory.

Opens args->dir_path, iterates over image files, loads each one with image_load_normalised() and writes the result to args->buf.

Parameters
argPointer to a heap-allocated LoaderArgs (owned by caller).
Returns
Always NULL (errors are printed to stderr).
Here is the call graph for this function:
Here is the caller graph for this function: