OCR Project
Loading...
Searching...
No Matches
solve_main.c File Reference

Entry point for the OCR + crossword-solver binary. More...

#include "src/cnn/cnn.h"
#include "src/cnn/model.h"
#include "src/preprocess/image.h"
#include "src/segment/segment.h"
#include "src/solver/solver.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
Include dependency graph for solve_main.c:

Classes

struct  SolveArgs
 Parsed command-line options for the solve binary. More...

Macros

#define DEFAULT_MODEL_DIR   "models/"
#define TTA_N_CROPS   5

Functions

static void usage (const char *prog)
 Print usage information to stderr.
static int parse_args (int argc, char **argv, SolveArgs *args)
 Parse argv into a SolveArgs structure.
static int resolve_model (SolveArgs *args)
 Ensure args->model_path is filled: auto-detect if not specified.
static void forward_region (const Image *gray_img, int x1, int y1, int x2, int y2, CNN *net, float *probs_out)
 Run one forward pass on a region of the grayscale image.
static int recognise_cell (const Image *gray_img, const BoundingBox *box, int cell_size, CNN *net)
 Predict a cell using Test-Time Augmentation (TTA).
static void search_words (const CharGrid *grid, char *words)
 Search for all comma-separated words in args->words.
int main (int argc, char **argv)

Detailed Description

Entry point for the OCR + crossword-solver binary.

Usage:

./solve --image <path> [--model <path>] [--words <word1,word2,...>] [-v]

Options: –image <path> (required) Input crossword image. –model <path> (optional) Model file to load. If absent, picks the most recent .bin in models/. –words

list of words to search for. –verbose / -v Print per-cell recognition details and grid.

Pipeline:

  1. Load model.
  2. Load and preprocess the image (grayscale, binarize).
  3. Segment letter cells.
  4. Recognise each cell with the CNN.
  5. Build a character grid.
  6. Solve the word search (if –words is given).

Exit codes: 0 Success. 1 Argument error. 2 Model error (not found / cannot load). 3 Image load error. 4 Segmentation error.

Macro Definition Documentation

◆ DEFAULT_MODEL_DIR

#define DEFAULT_MODEL_DIR   "models/"

◆ TTA_N_CROPS

#define TTA_N_CROPS   5

Number of shifted crops averaged for Test-Time Augmentation.

Function Documentation

◆ forward_region()

void forward_region ( const Image * gray_img,
int x1,
int y1,
int x2,
int y2,
CNN * net,
float * probs_out )
static

Run one forward pass on a region of the grayscale image.

Extracts [x1,x2) × [y1,y2) from gray_img (RGBA, grayscale), binarizes the region locally, resizes to 28×28 and runs cnn_forward(). The CNN output probabilities are added into probs_out.

Parameters
gray_imgFull grayscale image (RGBA).
x1y1 x2 y2 Region bounds (clamped to image boundaries internally).
netTrained CNN.
probs_outArray of CNN_N_CLASSES floats — result is added here.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ main()

int main ( int argc,
char ** argv )
Here is the call graph for this function:

◆ parse_args()

int parse_args ( int argc,
char ** argv,
SolveArgs * args )
static

Parse argv into a SolveArgs structure.

Parameters
argcArgument count.
argvArgument vector.
argsOutput structure (caller-allocated and zeroed).
Returns
0 on success, -1 on error.
Here is the caller graph for this function:

◆ recognise_cell()

int recognise_cell ( const Image * gray_img,
const BoundingBox * box,
int cell_size,
CNN * net )
static

Predict a cell using Test-Time Augmentation (TTA).

Runs TTA_N_CROPS forward passes with slightly different crop origins (the original crop + 4 shifts of ±shift pixels in x and y), averages the softmax outputs, and returns the argmax class.

The crop window is grid-aware: it is centred on the letter's bounding-box centre and sized to cell_size × cell_size (the detected grid pitch). This guarantees a consistent white border around the letter regardless of how tight the connected-component bounding box is.

Parameters
gray_imgFull grayscale image.
boxTight bounding box from segmentation.
cell_sizeFull grid cell side length (pixels). Pass 0 to fall back to the padding-fraction heuristic.
netTrained CNN.
Returns
Best class index (0='A'…25='Z'), or -1 on error.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ resolve_model()

int resolve_model ( SolveArgs * args)
static

Ensure args->model_path is filled: auto-detect if not specified.

Parameters
argsSolveArgs with possibly empty model_path.
Returns
0 if model_path is valid, -1 if no model could be found.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ search_words()

void search_words ( const CharGrid * grid,
char * words )
static

Search for all comma-separated words in args->words.

Prints each result to stdout in the format: WORD: (r0,c0) → (r1,c1) DIRECTION or: WORD: not found

Parameters
gridRecognised character grid.
wordsComma-separated word list (modified in-place by strtok).
Here is the call graph for this function:
Here is the caller graph for this function:

◆ usage()

void usage ( const char * prog)
static

Print usage information to stderr.

Parameters
progProgram name (argv[0]).
Here is the caller graph for this function: