|
| static void | usage (const char *prog) |
| | Print usage information to stderr.
|
| static int | parse_args (int argc, char **argv, SolveArgs *args) |
| | Parse argv into a SolveArgs structure.
|
| static int | resolve_model (SolveArgs *args) |
| | Ensure args->model_path is filled: auto-detect if not specified.
|
| static void | forward_region (const Image *gray_img, int x1, int y1, int x2, int y2, CNN *net, float *probs_out) |
| | Run one forward pass on a region of the grayscale image.
|
| static int | recognise_cell (const Image *gray_img, const BoundingBox *box, int cell_size, CNN *net) |
| | Predict a cell using Test-Time Augmentation (TTA).
|
| static void | search_words (const CharGrid *grid, char *words) |
| | Search for all comma-separated words in args->words.
|
| int | main (int argc, char **argv) |
Entry point for the OCR + crossword-solver binary.
Usage:
./solve --image <path> [--model <path>] [--words <word1,word2,...>] [-v]
Options: –image <path> (required) Input crossword image. –model <path> (optional) Model file to load. If absent, picks the most recent .bin in models/. –words
list of words to search for. –verbose / -v Print per-cell recognition details and grid.
Pipeline:
- Load model.
- Load and preprocess the image (grayscale, binarize).
- Segment letter cells.
- Recognise each cell with the CNN.
- Build a character grid.
- Solve the word search (if –words is given).
Exit codes: 0 Success. 1 Argument error. 2 Model error (not found / cannot load). 3 Image load error. 4 Segmentation error.
| void forward_region |
( |
const Image * | gray_img, |
|
|
int | x1, |
|
|
int | y1, |
|
|
int | x2, |
|
|
int | y2, |
|
|
CNN * | net, |
|
|
float * | probs_out ) |
|
static |
Run one forward pass on a region of the grayscale image.
Extracts [x1,x2) × [y1,y2) from gray_img (RGBA, grayscale), binarizes the region locally, resizes to 28×28 and runs cnn_forward(). The CNN output probabilities are added into probs_out.
- Parameters
-
| gray_img | Full grayscale image (RGBA). |
| x1 | y1 x2 y2 Region bounds (clamped to image boundaries internally). |
| net | Trained CNN. |
| probs_out | Array of CNN_N_CLASSES floats — result is added here. |
| int recognise_cell |
( |
const Image * | gray_img, |
|
|
const BoundingBox * | box, |
|
|
int | cell_size, |
|
|
CNN * | net ) |
|
static |
Predict a cell using Test-Time Augmentation (TTA).
Runs TTA_N_CROPS forward passes with slightly different crop origins (the original crop + 4 shifts of ±shift pixels in x and y), averages the softmax outputs, and returns the argmax class.
The crop window is grid-aware: it is centred on the letter's bounding-box centre and sized to cell_size × cell_size (the detected grid pitch). This guarantees a consistent white border around the letter regardless of how tight the connected-component bounding box is.
- Parameters
-
| gray_img | Full grayscale image. |
| box | Tight bounding box from segmentation. |
| cell_size | Full grid cell side length (pixels). Pass 0 to fall back to the padding-fraction heuristic. |
| net | Trained CNN. |
- Returns
- Best class index (0='A'…25='Z'), or -1 on error.