This repository contains the implementation of a CLIP-Large-based multimodal framework for cover-based book genre prediction. The model uses book cover images, book titles, and OCR text as inputs.
AI-powered image analysis — extract text, detect objects, describe scenes, summarise content, and analyse tone — all from a single image upload. Runs fully on CPU.