PROMPT = ( "Analyze the video and produce a structured annotation for perceptual quality, visual complexity, and basic scene context. " "Focus primarily on observable visual properties such as motion, texture, color, and quality. Prefer accurate and stable labels over detailed but uncertain descriptions. " "Keep semantic description minimal and only include clearly visible information. " "Return ONLY valid JSON (no markdown, no code fences, no extra text) with exactly these keys and value types:\n" "{\n" ' "summary": string,' ' "dense_caption": string,' ' "motion_level": string,' ' "motion_variability": string,' ' "texture_complexity": string,' ' "edge_density": string,' ' "color_richness": string,' ' "saturation_level": string,' ' "brightness_level": string,' ' "contrast_level": string,' ' "sharpness_level": string,' ' "noise_level": string,' ' "compression_artifacts": string,' ' "color_banding": string,' ' "overall_quality": integer,' ' "aesthetic_quality": integer,' ' "scene_complexity": string,' ' "camera_motion": string,' ' "spatial_scale": string,' ' "content_difficulty": string,' ' "environment": string,' ' "location_type": string,' ' "video_theme": string,' ' "visual_style": string,' ' "camera_viewpoint": string,' ' "scene_change": boolean,' ' "main_objects": array of strings,' ' "main_objects_count": string,' ' "contains_people": boolean,' ' "people_count": string,' ' "contains_text_overlay": boolean' "\n}\n" "Field requirements:\n" "- summary: one concise sentence, maximum 30 words; include the main action, scene, key objects, and camera viewpoint only if clearly visible.\n" "- dense_caption: 1 to 3 short sentences describing the visible scene and main event; do not repeat unnecessary details.\n" "- motion_level: one of [low, medium, high].\n" "- motion_variability: one of [consistent, varying, chaotic].\n" "- texture_complexity: one of [low, medium, high].\n" "- edge_density: one of [low, medium, high].\n" "- scene_complexity: one of [simple, moderate, complex].\n" "- color_richness: one of [low, medium, high].\n" "- saturation_level: one of [low, medium, high].\n" "- brightness_level: one of [dark, normal, bright].\n" "- contrast_level: one of [low, medium, high].\n" "- sharpness_level: one of [blurry, moderate, sharp].\n" "- noise_level: one of [low, medium, high].\n" "- compression_artifacts: one of [none, mild, noticeable, severe].\n" "- color_banding: one of [none, mild, noticeable, severe].\n" "- overall_quality: integer from 1 (very poor) to 5 (excellent).\n" "- aesthetic_quality: integer from 1 (very low) to 5 (very high).\n" "- camera_motion: one of [static, low, moderate, high, unstable].\n" "- spatial_scale: one of [close, medium, wide].\n" "- content_difficulty: one of [low, medium, high].\n" "- environment: one of [indoor, outdoor, mixed, \"\"].\n" "- location_type: short noun phrase (e.g., kitchen, street, beach, office); \"\" if unclear.\n" "- video_theme: short label (e.g., sports, travel, pets, urban, nature); \"\" if unclear.\n" "- visual_style: short phrase (e.g., natural color, low-light, cinematic, high saturation); \"\" if unclear.\n" "- camera_viewpoint: one of [eye-level, overhead, close, wide, \"\"].\n" "- scene_change: true if a scene cut occurs, else false.\n" "- main_objects: array of visible objects; [] if unclear.\n" "- main_objects_count: string integer matching length of main_objects.\n" "- contains_people: true or false.\n" "- people_count: one of [0, 1, 2, few, many, \"\"].\n" "- contains_text_overlay: true if visible text (subtitles, watermark, UI), else false.\n" "Important rules:\n" "- Use only the provided categories.\n" "- Do NOT use the string 'unknown', 'unclear', or null.\n" "- Do NOT add explanations.\n" "- Do NOT output extra text.\n" "- Use empty string \"\" only where explicitly allowed.\n" "- If a string field is unclear, return an empty string \"\".\n" "- If main_objects is unclear, return [].\n" "- If a boolean field is unclear, choose false.\n" "- Keep labels consistent and simple.\n" "- Output must be strict valid JSON with exactly the listed keys and no additional keys."" )