[NAVER Cloud] Vision °­È­ ÇнÀ LLM °³¹ß (°æ·Â)
´ã´ç¾÷¹« »ý»ê °øÁ¤ Áß Multi-modal RL (RLVR, RLHF) ÇнÀ ¿Ï¼º ¹× Resoning °ü·Ã Target Benchmark ÀÇ SOTA ±Þ ¼º´É ´Þ¼º • Hyperscale ±Ô¸ðÀÇ GPU ÀÚ¿ø (IF ·Î ¹­ÀÎ GPU Cluster)°ú VERL, VeOmni ¸¦ Ȱ¿ëÇÑ Vision RL °ü·Ã ÇнÀ ȯ°æ °³¹ß ¹× ½ÇÇè • RLVR À» À§ÇÑ Target Scenario Ž»ö ¹× Reward Design, °ü·Ã ÇнÀ Data °¡°ø ¹× È®º¸ • RLHF ÇнÀÀ» À§ÇÑ Reward Model (RM) ÇнÀ ¹× °ü¸®, Reward ±â¹Ý Policy ÃÖÀûÈ­ ½ÇÇè • Hyperscale ±Ô¸ðÀÇ GPU ÀÚ¿ø (InfiniBand ·Î ¹­ÀÎ GPU Cluster)¿¡¼­ÀÇ Multi-modal RL ÇнÀ, Ablation ¹× ¸ðµ¨ »ý»ê ÀÚµ¿È­ • LLMÀÇ Text ¼º´É ¹× ±âÁ¸ Vision ¼º´ÉÀÇ Forgetting À» ÃÖ¼ÒÈ­ ¹× Reasoning ´É·Â Çâ»ó ¿¬±¸ ÀÚ°Ý¿ä°Ç • ¹Ú»ç ÇÐÀ§ º¸À¯ÀÚ È¤Àº 2³â ÀÌ»óÀÇ À¯°ü °æ·ÂÀ» º¸À¯ÇϽŠºÐ • Vision Language Model (LLaVA, Qwen VL, DeepSeek VL µî) ÀÇ ±¸Á¶ ¹× ÇнÀ °úÁ¤¿¡ ´ëÇÑ »ó¼¼ÇÑ ÀÌÇØµµ¸¦ °®Ã߽ŠºÐ • Code Level ÀÇ °æÇèÀ» º¸À¯ÇϽŠºÐ • RL °ü·Ã ±âº» Áö½Ä (RLHF, DPO, PPO, GRPO µî) ¹× ½Ç¹«ÀûÀÎ ÇнÀ °æÇèÀ» º¸À¯ÇϽŠºÐ • Vision RL ¿¡¼­ »ç¿ëµÇ´Â µ¥ÀÌÅÍÀÇ Æ¯¼º, Á¾·ù¿¡ ´ëÇÑ ÀÌÇØ°¡ ÀÖÀ¸½Å ºÐ • Python ¹× LLM °³¹ß °ü·Ã Library, FW, Platform(Pytorch, Hugging Face) Ȱ¿ë ´É·ÂÀ» º¸À¯ÇϽŠºÐ