[NAVER Cloud] Vision °È ÇнÀ LLM °³¹ß (°æ·Â)
´ã´ç¾÷¹«
»ý»ê °øÁ¤ Áß Multi-modal RL (RLVR, RLHF) ÇнÀ ¿Ï¼º ¹× Resoning °ü·Ã Target Benchmark ÀÇ SOTA ±Þ ¼º´É ´Þ¼º
• Hyperscale ±Ô¸ðÀÇ GPU ÀÚ¿ø (IF ·Î ¹ÀÎ GPU Cluster)°ú VERL, VeOmni ¸¦ Ȱ¿ëÇÑ Vision RL °ü·Ã ÇнÀ ȯ°æ °³¹ß ¹× ½ÇÇè
• RLVR À» À§ÇÑ Target Scenario Ž»ö ¹× Reward Design, °ü·Ã ÇнÀ Data °¡°ø ¹× È®º¸
• RLHF ÇнÀÀ» À§ÇÑ Reward Model (RM) ÇнÀ ¹× °ü¸®, Reward ±â¹Ý Policy ÃÖÀûÈ ½ÇÇè
• Hyperscale ±Ô¸ðÀÇ GPU ÀÚ¿ø (InfiniBand ·Î ¹ÀÎ GPU Cluster)¿¡¼ÀÇ Multi-modal RL ÇнÀ, Ablation ¹× ¸ðµ¨ »ý»ê ÀÚµ¿È
• LLMÀÇ Text ¼º´É ¹× ±âÁ¸ Vision ¼º´ÉÀÇ Forgetting À» ÃÖ¼ÒÈ ¹× Reasoning ´É·Â Çâ»ó ¿¬±¸
ÀÚ°Ý¿ä°Ç
• ¹Ú»ç ÇÐÀ§ º¸À¯ÀÚ È¤Àº 2³â ÀÌ»óÀÇ À¯°ü °æ·ÂÀ» º¸À¯ÇϽŠºÐ
• Vision Language Model (LLaVA, Qwen VL, DeepSeek VL µî) ÀÇ ±¸Á¶ ¹× ÇнÀ °úÁ¤¿¡ ´ëÇÑ »ó¼¼ÇÑ ÀÌÇØµµ¸¦ °®Ã߽ŠºÐ
• Code Level ÀÇ °æÇèÀ» º¸À¯ÇϽŠºÐ
• RL °ü·Ã ±âº» Áö½Ä (RLHF, DPO, PPO, GRPO µî) ¹× ½Ç¹«ÀûÀÎ ÇнÀ °æÇèÀ» º¸À¯ÇϽŠºÐ
• Vision RL ¿¡¼ »ç¿ëµÇ´Â µ¥ÀÌÅÍÀÇ Æ¯¼º, Á¾·ù¿¡ ´ëÇÑ ÀÌÇØ°¡ ÀÖÀ¸½Å ºÐ
• Python ¹× LLM °³¹ß °ü·Ã Library, FW, Platform(Pytorch, Hugging Face) Ȱ¿ë ´É·ÂÀ» º¸À¯ÇϽŠºÐ