[NAVER Cloud] HyperCLOVA X Multimodal Model Recipe (üÇèÇü ÀÎÅÏ)
´ã´ç¾÷¹«
¡Ø ÀÎÅÏ½Ê ±â°£ Áß ºÎ¿©µÈ °úÁ¦¿¡ µû¶ó, ¾Æ·¡ ¾÷¹« Áß ÀϺθ¦ Áß½ÉÀ¸·Î ½Ç¹«¸¦ °æÇèÇÏ°Ô µË´Ï´Ù.
• Vision Language Model ÀÇ ÁÖ±âÀûÀÎ ´ë±Ô¸ð ÇнÀ °øÁ¤ ¼öÇà
• Video µîÀÇ »õ·Î¿î ¸ð´Þ¸®Æ¼ ¹× Computer-Use µîÀÇ Ãß°¡ ½Ã³ª¸®¿À ´ëÀÀ
• Vision MOE, Vision-RLHF °øÁ¤À» À§ÇÑ µ¥ÀÌÅÍ È®º¸
• Multimodal LLM ÇнÀ Àüü ´Ü°è¿¡ À̸£´Â µ¥ÀÌÅͼ ¼³°è ¹× Æò°¡
• ±¤¹üÀ§ÇÑ Domain & Task ÀÇ Multimodal LLM µ¥ÀÌÅÍ¿¡ ´ëÇÏ¿© ¼·ÎÀÇ ¿µÇâµµ ¹× ÃÖÁ¾ ¼º´É¿¡ ¹ÌÄ¡´Â ¿µÇâ Ž±¸
• ½Ã³ª¸®¿À Æ¯È µ¥ÀÌÅÍ È®º¸ ¹× ÃÖÁ¾ ¸ðµ¨ ¼º´É ¿µÇâ ÁõÁø
• Multimodal Data Filtering Method °³¹ß ¹× ¸ðµ¨ ÇнÀ¡¤ºÐ¼®
• ÃÖÀûÀÇ Data Recipe Ž»öÀ» À§ÇÑ Curation Method °³¹ß ¹× ¸ðµ¨ ÇнÀ¡¤ºÐ¼®