[NAVER Cloud] HyperCLOVA X Multimodal Model Recipe (üÇèÇü ÀÎÅÏ)
´ã´ç¾÷¹« ¡Ø ÀÎÅÏ½Ê ±â°£ Áß ºÎ¿©µÈ °úÁ¦¿¡ µû¶ó, ¾Æ·¡ ¾÷¹« Áß ÀϺθ¦ Áß½ÉÀ¸·Î ½Ç¹«¸¦ °æÇèÇÏ°Ô µË´Ï´Ù. • Vision Language Model ÀÇ ÁÖ±âÀûÀÎ ´ë±Ô¸ð ÇнÀ °øÁ¤ ¼öÇà • Video µîÀÇ »õ·Î¿î ¸ð´Þ¸®Æ¼ ¹× Computer-Use µîÀÇ Ãß°¡ ½Ã³ª¸®¿À ´ëÀÀ • Vision MOE, Vision-RLHF °øÁ¤À» À§ÇÑ µ¥ÀÌÅÍ È®º¸ • Multimodal LLM ÇнÀ Àüü ´Ü°è¿¡ À̸£´Â µ¥ÀÌÅͼ ¼³°è ¹× Æò°¡ • ±¤¹üÀ§ÇÑ Domain & Task ÀÇ Multimodal LLM µ¥ÀÌÅÍ¿¡ ´ëÇÏ¿© ¼­·ÎÀÇ ¿µÇâµµ ¹× ÃÖÁ¾ ¼º´É¿¡ ¹ÌÄ¡´Â ¿µÇâ Ž±¸ • ½Ã³ª¸®¿À Ưȭ µ¥ÀÌÅÍ È®º¸ ¹× ÃÖÁ¾ ¸ðµ¨ ¼º´É ¿µÇâ ÁõÁø • Multimodal Data Filtering Method °³¹ß ¹× ¸ðµ¨ ÇнÀ¡¤ºÐ¼® • ÃÖÀûÀÇ Data Recipe Ž»öÀ» À§ÇÑ Curation Method °³¹ß ¹× ¸ðµ¨ ÇнÀ¡¤ºÐ¼®