CV
Detailed curriculum vitae is here📄
Publications
Preprints
- Shintaro Ozaki, Kazuki Hayashi, Miyu Oba, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe. “BQA: Body Language Question Answering Dataset for Video Large Language Models” [arXiv]
International Conference
Yusuke Ide, Yuto Nishida, Miyu Oba, Yusuke Sakai, Justin Vasselli, Hidetaka Kamigaito, Taro Watanabe. “How to Make the Most of LLMs’ Grammatical Knowledge for Acceptability Judgments” Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-2025, Main, long), 2025/05 [paper arXiv] - Akari Haga, Akiyo Fukatsu, Miyu Oba, Arianna Bisazza, Yohei Oseki. “BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency” the BabyLM Challenge at the 28th Conference on Computational Natural Language Learning, 2024/11 [Outstanding paper award] [paper|arXiv]
- Miyu Oba, Yohei Oseki, Akiyo Fukatsu, Akari Haga, Hiroki Ouchi, Taro Watanabe, Saku Sugawara. “Can Language Models Induce Grammatical Knowledge from Indirect Evidence?” Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP-2024, Main, long), 2024/11 [paper|arXiv]
- Akari Haga, Saku Sugawara, Akiyo Fukatsu, Miyu Oba, Hiroki Ouchi, Taro Watanabe, Yohei Oseki. “Modeling Overregularization in Children with Small Language Models.” Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL-2024, Findings, long), 2024/08. [paper|arXiv]
- Miyu Oba, Akari Haga, Akiyo Fukatsu, Yohei Oseki. “BabyLM Challenge: Curriculum learning based on sentence complexity approximating language acquisition”, the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, 2023/12. [paper|arXiv]
- Miyu Oba, Tatsuki Kuribayashi, Hiroki Ouchi, Taro Watanabe. “Second Language Acquisition of Neural Language Models.” Findings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL-2023, Findings, long), 2023/07. [paper|arXiv]
Journal
- 大羽未悠, 栗林樹生, 大内啓樹, 渡辺太郎. 言語モデルの第二言語獲得. 自然言語処理 (domestic journal), Volume 31, Number 2, pp.433-455, 2024/06. [paper]
Domestic Conference
- 芳賀あかり, 深津聡世, 大羽未悠, Arianna Bisazza, 大関洋平. 言語モデルの事前学習におけるバリエーションセットの効果. 言語処理学会第31回年次大会, 4pages, 2025/03. [paper]
- 尾崎 慎太郎, 林 和樹, 大羽 未悠, 坂井 優介, 上垣外 英剛, 渡辺 太郎. マルチモーダル大規模言語モデルは非言語コミュニケーションを理解しているか?. 第19回NLP若手の会 シンポジウム (YANS), 2024/09. [Encouragement Award (奨励賞) received to 1st author]
- 井手佑翼, 西田悠人, 大羽未悠, 坂井優介, Justin Vasselli, 渡辺太郎, 上垣外英剛. 大規模言語モデルに適した容認性判断手法の検討. 第260回自然言語処理研究会, 8pages, 2024/06. [Young Researcher Award (若手奨励賞) received to 1st author] [paper]
- 大羽未悠, 大関洋平, 深津聡世, 芳賀あかり, 大内啓樹, 渡辺太郎, 菅原朔. 言語モデルの文法知識評価における間接肯定証拠の分析. 言語処理学会第30回年次大会, 4pages, 2024/3. [paper]
- 芳賀あかり, 菅原朔, 深津聡世, 大羽未悠, 大内啓樹, 渡辺太郎, 大関洋平. 小規模言語モデルによる子供の過剰一般化のモデリング. 言語処理学会第30回年次大会, 4pages, 2024/3. [paper]
- 大羽未悠, 芳賀あかり, 深津聡世, 大関洋平. 言語獲得過程を模倣した文の複雑さに基づくカリキュラム学習. 第18回NLP若手の会 シンポジウム (YANS), 2023/08.
- 大羽未悠, 栗林樹生, 大内啓樹, 渡辺太郎. 言語モデルの第二言語獲得. 言語処理学会第29回年次大会, 4pages, 2023/3. [Young Researcher Award (若手奨励賞)] [paper]
- 大羽未悠, 栗林樹生, 大内啓樹, 渡辺太郎. 言語モデルの第二言語獲得効率. 第254回自然言語処理研究会, 6pages, 2022/11. [IPSJ Yamashita SIG Research Award (山下記念研究賞), Best Paper Award (優秀研究賞)] [paper]
- 大羽未悠, 栗林樹生, 大内啓樹, 渡辺太郎. 言語モデルの第二言語獲得効率. 第17回NLP若手の会 シンポジウム (YANS), 2022/08. [Encouragement Award (奨励賞)]
Education
- 2024/04-present: Doctor of Engineering, Division of Information Science, NARA Institute of Science and Technology
- Research on natural language processing, computational (psycho)linguistics
- 2023/04-2024/03: Master of Engineering, Division of Information Science, NARA Institute of Science and Technology
- Research on natural language processing, computational (psycho)linguistics
- 2018/04-2022/03: Bachelor of Foreign Studies, Department of French Studies, Nanzan University
- Study of linguistics and french culture
- 2015/04-2018/03: High School Diploma, Meiwa High School
Experiences
- 2024/10-: Guest Researcher
- University of Göttingen, Germany
- Supervisor: Lisa Beinborn
- Researching modeling of second language (L2) learners’ sentence processing.
- 2023/04-: Research Assistant
- National Institute of Informatics
- Supervisor: Saku Sugawara
- Surveying papers on Linguistics and Cognitive science related to NLP
- Investigating language acquisition in language models inspired by human language acquisition.
- 2022/08-2023/01: Natural Language Processing R&D Engineer
- Trustworthy AI team, LINE Corp.
- Worked on ethical considerations in NLP.
- Surveyed and implemented evaluation methods for fairness in language models.
- 2020/06-2022/03: Data Scientist
- ROX Inc.
- Focused on demand forecasting and data analysis in the logistics and retail sectors.
- Developed applications leveraging forecast data for business insights.
Grants
- 2025/04-: Research Fellowship for Young Scientists by Japan Society for the Promotion of Science (PhD Fellowship; DC2)
- 2024/11: Scholarship for Study aboroad by the Association for Natural Language Processing
- 2024/04-Present: NAIST Granite Program (PhD Fellowship; JST SPRING)
- 2022/04-2024/03: JASSO Scholarship: Full Repayment exemption due to outstanding achievements (特に優れた業績による全額返済免除)
- 2023/07: ACL SRW Travel Grants
Activities/Talks/Interviews
- 2024/08: Introduce the paper “Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs” at 最先端NLP勉強会 [slides]
- 2023/08: Introduce the paper “How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases” at 最先端NLP勉強会 [slides]
- 2023/03: Graduate student interview with Gender Equality Promotion Office [page]
- 2023/03: 言語モデルの第二言語獲得 -研究のきっかけとこれから-. 言語処理学会第29回年次大会ワークショップ 深層学習時代の計算言語学. [page]
- 2021/12: Fundamental Information Technology Engineer Examination (FE; 基本情報技術者試験)
- 2020/12: JPHACKS2020 (Domestic Hackathon) [ Finalists, NTT Resonant Award & Studio Arcana Award ]
- 2020/11: Call for Code2020 (International Hackathon) [ Regional Finalists (日本上位4位) ]
- 2020/09: TOEIC 855
- 2020/06: Build@Mercari (Software Engineer Training Program)
Reviewer
- 2024: ARR