Hello, I am Sotaro Takeshita. I am a second-year Ph.D. student at University of Mannheim focusing on text summarization. I am interested in

  • text summarization
  • scholarly document processing
  • information extraction
  • multilinguality in NLP models

I like to read (both papers and books) and programming checkout my paper search system and its extension with an LLM, as well as my OSS projects. I speak Japanese (native), English (fluent), Spanish (basic) and now learning German. For programming, I like to use (neo)vim to write python. Feel free to get in touch by email (oh.sore.sore.soutarou at gmail.com).

Education

  • Sep. 2021 - present
    • The Data and Web Science Group, University of Mannheim
    • Advisor: Prof. Dr. Simone Paolo Ponzetto
  • Apr. 2013 - Mar. 2018, B.A. in Information Science
    • Faculty of Informatics and Engineering, A National University of Electro-Communications
    • Advisor: Minami Yasuhiro
  • Apr. 2018 - Mar. 2020, M.S. in Computer Science
    • Faculty of Informatics and Engineering, A National University of Electro-Communications
    • Advisor: Minami Yasuhiro

Experience

  • Feb. 2016 - Apr. 2016, Research Internship
    • Sección de Estudios de Posgrado e Investigación de ESIME Culhuacan, Instituto Politecnico Nacional
    • Advisor: Mariko Nakano Miyatake
    • funded by A National University of Electro-Communications
  • Aug. 2017 - Sep. 2017, Recruit Holdings, data scientist internship
    • Best work prize
  • Aug. 2018 - Seq. 2018, NTT Media Intelligence Lab, NLP research internship
    • Advisor: Dr. Ryuichiro Higashinaka
  • Nov. 2018 - Dec. 2018, Research Internship
    • The University of Campinas, Faculty of Electrical and Computer Engineering
    • Advisor: Prof. Dr. Eric Rohmer
    • funded by A National University of Electro-Communications
  • Apr. 2018 - Jun. 2021, BuildIt, data scientist
  • Sep. 2019, University of Mannheim, research visiting
    • Advisor: Dr. Goran Glavaš
    • funded by A National University of Electro-Communications

Project

  • GenGO
    • Paper exploration system with NLP technologies.
  • GenGO Chat
    • LLM-powered RAG system to support literature search for NLPers.
  • The Token
    • “The open community for all NLP people” where I contribute technical stuff.
  • NLP TLDRs
    • A list of major NLP conference proceedings with one sentence summaries.
  • schnitsum
    • Easy to use python pkg to generate summaries with state-of-the-art neural network models.
  • tofunlp/sister
    • Very simple and easy to use pkg to encode sentences in various language into vector representations.
  • sobamchan/pytorch-lightning-transformers
    • Clean readable code for finetuning transformers with pytorch-lightning.