Shutong JIN

Hi there! I'm currently a PhD student at RPL, KTH Royal Institute of Technology under the supervision of Assoc. Prof. Florian Pokorny (main supervisor) and Prof. Erik Elmroth (co-supervisor), funded by Wallenberg AI, Autonomous Systems and Software Program (WASP). My research includes two directions:

  • Attention as a tool for zero-shot robotic manipulation.
  • Scaling behaviors for learning-based methods with the large-scale cloud robotics platform CloudGripper we designed at KTH.

Current Research Interest: Attention Mechanism, Zero-shot Robotic Manipulation, Vision/Video Transformers/Diffusion Models, Visuo-motor Control Policies.

[Contact][Github] [Google Scholar]


Recent News

  • One paper recently accepted for WACV 2025
  • Our dataset CloudGripper-Push-1k featuring 1.4TB and in total 1278 hours of high-quality videos has been accepted for IROS 2024.

Feel free to contact me if you are interested in collecting another large-scale dataset with our cloud robotics platform!


Publications

photo

PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement [Video]

Shutong Jin*, Ruiyu Wang*, Kuangyi Chen, Florian T. Pokorny

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025

photo

Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies

Ruiyu Wang, Zheyu Zhuang, Shutong Jin, Nils Ingelhag, Danica Kragic, Florian T. Pokorny

Preprint

photo

CloudGripper-Push-1K: Understanding the Generalization Gap of Physics and Background Attributes for Robotic Manipulation

Shutong Jin, Ruiyu Wang, Zahid Muhammad and Florian T. Pokorny

IEEE/RSJ IROS 2024 Workshop on Collecting, Managing, and Utilizing Data through Embodied Robots.

Best Poster Award

photo

CloudGripper-AutoGrasper: A Cloud Robotics Toolkit for Automatic Data Collection

Axel Kaliff, Shutong Jin, Zahid Muhammad and Florian T. Pokorny

IEEE/RSJ IROS 2024 Workshop on Collecting, Managing, and Utilizing Data through Embodied Robots.

photo

RealCraft: Attention Control as A Tool for Zero-shot Consistent Video Editing

Shutong Jin, Ruiyu Wang and Florian T. Pokorny

Preprint.

photo

How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing [Video]

Shutong Jin, Ruiyu Wang, Muhammad Zahid and Florian T. Pokorny

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

photo

SectionKey: 3-D Semantic Point Cloud Descriptor for Place Recognition

Shutong Jin*, Zhenyu Wu*, Chunyang Zhao, Jun Zhang, Guohao Peng and Danwei Wang

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).


Education

kth  PhD candidate: KTH Royal Institute of Technology (Sweden)
  • Major: PhD candidate in Computer Science (2022.11 ~ Present)

ntu  Master: Nanyang Technological University (Singapore)
  • Major: MSc in Computer Control & Automation (2021.08 ~ 2022.07) (GPA: 4.75/5)

ecn  Foundation Master: Ecole Centrale de Nantes (France)
  • Major: Foundation Master in Robotics & Image Processing (2020.09 ~ 2021.06) (GPA: 4.0/4.0)

whu  Bachelor: Wuhan University (China)
  • Major: Bachelor in Electronic Information Engineering (2017.09 ~ 2021.06)(GPA: 87/100)


Teaching


Service

  • Reviewing Activities: IROS, WACV

  • Supervised Master's Student: Axel Kaliff