Dear Editor:
On behalf of my co-authors, we thank you very much for giving us an opportunity to revise our manuscript, we appreciate editor and reviewers for their positive and constructive comments and suggestions on our manuscript entitled "3DGAM: Using 3D Gesture and CAD Models for Training on Mixed Reality Remote Collaboration". (No.: MTAP-D-20-00850R1).
We have carefully studied the reviewer's comments and have made revision which marked in red in the paper. We have tried our best to revise our manuscript according to the comments.
We would like to express our great appreciation to you and reviewers for comments on our paper.
Thank you and best regards.
Yours sincerely,
Peng Wang
List of Responses
Reviewer #1:
I would recommend that you would add 3 to 4 more related references from past recent editions of the ACM'UIST and ACM'CHI conferences to section "2. Related Works". You do have at least 1 reference from for each at this time but if would be great to add more up-to-date references from each as well.
Response:
Thanks for your constructive suggestions. We totally agree with you, thus we add 4 related references as follows. The revised portion has been marked in red in the revised manuscript for easy tracking.
[2] Bai H, Prasanth S, Jing Y, Mark B (2020). A User Study on Mixed Reality Remote Collaboration with Eye Gaze and Hand Gesture Sharing. 1--13. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM CHI '20, April 25--30, 2020, Honolulu, HI, USA.
[4] Burova A, Mäkelä J, Hakulinen J, Keskinen T, Heinonen H, Siltanen S, Turunen M (2020) Utilizing VR and Gaze Tracking to Develop AR Solutions for Industrial Maintenance. (278):1--13. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM CHI '20, April 25--30, 2020, Honolulu, HI, USA.
[30] Villanueva A, Zhu Z, Liu Z, Peppler K, Redick T, Ramani K (2020) Meta-AR-App: An Authoring Platform for Collaborative Augmented Reality in STEM Classrooms. (2): 1--14. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM CHI '20, April 25--30, 2020, Honolulu, HI, USA.
[37] Wu T Y, Gong J, Seyed T, Yang X D. 2019. Proxino: Enabling Prototyping of Virtual Circuits with Physical Proxies. UIST 2019 - Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology(UIST): 121--32.
Reviewer #2:
1 Overall the text is well written, but it is recommended submitting it to a last proofreading as some phrases still present issues (example: "it is important to explore VR/AR/MR technologies can be used for remote collaboration on physical tasks", "Although they found that sharing 3D gesture has great potential in remote collaboration." or "However, as authors said that the prototype is too bulky for local workers.").
Response:
Thanks for your constructive suggestions. We have carefully proofread the paper. Moreover, we have improved these issues. The revised portion has been marked in red in the revised manuscript for easy tracking.
it is important to explore VR/AR/MR technologies that can be used for remote collaboration on physical tasks.
Although they found that sharing 3D gesture has great potential in remote collaboration for providing operational instructions, the prototype system only could provide four different types of gestures.
However, the prototype system is too bulky for local workers.
2 Finally I recommend to include a discussion in session 7 addressing possible bias in the formal user study, as the lack of gestures in the control group also could mean less communication capability.
Response:
Thanks for your constructive suggestions. We adjusted the structure of the paper. Now the session 7 is discussion, and we point out this in the paper. The revised portion has been marked in red in the revised manuscript for easy tracking.
In the 3DAM condition, we think that the lack of gestures also could lead to less communication capability, according to the interview.
Reviewer #3:
- Explain the novelty and motivation in abstract.
Response:
Thanks for your constructive suggestions. The revised portion has been marked in red in the revised manuscript for easy tracking.
The motivation is as follows.
Previous research has shown that gesture-based interaction is intuitive and expressive for remote collaboration, and using 3D CAD models can provide clear instructions for assembly tasks.
The novelty is as follows.
We describe a new MR remote collaboration system which combines the use of gesture and CAD models in a complementary manner. The prototype system enables a remote expert in VR to provide instructions based on 3D gesture and CAD models (3DGAM) for a local worker who uses AR to see these instructions. Using this interface, we conducted a formal user study to explore the effect of sharing 3D gesture and CAD models in an assembly training task.
- provide the compressive results explanation.
Response:
Thanks for your constructive suggestions. We have provided the compressive results explanation in section 6. The revised portion has been marked in red in the revised manuscript for easy tracking.
Overall, there was statistically significant difference in the performance time between two conditions, but there was not statistically significant difference in the number of operation errors. For uer experience, most participants think that the 3DGAM interface is better than the 3DAM interface in terms of UEQ and GCE.
- provide the verify and validation section.
Response:
Thanks for your constructive suggestions. About the verify and validation, we think that the function of the formal user study (Section 5) and results(Section 6) are to prove the verify and validation of our research contribution. Maybe, we didn't catch your meaning, could you give me more details about this problem? Thank you very much!
4 section 6.3 is very scattered (UEQ )
Response:
Thanks for your constructive suggestions. We have improved this section. The revised portion has been marked in red in the revised manuscript for easy tracking.
In short, there were significant differences between the 3DAM and 3DGAM conditions in terms of Attractiveness, Efficiency, Dependability for all participants. Following the guidelines [26] , the category ratings range from -3 to 3, and the result is good when the ratings are more than one. The higher the value, the better the result. Therefore the prototype system is good for the pump assembly task.
Reviewer #4:
- The application of using gesture and CAD model for training tasks in remote virtual collaboration is interesting. However, I think the technical contribution of the work is relatively weak. Using pre-built 3D models in the virtual environment is not new and combing the capability of sharing gesture information for remote tasks is definitely not unseen before. I'm not clear the advantages of the proposed system. And as the authors mentioned that there are still many limitations for the system, I would suggest revising the draft and emphasizing on the contribution of the work.
Response:
Thanks for your problems. In this paper, we describe a new MR remote collaboration system which combines the use of gesture and CAD models in a complementary manner. The prototype system enables a remote expert in VR to provide instructions based on 3D gesture and CAD models (3DGAM) for a local worker who uses AR to see these instructions. Therefore, Our research makes the following contributions:
-
Implementing an MR remote collaboration system, 3DGAM, which supports gesture-driven instructions using 3D gesture and CAD models of physical parts between both local AR and remote VR sites.
-
Conducting the first user study exploring the effects of sharing 3D gesture cues and CAD models in a VR/AR/MR interface for remote collaboration in industrial assembly training tasks.
-
Providing design implications for using 3DGAM cues in AR/MR collaborative interface.
-
I couldn't find any comparison between the system or its individual modules and the state of the art technique for gesture recognition or virtual environment handling.
Response:
Thanks for your problems. In the research we focused on a new remote collaboation way using new technonlogies, such as VR using HTC Vive, gesture recognition using Leap Motion. Based on the prototype system, we further explored the effect of 3D gesture and CAD models in AR/MR remote collaboration for industrial assembly training tasks using the first formal user study.
- I'm curious if the gestures of "local" user will be captured and sent back to the "remote expert" as a feedback? If not, why does the system need live instructions instead of predefined and pre-built instructions, such as animated movements of the mechanical parts?
Response:
Thanks for your problems. In current research we did not capture and send back the gestures of "local" user to the "remote expert" as a feedback. However, the remote expert can know the state of local scene by the shared video stream in real time. In the assembly task, some operations have special requirments for gestures (see Figure 21). In such a case, it's easy for local participants to understand the shared operations, leading to take less time for the assembly task by live instructions. During the assembly trainning, remote experts can do demonstrations in real time when the local participants made mistakes. Moreover, sharing live 3D gestures could improve the local user experinece. Thus we think that it's important for the system to support live instructions instead of predefined and pre-built instructions in remote collaboration on physical tasks.
Figure 21 The long bolts assembly in training.
- The overall language could use significant improvements. And there are quite a few typos and minor typesetting issues. Some of them are:
* on page 3, line 10, "The found that..." -> "They found".
* in figure 7, the subplot labels are missing.
Response:
Thanks for your constructive suggestions. We have improved these issues. The revised portion has been marked in red in the revised manuscript for easy tracking.
Now the figure 7 has the subplot labels as follows.
Figure 7 The assembly task for a water pump. (a, b) the collaborative scene on the remote VR site, (a-f) the collaborative scene on the local AR site, (a, c, e) the assembly before, (b, d, f) the assembly completion.
Reviewer #5:
Paper is interesting, however needs the following revisions
- Formatting of the paper is poor and should be improved.
Response:
Thanks for your constructive suggestions. We have improved some issues.
- Paper is in need of a thorough spelling and grammar check
Response:
Thanks for your constructive suggestions. We have made a a thorough spelling and grammar check.
- Figures are not well described in the text.
Response:
Thanks for your constructive suggestions. We provided more details when describeing figures.
About Figure 1, the detailed description is in the body (3.1 Framework). Besides, we provided more details in the caption. The revised portion has been marked in red in the revised manuscript for easy tracking.
Figure 1 The prototype MR remote collaboration framework, mainly including a local AR site, a remote VR site, a server, and related devices and softwares.
About Figure 3, the detailed description is in the body (the second paragraph of 3.2 Gesture-based Interaction).
About Figure 4, the detailed description is in the body (3.3 Sharing Gesture).
About Figure 5, the detailed description is in the body (3.4 Creating and loading asset-bundles).
- Some recent references from MTAP should be added to the paper.
Response:
Thanks for your constructive suggestions. We have added three recent references from MTAP as follows. The revised portion has been marked in red in the revised manuscript for easy tracking.
- Al-Khafajiy M, Baker T, Chalmers C, Asim M, Kolivand H, Fahim M, Waraich A (2019) Remote health monitoring of elderly through wearable sensors. Multimedia Tools and Applications,78(17), 1-26.
[12] García-Pereira I, Portalés C, Gimeno J, Casas S (2019) A collaborative augmented reality annotation tool for the inspection of prefabricated buildings. Multimedia Tools and Applications, 79(4), 1-19.
[22] Petrangeli S, Pauwels D, Hooft J V D, Ziak M, Slowack J, Wauters T, Turck F D (2019) A scalable webrtc-based framework for remote video collaboration applications. Multimedia tools and applications. 78:7419--7452