Dear Editor:
On behalf of my co-authors, we thank you very much for giving us an opportunity to revise our manuscript, we appreciate editor and reviewers for their positive and constructive comments and suggestions on our manuscript entitled "3DGAM: Using 3D Gesture and CAD Models for Training on Mixed Reality Remote Collaboration". (No.: COMIND_2019_1015).
We have carefully studied the reviewer's comments and have made revision which marked in red in the paper. We have tried our best to revise our manuscript according to the comments.
We would like to express our great appreciation to you and reviewers for comments on our paper.
Thank you and best regards.
List of Responses
Responses to Reviewer #1:
- Research methodology: Authors want to compare 3DAM vs. 3DGAM but users are in the same room and they can speak to each other. Previuos works (for instance, [3]) have proved that verbal communication can overcome any other suggestion. For this reason, I think test should be repeated placing users in different rooms; I expect Authors can measure significative differences in terms of errors and completion times.
Response:
Thanks for your constructive suggestions. We totally agree with you, and maybe this condition (users are in the same room and they can speak to each other) can affect the results in terms of error and completion time. In our current study, we indeed overlook this point. For this, we simplified the prototype to the key points aligned with the research focus, using co-location instead of geographical separation, similar to prior research [1-5]. Moreover, in the research, the audio condition is same, so we did not consider this factor with respect to error and completion time. We discussed this point in Section 6.5 (Limitations and future works). Therefore, thanks for your witty remarks and brilliant views again, because the idea will remarkably push us to improve our further research and transfer the study result to practical benefits. The revised portion has been marked in red in the revised manuscript for easy tracking.
Fourth, in our current research, we adopt the co-located collaboration to simulate remote collaboration. Specifically, in the experiment all participants are in the same room and they can speak to each other. We think this factor perhaps affect the objective results in terms of performance time and error evaluation. Therefore, we are trying to improve the prototype based on WebRTC(https://webrtc.org) for the geographically separated remote collaboration.
References
-
Elvezio C , Sukan M , Oda O , et al. [ACM Press ACM SIGGRAPH 2017 VR Village - Los Angeles, California (2017.07.30-2017.08.03)] ACM SIGGRAPH 2017 VR Village on, - SIGGRAPH \"17 - Remote collaboration in AR and VR using virtual replicas[C]// Acm Siggraph Vr Village. ACM, 2017:1-2.
-
Oda O , Elvezio C , Sukan M , et al. Virtual Replicas for Remote Assistance in Virtual and Augmented Reality[C]// the 28th Annual ACM Symposium. ACM, 2015.
-
Wang P, Zhang S, Bai X, et al. 2.5 DHANDS: a gesture-based MR remote collaborative platform[J]. The International Journal of Advanced Manufacturing Technology, 2019, 102(5-8): 1339-1353.
-
Le Chénéchal M, Duval T, Gouranton V, et al. Vishnu: virtual immersive support for HelpiNg users an interaction paradigm for collaborative remote guiding in mixed reality[C]//2016 IEEE Third VR International Workshop on Collaborative Virtual Environments (3DCVE). IEEE, 2016: 9-12.
-
Kim S, Lee G, Huang W, et al. Evaluating the Combination of Visual Communication Cues for HMD-based Mixed Reality Remote Collaboration[C]//Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 2019: 173.
-
A second issue is again related to the methodology. Assembly tasks should also involve a set of basic operations (such us screw/unscrew, hummer, etc.) and tools (e.g., screwdriver, hummer, allen keys, etc.); also the selection of the right tool should be suggested as well as the right way to perform an operation. On the other hand, task selected by Authors seem to be only a set of piece selection and piece insertion. I'd like to see a more realistic task where "the whole" assembly process is suggested.
Response:
Thanks for your constructive suggestions. You're right, assembly tasks should also involve a set of basic operations (such us screw/unscrew, hummer, etc.) and tools (e.g., screwdriver, hummer, allen keys, etc.); also the selection of the right tool should be suggested as well as the right way to perform an operation. In fact, in our research we used the wrench as shown in Figure 14(m). We did not elaborate the related operations on the tool (wrench) because the local worker can use them according to the shared audio cues from the remote helper. According to your valuable suggestions, we improved the paper, and made an in-depth discussion about this. The revised portion has been marked in red in the revised manuscript for easy tracking.
Moreover, assembly tasks should also involve a set of basic operations (such us screw/unscrew, hummer, etc.) and tools (e.g., screwdriver, hummer, wrench, etc.). In MR remote collaboration on assembly training tasks, the selection of the right tool should be suggested as well as the right way to perform an operation. On the other hand, tasks selected in this research seem to be only a set of piece selection and piece insertion. In fact, in our research we used a wrench as shown in Figure 14(m), and during assembling the parts they should be tightened by hand first, and then a wrench should be used to tighten them. We did not elaborate the related operations on the wrench because the local worker can use it according to the shared audio cues from the remote helper. Specifically, there are three kinds of bolts (B1, B2, B3) and a wrench (see Figure 14(m)). The big side of the wrench is for the two kinds of bolts (B1 and B2), and the small side of the wrench is for the B3 blots, as shown in Figure 24.
Figure 14(m)
Figure 24 A set of basic operations using a wrench. (a) the used wrench, (b, c) the big side of the wrench for the B1 and B2 bolts, (d) the small side of the wrench for the B3 bolts.
- The augmented scene seen by the local user seems to be far from the real pieces. On the other hand, augmented hints might/should be overlapped to the real objects (this reduces the cognitive load). Is this choice to reduce the impact of the narrow FoV of the Microsoft Hololens? I think Authors should discuss this issue.
Response:
Thanks for your valuable suggestions. We did not overlap the hints to the real objects for reducing the impact of the narrow FoV of the Microsoft HoloLens. Therefore, we discussed this in subsection 6.5 (Discussion and implications) as follows. The revised portion has been marked in red in the revised manuscript for easy tracking.
In this research, the prototype system did not support that the augmented hints overlap to the real assembly parts. Although this choice will increase the cognitive load, it's a good solution to reduce the impact of the narrow field of view of the HoloLens to some extent.
4 -) some English flaws should be fixed;
Response:
Thanks for your valuable suggestions. We have carefully checked the language throughout the manuscript.
5-) Section 6.1: what do Authors mean with "Standar error"? Is it "Standar deviation"?
Response:
Thanks for your valuable suggestions. It is "Standard error". About this, you can refer to this link: Standard Error.
-Reviewer 2
1 First, the authors should clearly distinguish between the novelty of the proposed paper and their previous works. The authors do not explicitly refer to their previous works. However, it seems to me that some works cited in Section 2 refers to their previous works, more specifically:
[1] 2.5DHANDS: a gesture-based MR remote collaborative platform
[2] Head Pointer or Eye Gaze: Which Helps More in MR Remote Collaboration?
[3] A gesture- and head-based multimodal interaction platform for MR remote collaboration
This makes it very difficult to measure the novelty of the proposed work.
Response:
Thanks for your valuable suggestions. You're right. We indeed overlook this. Therefore, we clearly distinguish between the novelty of the proposed paper and their previous works as follows. The revised portion has been marked in red in the revised manuscript for easy tracking.
Compared to previous works, Wang et al. proposed an MR remote collaborative platform, sharing non-verbal cues (e.g., gesture, head pointer, and eye gaze), that the remote user is in the VR environment and the local user is in the projector-based AR environment. Using this prototype system, they found that sharing gesture can improve user experience and performance [1], the head pointer is a good proxy for the eye gaze [2], and the gesture- and head-based multimodal interaction [3] also can enhance user experience and performance in MR remote collaboration on physical tasks.
2 3D gestures are used to provide remote assistance. In the images and video provided it is possible to see that all the 3D contents displayed on the HoloLens are not overimposed respect to the real objects but displayed on an empty desk, in front of the user. Due to this design choice, some questions arises, such as:
- the displacement between the 3D contents and the real objects leads the user to continuously switch context. Is it not one of the principles of AR to remove this type of context switch respect to traditional solutions such as paper instructions?
Response:
Thanks for your valuable suggestions. This is indeed a weakness of the prototype system. However, our method can provide real-time assistance based on 3D gestures and CAD model compared to paper instructions. From this perspective, although in our current research all the 3D contents did not overlap to the real objects, it is better than paper instructions to some extent. Therefore, we discussed this in the subsection 6.5 (Discussion and implications) as follows. The revised portion has been marked in red in the revised manuscript for easy tracking.
In this research, the prototype system did not support that the augmented hints overlap to the real assembly parts. Although this choice will increase the cognitive load, it's a good solution to reduce the impact of the narrow field of view of the HoloLens to some extent. Moreover, based on the prototype we can try to do some further research on natural feature tracking for improving virtual-real fusion between the shared instructions and real assembly parts.
3- if the instructions are displayed statically in front of the user, what is the difference respect to using a monitor displaying the same content or projected AR?
Response:
Thanks for your valuable suggestions. This is a very interesting question. Although the instructions are displayed statically in front of the user, we think that there are some differences. Specifically, the user can see the shared 3D instructions using HoloLens instead of 2D scene using a monitor or projected-based AR.
4- if the 3D content is not overimposed on the real objects, the final result perceived by the remote user is very similar to a video of the instructor performing the required operation. Why this alternative has not been taken into account, since it would be easier to implement than the proposed one? It would be relevant at least a comparison between the two methodologies.
Response:
Thanks for your valuable suggestions. As you said that if the 3D content is not overlapped on the real objects, the final result perceived by the remote user is very similar to a video of the instructor performing the required operation. However, there are some differences, for example, the local user can freely explore the shared 3D scene with independent of the local camera viewpoint. Moreover, the shared scene can provide the depth perception for local users.
5- even if the test results seems to prove that gestures (3DGAM) are better than simple animations (3DAM), this may be due to the fact that the 3D content is not overimposed on the real objects. Thus, the virtual hands allow the user to focus on the elements to operate at each step of the procedure. A similar effect could be obtained if the 3D contents were overimposed on the real objects and only the objects needed at each step of the procedure were displayed on the Hololens.
Response:
Thanks for your valuable suggestions. We totally agree with you about this. Therefore, we discussed this point in the subsection 6.5 (Discussion and implications) as follows. The revised portion has been marked in red in the revised manuscript for easy tracking.
It should be noted that even if the exploring results showed that the 3DGAM condition is better than simple animations (3DAM), this may be due to the fact that the shared 3D instructions did not overlap on the real objects. However, the shared 3D gestures allow the user to focus on the elements to operate at each step of the assembly procedure in the 3DAM condition. Thus, we think that a similar effect could be obtained if the shared 3D instructions were overlapped on the real parts only the assembly part needed at each step of the procedure were displayed on the AR site using the HoloLens.
6 The paper introduces the usage of gestures that are recognized by the system and reproduced as 3D contents to the remote user to provide assistance. However, only one gesture can be recognized by the system. Overall, it seems to me that the system actually get the hands and fingers position through the leap motion's sensor and reproduces them as a 3D, real time animation on the Hololens, whereas the grap gesture is used to detect the interaction with the virtual objects. This concept is not very clear and should be explained better. Finally, it would be interesting to understand how the coordinates' system of the hand/fingers provided by the leap motion are aligned with the coordinate system of the 3D virtual world.
Response:
Thanks for your valuable suggestions. Yes, you're right. Our system actually gets the hands and fingers position through the Leap Motion on the remote VR site and reproduces them as a real-time 3D gesture on the local AR site using the HoloLens. So the helper can provide the instructions based on 3D dynamic gesture. For the "grasp" gesture, we elaborated it in the subsection 3.2 (Gesture-based Interaction). Finally, about the coordinates' problem, the coordinates' system of the hand/fingers provided by the leap motion are aligned with the coordinate system of the 3D virtual world by sharing the position of the 3D virtual scene on the VR site. That is, in the experiment the remote VR user can adjust the position of the shared 3D virtual scene according to the real situation before they can collaborate.
We tried our best to improve the manuscript and made some changes in the manuscript. These changes will not influence the content and framework of the paper. And here we did not list the changes but marked in red in revised paper. We appreciate for Editors/Reviewers' warm work earnestly, and hope that the correction will meet with approval. Once again, thank you very much for your comments and suggestions.