Do We Have “Biomechanics” Of the New Era in Our Personalized Contactless Hand-Gesture Non-Invasive Surgeon-Computer Interaction?

Klapan I, Žagar M, Majhen Z, Klapan L and Trampuš Z

Published on: 2021-03-18


In this paper, we propose and analyze the novel approach for contactless surgery. The base of contactless surgery is the system for motion tracking. For motion tracking, we use a complex stereo vision depth camera system. Once having the patient data, by using the VR techniques and depth tracking cameras, it is possible to analyze and segment the region of interest in pre-operative preparation. Using a depth tracking camera enables an operation to be simulated, and the outcome is viewed and analyzed before the patient undergoes surgery in the same environment as in the real surgery. This enables medical specialists to optimize their surgical approaches. We think specific VR/AR technologies and approaches should be incorporated, such as analyzing 3D workflow and 3D virtual scene so medical specialists would be able to make the most of the benefits during the surgery. A VR-based application is useful due to avoiding any dangers for the patients. Training applications usually have an elevation part for any step of training. However, the main challenge is to produce a step with adequate fidelity to be simulated like what is performing on the patients. The other ability of training and education simulators is preparing a situation for research surgeons to simply practice what has been suggested by other colleagues and even reading from websites.


Augmented reality; Gesture control; Virtual surgery; Voice commands; Navigation surgery; 3D volume rendering


AI - artificial intelligence

AR - augmented reality

CA / CAS - computer-assisted / computer-assisted surgery

CS - contactless surgery

DICOM - Digital imaging and communications in medicine

DOF - degrees of Freedom

FESS / NESS - functional endoscopic sinus surgery / navigation endoscopic sinus surgery

HW/SW - hardware/software

IT - information technology

LM - Leap Motion

MIS - minimally invasive surgery

MRI - magnetic resonance imaging

MSCT - multislice computer tomography

NCAS - navigation computer-assisted surgery

OMC - ostiomeatal complex

OR - operation room

ROI - a region of interest

RTG - Radioisotope thermoelectric generator

SWOT - strengths, weaknesses, opportunities, threats

VC - voice command

VE/VS - virtual endoscopy/virtual surgery

VSI - virtual surgery intelligence

VR/VW - virtual reality/virtual world

3DVR - three-dimensional volume rendering


In the context of CS, our innovative research approach [1] considers using some additional data to visual overlay over existing 3D MRI and/or MSCT data, in order to obtain blending of interactive digital elements, like selecting regions of interests, motion capture through LM sensors [2], and other sensory projections into our real-world environments where VR tools and environments are extended to AR overlapping real-world environment in real-time.

When talking about medical imaging very often we talk about a huge amount of data we need to process and visualize. In order to obtain noninvasive sinus surgery in the pre-operation phase, it is necessary to generate volumetric representations to achieve more realistic AR/VR models. In AR, volumetric representations are generally used accurate surface models consisting of thousands of polygons and other mathematical shapes. These volumetric representations are usually displayed on 2D displays in the OR with different volume rendering techniques preserving the values and context of the original image data overlaid with added AR. The entire volume data is used in the rendering process providing the capability to section the rendered image and visualize the actual image data in the volume image enabling value-based measurements for the rendered image. On the other hand, there is always a trade-off between the accuracy of representation and complexity. If we are speaking about the “philosophy "of VR applications in the field of CS [3] we have to say that the utilized models should be quite realistic, but they also should have a lower number of parameters in order to be processed in real-time during the surgery [4]. In this paper, we are describing our novel approach in personalized contactless hand-gesture non-invasive surgeon-computer interaction and discussing the needs and outcomes that this approach derives.

Materials and Methods

Research Methods

The base of the contactless surgery is the system for motion tracking, which is consisted of a stereo vision depth camera system with an integrated vision processor, stereo depth module, color image signal processing, and inertial measurement unit that enables Full HD resolution captivation remotely on the distance ranging 1 to 2 meters, with a wide field of view in all three dimensions (which can be seen as the right-most screen and camera below it on Figure 1).

Figure 1: Visualization system in OR.

The camera system has installed a high-precision image sensor and infrared projector. The stereo depth module has two identical camera sensors (imagers) and is configured with identical settings. The infrared projector improves the ability of the stereo camera system to determine the depth. The color sensor data when captured is sent to a discrete image signal processor for image adjustments, image scaling, and other functions to compensate for inherent inaccuracy in lens and sensor in providing better image quality. In calibrating the system, the first step is to define gestures that will be used as control inputs and to train the surgeon (user) in order to customize the quality of depth settings that will be used in contactless surgery.

Project Flow

Our project flow is oriented to solve several issues we faced during the proofing of the concept of contactless surgery in our previous approach while using the Leap Motion as a tracking camera, both with software and hardware for spatial (3D) interaction. In this proposal, we will combine using more powerful high-resolution and depth cameras to track motion even more precisely and to define what will be needed from the educational perspective to fully deploy proposed procedures.

Development Guidelines of "VR-Contactless Surgery" On the IT Plan

The application of virtual reality equipment for both pieces of training in medical education and contactless surgery might raise questions in the field of biomechanics. Since there are different approaches to deploying such technologies in different fields of education and surgery, we will mention just a few (Figure 2). The bilateral sagittal split osteotomy (BSSO) project aims at developing a specialized training system dedicated to the teaching of a major maxillofacial surgery procedure based on VR technology. Since the mandible and the contained neurovascular bundle could be damaged beyond repair, using VR methods should be used to increase preciseness. This could be a showcase for our approach for contactless surgery based on depth stereo cameras for motion and gesture tracking.

Figure 2: Virtual reality in operation planning.

But the question here is, are the surgeons aware of the technical possibilities of such complex systems? In order to familiarize them as soon as possible with future technologies and productivity advances, we propose to use such technologies in training in medical schools. This would require capable interdisciplinary developers in the fields of, among others, IT and medicine that would help in the shaping of the educational approach by, for example, supplying specialized headsets for the study to examine the effectiveness of VR.

“Moving Beyond Frontiers” In Our Development Ideas

The next big possibility of future contactless surgery is a haptic response which brings additional augmented reality options during the surgery. To support this thesis, we would like to mention that, based on several predictions [5-7], by 2030 global technological change will be immersive towards automatization of different aspects of our lives, specifically for an aging society burdened with increased healthcare spending. There will be many more possibilities and technologies for medical specialists before and during surgery. For example, video-assisted thoracic surgery is gradually replacing conventional open thoracotomy as the method of choice for the treatment of early-stage non-small cell lung cancers, and thoracic surgical trainees must learn and master this technique.

On the other hand, current needs in Computed Assisted Surgery (CAS) methods exceed the contemporary educational limits. In our augmented reality approach [2-4] (Figure 3), we have demonstrated how to design spatial anatomic elements, from both IT and medical perspectives. Augmented spatial anatomic elements simultaneously combined with the use of 3D medical images and 4D medical videos, united with touchless navigation in space through such complex data should enable higher intraoperative safety and reduce operating time. With the above-mentioned methods, we will track students who will use the personal-3D-navigation system, which is currently in the development phase, to see how they will cope with these state-of-the-art technologies, to track ease-of-use, and to educate them in virtual environments that are as close as possible to real-life applications.

Figure 3: Pre-operative data analysis based on augmented reality.

In preparation for surgery, AR tools should be used for 3D virtual diagnosis and 3D virtual surgery planning. Training and education simulators have their own strategies and protocols (Figure 4). Level of difficulty and psycho-motor skills are the main parameters that are taken into consideration. In other words, the development of innovative surgical instruments can be employed using training and education simulators easily.

Figure 4: Example of training simulator.

Justification for the Development and Application of Contactless Surgery-Related "Understanding" Of the Human Mind

More specifically, for our research on shifting virtual reality experience in education, current classes and laboratories should be equipped with VR and AR tools and different software-based augmented reality scenarios that would enable the abovementioned aims of creating a comprehensive and interdisciplinary environment should be developed. Our research proposal would also go in the direction of monitoring learning experience and students’ emotions while resolving gamified problems and comparing learning outcomes with current learning outcomes produced by the contemporary education process. By enabling this new educational approach, we aim to achieve a sustainable environment for developing skills that students would need in the future. There are currently different human-machine interfaces available which should be covered and analyzed through the educational processes, both haptic feedback and depth tracking, which are covering all 6 (six) degrees of freedom for better understanding and utilization of the devices justified with the understanding of the human mind. As a next step, we should define the benchmark to evaluate the usefulness of our approach and to identify the surgeons’ experience.

Case Report

An adult male patient, AD, 21 years old with a typical dental medical exam for a modern young person. Generally speaking, the phylogenetic development of the stomatognathic system in a modern man is in the „jaw-to-tooth disproportion“. The number of teeth has remained the same, and the jawbone has become smaller in volume. We could certainly describe this as a functional legacy over thousands of years, closely related to human nutrition. In the past Paleozoic period, the chewing function was much more pronounced in the human race, by transferring forces to the bone trajectories, with better-restored bone structure at the same time.

Over millions of years, in the view of modern lifestyle and/or habits (i.e. eating habits), the functional use of the dental system has completely changed, together with the bone base of upper and lower jaws. Additionally, in modern man, we can think of wisdom teeth as a type of atavism, since there is often no bone space for their normal emergence in the dental arch, as seen in this case report (Figure 5). Due to a lack of space in the jaw, there was a retention of the tooth below the equator of the tooth. As already discussed in the literature, such a finding is the cause of so-called tertiary compression of the dental arch (orthodontic anomaly), with possible sensations, such as temporal headache („fictive pain“).

Figure 5: Touchless gesture-controlled functional analyzing and different views of tertiary compression of the dental arch.

The therapeutic approach involves standard surgical alveotomy of the tooth with well-known recommended premedication, such as hydrocortisone 1g/kg, clindamycin MIP 600 mg continuously for 3-7 days, and ketoprofen (NSAIL) 150mg duo/3 days. In such cases, the current basic therapeutic approach is completely based on 2D-black and white X-ray scans (orthopantomogram) with no use of 3D-CBCT diagnostics. In this way, operative approaches and their consequences were depended exclusively on the surgeon's experience and his interpretation of such diagnostic tools in the 2D-anatomical world. With the use of advanced techniques, such as the „contactless diagnostics and/or surgery“, we can understand all anatomic structures per viam non-invasive virtual approach during the operation. This new innovative navigation-non-invasive on the fly gesture-controlled incisionless surgery in our medical activities is usable also in operative extractions of lower wisdom teeth as well as other pathological formations topographically situated adjacent to the mandibular nerve.

Results and Discussion

The next step in innovative noninvasive monitoring technologies of sinus surgery may soon be using smartphone-enabled AR. For example, the Tissue Analytics platform enables doctors and nurses to use their phones for 3D wound-imaging on a smartphone, to quickly identify specific types of wounds for faster diagnosis and more efficient care [8]. With this application, it is already proven that it is possible to utilize low-level camera functionality to produce a high level of control over medical images to capture behavior. Low-level camera input data from a smartphone could be overlaid over existing 3D MRI and/or MSCT data and a fixed mounted mobile phone could be driven by VC. In such a case, using smartphones bring all possible SW advantages to handle the demands of AR environments in noninvasive sinus surgery, as we published previously [9-11]. For example, current Apple iPhone 12 and Samsung Galaxy S21 processors have octa-core (Samsung) or hexa-core (Apple) and enough RAM to run even demanding AR applications (Samsung 16 GB, Apple 4 GB). Also, these smartphones already have all important sensors that might be useful in enabling AR in surgery. Paired with the inevitable rollout of faster 5G data networks, those devices will be able to send and receive mind-boggling amounts of data that will make AR faster and better than ever before [12], but what is more important, it can be used during the surgery, for remote monitoring of the surgery or giving the second opinion in real-time.

An important issue we were thinking about when designing our approach was the fact that in data acquired prior to surgery and in surgery preparation [13], there might be such a thing as too much information, an overreliance on AR could mean that people are missing out on what's right in front of them. In our case of CS, visualization AR/VR tools must enable interactive visualization in real-time. Response time to interaction (a motion that is caught through LM)1-4 is sufficiently high to display the data that can be immersive and medical specialists can enter “into” the data and manipulate and analyze the displayed information and take up any viewpoint. This enables dynamic functional processes as well as entering into the anatomy details, on-line measurements in VR. In NESS/tele-NESS [14-16] as well as in contactless sinus surgery [17,18] it is very important to get high quality and dimensionality of the display and interactive access to the data represented, like quantitative information [19] about the properties of tissues [20], as already published in robotic surgery [21] and these functions could be operated contactless by motion tracking and VC in our innovative approach (Figure 6).

Figure 6: High-quality visualization system used in OR (Klapan Medical Group Polyclinic).


With the re-birth of Google Glasses [22] AR-capable glasses acting as a head-up display, which essentially provided an AR-type heads-up display with packed memory, a processor, a camera, speaker and microphone, Bluetooth and Wi-Fi antennas, an accelerometer, gyroscope, compass and a battery, this approach could also be a solution for an in-surgery wearable device providing medical specialists with more convenient, expansive views of the AR/VR applications [23] and helping with monitoring of our innovative contactless noninvasive sinus surgery [2,3].

On the other hand, in these visualizations, it is important to consider the fact that each eye sees the space differently, so building AR/VR environments need to define the positioning system and orientation of AR objects in the real world [24]. It is necessary to define the orientation and rotation of user viewpoints and objects along with their position in the virtual space [25,26], as we were careful when we were developing our original contactless ENT-surgery [1,4].

Knowing this information is especially important when tracking where the user is looking at or knowing the orientation of virtual objects with respect to the visual space. Due to the lack of 3D information from the input images, a graphical 3D model of the human body has to be generated for 3D motion tracking. Using LM makes easy tracking and performing a large variety of human motion and easy to identify from the silhouettes.

The most articulated 3D human model is generated with a number of rigid body parts and joints. The number of DOF is thus a key factor in the construction of the graphical 3D human model. DOF partially describes types of transformations. Rigid-body transformations have 6 DOF in 3D. This includes three rotations and three translations. Affine transformations have 12 DOF in 3D. This includes three rotations, three translations, three scalings, and three skews/shears. Our approach in the navigation of such a complex system incorporates the usage of LM [4] for touchless interaction and virtual movement [24,25]. This enables 3D-depth-sensing which recognizes and maps spatial environments to place 3D objects in order to enable faster decision making for medical specialists during the surgery and decision making based on more in-depth info. Comparing this with “our standard classical approach in surgery and/or telesurgery” (CAS, NESS) [27-29] makes decision making in surgery preparation and during the surgeries of better quality due to the depth sensing [30], different views [31-32], and 3D navigation [33].


The authors are grateful to Professor Heinz Stammberger, M.D.(†), Graz/Austria/EU, for his helpful discussion about NESS and contactless surgery (February/2018), and Dr. Armin Stranjak, Lead software architect at Siemens Healthcare, Erlangen, Germany, EU, for his discussion about MSCT and MRI nose/sinus scans.

Funding: We received no funding for this research.


  1. Klapan I, Duspara A, Majhen Z, Beni? I, Kostelac M, Kubat G, et all. What is the future of minimally invasive surgery in rhinology: marker-based virtual reality simulation with touch free surgeon's commands, 3D-surgical navigation with additional remote visualization in the operating room, or? Front Otolaryngol Head Neck Surg. 2017; 1: 1-7.
  2. Klapan I, Duspara A, Majhen Z, Beni? I, Kostelac M, Kubat G, et al. What is the future of minimally invasive sinus surgery: computer assisted navigation, 3D-surgical planner, augmented reality in the operating room with 'in the air' surgeon's commands as a "biomechanics" of the new era in personalized contactless hand-gesture noninvasive surgeon-computer interaction? J Scientific Technical Research. 2019; 19:14678-14685.
  3. Klapan I, Duspara A, Majhen Z, Beni? I, Trampuš Z, Žagar M, et al. Do we really need a new innovative navigation-non-invasive on the fly gesture-controlled incisionless surgery? J Scientific Technical Research. 2019; 20: 15394-15404.
  4. Klapan I, Majhen Z, Žagar M, Klapan L, Trampuš Z, Berlengi N, et al. Utilization of 3-D medical imaging and touch-free navigation in endoscopic surgery: does our current technologic advancement represent the future in innovative contactless noninvasive surgery in rhinology? What is next? J Scientific Technical Research. 2019; 22:16336-16344.
  5. European Strategy and Policy Analysis System. Global Trends to 2030: Can the EU meet the challenges ahead? 2015.
  6. Quantum Run State of technology in 2030 | Future Forecast. 2020.
  7. QuantumRun AR + VR forecasts. 2020.
  8. University of Southern Carolina, Institute for Creative Technologies, 2020 Medical Virtual Reality.
  9. Knezovi? J, Kova? M, Klapan I, Mlinari? H, Vranješ Ž, Lukinovi? J, et all. Application of novel lossless compression of medical images using prediction and contextual error modeling. Coll Antropol. 2007; 31: 315-319.
  10. Klapan I, Vranješ Ž, Prgomet D, Lukinovi? J. Application of advanced virtual reality and 3D computer assisted technologies in tele-3D-computer assisted surgery in rhinology. Coll Antropol. 2008; 32: 217-219.
  11. Klapan I, Raos P, Galeta T, Kubat G. Virtual reality in rhinology - a new dimension of clinical experience. Ear Nose Throat. 2016; 95: 23-28.
  12. Houston Methodist Hospital. Texas Medical Center - Brainlab. 2020.
  14. Hötker AM, Pitton MB, Mildenberger P, Düber C. Speech and motion control for interventional radiology: requirements and feasibility. Int J Comput Assist Radiol Surg. 2013; 8: 997-1002.
  15. Klapan I, Raos P, Galeta T. Virtual Reality and 3D computer assisted surgery in rhinology. Ear Nose Throat. 2013; 95: 23-28.
  16. Klapan I, Vranješ Ž, Prgomet D, Lukinovi? J. Application of advanced virtual reality and 3D computer assisted technologies in tele-3D-computer assisted surgery in rhinology. Coll Antropol. 2008; 32: 217-219.
  17. Klapan I, Šimi?i? Lj, Rišavi R, Pasari K, Sruk V, Schwarz D, et all. Real time transfer of live video images in parallel with 3D-modeling of the surgical field in computer-assisted telesurgery. J Telemed Telecare. 2002; 8: 125-130.
  18. Bachmann D, Weichert F, Rinkenauer G. Evaluation of the Leap Motion Controller as a New Contact-Free Pointing Device. Sensors. 2015; 15: 214-233.
  19. Alvise Memo A, Zanuttigh. Head-mounted gesture controlled interface for human-computer interaction. Multimedia Tools and Applications. 2018; 77: 27-55.
  20. Bachmann D, Weichert F, Rinkenauer G. Review of Three-Dimensional Human-Computer Interaction with Focus on the Leap Motion Controller. Sensors (Basel). 2018; 18: 2194.
  21. Citardi MJ, Batra PS. Intraoperative surgical navigation for endoscopic sinus surgery: rationale and indications. Otolaryngol Head Neck Surg. 2007; 15: 23-27.
  22. Caversaccio M, Gerber K, Wimmer W, Williamson T, Anso J, Mantokoudis G, et all. Robotic cochlear implantation: surgical procedure and first clinical experience. Acta Otolaryngol. 2017; 137: 447-454.
  23. Stanford Neurosurgical Simulation and Virtual Reality Center. 2020.
  24. Peters TM, Linte CA, Yaniv Z, Williams J. Mixed and Augmented Reality in Medicine. CRC Press. Series: Series in Medical Physics and Biomedical Engineering. 2018; 888.
  25. Gojare B, Kanawade SY, Bodhak K, Surve S. Leap Motion Control Using Virtual Automation. Int. J. Adv. Res. Ideas Innov. Technol. 2017; 3: 322-325.
  26. Bizzotto N, Costanzo A, Bizzotto L, Regis D, Sandri A, Magnan B. Leap Motion gesture Control with OsiriX in the operating room to control imaging. Surg Innov. 2014; 21: 655-656.
  27. Grätzel C, Fong T, Grange S, Baur C. A non-contact mouse for surgeon-computer interaction. Technology and Health Care. 2004; 12: 245-257.
  28. Klapan I, Šimi?i? Lj, Rišavi R, Bešenski N, Bumber Ž, Stiglmajer N, et all. Dynamic 3D computer-assisted reconstruction of metallic retrobulbar foreign body for diagnostic and surgical purposes. Case report: orbital injury with ethmoid bone involvement. Orbit. 2001; 20: 35-49.
  29. Klapan I, Šimi?i? Lj, Bešenski N, Bumber Ž, Janjanin S, Rišavi R. Application of 3D-computer assisted techniques to sinonasal pathology. Case report: war wounds of paranasal sinuses with metallic foreign bodies. Am J Otolaryngol. 2002; 23: 27-34.
  30. Klapan I. Application of advanced Virtual Reality and 3D computer assisted technologies in computer assisted surgery and tele-3D-computer assisted surgery in rhinology. In: Kim JJ, ed., Virtual Reality, Vienna: Intech. 2011; 303-336.
  31. Qin S, Zhu X, Yang Y, Jiang Y. Real-time hand gesture recognition from depth images using convex shape decomposition method. J Signal Processing Sys. 2014; 74: 47-58.
  32. Klapan I, Šimi?i? Lj, Rišavi R, Bešenski N, Pasari? K, Gortan D, et all. Tele-3D-Computer Assisted Functional Endoscopic Sinus Surgery: new dimension in the surgery of the nose and paranasal sinuses. Otolaryngol Head Neck Surg. 2002; 127: 549-557.
  33. Raos P, Klapan I, Galeta T. Additive manufacturing of medical models - applications in rhinology. Coll Antropol. 2015; 39: 667-673.