At Nvidia GTC, SimInsights presents R&D to enable AI powered assembly task quality monitoring capabilities within HyperSkill

Session Title –  No Code Authoring of Assembly Digital Twins
Session Format & Duration – 40 min talk 2 presenters
Audience Level – Business/Executive
Industry – Manufacturing
Theme – Computer Vision
Content Type – Industry use-case

Session Description – Learn how to create novel VR/AR enabled manufacturing digital twins for training, guidance and quality monitoring. You will learn how to leverage the power of NVIDIA GPUs and SDKs to train AI models for object detection and pose estimation and overcome the limitations of small datasets and tedious ground truth annotations. We will walk you through our lessons learned from the numerous iterations we went through to develop a successful solution for a manufacturing assembly use case.

Extended Abstract & Results  

Assembly is a large portion of manufacturing operations. During assembly, defects can be introduced and they can pass undetected through quality inspection which can result in substantial rework or warranty costs. Just in the US, companies could have saved more than 25 billion dollars in 2018 in warranty claims if all the products worked as expected. Our aim is to enable non-technical users to author digital twins of the workspaces, equipment and processes for training, guidance and quality monitoring.

To validate the solution, we created a test bed consisting of mechanical and electrical components connected via wires. The components represented varying materials, textures, shapes, sizes etc. Our goal was to identify each individual component in the set up and understand the semantic relations between each component. We implemented two DNN training approaches: 1) using real world data and 2) using synthetic data from 3D CAD models. For the real world training pipeline, we captured 40 sec video clips of the test bed using a regular phone camera and pre-processed the video by resizing from 1280×720 to 640×368 and adding data augmentation by varying brightness levels. The brightness levels were varied making sure that it still resembles probable real world images, within a range of 0.5 to 1.5 times of the original brightness level. Resizing was performed because if the frame size is too large then the resource requirement for training increases rapidly. Data augmentation helps combat cases where there is difference in lighting conditions and overcome issues due to shadows. To reduce the annotation effort, we built a tool using a Lucas Kanade filter, details of which are available in the supporting material. Finally, we used Object Detection with DetectNetv2 gem in Isaac SDK, to train a model using the ResNet18 architecture on the dataset described above. We achieved satisfactory model performance with different lighting conditions without confusion between objects. For the synthetic data training pipeline using 3D models of our objects, we began by modifying the Object Detection scene provided in the ISAAC release to resemble our test bed. We implemented texture randomization, location randomization and light randomization to create a synthetic dataset.

The ability to detect presence or absence of objects is sufficient for detecting a subset of quality defects. However, in other cases, we also need to ensure that the desired objects are placed in the correct position and orientation. To detect orientations, we used the Pose CNN Decoder in the Isaac SDK. We  implemented camera randomization, randomization in position and rotation of the object while generating data for training. While the initial results are promising, the model performance is not as high as object detection. With the latest release of Isaac SDK, we believe that we can achieve the performance targets on pose accuracy. 

The object detection and pose estimation models need not necessarily be used for monitoring in AR and VR, but can also be used for fixed camera monitoring. Using deepstream sdk on Jetson Nano, we could achieve video analysis in real-time. The live feed is taken from a camera connected to Jetson Nano which serves as the input to the object detection model and we could observe the detections on the live feed using the deepstream app. This approach can be extended for monitoring using a set of fixed cameras in an industry/warehouse.

To learn more, please watch the video recording available on the Nvidia GTC website (login required).