HVTR++: Image and Pose Driven Human Avatars using Hybrid Volumetric-Textural Rendering

TL;DR: HVTR++ is a virtual telepresence system that can be driven by poses and view images and render human avatars using Hybrid Volumetric-Textural Rendering (HVTR).

Abstract

Recent neural rendering methods have made great progress in generating photorealistic human avatars. However, these methods are generally conditioned only on low-dimensional driving signals (e.g., body poses), which are insufficient to encode the complete appearance of a clothed human. Hence they fail to generate faithful details. To address this problem, we exploit driving view images (e.g., in telepresence systems) as additional inputs. We propose a novel neural rendering pipeline, Hybrid Volumetric-Textural Rendering (HVTR++), which synthesizes 3D human avatars from arbitrary driving poses and views while staying faithful to appearance details efficiently and at high quality. First, we learn to encode the driving signals of pose and view image on a dense UV manifold of the human body surface and extract UV-aligned features, preserving the structure of a skeleton-based parametric model. To handle complicated motions (e.g., self-occlusions), we then leverage the UV-aligned features to construct a 3D volumetric representation based on a dynamic neural radiance field. While this allows us to represent 3D geometry with changing topology, volumetric rendering is computationally heavy. Hence we employ only a rough volumetric representation using a pose- and image-conditioned downsampled neural radiance field (PID-NeRF), which we can render efficiently at low resolutions. In addition, we learn 2D textural features that are fused with rendered volumetric features in image space. The key advantage of our approach is that we can then convert the fused features into a high-resolution, high-quality avatar by a fast GAN-based textural renderer. We demonstrate that hybrid rendering enables HVTR++ to handle complicated motions, render high-quality avatars under user-controlled poses/shapes, and most importantly, be efficient at inference time. Our experimental results also demonstrate state-of-the-art quantitative results.

Method Overview

Taking as input the driving signals including multiple driving views with a fitted coarse SMPL mesh, and view direction, HVTR++ produces a full-body avatar. At the core of the pipeline is a localised UV-aligned representation which models both the geometric driving pose signals of a parametric mesh (SMPL) and sparse driving image signals in a compact UV space. The UV-aligned representation can be efficiently rendered into images by Hybrid Volumetric-Textural Rendering.

Applications

HVTR can render human avatars with both pose and shape control from arbitary viewpoints.

Pose Driven Avatars

Render human avatars under different poses and viewpoints.

Shape Editing

Render human avatars under different shape parameters.

BibTeX

@ARTICLE{hu2023hvtrpp,
  author={Hu, Tao and Xu, Hongyi and Luo, Linjie and Yu, Tao and Zheng, Zerong and Zhang, He and Liu, Yebin and Zwicker, Matthias},
  journal={IEEE Transactions on Visualization and Computer Graphics}, 
  title={HVTR++: Image and Pose Driven Human Avatars using Hybrid Volumetric-Textural Rendering}, 
  year={2023}
}
@inproceedings{HVTR:3DV2022,
  title={HVTR: Hybrid Volumetric-Textural Rendering for Human Avatars},
  author={Hu, Tao and Yu, Tao and Zheng, Zerong and Zhang, He and Liu, Yebin and Zwicker, Matthias},
  booktitle = {2022 International Conference on 3D Vision (3DV)},
  year = {2022}
}