They're modeling a scene mathematically as a "radiance field" - a function that takes a view position and direction as inputs and returns the light color that hits that position from the direction it's facing. They use some input images to train a neural network, in order to find an optimal radiance field function which explains the input images. Once they have that function, they can construct images from new angles by evaluating the function over the (position, direction) inputs needed by the pixels in the new image.