Because it is written in Node, it uses several other softwares (i.e. Postgresql), it runs image/video processing on the fly (transcoding to various formats depending on what you upload and who views), it does face/object recognition running a local model and a few other nice features that yeah, require more power. It's not a static HTML of your photos.