Today, cameras with built-in neural network modules are becoming increasingly common. They can process images directly inside the device without transmitting the stream to a server. This seems very convenient: minimal delays, less load on the network, compactness.
But this approach also has obvious drawbacks. The hardware capabilities of built-in processors are limited, and they are usually designed for narrow tasks: face recognition, license plate recognition, simple motion detection. It is practically impossible to configure them for something non-standard—for example, complex analysis of video with water, waves, and a lot of visual noise. The cameras are programmed for a specific set of functions and offer little flexibility.
Moreover, if complex neural networks are required—for example, architectures that analyze the dynamics of human movement underwater—the power of the built-in chips is simply not enough.