Protection against illicit drone intrusions is a matter of great concern. The relative stealthy nature of UAVs makes their detection difficult. To address this issue, the Deeplomatics project provides a multimodal and modular approach, which combines the advantages of different systems, while adapting to various topologies of the areas to be secured. The originality lies in the fact that acoustic and optronic devices feed independent AI to simultaneously localize and identify the targets using both spatial audio and visual signatures. Several microphone arrays are deployed on the area to be protected. Within its coverage area (about 15 hectares), each microphone array simultaneously localizes and identifies flying drones using a deep learning approach based on the BeamLearning network. Each array is attached to a local AI which processes spatial audio measurements in realtime (40 estimations per second), independently to the other units of the surveillance network. A data fusion system refines the estimates provided by each of the AI-enhanced microphone arrays. This detected position is shared in real-time with an optronic system. Once this system has hooked its target, a Deep Learning tracking algorithm is used to allow an autonomous visual tracking and identification. The optronic system is composed of various cameras (visible, thermal, and active imaging) mounted on a servo-turret. The active imaging system can capture scenes up to 1 km, and only captures objects within a given distance, which naturally excludes foreground and background from the image, and enhances the capabilities of computer vision. The Deeplomatics project combines benefits from acoustics and optronics to ensure real-time localization and identification of drones, with a high precision (less than 7° of absolute 3D error, more than 90 % detection accuracy). The modular approach also allows to consider in the long term the addition of new capture systems such as electromagnetic radars.