time-series

Multimodal Association

3260 papers • 126 benchmarks • 313 datasets

Multimodal association refers to the process of associating multiple modalities or types of data in time series analysis. In time series analysis, multiple modalities or types of data can be collected, such as sensor data, images, audio, and text. Multimodal association aims to integrate these different types of data to improve the understanding and prediction of the time series. For example, in a smart home application, sensor data from temperature, humidity, and motion sensors can be combined with images from cameras to monitor the activities of residents. By analyzing the multimodal data together, the system can detect anomalies or patterns that may not be visible in individual modalities alone. Multimodal association can be achieved using various techniques, including deep learning models, statistical models, and graph-based models. These models can be trained on the multimodal data to learn the associations and dependencies between the different types of data.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in multimodal-association-1

Trend

Dataset

Best Model

Actions

No benchmarks available.

Libraries

Use these libraries to find multimodal-association-1 models and implementations

Datasets

Vi-Fi Multi-modal Dataset

Subtasks

multimodal generation

Most implemented papers

Vi-Fi: Associating Moving Subjects across Vision and Wireless Sensors

Kristin J. Dana, Hansi Liu, Abrar Alali, Mohamed Ibrahim, Bryan Bo Cao, Nicholas Meegan, Hongyu Li, M. Gruteser, Shubham Jain, A. Ashok, Bin Cheng, Hongsheng Lu•Sat Apr 30 2022

In this paper, we present Vi-Fi, a multi-modal system that leverages a user's smartphone WiFi Fine Timing Measurements (FTM) and inertial measurement unit (IMU) sensor data to associate the user detected on a camera footage with their corresponding smartphone identifier (e.g. WiFi MAC address). Our approach uses a recurrent multi-modal deep neural network that exploits FTM and IMU measurements along with distance between user and camera (depth information) to learn affinity matrices. As a baseline method for comparison, we also present a traditional non deep learning approach that uses bipartite graph matching. To facilitate evaluation, we collected a multi-modal dataset that comprises camera videos with depth information (RGB-D), WiFi FTM and IMU measurements for multiple participants at diverse real-world settings. Using association accuracy as the key metric for evaluating the fidelity of Vi-Fi in associating human users on camera feed with their phone IDs, we show that Vi-Fi achieves between 81% (real-time) to 91% (offline) association accuracy.

Content

Multimodal Association | State-of-the-Art