Week Ending 5.26.2024

RESEARCH WATCH: 5.26.2024

Synergistic Global-space Camera and Human Reconstruction from Videos

Remarkable progress has been made in 3D reconstruction from videos, but camera and human modeling have largely been separate efforts. SynCHMR marries the two, reconstructing metric camera poses, scene point clouds, and human meshes in a unified world frame. This could enable new augmented/virtual reality experiences that coherently blend real environments with digital humans.

Authors: Yizhou Zhao, Tuanfeng Y. Wang, Bhiksha Raj, Min Xu, Jimei Yang, Chun-Hao Paul Huang

Link: https://arxiv.org/abs/2405.14855v1

Date: 2024-05-23

Summary:

Remarkable strides have been made in reconstructing static scenes or human bodies from monocular videos. Yet, the two problems have largely been approached independently, without much synergy. Most visual SLAM methods can only reconstruct camera trajectories and scene structures up to scale, while most HMR methods reconstruct human meshes in metric scale but fall short in reasoning with cameras and scenes. This work introduces Synergistic Camera and Human Reconstruction (SynCHMR) to marry the best of both worlds. Specifically, we design Human-aware Metric SLAM to reconstruct metric-scale camera poses and scene point clouds using camera-frame HMR as a strong prior, addressing depth, scale, and dynamic ambiguities. Conditioning on the dense scene recovered, we further learn a Scene-aware SMPL Denoiser to enhance world-frame HMR by incorporating spatio-temporal coherency and dynamic scene constraints. Together, they lead to consistent reconstructions of camera trajectories, human meshes, and dense scene point clouds in a common world frame. Project page: https://paulchhuang.github.io/synchmr

--------------------------------------------------------------------------------------------------------

A Transformer-Based Approach for Smart Invocation of Automatic Code Completion

Code completion tools can greatly boost developer productivity, but knowing when to invoke them is critical to avoid disruptions. This work develops a machine learning model to accurately predict ideal invocation timing based on code context and telemetry data, deployed in a real plugin. Such context-aware invocation could make code completion maximally effective.

Authors: Aral de Moor, Arie van Deursen, Maliheh Izadi

Link: https://arxiv.org/abs/2405.14753v1

Date: 2024-05-23

Summary:

Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions. Despite their effectiveness, these models come with high operational costs and can be intrusive, especially when they suggest too often and interrupt developers who are concentrating on their work. Current research largely overlooks how these models interact with developers in practice and neglects to address when a developer should receive completion suggestions. To tackle this issue, we developed a machine learning model that can accurately predict when to invoke a code completion tool given the code context and available telemetry data. To do so, we collect a dataset of 200k developer interactions with our cross-IDE code completion plugin and train several invocation filtering models. Our results indicate that our small-scale transformer model significantly outperforms the baseline while maintaining low enough latency. We further explore the search space for integrating additional telemetry data into a pre-trained transformer directly and obtain promising results. To further demonstrate our approach's practical potential, we deployed the model in an online environment with 34 developers and provided real-world insights based on 74k actual invocations.

--------------------------------------------------------------------------------------------------------

Distributed Speculative Inference of Large Language Models

Large language model (LLM) inference is computationally expensive. Distributed speculative inference accelerates LLM inference without architecture changes by orchestrating multiple instances of the model and drafters in parallel. This could make LLM applications like chatbots, writing assistants, and code generation faster and more practical.

Authors: Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Moshe Wasserblat, Tomer Galanti, Michal Gordon, David Harel

Link: https://arxiv.org/abs/2405.14105v1

Date: 2024-05-23

Summary:

Accelerating the inference of large language models (LLMs) is an important challenge in artificial intelligence. This paper introduces distributed speculative inference (DSI), a novel distributed inference algorithm that is provably faster than speculative inference (SI) [leviathan2023fast, chen2023accelerating, miao2023specinfer] and traditional autoregressive inference (non-SI). Like other SI algorithms, DSI works on frozen LLMs, requiring no training or architectural modifications, and it preserves the target distribution. Prior studies on SI have demonstrated empirical speedups (compared to non-SI) but require a fast and accurate drafter LLM. In practice, off-the-shelf LLMs often do not have matching drafters that are sufficiently fast and accurate. We show a gap: SI gets slower than non-SI when using slower or less accurate drafters. We close this gap by proving that DSI is faster than both SI and non-SI given any drafters. By orchestrating multiple instances of the target and drafters, DSI is not only faster than SI but also supports LLMs that cannot be accelerated with SI. Our simulations show speedups of off-the-shelf LLMs in realistic settings: DSI is 1.29-1.92x faster than SI.

--------------------------------------------------------------------------------------------------------

Mitigating Interference in the Knowledge Continuum through Attention-Guided Incremental Learning

Deep neural networks are prone to catastrophic forgetting when continually learning new tasks. AGILE is a new method that reduces interference between tasks through compact task attention, improving generalization across many sequential tasks. This could enable more plastic artificial general intelligence that accumulates knowledge over time.

Authors: Prashant Bhat, Bharath Renjith, Elahe Arani, Bahram Zonooz

Link: https://arxiv.org/abs/2405.13978v1

Date: 2024-05-22

Summary:

Continual learning (CL) remains a significant challenge for deep neural networks, as it is prone to forgetting previously acquired knowledge. Several approaches have been proposed in the literature, such as experience rehearsal, regularization, and parameter isolation, to address this problem. Although almost zero forgetting can be achieved in task-incremental learning, class-incremental learning remains highly challenging due to the problem of inter-task class separation. Limited access to previous task data makes it difficult to discriminate between classes of current and previous tasks. To address this issue, we propose `Attention-Guided Incremental Learning' (AGILE), a novel rehearsal-based CL approach that incorporates compact task attention to effectively reduce interference between tasks. AGILE utilizes lightweight, learnable task projection vectors to transform the latent representations of a shared task attention module toward task distribution. Through extensive empirical evaluation, we show that AGILE significantly improves generalization performance by mitigating task interference and outperforming rehearsal-based approaches in several CL scenarios. Furthermore, AGILE can scale well to a large number of tasks with minimal overhead while remaining well-calibrated with reduced task-recency bias.

--------------------------------------------------------------------------------------------------------

Multi-Dataset Multi-Task Learning for COVID-19 Prognosis

Predicting COVID-19 outcomes from chest X-rays is a valuable but data-constrained task. This multi-dataset multi-task approach integrates datasets assessing severity scores and prognostic groups, boosting performance over single-task baselines across many models. Such multi-modal techniques could enhance medical AI decision support systems.

Authors: Filippo Ruffini, Lorenzo Tronchin, Zhuoru Wu, Wenting Chen, Paolo Soda, Linlin Shen, Valerio Guarrasi

Link: https://arxiv.org/abs/2405.13771v1

Date: 2024-05-22

Summary:

In the fight against the COVID-19 pandemic, leveraging artificial intelligence to predict disease outcomes from chest radiographic images represents a significant scientific aim. The challenge, however, lies in the scarcity of large, labeled datasets with compatible tasks for training deep learning models without leading to overfitting. Addressing this issue, we introduce a novel multi-dataset multi-task training framework that predicts COVID-19 prognostic outcomes from chest X-rays (CXR) by integrating correlated datasets from disparate sources, distant from conventional multi-task learning approaches, which rely on datasets with multiple and correlated labeling schemes. Our framework hypothesizes that assessing severity scores enhances the model's ability to classify prognostic severity groups, thereby improving its robustness and predictive power. The proposed architecture comprises a deep convolutional network that receives inputs from two publicly available CXR datasets, AIforCOVID for severity prognostic prediction and BRIXIA for severity score assessment, and branches into task-specific fully connected output networks. Moreover, we propose a multi-task loss function, incorporating an indicator function, to exploit multi-dataset integration. The effectiveness and robustness of the proposed approach are demonstrated through significant performance improvements in prognosis classification tasks across 18 different convolutional neural network backbones in different evaluation strategies. This improvement is evident over single-task baselines and standard transfer learning strategies, supported by extensive statistical analysis, showing great application potential.

--------------------------------------------------------------------------------------------------------

Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust

Generating high-quality scientific code across many languages is challenging for current AI models like ChatGPT. This work evaluates their code generation capabilities across integration, optimization, and parallel problems, revealing promising but uneven performance. Continued research could lead to powerful AI coding assistants.

Authors: Patrick Diehl, Noujoud Nader, Steve Brandt, Hartmut Kaiser

Link: https://arxiv.org/abs/2405.13101v1

Date: 2024-05-21

Summary:

This study evaluates the capabilities of ChatGPT versions 3.5 and 4 in generating code across a diverse range of programming languages. Our objective is to assess the effectiveness of these AI models for generating scientific programs. To this end, we asked ChatGPT to generate three distinct codes: a simple numerical integration, a conjugate gradient solver, and a parallel 1D stencil-based heat equation solver. The focus of our analysis was on the compilation, runtime performance, and accuracy of the codes. While both versions of ChatGPT successfully created codes that compiled and ran (with some help), some languages were easier for the AI to use than others (possibly because of the size of the training sets used). Parallel codes -- even the simple example we chose to study here -- also difficult for the AI to generate correctly.

--------------------------------------------------------------------------------------------------------

PILOT: Equivariant diffusion for pocket conditioned de novo ligand generation with multi-objective guidance via importance sampling

De novo molecular design for drug discovery is difficult, requiring optimizing for binding affinity and synthesizability. PILOT is an equivariant diffusion model that generates 3D ligand structures conditioned on protein pockets while guiding towards desired chemical properties via importance sampling, outperforming prior methods.

Authors: Julian Cremer, Tuan Le, Frank Noé, Djork-Arné Clevert, Kristof T. Schütt

Link: https://arxiv.org/abs/2405.14925v1

Date: 2024-05-23

Summary:

The generation of ligands that both are tailored to a given protein pocket and exhibit a range of desired chemical properties is a major challenge in structure-based drug design. Here, we propose an in-silico approach for the $\textit{de novo}$ generation of 3D ligand structures using the equivariant diffusion model PILOT, combining pocket conditioning with a large-scale pre-training and property guidance. Its multi-objective trajectory-based importance sampling strategy is designed to direct the model towards molecules that not only exhibit desired characteristics such as increased binding affinity for a given protein pocket but also maintains high synthetic accessibility. This ensures the practicality of sampled molecules, thus maximizing their potential for the drug discovery pipeline. PILOT significantly outperforms existing methods across various metrics on the common benchmark dataset CrossDocked2020. Moreover, we employ PILOT to generate novel ligands for unseen protein pockets from the Kinodata-3D dataset, which encompasses a substantial portion of the human kinome. The generated structures exhibit predicted $IC_{50}$ values indicative of potent biological activity, which highlights the potential of PILOT as a powerful tool for structure-based drug design.

--------------------------------------------------------------------------------------------------------

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

The Go-Explore algorithm has achieved impressive results in hard exploration problems, but requires manually designed heuristics. Intelligent Go-Explore replaces heuristics with the general intelligence of large language models, allowing for open-ended exploration driven by instincts about interestingness. This could lead to more general and capable exploratory agents.

Authors: Cong Lu, Shengran Hu, Jeff Clune

Link: https://arxiv.org/abs/2405.15143v1

Date: 2024-05-24

Summary:

Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems, built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration, which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these heuristics with the intelligence and internalized human notions of interestingness captured by giant foundation models (FMs). This provides IGE with a human-like ability to instinctively identify how interesting or promising any new state is (e.g. discovering new objects, locations, or behaviors), even in complex environments where heuristics are hard to define. Moreover, IGE offers the exciting and previously impossible opportunity to recognize and capitalize on serendipitous discoveries that cannot be predicted ahead of time. We evaluate IGE on a range of language-based tasks that require search and exploration. In Game of 24, a multistep mathematical reasoning problem, IGE reaches 100% success rate 70.8% faster than the best classic graph search baseline. Next, in BabyAI-Text, a challenging partially observable gridworld, IGE exceeds the previous SOTA with orders of magnitude fewer online samples. Finally, in TextWorld, we show the unique ability of IGE to succeed in settings requiring long-horizon exploration where prior SOTA FM agents like Reflexion completely fail. Overall, IGE combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities.

--------------------------------------------------------------------------------------------------------

Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation

Neural network quantization can increase efficiency but degrade accuracy. This work introduces a 2D output representation mapping targets to a parametric curve, demonstrating substantially reduced quantization error on depth estimation and vision transformer tasks with minimal latency increase. Such techniques could enable deploying larger models on low-power devices.

Authors: Mykhailo Uss, Ruslan Yermolenko, Olena Kolodiazhna, Oleksii Shashko, Ivan Safonov, Volodymyr Savin, Yoonjae Yeo, Seowon Ji, Jaeyun Jeong

Link: https://arxiv.org/abs/2405.14024v1

Date: 2024-05-22

Summary:

Quantization is widely used to increase deep neural networks' (DNN) memory, computation, and power efficiency. Various techniques, such as post-training quantization and quantization-aware training, have been proposed to improve quantization quality. We introduce a novel approach for DNN quantization that uses a redundant representation of DNN's output. We represent the target quantity as a point on a 2D parametric curve. The DNN model is modified to predict 2D points that are mapped back to the target quantity at a post-processing stage. We demonstrate that this mapping can reduce quantization error. For the low-order parametric Hilbert curve, Depth-From-Stereo task, and two models represented by U-Net architecture and vision transformer, we achieved a quantization error reduction by about 5 times for the INT8 model at both CPU and DSP delegates. This gain comes with a minimal inference time increase (less than 7%). Our approach can be applied to other tasks, including segmentation, object detection, and key-points prediction.

--------------------------------------------------------------------------------------------------------

MOSS: A Large-scale Open Microscopic Traffic Simulation System

Large-scale microscopic traffic simulation is valuable for transportation research but faces scalability and realism challenges. MOSS is an open GPU-accelerated system that enables large-scale city-level simulation with realistic travel demand synthesized from open data, offering a powerful public toolchain for simulating future smart city scenarios.

Authors: Jun Zhang, Wenxuan Ao, Junbo Yan, Can Rong, Depeng Jin, Wei Wu, Yong Li

Link: https://arxiv.org/abs/2405.12520v1

Date: 2024-05-21

Summary:

In the research of Intelligent Transportation Systems (ITS), traffic simulation is a key procedure for the evaluation of new methods and optimization of strategies. However, existing traffic simulation systems face two challenges. First, how to balance simulation scale with realism is a dilemma. Second, it is hard to simulate realistic results, which requires realistic travel demand data and simulator. These problems limit computer-aided optimization of traffic management strategies for large-scale road networks and reduce the usability of traffic simulations in areas where real-world travel demand data are lacking. To address these problems, we design and implement MObility Simulation System (MOSS). MOSS adopts GPU acceleration to significantly improve the efficiency and scale of microscopic traffic simulation, which enables realistic and fast simulations for large-scale road networks. It provides realistic travel Origin-Destination (OD) matrices generation through a pre-trained generative neural network model based on publicly available data on a global scale, such as satellite imagery, to help researchers build meaningful travel demand data. It also provides a complete open toolchain to help users with road network construction, demand generation, simulation, and result analysis. The whole toolchain including the simulator can be accessed at https://moss.fiblab.net and the codes are open-source for community collaboration.

--------------------------------------------------------------------------------------------------------

Bangladeshi Native Vehicle Detection in Wild

Accurate vehicle detection is crucial for autonomous navigation, but existing datasets often lack regional diversity. This work introduces the Bangladesh Native Vehicle Dataset (BNVD) containing over 80,000 annotated instances across 17 vehicle classes commonly found in Bangladesh, accounting for variations in geography, illumination, sizes, and orientations. Evaluations show BNVD achieves high performance and presents considerable complexities, making it valuable for developing context-aware vehicle detection systems tailored to regional environments.

Authors: Bipin Saha, Md. Johirul Islam, Shaikh Khaled Mostaque, Aditya Bhowmik, Tapodhir Karmakar Taton, Md. Nakib Hayat Chowdhury, Mamun Bin Ibne Reaz

Link: https://arxiv.org/abs/2405.12150v1

Date: 2024-05-20

Summary:

The success of autonomous navigation relies on robust and precise vehicle recognition, hindered by the scarcity of region-specific vehicle detection datasets, impeding the development of context-aware systems. To advance terrestrial object detection research, this paper proposes a native vehicle detection dataset for the most commonly appeared vehicle classes in Bangladesh. 17 distinct vehicle classes have been taken into account, with fully annotated 81542 instances of 17326 images. Each image width is set to at least 1280px. The dataset's average vehicle bounding box-to-image ratio is 4.7036. This Bangladesh Native Vehicle Dataset (BNVD) has accounted for several geographical, illumination, variety of vehicle sizes, and orientations to be more robust on surprised scenarios. In the context of examining the BNVD dataset, this work provides a thorough assessment with four successive You Only Look Once (YOLO) models, namely YOLO v5, v6, v7, and v8. These dataset's effectiveness is methodically evaluated and contrasted with other vehicle datasets already in use. The BNVD dataset exhibits mean average precision(mAP) at 50% intersection over union (IoU) is 0.848 corresponding precision and recall values of 0.841 and 0.774. The research findings indicate a mAP of 0.643 at an IoU range of 0.5 to 0.95. The experiments show that the BNVD dataset serves as a reliable representation of vehicle distribution and presents considerable complexities.

--------------------------------------------------------------------------------------------------------

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

As generative video models become more powerful, controlling their output is an increasing focus. CamViG extends multimodal transformers to enable 3D camera control during video generation by conditioning on encoded camera motion signals. This allows generating coherent videos from just a single frame while manipulating the virtual camera path, opening up applications in areas like computer graphics and virtual/augmented reality.

Authors: Andrew Marmon, Grant Schindler, José Lezama, Dan Kondratyuk, Bryan Seybold, Irfan Essa

Link: https://arxiv.org/abs/2405.13195v1

Date: 2024-05-21

Summary:

We extend multimodal transformers to include 3D camera motion as a conditioning signal for the task of video generation. Generative video models are becoming increasingly powerful, thus focusing research efforts on methods of controlling the output of such models. We propose to add virtual 3D camera controls to generative video methods by conditioning generated video on an encoding of three-dimensional camera movement over the course of the generated video. Results demonstrate that we are (1) able to successfully control the camera during video generation, starting from a single frame and a camera signal, and (2) we demonstrate the accuracy of the generated 3D camera paths using traditional computer vision methods.

--------------------------------------------------------------------------------------------------------

From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

Generating explicit chain-of-thought (CoT) reasoning steps often improves language model performance on complex tasks, but could be simplified. This work proposes iteratively fine-tuning a CoT-trained model while removing intermediate reasoning steps, allowing it to internalize the step-by-step logic. Using this approach, even small models achieved high accuracy on challenging arithmetic tasks without producing any explicit reasoning, streamlining the inference process.

Authors: Yuntian Deng, Yejin Choi, Stuart Shieber

Link: https://arxiv.org/abs/2405.14838v1

Date: 2024-05-23

Summary:

When leveraging language models for reasoning tasks, generating explicit chain-of-thought (CoT) steps often proves essential for achieving high accuracy in final outputs. In this paper, we investigate if models can be taught to internalize these CoT steps. To this end, we propose a simple yet effective method for internalizing CoT steps: starting with a model trained for explicit CoT reasoning, we gradually remove the intermediate steps and finetune the model. This process allows the model to internalize the intermediate reasoning steps, thus simplifying the reasoning process while maintaining high performance. Our approach enables a GPT-2 Small model to solve 9-by-9 multiplication with up to 99% accuracy, whereas standard training cannot solve beyond 4-by-4 multiplication. Furthermore, our method proves effective on larger language models, such as Mistral 7B, achieving over 50% accuracy on GSM8K without producing any intermediate steps.

--------------------------------------------------------------------------------------------------------

Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition

Integrating large language models (LLMs) into multimodal recognition tasks like speech and text is challenging due to mismatched output spaces. Generative Fusion Decoding (GFD) enables seamless fusion by mapping to a shared byte token space, leveraging LLM capabilities like open-ended context learning to boost performance on benchmarks while offering a unified solution across modalities.

Authors: Chan-Jan Hsu, Yi-Chang Chen, Feng-Ting Liao, Pei-Chen Ho, Yu-Hsiang Wang, Po-Chun Hsu, Da-shan Shiu

Link: https://arxiv.org/abs/2405.14259v1

Date: 2024-05-23

Summary:

We introduce ``Generative Fusion Decoding'' (GFD), a novel shallow fusion framework, utilized to integrate Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recognition (OCR). We derive the formulas necessary to enable GFD to operate across mismatched token spaces of different models by mapping text token space to byte token space, enabling seamless fusion during the decoding process. The framework is plug-and-play, compatible with various auto-regressive models, and does not require re-training for feature alignment, thus overcoming limitations of previous fusion techniques. We highlight three main advantages of GFD: First, by simplifying the complexity of aligning different model sample spaces, GFD allows LLMs to correct errors in tandem with the recognition model, reducing computation latencies. Second, the in-context learning ability of LLMs is fully capitalized by GFD, increasing robustness in long-form speech recognition and instruction aware speech recognition. Third, GFD enables fusing recognition models deficient in Chinese text recognition with LLMs extensively trained on Chinese. Our evaluation demonstrates that GFD significantly improves performance in ASR and OCR tasks, with ASR reaching state-of-the-art in the NTUML2021 benchmark. GFD provides a significant step forward in model integration, offering a unified solution that could be widely applicable to leveraging existing pre-trained models through step by step fusion.

--------------------------------------------------------------------------------------------------------

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models

Vision-language models trained on English data can exhibit biases against other cultures and socioeconomic statuses. This work highlights how pretraining on unfiltered global data before English fine-tuning improves cross-cultural understanding without compromising performance on popular benchmarks. It also introduces geo-localization as an evaluation for assessing cultural diversity, underscoring the importance of using diverse data.

Authors: Angéline Pouget, Lucas Beyer, Emanuele Bugliarello, Xiao Wang, Andreas Peter Steiner, Xiaohua Zhai, Ibrahim Alabdulmohsin

Link: https://arxiv.org/abs/2405.13777v2

Date: 2024-05-24

Summary:

We study cultural and socioeconomic diversity in contrastive vision-language models (VLMs). Using a broad range of benchmark datasets and evaluation metrics, we bring to attention several important findings. First, the common filtering of training data to English image-text pairs disadvantages communities of lower socioeconomic status and negatively impacts cultural understanding. Notably, this performance gap is not captured by - and even at odds with - the currently popular evaluation metrics derived from the Western-centric ImageNet and COCO datasets. Second, pretraining with global, unfiltered data before fine-tuning on English content can improve cultural understanding without sacrificing performance on said popular benchmarks. Third, we introduce the task of geo-localization as a novel evaluation metric to assess cultural diversity in VLMs. Our work underscores the value of using diverse data to create more inclusive multimodal systems and lays the groundwork for developing VLMs that better represent global perspectives.

--------------------------------------------------------------------------------------------------------

Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents

Though powerful, open-domain dialogue models can generate unsafe or toxic responses. Adversarial DPO enhances model resilience against harmful conversations via a new training algorithm that up-weights preferred responses and down-weights toxic ones generated using control tokens. This approach stabilizes training while reducing the need for artificial safe data, paving the way for more robust conversational AI assistants.

Authors: San Kim, Gary Geunbae Lee

Link: https://arxiv.org/abs/2405.12900v1

Date: 2024-05-21

Summary:

Recent advancements in open-domain dialogue systems have been propelled by the emergence of high-quality large language models (LLMs) and various effective training methodologies. Nevertheless, the presence of toxicity within these models presents a significant challenge that can potentially diminish the user experience. In this study, we introduce an innovative training algorithm, an improvement upon direct preference optimization (DPO), called adversarial DPO (ADPO). The ADPO algorithm is designed to train models to assign higher probability distributions to preferred responses and lower distributions to unsafe responses, which are self-generated using the toxic control token. We demonstrate that ADPO enhances the model's resilience against harmful conversations while minimizing performance degradation. Furthermore, we illustrate that ADPO offers a more stable training procedure compared to the traditional DPO. To the best of our knowledge, this is the first adaptation of the DPO algorithm that directly incorporates harmful data into the generative model, thereby reducing the need to artificially create safe dialogue data.

--------------------------------------------------------------------------------------------------------

Contactless Polysomnography: What Radio Waves Tell Us about Sleep

Sleep monitoring is crucial for understanding disease interactions, but current methods are obtrusive. This work proposes using radio waves to passively capture sleep stages and breathing patterns by developing an ML model analyzing reflected signals. Validations demonstrate high accuracy in detecting sleep stages, apnea, and revealing disease connections, promising applications in clinical trials and routine care.

Authors: Hao He, Chao Li, Wolfgang Ganglberger, Kaileigh Gallagher, Rumen Hristov, Michail Ouroutzoglou, Haoqi Sun, Jimeng Sun, Brandon Westover, Dina Katabi

Link: https://arxiv.org/abs/2405.11739v1

Date: 2024-05-20

Summary:

The ability to assess sleep at home, capture sleep stages, and detect the occurrence of apnea (without on-body sensors) simply by analyzing the radio waves bouncing off people's bodies while they sleep is quite powerful. Such a capability would allow for longitudinal data collection in patients' homes, informing our understanding of sleep and its interaction with various diseases and their therapeutic responses, both in clinical trials and routine care. In this article, we develop an advanced machine learning algorithm for passively monitoring sleep and nocturnal breathing from radio waves reflected off people while asleep. Validation results in comparison with the gold standard (i.e., polysomnography) (n=849) demonstrate that the model captures the sleep hypnogram (with an accuracy of 81% for 30-second epochs categorized into Wake, Light Sleep, Deep Sleep, or REM), detects sleep apnea (AUROC = 0.88), and measures the patient's Apnea-Hypopnea Index (ICC=0.95; 95% CI = [0.93, 0.97]). Notably, the model exhibits equitable performance across race, sex, and age. Moreover, the model uncovers informative interactions between sleep stages and a range of diseases including neurological, psychiatric, cardiovascular, and immunological disorders. These findings not only hold promise for clinical practice and interventional trials but also underscore the significance of sleep as a fundamental component in understanding and managing various diseases.

--------------------------------------------------------------------------------------------------------

The Road Less Scheduled

Existing learning rate schedules require specifying the stopping step, limiting their generality. The Schedule-Free approach avoids this by dynamically optimizing without schedules, introducing no extra hyperparameters yet achieving state-of-the-art performance across convex optimization and deep learning tasks. This enables simpler, more effective training for a wide range of problems.

Authors: Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

Link: https://arxiv.org/abs/2405.15682v1

Date: 2024-05-24

Summary:

Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available (https://github.com/facebookresearch/schedule_free).

--------------------------------------------------------------------------------------------------------

AI-Protected Blockchain-based IoT environments: Harnessing the Future of Network Security and Privacy

Integrating blockchain with IoT networks has immense potential for enhancing security and privacy via decentralized, tamper-proof device management. This work explores harnessing AI to autonomously detect threats, optimize protocols, and protect user data anonymity alongside blockchain's capabilities, paving the way for more resilient, intelligent IoT systems in critical infrastructure.

Authors: Ali Mohammadi Ruzbahani

Link: https://arxiv.org/abs/2405.13847v1

Date: 2024-05-22

Summary:

Integrating blockchain technology with the Internet of Things offers transformative possibilities for enhancing network security and privacy in the contemporary digital landscape, where interconnected devices and expansive networks are ubiquitous. This paper explores the pivotal role of artificial intelligence in bolstering blockchain-enabled IoT systems, potentially marking a significant leap forward in safeguarding data integrity and confidentiality across networks. Blockchain technology provides a decentralized and immutable ledger, ideal for the secure management of device identities and transactions in IoT networks. When coupled with AI, these systems gain the ability to not only automate and optimize security protocols but also adaptively respond to new and evolving cyber threats. This dual capability enhances the resilience of networks against cyber-attacks, a critical consideration as IoT devices increasingly permeate critical infrastructures. The synergy between AI and blockchain in IoT is profound. AI algorithms can analyze vast amounts of data from IoT devices to detect patterns and anomalies that may signify security breaches. Concurrently, blockchain can ensure that data records are tamper-proof, enhancing the reliability of AI-driven security measures. Moreover, this research evaluates the implications of AI-enhanced blockchain systems on privacy protection within IoT networks. IoT devices often collect sensitive personal data, making privacy a paramount concern. AI can facilitate the development of new protocols that ensure data privacy and user anonymity without compromising the functionality of IoT systems. Through comprehensive analysis and case studies, this paper aims to provide an in-depth understanding of how AI-enhanced blockchain technology can revolutionize network security and privacy in IoT environments.

--------------------------------------------------------------------------------------------------------

Generative AI: The power of the new education

Generative AI will profoundly impact education, necessitating proactive integration across all subjects beyond just AI-related fields. This study proposes an accelerated, generation-focused learning methodology that not only builds technical skills but also understanding of AI ethics, risks, and applications. By deeply examining student perspectives, it aims to guide educators in effectively preparing the next generation.

Authors: Sergio Altares-López, José M. Bengochea-Guevara, Carlos Ranz, Héctor Montes, Angela Ribeiro

Link: https://arxiv.org/abs/2405.13487v1

Date: 2024-05-22

Summary:

The effective integration of generative artificial intelligence in education is a fundamental aspect to prepare future generations. This study proposes an accelerated learning methodology in artificial intelligence, focused on its generative capacity, as a way to achieve this goal. It recognizes the challenge of getting teachers to engage with new technologies and adapt their methods in all subjects, not just those related to AI. This methodology not only promotes interest in science, technology, engineering and mathematics, but also facilitates student understanding of the ethical uses and risks associated with AI. Students' perceptions of generative AI are examined, addressing their emotions towards its evolution, evaluation of its ethical implications, and everyday use of AI tools. In addition, AI applications commonly used by students and their integration into other disciplines are investigated. The study aims to provide educators with a deeper understanding of students' perceptions of AI and its relevance in society and in their future career paths.

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithMay 27, 2024Comment